Resources for Data Analysis and Classification
- The Fundamental
Clustering Problems Suite (FCPS) offers a variety of clustering
problems any algorithm shall be able to handle when facing real world
data. FCPS serves as an elementary benchmark for clustering
algorithms.
- Fionn Murtagh maintains a list of Multivariate Data Analysis
Software and Resources
- StatLib
- The R Project for Statistical Computing
- Octave
- GNU Scientific Library
- GGobi is an open source
visualization program for exploring high-dimensional data.
- PDL ("Perl Data Language")
gives standard Perl the ability to compactly store and speedily
manipulate the large N-dimensional data arrays which are the bread and
butter of scientific computing.
- Scientific
computing with Python
- Weka is a
collection of machine learning algorithms for data mining tasks. The
algorithms can either be applied directly to a dataset or called from
your own Java code. Weka contains tools for data pre-processing,
classification, regression, clustering, association rules, and
visualization. It is also well-suited for developing new machine
learning schemes.