Fionn Murtagh's
Multivariate Data Analysis
Software and Resources

Page


Contents

  1. Hierarchical Clustering Software in R/S-Plus
  2. MDA-J: Multivariate Data Analysis - Java
  3. Multivariate Data Analysis Software as Standalone Java Applications
  4. Gaussian Mixture Modeling with Bayes Factors in C
  5. Wavelet Transform on a Hierarchy or Dendrogram
  6. Multivariate Data Analysis Software in Fortran (and C)
  7. Resources and Links, including: JP Benzécri's Pascal Code for Correspondence Analysis; Multidimensional Scaling; Point Pattern Matching; book F. Murtagh, Multidimensional Clustering Algorithms, Physica-Verlag, 1985; Reading FITS Files in R.
  8. Software accompanying the book: Correspondence Analysis and Data Coding with R and Java
Legal notice: all software here can be freely used and incorporated into any system, for any purpose. It is required that the origin and authorship of any code taken from here is acknowledged in code and other documentation.

1. Hierarchical Clustering Software in R/S-Plus

What is unique here: (1) All hierarchical clustering programs achieve the optimal O(n2) computational bound using the nearest neighbors chain algorithm. (2) The "stored dissimilarity" algorithm is used, implying O(n2) storage (the "stored data" algorithm is an alternative, with O(n) storage, but greater absolute computational requirement). (3) For hierarchical clustering, both native R code and linked C code (crucial for efficiency, with large data sets); implementations are identical. (4) Weighting of cases/observations (rows) supported. (5) Easy and straightforward linkage with correspondence analysis to normalize (by "Euclideanizing") the data input to the hierarchical clustering. (6) The Ward minimum variance criterion is used in the software here. (See note below.)

2. MDA-J: Multivariate Data Analysis - Java


3. Multivariate Data Analysis Software as Individual Java Applications


4. Gaussian Mixture Modeling with Bayes Factors

This is a new area, where we will get - soon - programs in C uploaded, mainly for image segmentation (including multiband images) based on Markov random field models, and with use of Bayes factor inference - Bayes information criterion and BIC in the pseudolikelihood case.

5. Wavelet Transform on a Hierarchy or Dendrogram

New hierarchical Haar wavelet transform in R (see commented lines at start for example of use), which works on a hierarchy produced by the foregoing hierarchical clustering programs. This hierarchical Haar wavelet transform carries out the following processing tasks: (i) from the data and a hierarchy, produce the wavelet transform; (ii) filter the wavelet coefficients, using a user-specified hard threshold; and (iii) reconstruct the data, i.e. perform the inverse wavelet transform.

6. Multivariate Data Analysis Software in Fortran (and C)

The following is provided in case it is still of interest. (Of note: the agglomeration sequence visualization program.)

This is a collection of stand-alone routines, in Fortran (mostly) and C. Sample data sets are available. Indications are given on how to compile, link and run. Download the programs and run on your system. Many of these programs were originally used on a VAX/VMS system, and later on Linux and Solaris systems. Please notify the author of any problems (although the programs are provided "as is" and there are evident improvements which could be made). The example of compiling, linking and running given for principal components analysis (in Fortran) is similar to what is required for the other Fortran programs here.


7. Links and Resources


Author: fmurtagh at acm dot org
Homepage