\name{run.cluster.matrix}
\alias{run.cluster.matrix}
\title{Identify Equivalent Peaks from Different Subjects}
\description{
Takes the file generated by \code{\link{run.lrg.peaks}}, identifies equivalent peaks in each spectrum,
and fills in missing values.
}
\usage{
run.cluster.matrix(pre.align = FALSE, align.method = "spline",
                   trans.method = "shiftedlog", add.par = 0,
                   subtract.base = FALSE, lrg.only = TRUE,
                   calc.all.peaks = FALSE, masses = NULL,
                   isotope.dist = 7, cluster.method = "ppm",
                   cluster.constant = 10, num.pts = 5,
                   R2.thresh = 0.98, oneside.min = 1,
                   peak.method = "parabola", root.dir = ".",
                   base.dir, peak.dir, lrg.dir,
                   lrg.file = lrg_peaks.RData,
                   overwrite = FALSE, use.par.file = FALSE,
                   par.file = "parameters.RData")
}
\arguments{
    \item{pre.align}{either \code{FALSE}, or a numeric vector of shifts to apply to spectra, or a two-component list (of the form described in the \code{Note} section below) to be used before identifying peaks from different spectra}
    \item{align.method}{alignment algorithm for peaks}
    \item{trans.method}{type of transformation to use on spectra before statistical analysis; currently, only \code{"shiftedlog"}, \code{"glog"}, and \code{"none"} are supported}
    \item{add.par}{additive parameter for \code{"shiftedlog"} or \code{"glog"} options for \code{trans.method}}
    \item{subtract.base}{logical; whether to subtract calculated baseline from spectrum}
    \item{lrg.only}{logical; whether to consider only peaks that have at least one \dQuote{large}peak; i.e., identified by \code{run.lrg.peaks}}
    \item{calc.all.peaks}{logical; whether to calculate all possible peaks or only sufficiently large ones}
    \item{masses}{specific masses to test}
    \item{isotope.dist}{maximum distance for declaring isotopes}
    \item{cluster.method}{NA}
    \item{cluster.constant}{NA}
    \item{num.pts}{number of consecutive points needed for peak fitting}
    \item{R2.thresh}{\eqn{R^2} value needed for peak fitting}
    \item{oneside.min}{minimum number of points on each side of local maximum for peak fitting}
    \item{peak.method}{method for locating peaks}
    \item{root.dir}{directory for parameters file and raw data}
    \item{base.dir}{directory for baseline files; default is \code{paste(root.dir, "/Baselines", sep = "")}}
    \item{peak.dir}{directory for peak location files; default is \code{paste(root.dir, "/All_Peaks", sep = "")}}
    \item{lrg.dir}{directory for large peaks file; default is \code{paste(root.dir, "/Large_Peaks", sep = "")}}
    \item{lrg.file}{name of file to store large peaks in}
    \item{overwrite}{whether to replace exisiting files with new ones}
    \item{use.par.file}{logical; if \code{TRUE}, then parameters are read from \code{par.file} in directory \code{root.dir}}
    \item{par.file}{string containing name of parameters file}
}
\details{Reads in information from file created by \code{\link{run.strong.peaks}}, calculates the cluster matrix,
fills in missing values, and overwrites the file named \code{lrg.file} in \code{lrg.dir}.  
The resulting file contains variables
\tabular{ll}{ \tab \cr
    \code{amps}\tab data frame of amplitudes created by \code{\link{run.strong.peaks}}\cr
    \code{centers}\tab data frame of centers created by \code{\link{run.strong.peaks}}\cr
    \code{clust.mat}\tab data frame with columns given by samples and rows given by the distinct peaks in the samples\cr
    \code{num.sig}\tab vector of the number of peaks in each row of \code{clust.mat} which were not missing\cr
    \code{lrg.peaks}\tab the data frame of significant peaks created by \code{\link{run.lrg.peaks}}\cr
}
    and is ready to be used by \code{\link{run.strong.peaks}}.
}
\value{
No value returned; the file is simply created.
}
\references{
Barkauskas, D.A. (2009) \dQuote{Statistical Analysis of Matrix-Assisted Laser Desorption/Ionization
Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Data with Applications to Cancer
Biomarker Detection}.  Ph.D. dissertation, University of California at Davis.

Barkauskas, D.A. \emph{et al}. (2009) \dQuote{Detecting glycan cancer biomarkers in serum
samples using MALDI FT-ICR mass spectrometry data}.  \emph{Bioinformatics}, \bold{25}:2, 251--257.
}
\author{Don Barkauskas (\email{barkda@wald.ucdavis.edu})}
\note{If \code{use.par.file = TRUE}, then the parameters read in from the file overwrite any arguments entered in the
function call.

\code{pre.align} is used if the spectra have not already been aligned by the mass spectroscopists.
If it is not \code{FALSE}, it can either be a vector of additive shifts to be applied to the 
spectra, or a list with components \code{targets} and \code{actual}.  In the last case, \code{targets}
is a vector of target masses, and \code{actual} is a matrix with \code{length(targets)}
columns and a row for each spectrum, \code{actual[i,j]} being the mass in spectrum \code{i} that should  
be matched exactly to \code{target[j]}, with \code{NA} being a valid entry in \code{actual}.  
The matching is done (depending on the number of non-missing values in row \code{i}) either with a
simple shift (one non-missing value), an affine transformation (two non-missing values), a 
piecewise affine transformation (three non-missing values), or an interpolation spline (four 
or more non-missing values).

Suppose \code{cluster.constant = K} and we have two peaks in different spectra with 
masses \eqn{m_1}{m[1]} and \eqn{m_2}{m[2]}.  If \code{cluster.method = "constant"}, then the peaks
are considered to be the same peak if we have \eqn{m_{2}-m_{1} < K}{m[2]-m[1] < K}.  If
\code{cluster.method = "ppm"}, then the peaks are considered to be the same peak if we 
have \eqn{m_{2}-m_{1} < Km_{2}/10^{6}}{m_[2]-m_[1] < K * m[2] * 1e-6}.  If
\code{cluster.method = "usewidth"}, then the algorithm uses the observation that
\code{log(Width_hat)} and \code{log(Center_hat)} appear to be linearly related.  Tolerances are
then computed using this relationship.
}
\seealso{\code{\link{run.lrg.peaks}}, \code{\link{run.strong.peaks}}, \code{\link[splines]{interpSpline}}}
\examples{}