Complete Gene Expression Deconvolution by Semi-Supervised NMF

Description

Algorithms ‘ssKL’ and ‘ssFrobenius’ are modified versions of the original NMF algorithm from Brunet et al. (2004) and Lee et al. (2001), that use a set of known marker genes for each cell type, to enforce the expected block expression pattern on the estimated signatures, as proposed in Gaujoux et al. (2011).

Usage

gedAlgorithm.ssKL(..., maxIter = 3000L, seed = "rprop", eps = 2.22044604925031e-16, 
  .stop = NULL, data = NULL, markers = c("prior+semi", "semi"), log = NULL, ratio = NULL, 
      copy = FALSE, sscale = FALSE, alpha = 0, stationary.th = .Machine$double.eps, 
      check.interval = 5 * check.niter, check.niter = 10L)

gedAlgorithm.ssFrobenius(..., maxIter = 3000L, seed = "rprop", .stop = NULL, data = NULL, 
      markers = c("prior+semi", "semi"), log = NULL, ratio = NULL, eps = NULL, sscale = TRUE, 
      copy = FALSE, alpha = 0, stationary.th = .Machine$double.eps, check.interval = 5 * 
          check.niter, check.niter = 10L)

Arguments

seed
default seeding method.
sscale
specifies how signatures -- and proportions -- are re-scaled at the end of each iteration. If TRUE, each signature is mean-centered separately. If 2, then each signature is mean-centered separately and the inverse linear transformation proportions is applied to the proportions (i.e. on the rows of the mixture coefficient matrix), so that the fitted matrix does not change. If FALSE, no re-scaling is performed at all.
alpha
numeric coefficient used to smoothly enforce a sum-up-to-one constraint on the proportions, by regularising the objective function. If NULL, no constraint is applied.
...
extra arguments passed to the function objective, which computes the objective value between x and y.
stationary.th
maximum absolute value of the gradient, for the objective function to be considered stationary.
check.interval
interval (in number of iterations) on which the stopping criterion is computed.
check.niter
number of successive iteration used to compute the stationnary criterion.
maxIter
maximum number of iterations to perform.
eps
small numeric value used to ensure numeric stability, by shifting up entries from zero to this fixed value.
.stop
specification of a stopping criterion, that is used instead of the one associated to the NMF algorithm. It may be specified as:
  • the access key of a registered stopping criterion;
  • a single integer that specifies the exact number of iterations to perform, which will be honoured unless a lower value is explicitly passed in argument maxIter.
  • a single numeric value that specifies the stationnarity threshold for the objective function, used in with nmf.stop.stationary;
  • a function with signature (object="NMFStrategy", i="integer", y="matrix", x="NMF", ...), where object is the NMFStrategy object that describes the algorithm being run, i is the current iteration, y is the target matrix and x is the current value of the NMF model.
copy
logical that indicates if the update should be made on the original matrix directly (FALSE) or on a copy (TRUE - default). With copy=FALSE the memory footprint is very small, and some speed-up may be achieved in the case of big matrices. However, greater care should be taken due the side effect. We recommend that only experienced users use copy=TRUE.
data
marker list
markers
indicates what the markers are used for:
  1. sQuoteprioruses DSA proportion estimation method from Zhong2013 to compute sensible initial proportions from average marker expression profiles in the mixed sample data.
  2. sQuotesemienforces marker block patterns after each iteration.
  3. sQuoteposta posteriori assigns estimated signatures;
log
indicates if the data are in log-scale or should be converted to linear-scale. This is relevant because the DSA algorithm assumes that the input mixed data are in linear scale (i.e. not log-trasnformed). If NULL, then data's scale is detected by link{is_logscale} and conversion to linear-scale is performed if necessary. If TRUE the data is exponentialised (using log base 2). If FALSE the data is left unchanged (the detected log scale is displayed in verbose mode). If a number, then it is used as the log base to exponentialise the data.
ratio
expression ratio of markers between its cell type and other cell types.

Details

These algorithms simultaneously estimates both the cell-specific signature and mixture proportion matrices, using block-descent method that alternately estimates each matrix. Both re-scale the final proportion estimates so that they sum-up to one.

The functions gedAlgorithm.ssKL and gedAlgorithm.ssFrobenius are wrapper functions to the underlying NMF algorithms. They are primiraly defined to enable correct listing their specific arguments on this help page. The recommend way of applying these algorithms is via ged interface (e.g., ged(..., method='ssKL')).

References

Brunet J, Tamayo P, Golub TR and Mesirov JP (2004). "Metagenes and molecular pattern discovery using matrix factorization." _Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), pp. 4164-9. ISSN 0027-8424, , .

Lee DD and Seung H (2001). "Algorithms for non-negative matrix factorization." _Advances in neural information processing systems_. .

Gaujoux R and Seoighe C (2011). "Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: A case study." _Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases_. ISSN 1567-7257, , .