Estimating Cell Proportions from Known Signatures

Description

gedProportions implements a pre-processing pipeline for applying deconvolution methodologies that use a known set of cell type-specific signatures in order to estimate cell proportions in heterogeneous samples (e.g., Abbas et al. (2009) or Gong et al. (2011)).

Usage

gedProportions(object, x, method = "lsfit", CLsubset = NULL, subset = NULL, map.method = "auto", 
  ..., log = NULL, lbase = 2, normalize = c("none", "quantiles"), verbose = FALSE)

Arguments

object
target data, specified in any format supported by ged.
x
basis signature data. It can be an ExpressionSet-class object or a matrix, whose columns contains cell-specific expression for each feature in the target data. If the gene identifier type from the basis matrix do not match the one from the target matrix, these are converted using convertIDs. If needed, this automatic conversion can be disabled using map.method=NA, as it is by default when x is a matrix, whose rows are assumed to match the rows in the target matrix.
method
method to use to deconvolve the target data and estimate cell proportions. The method must be a deconvolution algorithm that is able to run using signatures as only auxiliary input. The default method is ‘lsfit’, which implements the algorithm proposed by Abbas2009 that is based on standard regression. An alternative method is the quadratic programming approach from Gong2011, which solves a nonnegative least-square problem with sum-up-to one constraints on the proportions.
CLsubset
indicates the cell type(s) for which proportions are to be computed, as a vector of indexes or names that is used to subset the columns of x.
subset
optional subset of features to use in the estimation.
map.method
method used to convert the basis signature's identifiers to match the target data's own type of identifiers. See mapIDs. Identifier conversion can be disabled using map.method=NA.
...
extra arguments passed to ged
log
logical that indicates if the computation should take place in log or linear scale. If TRUE, all non-log-scaled data (signatures and/or target) are log-transformed using with base lbase. If FALSE, all log-scaled data (signatures and/or target) are exp-transformed using with base lbase. If a number, then the function acts as if log=TRUE using the value of log as lbase. If NULL, then log-transform is applied only if either the signatures or the target data is in log scale, otherwise non-log-scaled data is exp-transformed into linear values, via expb(A, lbase). If log=NA no transformation is performed at all.
lbase
numeric base use for the logarithmic/exponential transformations that are applied to the signature or data matrix.
normalize
character string that specifies the normalisation method to apply jointly to the combined data (signatures,data). The normalisation is performed after transforming the data and/or signatures if necessary.
verbose
logical that toggles verbosity. A number (integer) can be passed to specify the verbosity level (the higher the more messages are output). Passing verbose=Inf toggles debug mode (all messages). Note that because it appears after ... it must be fully named.

Details

The actual estimation is performed via the ged interface, using a suitable deconvolution method.

Before calling ged, the following pre-processing pipeline is applied to the data and/or the signature matrix:

  1. map the gene identifiers of the signature matrix into identifiers in the target global expression matrix;
  2. subset signatures and data matrices to a common set of features;
  3. transform signatures and data matrices to a common scale: linear or log; Log-scale is automatically detected using the same heuristic as GEO2R.
  4. normalise jointly the signatures and data matrices using quantile normalisation.

All steps are optional and can be disabled if needed (see argument details).

References

Abbas AR, Wolslegel K, Seshasayee D, Modrusan Z and Clark HF (2009). "Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus." _PloS one_, *4*(7), pp. e6098. ISSN 1932-6203, , .

Gong T, Hartmann N, Kohane IS, Brinkmann V, Staedtler F, Letzkus M, Bongiovanni S and Szustakowski JD (2011). "Optimal deconvolution of transcriptional profiling data using quadratic programming with application to complex clinical blood samples." _PloS one_, *6*(11), pp. e27156. ISSN 1932-6203, , .

See also

ged, gedBlood