Blood Sample Deconvolution

Description

The functions described here are dedicated to gene expression deconvolution of blood samples (i.e. whole blood or PBMCs).

gedBlood uses the methodology defined by Abbas et al. (2009), which uses a fixed set of 17 cell type-specific signatures to estimate cell proportions in blood samples. Each signature corresponds to a white blood cell in resting or activated state (See section Details).

asCBC has methods defined for NMF models and Markerlist objects. See each method's description for more details.

refCBC is a numeric vector that contains average Complete Blood Count proportions (CBC) in healthy persons, based on empirical studies in healthy patients. It contains proportions for Basophils, Lymphocytes, Eosinophils, Neutrophils and Monocytes.

gCBC generates a matrix of average Complete Blood Count proportions (CBC) for a given number of samples. The default proportions are based on empirical studies in healthy patients (see refCBC), and each sample get assigned the same proportions.

Usage

gedBlood(object, method = "lsfit", CLsubset = c("WB", "PBMCs"), ..., normalize = TRUE, 
  verbose = FALSE)

asCBC(object, ...)

S4 (character)
`asCBC`(object, drop = FALSE, quiet = FALSE)

S4 (NMF)
`asCBC`(object, drop = TRUE, ...)

S4 (matrix)
`asCBC`(object, margin = 1, drop = TRUE, ...)

refCBC

gCBC(n = 1, sampleNames = NULL, counts = NULL)

Arguments

object
target data, specified in any format supported by ged. For asCBC, an object for with suitable asCBC method defined.
CLsubset
indicates the cell type(s) for which proportions are to be computed. Currently these can be any of cell types whose for which a signature is available in the Abbas basis signature matrix (see examples for how to list them). In addition, this argument accepts the following values for indicating composite cell subsets:
  1. "WB" for Whole blood, which includes all signatures (default).
  2. "PBMCs" for Peripheral Blood Mononuclear Cells, which exclude the Neutrophil signature.
...
extra arguments passed to gedProportions.
drop
logical that indicates if elements in object that cannot be mapped to a cell type should be removed from the returned mapping.
quiet
logical that indicates that the mapping should be performed quietly. If FALSE, then an error is thrown if none of the elements can be mapped, or, if in addition drop=FALSE, a warning is thrown if only some of the elements could be mapped.
margin
single numeric that indicates the margin to aggregate, according to the CBC cell type associated with its names (i.e. row names if margin=1L or column names if margin=2L).
n
number of samples in the generated CBC matrix
sampleNames
names of the samples, recycled or truncated if necessary, to match n.
counts
CBC data to use instead of the defaults. It must be a numeric vector.
method
method to use to deconvolve the target data and estimate cell proportions. The method must be a deconvolution algorithm that is able to run using signatures as only auxiliary input. The default method is ‘lsfit’, which implements the algorithm proposed by Abbas2009 that is based on standard regression. An alternative method is the quadratic programming approach from Gong2011, which solves a nonnegative least-square problem with sum-up-to one constraints on the proportions.
normalize
character string that specifies the normalisation method to apply jointly to the combined data (signatures,data). The normalisation is performed after transforming the data and/or signatures if necessary.
verbose
logical that toggles verbosity. A number (integer) can be passed to specify the verbosity level (the higher the more messages are output). Passing verbose=Inf toggles debug mode (all messages). Note that because it appears after ... it must be fully named.

Format

Named num [1:5] 0.005 0.295 0.03 0.57 0.1 - attr(*, "names")= chr [1:5] "Basophils" "Lymphocytes" "Eosinophils" "Neutrophils" ...

Details

The signatures used by gedBlood were designed by Abbas et al. (2009) to optimise their deconvolution power. They are available in the CellMix as dataset Abbas.

gedBlood is currently essentially a shortcut for gedProportions(object, Abbas, ...), see gedProportions for details on other possible arguments.

Currently asCBC methods will correctly work only on objects that have cell types that match exactly names of signatures in the Abbas dataset.

Methods

  1. asCBCsignature(object = "character"): This is the workhorse method that maps immune/blood cell type names to the CBC cell types: Monocytes, Basophils, Lymphocytes, Neutrophils and Eosinophils.

    It returns a factor, whose names are elements of object and the values are their corresponding CBC cell type. If drop=FALSE the result is of the same length as object, otherwise it only contains elements that could be mappped to a cell type.

  2. asCBCsignature(object = "NMF"): The result of gene expression deconvolution performed by ged are stored in NMFstd-class model objects, which contain the cell type-specific signatures and/or cell relative proportions.

    This method aggregates, i.e. sums up, the cell proportions and averages the signatures of cell types from each of the CBC groups that are available in the data.

  3. asCBCsignature(object = "matrix"): Aggregates along given margin: sum across rows or average across columns.

  4. asCBCsignature(object = "MarkerList"): This method combines markers of cell types that belong to the same CBC group.

References

Abbas AR, Wolslegel K, Seshasayee D, Modrusan Z and Clark HF (2009). "Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus." _PloS one_, *4*(7), pp. e6098. ISSN 1932-6203, , .

See also

gedProportions, ged