Automatic Selection of Gene Expression Deconvolution Algorithms

Description

Implements a simple automatic selection strategy that chooses a suitable deconvolution method given the input data.

Usage

selectGEDMethod(object, x = NULL, data = NULL, maxIter = 1L, ..., call. = FALSE, 
  quiet. = FALSE)

Arguments

call.
logical that indicates if one should return the suitable call to ged (TRUE), or just the name of the selected method.
quiet.
logical that indicates if an error should be thrown if no algorithm able to fit the input data is found, or simply return NULL. If explicitly set to FALSE, then a note is displayed, showing the selected algorithm and a quick justification for the choice.
object
global gene expression matrix-like data object (e.g., matrix or ExpressionSet)
x
input data used by the algorithm to deconvolve global gene expression.
data
optional data, typically a marker list, specified in a format that is supported by the factory function MarkerList.
maxIter
maximum number of iterations to perform. If method is missing, the value of this argument can influence which method is selected. See section Details.
...
extra arguments to allow extension, most of which are passed down to the deconvolution algorithm itself.

Details

The selection aims at finding an algorithm that is able to perform deconvolution based on the provided input data. The strategy is to choose amongst the possible algorithms available from the CellMix built-in registry, according to their respective data requirements.

Essentially the choice of algorithms made based on the dimensions of the target expression data object and the dimensions or type of the input data in x and data.

Currently, the pipeline does not attempt is made to choose the "best" algorithm, which would be the one that would return the most accurate results (proportions or cell-specific signatures/differences) for the given data setting.

The selected algorithm is indeed chosen as to be applicable to the input data. When possible, however, a state of the art algorithm or the most currently used algorithm is selected.

Examples


# ged methods requirements
selectGEDMethod()
##             Basis  Coef Marker
## lsfit        TRUE FALSE  FALSE
## cs-lsfit    FALSE  TRUE  FALSE
## qprog        TRUE FALSE  FALSE
## cs-qprog    FALSE  TRUE  FALSE
## DSA         FALSE FALSE   TRUE
## csSAM       FALSE  TRUE  FALSE
## DSection    FALSE  TRUE  FALSE
## ssKL        FALSE FALSE   TRUE
## ssFrobenius FALSE FALSE   TRUE
## meanProfile FALSE FALSE   TRUE
## deconf      FALSE FALSE  FALSE
# generate mixed expression data
x <- rmix(3, 100, 20)
dim(x)
##   Features    Samples Components 
##        100         20          3
sig <- basis(x)
prop <- coef(x)
ml <- getMarkers(x)

# one need at least the number of cell types
try( selectGEDMethod(x) )
selectGEDMethod(x, 3)
## [1] "deconf"
# from signature basis matrix
selectGEDMethod(x, sig)
## [1] "lsfit"
selectGEDMethod(x, sig, quiet.=FALSE)
## [1] "lsfit"
# from cell proportion matrix
selectGEDMethod(x, prop)
## [1] "csSAM"
# from cell proportion matrix with multiple iterations
selectGEDMethod(x, prop, maxIter=10, quiet.=FALSE)
## [1] "DSection"
# from cell proportion matrix with markers
selectGEDMethod(x, prop, data=ml, quiet.=FALSE)
## [1] "cs-qprog"
# from marker genes
selectGEDMethod(x, ml)
## [1] "DSA"
# from marker genes with multiple iterations
selectGEDMethod(x, ml, maxIter=10, quiet.=FALSE)
## [1] "ssKL"