Partial Gene Expression Deconvolution with csSAM

Description

Estimates cell/tissue proportions given a known set of cell/tissue-specific expression signatures, using standard least-squares, as implemented by the package csSAM.

The S3 method csTopTable for csSAM fits returns, for each feature, the false discovery rates of differential expression between groups of samples within each cell type, as computed by fdrCsSAM when running csSAM. These are returned as a list, whith one element per cell type.

The S3 method csplot for csSAM fits plots cell-specific fdr cumulative distributions.

Usage

gedAlgorithm.csSAM(Y, x, data = NULL, nperms = 200, alternative = c("all", "two.sided", 
  "greater", "less"), standardize = TRUE, medianCenter = TRUE, logRm = FALSE, logBase = 2, 
      nonNeg = TRUE, verbose = lverbose())

S3 (csSAM)
`csTopTable`(x, alternative = c("two.sided", "greater", "less"), ...)

S3 (csSAM)
`csplot`(x, types = NULL, alternative = "all", xlab = "# called", ylab = "FDR", 
      ylim = c(0, 1), ...)

Arguments

types
index or names of the type to plot. They need to be found in the fit data.
Y
target global gene expression matrix (n x p), with samples in columns, ordered in the same order at the cell proportions data in x.
x
known cell proportions as a matrix (k x p) or an NMF-class model containing the cell proportions in the coefficient matrix -- and a normally empty basis matrix. The proportions must be ordered in the same order as the samples in the target matrix. For csTopTable, a csSAM fit as return by ged.
data
specification of the sample groups. If not missing, it must be a factor or coercible to a factor, with length the number of samples, i.e. columns, in the target matrix.
nperms
The number of permutations to perform. It is only used when computing cell-specific differential expression between groups specified in argument data.
alternative
two.sided less greater
standardize
Standardize sample or not. Default is TRUE.
medianCenter
Median center rhat distributions. Default is TRUE.
logRm
Exponentiate data for deconvolution stage. Default is FALSE
logBase
Base of logaritm used to determine exponentiation factor. Default is 2
nonNeg
For single channel arrays. Set any cell-specific expression estimated as negative, to a ceiling of 0. It is conservative in its study of differential expression. Default is FALSE.
verbose
logical that indicates if verbose messages should be shown.
...
extra parameters passed to subsequent calls.
xlab
a label for the x axis, defaults to a description of x.
ylab
a label for the y axis, defaults to a description of y.
ylim
the y limits of the plot.

Details

All regressions are fitted using the function lsfit.

References

Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM, Hastie T, Sarwal MM, Davis MM and Butte AJ (2010). "Cell type-specific gene expression differences in complex tissues." _Nature methods_, *7*(4), pp. 287-9. ISSN 1548-7105, , .

Examples


# random global expression
x <- rmix(3, 100, 20)
basisnames(x) <- paste('Cell', 1:nbasis(x))
# extract true proportions
p <- coef(x)

# deconvolve using csSAM
res <- ged(x, p, 'csSAM')
head(basis(res))
##         Cell 1 Cell 2 Cell 3
## gene_1  8.3848  4.204 2.2006
## gene_2 11.5110  5.897 0.5704
## gene_3  9.3111  4.875 4.6795
## gene_4  6.9017  1.029 3.2170
## gene_5  5.8233  2.623 1.3222
## gene_6  0.8123  2.212 1.1104
# proportions are not updated
identical(coef(res), p)
## [1] TRUE
## Don't show: 
    stopifnot(identical(coef(res), p))
    stopifnot( nmf.equal(res, ged(x, p, 'csSAM')) )
## End Don't show

# estimate cell-specific differential expression between 2 groups
gr <- gl(2, 10)
res <- ged(x, p, 'csSAM', data = gr, nperms=20, verbose=TRUE)
##   Using ged algorithm: "csSAM"
##    Groups: 1=10L | 2=10L
##    Fitting cell-specific linear model ...    OK
##    Computing csSAM model statistics ...    OK
##    Computing fdr using 20 permutations ... 
##     Alternative 'two.sided' ...     OK
##     Alternative 'greater' ...     OK
##     Alternative 'less' ...     OK
##    OK
##   Timing:
##    user  system elapsed 
##   0.816   0.008   0.827 
##   GED final wrap up ...   OK
head(basis(res))
##           Cell 1   Cell 2  Cell 3
## gene_1  0.378861 -0.49548  0.1313
## gene_2 -0.365197  1.37997 -0.9378
## gene_3 -0.004709  0.03992  0.1743
## gene_4  0.682757 -1.43884  1.1413
## gene_5 -1.212247  0.28653  0.4643
## gene_6 -0.088401  1.15964 -0.5704
# plot FDRs
csplot(res)

plot of chunk unnamed-chunk-1

# extract fdr for top differentially expressed gene in each cell type
t <- csTopTable(res)
str(t)
## List of 3
##  $ Cell 1: Named num [1:100] 0.759 0.759 0.759 0.759 0.759 ...
##   ..- attr(*, "names")= chr [1:100] "gene_5" "gene_7" "gene_9" "gene_15" ...
##  $ Cell 2: Named num [1:100] 0.05 0.425 0.48 0.48 0.48 ...
##   ..- attr(*, "names")= chr [1:100] "gene_12" "gene_74" "gene_61" "gene_84" ...
##  $ Cell 3: Named num [1:100] 0.85 0.85 0.85 0.85 0.962 ...
##   ..- attr(*, "names")= chr [1:100] "gene_12" "gene_22" "gene_48" "gene_78" ...

See also

fdrCsSAM, csTopTable