A Comprehensive Toolbox for Gene Expression Deconvolution

Renaud Gaujoux and Cathal Seoighe. “CellMix: A Comprehensive Toolbox for Gene Expression Deconvolution.” Bioinformatics 2013. doi: 10.1093/bioinformatics/btt351.  [Free-access PDF]

Overview

This package contains methods and utilities for performing gene expression deconvolution, i.e. estimating cell type proportions and/or cell-specific gene expression signatures from global expression data in heterogeneous samples. Its main objectives are to provide:

  • a unified base framework for applying and developing deconvolution analysis;
  • a user-friendly and intuitive interface;
  • a resource of relevant auxiliary data such as benchmark datasets or cell marker gene lists.

See Algorithms for a list of all available deconvolution methods.

Installation

# install biocLite if not already there
if( !require(BiocInstaller) ){
    # enable Bioconductor repositories
    # -> add Bioc-software
    setRepositories() 

    install.packages('BiocInstaller')
    library(BiocInstaller)
}
# or alternatively do: 
# source('http://www.bioconductor.org/biocLite.R')

# install (NB: this might ask you to update some of your packages)
biocLite('CellMix', siteRepos = 'http://web.cbio.uct.ac.za/~renaud/CRAN', type='both')

Help topics

  • Abbas
    Basis Matrix for Whole Blood Expression Deconvolution
  • addNames
    Generating Names
  • applyBy(applyBy.ExpressionSet, applyBy.matrix, colApplyBy, colMaxsBy, colMeansBy, colMediansBy, colMinsBy, colSumsBy, rowApplyBy, rowMaxsBy, rowMeansBy, rowMediansBy, rowMinsBy, rowSumsBy)
    Group Apply
  • .atrack,MarkerList-method
    Heatmap Annotation Track for MarkerList Objects
  • barplot.MarkerList
    Plotting MarkerList Objects
  • basisfit(coeffit)
    Accessing Fit Data in Deconvolution Results
  • beeswarm.MarkerList(hist.MarkerList, profplot.MarkerList, screeplot.MarkerList, stripchart.MarkerList)
    Plotting Markers
  • biocann_map
    Retrieving ID Conversion Maps
  • biocann_object
    Retrieving Bioconductor Annotation Maps
  • boxplotBy(boxplotBy.default, boxplotBy.NMF)
    Split Boxplot by Group
  • cbind.ExpressionSet
    Combining Expression Matrices
  • cellMarkers(cellMarkersInfo)
    Loading Marker Lists from Registry
  • cellmix.options(cellmix.getOption, cellmix.printOptions, cellmix.resetOptions)
    Package Specific Options for CellMix
  • CellMix-package(CellMix)
    Gene Expression Deconvolution
  • checkConstraints
    Checking Linear Constraints
  • combine,MarkerList,factor-method
    Combine markers from multiple cell types of a MarkerList object, based on groups
  • Compare,MarkerList,numeric-method
    Operations on MarkerList Objects
  • convertIDs(convertIDs,ANY,ANY,ANY-method, convertIDs,ANY,ANY,NullIdentifier-method, convertIDs,ANY,list,missing-method, convertIDs,character,GeneIdentifierType,GeneIdentifierType-method, convertIDs,ExpressionSet,GeneIdentifierType,GeneIdentifierType-method, convertIDs,list,GeneIdentifierType,GeneIdentifierType-method, convertIDs,MarkerList,GeneIdentifierType,GeneIdentifierType-method, convertIDs,matrix,GeneIdentifierType,GeneIdentifierType-method, convertIDs-methods, mapIdentifiers,list,GeneIdentifierType,GeneIdentifierType-method)
    Converting Gene or Probeset IDs
  • csplot(csplot.character, csplot.NMFfit, csplot.NMFfitX)
    Plots Cell-Specific FDR Estimates
  • .csSAM
    Cell-specific Differential Expression with csSAM
  • csTopTable(csTopTable.array, csTopTable.character, csTopTable.matrix, csTopTable.NMFfit, csTopTable.NMFfitX)
    Compute Cell-Specific Statistics
  • DataSource(isDataSource)
    Gene Expression Data Sources
  • dim,ExpressionMix-method(dimnames<-,ExpressionMix,ANY-method, dimnames<-,ExpressionMix-method, dimnames,ExpressionMix-method, featureNames<-,ExpressionMix-method, sampleNames<-,ExpressionMix,ANY-method, sampleNames<-,ExpressionMix-method)
    Dimensions in ExpressionMix Objects
  • .DollarNames,ExpressionMix-method
    Auto-completion for
  • dropvalues(drop,MarkerList-method, hasDuplicated, mltype, nmark, nmark,ANY-method, nmark,list-method, nmark,MarkerList-method, nmark-methods, reverse, reverse,MarkerList-method, reverse-methods, rmDuplicated)
    Utility Functions for MarkerList Objects
  • DSAproportions(.DSAproportions)
    Digital Sorting Algorithm: Proportion Estimation Method
  • DSection
    DSection Gene Expression Deconvolution Method
  • enforceMarkers(enforceMarkers,matrix,list-method, enforceMarkers,matrix,numeric-method, enforceMarkers-methods, enforceMarkers,NMF,ANY-method)
    Enforcing Marker Block Patterns
  • eset(eset,ExpressionMix-method, eset,GEDdata_entry-method, eset-methods)
    Extracting Expression Data
  • ExpressionMix(ExpressionMix,character-method, ExpressionMix,ExpressionSet-method, ExpressionMix,matrix-method, ExpressionMix-methods, show,ExpressionMix-method)
    Factory Method for ExpressionMix Objects
  • ExpressionMix-class
    Class for Gene Expression Deconvolution Benchmark Datasets
  • ExpressionMix-subset([,ExpressionMix,ANY,ANY,ANY-method, [,ExpressionMix,ANY,ANY-method, [,ExpressionSet,MarkerList,ANY,ANY-method, [,ExpressionSet,MarkerList,ANY-method, [,MatrixData,MarkerList,ANY,ANY-method, [,MatrixData,MarkerList,ANY-method, [,NMF,MarkerList,ANY,ANY-method, [,NMF,MarkerList,ANY-method)
    Subsetting ExpressionMix Objects
  • extractMarkers(extractMarkers,ANY-method, extractMarkers,ExpressionSet-method, extractMarkers-methods, markerScoreMethod, scoreMarkers)
    Extract Markers from Pure Samples.
  • featureNames(exprs, exprs,matrix-method, exprs-methods, featureNames<-, featureNames,MarkerList-method, featureNames,matrix-method, featureNames<--methods, featureNames-methods, sampleNames, sampleNames<-, sampleNames,matrix-method, sampleNames<--methods, sampleNames-methods)
    Extracting Feature Names
  • fetchData
    Fetching Data from Data Sources
  • flatten(flatten,list-method, flatten,MarkerList-method, flatten-methods)
    Flattening Marker Lists
  • ged(ged,ANY,ANY,character-method, ged,ANY,ANY,function-method, ged,ANY,ANY,missing-method, ged,ExpressionSet,ANY,GEDStrategy-method, ged,MatrixData,ANY,GEDStrategy-method, ged-methods)
    Main Interface for Gene Expression Deconvolution Methods
  • gedAlgorithm(gedAlgorithmInfo)
    Managing Gene Expression Deconvolution Algorithms
  • gedAlgorithm.cs_lsfit(cs-lsfit-ged)
    Cell-Specific Expression by Standard Least-Squares
  • gedAlgorithm.csSAM(csplot.csSAM, csSAM-ged, csTopTable.csSAM)
    Partial Gene Expression Deconvolution with csSAM
  • gedAlgorithm.deconf(deconf-ged)
    Complete Gene Expression Deconvolution: Method deconf
  • gedAlgorithm.DSA(DSA-ged)
    Complete Deconvolution using Digital Sorting Algorithm (DSA)
  • gedAlgorithm.DSection(csTopTable.DSection, DSection-ged)
    Partial Gene Expression Deconvolution with DSection
  • gedAlgorithm.lsfit(lsfit-ged)
    Partial Gene Expression Deconvolution by Standard Least-Squares
  • gedAlgorithm.meanProfile(meanProfile-ged)
    Partial Gene Expression Deconvolution: Marker Mean Expression Profile
  • gedAlgorithm.qprog(cs-qprog-ged, gedAlgorithm.cs_qprog, qprog-ged)
    Partial Gene Expression Deconvolution by Quadratic Programming
  • gedAlgorithm.ssKL(gedAlgorithm.ssFrobenius, ssFrobenius-ged, ssKL-ged)
    Complete Gene Expression Deconvolution by Semi-Supervised NMF
  • gedBlood(asCBC, asCBC,character-method, asCBC,MarkerList-method, asCBC,matrix-method, asCBC-methods, asCBC,NMF-method, gCBC, refCBC)
    Blood Sample Deconvolution
  • gedCheck
    Checking a Deconvolution Method
  • gedData(gedDataInfo)
    Loading Gene Expression Deconvolution Data
  • GEDdata-access(basis,GEDdata_entry-method, coef,GEDdata_entry-method, dims,GEDdata_entry-method, exprs,GEDdata_entry-method, nbasis,GEDdata_entry-method)
    Accessing Data from CellMix Dataset
  • GEDdownload
    Downloading Gene Expression Deconvolution Datasets
  • gedInput(anyRequired, gedIO, gedOutput, isRequired, onlyRequired)
    Checking Input and Output Data for GED Algorithms
  • gedProportions
    Estimating Cell Proportions from Known Signatures
  • GEDStrategy(GEDStrategy,character-method, GEDStrategy,function-method, GEDStrategy,GEDalgorithm_entry-method, GEDStrategy,GEDStrategy-method, GEDStrategy-methods, GEDStrategy,missing-method)
    Factory Methods for GEDStrategy Objects
  • GEDStrategy-class(show,GEDStrategy-method)
    Class for GED Algorithms
  • geneValues(annotation, annotation<-, annotation,GEDdata_entry-method, annotation<-,MarkerList,character-method, annotation,MarkerList-method, annotation<-,MarkerList,NULL-method, annotation<--methods, annotation-methods, connectivity, connectivity,MarkerList-method, connectivity-methods, details, details,MarkerList-method, details-methods, geneIds, geneIds<-, geneIds<-,MarkerList,list-method, geneIds,MarkerList-method, geneIds<--methods, geneIds-methods, geneIdType, geneIdType<-, geneIdType<-,MarkerList,character-method, geneIdType<-,MarkerList,GeneIdentifierType-method, geneIdType,MarkerList-method, geneIdType<-,MarkerList,NULL-method, geneIdType<--methods, geneIdType-methods, geneValues<-, geneValues,MarkerList-method, geneValues<--methods, geneValues-methods, hasValues, incidence, incidence,MarkerList-method, incidence-methods, marknames, marknames,list-method, marknames-methods, marknames,vector-method, nmf, nmf,MatrixData,MarkerList,ANY-method, nmf-methods, show,MarkerList-method)
    Accessing Data in Marker Lists
  • getGSE
    Downloading GSE Datasets from GEO
  • gmarkers(gMarkerList, rMarkerList, rmarkers)
    Generating Marker Lists
  • gpl2bioc(bioc2gpl)
    Mapping Bioconductor Annotation packages to GEO GPL Identifiers
  • GPL2bioc
    Map Between Bioconductor Annotation Packages and GEO GPL Identifiers
  • Grigoryev-markers
    Grigoryev - Cytometry Antigen Markers
  • GSE11058_pdata
    Cell Line Proportions for Dataset GSE11058
  • GSE20300_pdata
    Complete Blood Count for Dataset GSE20300
  • GSE3649_fdata
    Feature Annotation Data for Dataset GSE3649
  • GSE3649_pdata
    Phenotypic Annotation Data for Dataset GSE3649
  • HaemAtlas
    HaemAtlas Dataset - Immune Cells
  • hasAnnotation(getAnnotation, setAnnotation)
    Extracting Annotation from Objects
  • idFilter(idFilterAffy, idFilterAll, idFilterAuto, idFilterFirstN, idFilterInjective, idFilterMAuto, idFilterOneToMany, idFilterOneToOne)
    Gene Identifier Filtering Strategies
  • idtype(idtype,AnnDbBimap-method, idtype,ChipDb-method, idtype,ExpressionSet-method, idtype,GeneIdentifierType-method, idtype,list-method, idtype,MarkerList-method, idtype,matrix-method, idtype-methods, idtype,missing-method, idtype,NMF-method, idtype,NULL-method, idtype,ProbeAnnDbBimap-method, idtype,vector-method)
    Identifying Gene or Probe ID Type
  • initialize,ExpressionMix-method
    Initialize method for ExpressionMix object
  • intersect(intersect,MatrixData,character-method, intersect,MatrixData,logical-method, intersect,MatrixData,MarkerList-method, intersect,MatrixData,MatrixData-method)
    Enhanced Subsetting for Matrix-like Data
  • IRIS
    Immune Response In-Silico (IRIS) Data
  • is.annpkg(biocann_mapname, biocann_pkgname, biocann_pkgobject, is.anndb, revmap,character-method)
    Annotation Tools
  • is_logscale(log_transform)
    Detect Log-transformed Data
  • is.probeid(asGeneIdentifierType, is.idtype, is.probetype)
    Utility function for Biological Identifiers
  • kappa.MarkerList
    Condition Number of a Marker List
  • log,ExpressionSet-method(expb, expb,ExpressionSet-method, expb,matrix-method, expb-methods, exp,ExpressionSet-method, quantile.ExpressionSet, range,ExpressionSet-method)
    Numeric Computations on ExpressionSet objects
  • .mAbbas(Abbas-markers)
    Optimised Set of Marker Genes for Immune Cells
  • mapIDs
    Mapping Gene Identifiers
  • MarkerList(MarkerList,ANY-method, MarkerList,character-method, MarkerList,ExpressionSet-method, MarkerList,factor-method, MarkerList,integer-method, MarkerList,list-method, MarkerList,MarkerList-method, MarkerList,matrix-method, MarkerList-methods, MarkerList,missing-method, MarkerList,vector-method)
    Factory Method for Marker Lists
  • [,MarkerList,ANY,ANY-method([,MarkerList,ANY,ANY,ANY-method, [,MarkerList,list,ANY,ANY-method, [,MarkerList,list,ANY-method, [,MarkerList,missing,list,ANY-method, [,MarkerList,missing,list-method, subset,MarkerList-method)
    Subsetting Marker Lists
  • MarkerList-class(attachMarkers, getMarkers, has.markers, isMarkerList, summary,MarkerList-method)
    Class for Marker Gene Lists
  • markermap(basismarkermap, markermap,MarkerList,ExpressionSet-method, markermap,MarkerList,matrix-method, markermap,MarkerList,NMFfitX-method, markermap,MarkerList,NMF-method, markermap,MatrixData,ANY-method, markermap-methods)
    Heatmaps Highlighting Markers
  • markerScoreAbbas
    Marker Scoring Method: Abbas et al. (2009)
  • markerScoreHSD(selectMarkers.markerScore_HSD)
    Marker Scoring Method: Tukey Honest Significant Difference
  • markerScoreMaxcol
    Marker scoring method: Max Expression
  • markerScoreScorem(selectMarkers.markerScore_scorem)
    Marker scoring method: SCOREM
  • MarkerSetCollection(MarkerSetCollection-methods)
    Factory Methods for Marker Gene Set Collections
  • MarkerSetCollection-class
    Classes for Marker Gene Set Collections
  • markersGrigoryev
    Cytometry Antigen Expression from Grigoryev et al. (2010)
  • match_first
    Returns the index of the first match in a reference table
  • matchIndex(matchIndex,ANY,ANY-method, matchIndex,ANY,missing-method, matchIndex,list,character-method, matchIndex,list,ChipDb-method, matchIndex,list,ExpressionSet-method, matchIndex,list,matrix-method, matchIndex,list,NMF-method, matchIndex,list,ProbeAnnDbBimap-method, matchIndex-methods)
    Creating Mapping from Marker Lists
  • match.nmf(match.cell)
    Assigning Estimated Signature to Real Cell-Types
  • mergeList
    Merge two lists
  • .mHaemAtlas(HaemAtlas-markers)
    HaemAtlas Marker List for Immune Cells - Watkins et al. (2009)
  • .mIRIS(IRIS-markers)
    Marker Genes for Immune Cells (IRIS)
  • mixData(mixData<-, mixData,ExpressionMix-method, mixData<-,ExpressionMix,NMFstd-method, mixData<--methods, mixData-methods)
    Extracting Mixture Data
  • mlsei
    Multivariate Least Squares with Equalities and Inequalities
  • .mTDDB_HS(TDDB_HS, TissueDistributionDB_HS-markers)
    Human Tissue Specific Genes from the TissueDistributionDB Database
  • .mTDDB_RN(TDDB_RN, TissueDistributionDB_RN-markers)
    Rat Tissue Specific Genes from the TissueDistributionDB Database
  • .mTIGER(TIGER-markers)
    TiGER - Human Tissue Specific Genes
  • MySQLtoSQLite
    Load a MySQL Dump for Using with SQLite
  • npure(lpure, mixedSamples, pureSamples, wpure)
    Acessing Pure or Mixed Sample Data
  • nuIDdecode
    Convert nuID to Nucleotide Sequence
  • Palmer
    Palmer Dataset
  • propplot
    Cell Type Proportion Plot
  • regcheck(checkS4)
    Validity Functions of Registry Fields
  • reorder,MarkerList-method(sort.MarkerList)
    Reordering Marker Lists
  • rmix
    Generating Random Global Mixed Gene Expression Data
  • rproportions
    Generating Random Cell Type Proportions
  • rpure
    Generating Random Pure Cell Type Sample
  • sapply,MarkerList-method
    Aplying Functions Along MarkerList Objects
  • selectGEDMethod
    Automatic Selection of Gene Expression Deconvolution Algorithms
  • selectMarkers(selectMarkers.MarkerList, selectMarkers.markerScore)
    Select Markers Based on Scores
  • setGEDMethod(removeGEDMethod)
    Register CellMix Deconvolution Methods
  • setMarkerList(removeMarkerList)
    Register CellMix MarkerLists
  • showData(showData,ExpressionMix-method, showData-methods)
    Show Data Available in an Object
  • SLE
    Gene Expression from Systemic Lupus Erythematosus Patients
  • stack.MarkerList(unstack.stackedMarkerList)
    Converting Marker Lists into data.frames
  • subsetML
    Subsetting Data with MarkerList Objects
  • TIGER
    TiGER Database
  • TissueDistributionDB_HS
    Human Tissue Specific Genes from TissueDistributionDB
  • TissueDistributionDB_RN
    Rat Tissue Specific Genes from TissueDistributionDB
  • userData
    User Data Directory
  • VeryGene
    VeryGene Dataset
  • VeryGene-markers
    VeryGene - Marker List for Human Tissues

Vignettes

Demos

Dependencies

  • R version: R >= 3.0
  • Depends: methods, pkgmaker, NMF, csSAM, registry, stats, stringr, GSEABase, Biobase
  • Imports: graphics, BiocGenerics, annotate, matrixStats, genefilter, AnnotationDbi, RSQLite, DBI, preprocessCore, limSolve, quadprog, corpcor, xtable, gtools, csSAM, beeswarm, graph, BiocInstaller, bibtex, digest, ggplot2, plyr
  • Suggests: limSolve, RUnit, lumi, hgu133a.db, hgu133b.db, hgu133plus2.db, illuminaHumanv2.db, rat2302.db, hom.Hs.inp.db, GEOmetadb, xlsx, GEOquery, ArrayExpress, biomaRt, knitr

Authors

  • Renaud Gaujoux
  • Cathal Seoighe
+ See contributions
  • Renaud Gaujoux (renaud at mancala.cbio.uct.ac.za) [aut, cre]
  • Cathal Seoighe [ths]
  • S. Schneider [ctb] (SCOREM code)
  • D. Repsilber [ctb] (method: deconf)
  • Shai Shen-Orr [ctb] (method: csSAM + discussion/testing)
  • T. Gong [ctb] (method: quadratic programming)

Maintainer

  • Renaud Gaujoux (renaud at mancala.cbio.uct.ac.za)

Web page(s)