The CellMix package implements methods for converting markers identifiers cross-platform and cross-organism, to facilitate deconvolution analysis to be carried out using multiple independent sources of data, e.g., use a marker gene list obtained from data on one platform to deconvolve gene expression data generated on another platform.
convertIDs(object, to, from, ...) S4 (list,GeneIdentifierType,GeneIdentifierType) `mapIdentifiers`(what, to, from, ..., verbose = FALSE) S4 (MarkerList,GeneIdentifierType,GeneIdentifierType) `convertIDs`(object, to, from, verbose = FALSE, nodups = NULL, unlist = TRUE, ...) S4 (character,GeneIdentifierType,GeneIdentifierType) `convertIDs`(object, to, from, method = "auto", unlist = TRUE, ...) S4 (matrix,GeneIdentifierType,GeneIdentifierType) `convertIDs`(object, to, from, ..., unlist = TRUE, rm.duplicates = NULL) S4 (ExpressionSet,GeneIdentifierType,GeneIdentifierType) `convertIDs`(object, to, from, ..., unlist = TRUE, rm.duplicates = NULL)
object
. This is only neeeded when the source type
cannot be inferred from object
itself.convertIDs,character,GeneIdentifierType,GeneIdentifierType
.
See each method's description for more details.TRUE
) or not (FALSE
). If
NULL
, then duplicates are removed only if there
were no duplicates in the source object.mapIDs
, that indicates how to carry the
mapping between the original and final identifier type.unlist2
. In this case, the
vector's name then correspond to the source identifiers.rm.duplicates=FALSE
does not allow any
duplicated match and throws an error if any is present.
If TRUE
or NULL
duplicates only the first
match is kept, but a warning is thrown only when
NULL
.The function convertIDs
provides the main
interface to convert genes/probeset ids into IDs
compatible with another given data. It is typically
useful to convert built-in marker gene lists (see
link{cellMarkers}
).
The identifier conversion functions and methods defined
in the CellMix package can be seen as extending the
existing framework defined in the GSEABase package,
with the generic mapIdentifiers
.
signature(object = "list", to =
"GeneIdentifierType", from = "GeneIdentifierType")
:
Apply the conversion to each element of the list.
signature(object = "MarkerList",
to = "GeneIdentifierType", from = "GeneIdentifierType")
:
Convert IDs from a MarkerList object.
In this case, argument unlist
indicates if the
result should be a simple list containing the mapping (a
list) for each cell type or a
MarkerList-class
object (default).
signature(object = "character",
to = "GeneIdentifierType", from = "GeneIdentifierType")
:
This is the workhorse method that is eventually called by
all other convertIDs
methods. The actual
conversions are perforemd by mapIDs
, to
which are passed all arguments in ...
, in
particular, arguments verbose
and method
.
signature(object = "matrix", to =
"GeneIdentifierType", from = "GeneIdentifierType")
:
Convert the row names of a matrix into other identifiers.
In this case, argument unlist
indicates if the
converted ids should be used to subset the original
matrix
object, or returned directly returned as a
list.
signature(object =
"ExpressionSet", to = "GeneIdentifierType", from =
"GeneIdentifierType")
: Convert the feature names of an
ExpressionSet into other identifiers.
In this case, argument unlist
indicates if the
converted ids should be used to subset the original
ExpressionSet
object, or returned directly
returned as a list.
signature(object = "ANY", to =
"ANY", from = "NullIdentifier")
: Convert identifiers,
inferring the type of origin from the object itself, but
keep the annotation specification embedded in
from
.
signature(object = "ANY", to =
"ANY", from = "ANY")
: Convert identifiers, inferring the
type from the specifications in to
and
from
, eg., to='ENTREZID'
, or
'UNIGENE'
. If not specified in either to
or
from
, the annotation is taken from object
.
If from
is missing, the source type is infered
from object
itself.
signature(object = "ANY", to =
"list", from = "missing")
: Convert identifiers using a
given map or list of maps.
signature(what = "list", to =
"GeneIdentifierType", from = "GeneIdentifierType")
:
Applies mapIdentifier
to each element in a list.
All arguments in ...
are passed to the subsequent
calls to mapIdentifiers
.
# load a marker list from the registry
m <- MarkerList('IRIS')
summary(m)
## Length Class Mode
## B 121 -none- numeric
## T 94 -none- numeric
## NK 24 -none- numeric
## Dendritic 86 -none- numeric
## Monocyte 103 -none- numeric
## Neutrophil 54 -none- numeric
## Lymphoid 302 -none- numeric
## Myeloid 449 -none- numeric
## Multiple 1037 -none- numeric
head(m[[1]])
## 205267_at 211048_s_at 206398_s_at 217823_s_at 217825_s_at 217826_s_at
## 6.884 5.298 4.678 4.481 4.374 4.206
# convert Entrez gene ids to Affy probeset ids chip hgu133b
m2 <- convertIDs(m, 'hgu133b.db', verbose=2)
## # Converting 2270 markers from Annotation (hgu133a.db, hgu133b.db) to Annotation (hgu133b.db) ... OK [1402/2270 (1:1)]
## # Processing 2270 markers from Annotation (hgu133a.db, hgu133b.db) to Annotation (hgu133b.db) ...
## ** Processing ids for 'B' ... OK [69/121 (1:1)]
## ** Processing ids for 'T' ... OK [44/94 (1:1)]
## ** Processing ids for 'NK' ... OK [8/24 (1:1)]
## ** Processing ids for 'Dendritic' ... OK [43/86 (1:1)]
## ** Processing ids for 'Monocyte' ... OK [43/103 (1:1)]
## ** Processing ids for 'Neutrophil' ... OK [21/54 (1:1)]
## ** Processing ids for 'Lymphoid' ... OK [166/302 (1:1)]
## ** Processing ids for 'Myeloid' ... OK [238/449 (1:1)]
## ** Processing ids for 'Multiple' ... OK [636/1037 (1:1)]
## # Checking for duplicated marker(s) across cell-types ... OK [dropped 83/1268]
## OK [1185/2270 (1:1)]
summary(m2)
## Length Class Mode
## B 66 -none- numeric
## T 40 -none- numeric
## NK 6 -none- numeric
## Dendritic 41 -none- numeric
## Monocyte 40 -none- numeric
## Neutrophil 17 -none- numeric
## Lymphoid 150 -none- numeric
## Myeloid 219 -none- numeric
## Multiple 606 -none- numeric
#----------------------------------------------
# 1. Conversion from biological IDs
#----------------------------------------------
# For this kind of IDs, a source annotation package can often be inferred
# from the ID type, using regular expression patterns (e.g. "^ENS[0-9]+$"
# identifies Ensembl gene IDs)
ids <- c("Hs.1", "Hs.2", "Hs.3")
# get Entrez gene IDs (based on annotation from the org.Hs.gene.eg package)
convertIDs(ids, 'ENTREZID', 'org.Hs.eg.db', verbose=TRUE)
## # Converting from Unigene (org.Hs.eg.db) to EntrezId (org.Hs.eg.db) ... OK [1/3 mapped (1:1)]
## Hs.1 Hs.2 Hs.3
## NA "10" NA
## attr(,"from")
## geneIdType: Unigene (org.Hs.eg.db)
## attr(,"to")
## geneIdType: EntrezId (org.Hs.eg.db)
# map to other IDs
convertIDs(ids, 'REFSEQ')
## Hs.1 Hs.2 Hs.3
## NA "NM_000015" NA
## attr(,"from")
## geneIdType: Unigene (org.Hs.eg.db)
## attr(,"to")
## geneIdType: Refseq (org.Hs.eg.db)
convertIDs(ids, 'ENSEMBL')
## Hs.1 Hs.2 Hs.3
## NA "ENSG00000156006" NA
## attr(,"from")
## geneIdType: Unigene (org.Hs.eg.db)
## attr(,"to")
## geneIdType: ENSEMBL (org.Hs.eg.db)
# convert across ogranism
convertIDs(ids, 'rat2302.db')
## Warning: An error occured when converting ids cross-species from Homo
## sapiens to Rattus norvegicus: Error in names(destIDs) = dnames : attempt
## to set an attribute on NULL
## Hs.1 Hs.2 Hs.3
## NA NA NA
## attr(,"from")
## geneIdType: Unigene (org.Hs.eg.db)
## attr(,"to")
## geneIdType: Annotation (rat2302.db)
# get Affy probeset IDs for chip hgu133a
affy <- convertIDs(ids, 'hgu133a.db')
# assume we have a vector of IDs, e.g. Entrez gene ids
id <- c("673", "725", "10115")
# get associated probesets on chip hgu133a
convertIDs(id, 'hgu133a.db')
## 673 725 10115
## "206044_s_at" "208209_s_at" NA
## attr(,"from")
## geneIdType: EntrezId (hgu133a.db)
## attr(,"to")
## geneIdType: Annotation (hgu133a.db)
# get all associated probesets on chip hgu133a
convertIDs(id, 'hgu133a.db', method='all')
## 673 725 10115
## "206044_s_at" "208209_s_at" NA
## attr(,"from")
## geneIdType: EntrezId (hgu133a.db)
## attr(,"to")
## geneIdType: Annotation (hgu133a.db)
# same as a vector with duplicated names
convertIDs(id, 'hgu133a.db', method='all', unlist=FALSE)
## $`673`
## 673
## "206044_s_at"
##
## $`725`
## 725
## "208209_s_at"
##
## $`10115`
## [1] NA
##
## attr(,"from")
## geneIdType: EntrezId (hgu133a.db)
## attr(,"to")
## geneIdType: Annotation (hgu133a.db)
# specification using ProbeAnnDbBimap objects
library(hgu133b.db)
convertIDs(id, 'hgu133a.db', hgu133bENTREZID, verbose=2)
## # Converting from EntrezId (hgu133b.db) to Annotation (hgu133a.db) ...
## # Limiting query to EntrezId (hgu133b.db) ... [3 -> 1 id(s)]
## # Loading map(s) from EntrezId (hgu133b.db) to Annotation (hgu133a.db) [x-platform /x-id] ... OK [1 step(s)]
## # Mapping from EntrezId (hgu133a.db) to Annotation (hgu133a.db) [43827 entries] ... [1/1 mapped (1:1)]
## # Applying filtering strategy 'auto' ... (kept 1 2nd-affy probes) [1/1 passed (1:1)]
## OK [1/3 mapped (1:1)]
## 673 725 10115
## "206044_s_at" NA NA
## attr(,"from")
## geneIdType: EntrezId (hgu133b.db)
## attr(,"to")
## geneIdType: Annotation (hgu133a.db)
#----------------------------------------------
# 2. Conversion from probeset IDs
#----------------------------------------------
# For this kind of IDs, a source annotation package is required, because it
# cannot be easily inferred from the ID type.
# get Affy probeset IDs for chip hgu133b from ids for hgu133b
convertIDs(affy, 'hgu133a.db', 'hgu133b.db')
## <NA> 206797_at <NA>
## NA NA NA
## attr(,"from")
## geneIdType: Annotation (hgu133b.db)
## attr(,"to")
## geneIdType: Annotation (hgu133a.db)
# across organism
convertIDs(affy, 'hgu133a.db', 'rat2302.db')
## <NA> 206797_at <NA>
## NA NA NA
## attr(,"from")
## geneIdType: Annotation (rat2302.db)
## attr(,"to")
## geneIdType: Annotation (hgu133a.db)