The S4 generic idtype
automatically determine the
type of gene/feature identifiers stored in objects, based
on a combination of regular expression patterns and test
functions.
idtype(object, ...) S4 (missing) `idtype`(object, def = FALSE) S4 (ProbeAnnDbBimap) `idtype`(object, limit = 500L, ...) S4 (ChipDb) `idtype`(object, limit = 500L, ...) S4 (AnnDbBimap) `idtype`(object, limit = 500L, ...) S4 (MarkerList) `idtype`(object, each = FALSE, ...) S4 (vector) `idtype`(object, each = FALSE, limit = NULL, no.match = "")
idtype,character-method
. See each
method's description for more details.object
is missing, which indicates that the result
should contain the definition of the matching
pattern/function of each type, or which type's deifnition
should be included in the result list.TRUE
) or only the type
of the vector as a whole (default).limit
elements are
used. Otherwise it must be a subsetting logical or
numeric vector.'.'
,
allowing the subsequent handling of such IDs as a group.
"ENTREZID"
or
"ENSEMBL"
) "hgu133plus2.db"
. nuIDdecode
to try converting the ids into
nucleotide sequences. Identification is positive if no
error is thrown during the conversion. a single character string (possibly empty) if
each=FALSE
(default) or a character vector of the
same "length" as object
otherwise.
It uses a heuristic based on a set of regular expressions and functions that uniquely match most common types of identifiers, such as Unigene, entrez gene, Affymetrix probe ids, Illumina probe ids, etc..
signature(object = "missing")
:
Method for when idtype
is called with its first
argument missing, in which case it returns all or a
subset of the known type names as a character vector, or
optionally as a list that contains their definition, i.e.
a regular expression or a matching function.
signature(object = "matrix")
:
Detects the type of identifiers used in the row names of
a matrix.
signature(object = "ExpressionSet")
:
Detects the type of identifiers used in the feature names
of an ExpressionSet
object.
signature(object = "NMF")
: Detects
the type of identifiers used in the rownames of the basis
matrix of an NMF
model.
signature(object =
"ProbeAnnDbBimap")
: Detects the type of the primary
identifiers of a probe annotation bimap object.
To speedup the identification, only the first 500 probes
are used by default, since the IDs are very likely to
have been curated and to be of the same type. This can be
changed using argument limit
.
signature(object = "ChipDb")
:
Detects the type of the identifiers of a chip annotation
object.
To speedup the identification, only the first 500 probes
are used by default, since the IDs are very likely to
have been curated and to be of the same type. This can be
changed using argument limit
.
signature(object = "AnnDbBimap")
:
Detects the type of the identifiers of an organism
annotation object.
To speedup the identification, only the first 500 probes
are used by default, since the IDs are very likely to
have been curated and to be of the same type. This can be
changed using argument limit
.
signature(object =
"GeneIdentifierType")
: Returns the type of identifier
defined by a GeneIdentifierType
object.
Note that this methods is a bit special in the sense that
it will return the string ANNOTATION for
annotation based identifiers, but will not tell which
platform it is relative to. This is different to what
idtype
would do if applied to the primary
identifiers of the corresponding annotation package.
signature(object = "list")
: Detects
the type of all elements in a list, but provides the
option of detecting the type of each element separately.
signature(object = "NULL")
: Dummy
method -- defined for convenience -- that returns
''
signature(object = "vector")
: This
is the workhorse method that determine the type of ids
contained in a character vector.
# all known types
idtype()
## [1] "UNIGENE" "ENSEMBL" "ENSEMBLTRANS" "ENSEMBLPROT"
## [5] "ENTREZID" "IMAGE" "GOID" "PFAM"
## [9] "REFSEQ" "ENZYME" "MAP" "GENEBANK"
## [13] "GENEBANK" "GENEBANK" "GENEBANK" "GENENAME"
## [17] ".Affymetrix" ".Illumina" ".Agilent" ".nuID"
# with their definitions
idtype(def=TRUE)
## $UNIGENE
## [1] "^[A-Z][a-z]\\.[0-9]+$"
##
## $ENSEMBL
## [1] "^ENSG[0-9]+$"
##
## $ENSEMBLTRANS
## [1] "^ENST[0-9]+$"
##
## $ENSEMBLPROT
## [1] "^ENSP[0-9]+$"
##
## $ENTREZID
## [1] "^[0-9]+$"
##
## $IMAGE
## [1] "^IMAGE:[0-9]+$"
##
## $GOID
## [1] "^GO:[0-9]+$"
##
## $PFAM
## [1] "^PF[0-9]+$"
##
## $REFSEQ
## [1] "^[XYN][MPR]_[0-9]+$"
##
## $ENZYME
## [1] "^[0-9]+(\\.(([0-9]+)|-)+){3}$"
##
## $MAP
## [1] "^(([0-9]{1,2})|([XY]))((([pq])|(cen))(([0-9]+(\\.[0-9]+)?)|(ter))?(-([0-9]{1,2})|([XY]))?(([pq]?)|(cen))((ter)|([0-9]+(\\.[0-9]+)?))?)?)?$"
##
## $GENEBANK
## [1] "^[A-Z][0-9]{5}$" "^[A-Z]{2}[0-9]{6}$"
##
## $GENEBANK
## [1] "^[A-Z]{3}[0-9]{5}$"
##
## $GENEBANK
## [1] "^[A-Z]{4}[0-9]{8}[0-9]?[0-9]?$"
##
## $GENEBANK
## [1] "^[A-Z]{5}[0-9]{7}$"
##
## $GENENAME
## [1] " "
##
## $.Affymetrix
## [1] "(^AFFX[-_])|(^[0-9]+_([abfgilrsx]_)?([as]t)|(i))$"
##
## $.Illumina
## [1] "^ILMN_[0-9]+$"
##
## $.Agilent
## [1] "^A_[0-9]+_P[0-9]+$"
##
## $.nuID
## function (x)
## !is.na(nuIDdecode(x, error = NA))
## <environment: 0xd36c138>
idtype(def='ENTREZID')
## [1] "^[0-9]+$"
idtype(def=c('ENTREZID', 'ENSEMBLTRANS'))
## $ENTREZID
## [1] "^[0-9]+$"
##
## $ENSEMBLTRANS
## [1] "^ENST[0-9]+$"
# from GeneIdentifierType objects
idtype(NullIdentifier())
## [1] ""
idtype(AnnotationIdentifier('hgu133a.db'))
## "ANNOTATION"
# but
## Not run:
##D library(hgu133a.db)
##D idtype(hgu133a.db)
## End(Not run)
idtype("12345_at")
## [1] ".Affymetrix"
idtype(c("12345_at", "23232_at", "555_x_at"))
## [1] ".Affymetrix"
# mixed types
ids <- c("12345_at", "23232_at", "Hs.1213")
idtype(ids) # not detected
## [1] ""
idtype(ids, each=TRUE)
## 12345_at 23232_at Hs.1213
## ".Affymetrix" ".Affymetrix" "UNIGENE"