|DaGO-Fun - Database for GO-based Functional Annotation Analysis|
|About the GOSS-FEAT Tool|
High-throughput biology technologies, such as microarray and proteomic experiments, usually result in long gene lists, requiring support from dedicated bioinformatics tools in order to retrieve relevant information from these lists. Exploring Gene Ontology (GO) annotations of gene lists relevant to particular biological conditions has become widespread practice to gain initial insights into the biological meaning of these experiments. This strategy is referred to as enrichment analysis, and is a promising approach that enables the identification of biological processes most pertinent to the biological phenomena being studied. Several approaches have been suggested and have contributed to gene functional analyses for high-throughput biological studies by systematically mapping a large number of interesting genes in a list to the biological process of interest, and statistically highlighting the most over-represented or enriched biological processes. However, none of these approaches has attempted to consider the uncertainty issue of gene annotations in the dataset under consideration. In fact, a gene can be annotated with a general GO term rather than a more specific one because of lack of more complete biological knowledge about the gene. This obviously induces uncertainty in the GO annotations of genes in the dataset under consideration, and none of the previous term enrichment approaches has attempted to consider this issue in their functional analysis. The GOSS-FEAT tool incorporates the complex dependence structure of the GO DAG and the uncertainty in annotation data using fuzzy expressions through term semantic similarity measures.
1. Background or Reference Data
The DaGO-Fun enrichment analysis uses a specific organism-based background or reference, meaning that the user will have to select the organism under consideration. In fact, the enrichment analysis aims at comparing the annotation composition in your gene list to that of a population background genes. DaGO-Fun default population background in enrichment analysis is the associated genome-wide genes with at least one annotation in the analyzing cetegories. The default background is a good choice for the studies in genome-wide scope or close to genome-wide scope. Currently, DaGO-Fun includes three population backgrounds, namely human (Homo sapiens), Mycobacterium tuberculosis and Mycobacterium leprae and user input customized background within these organisms.
2. About Term Semantic Similarity Measures and Fuzziness
GOSS-FEAT deals with relationships between terms through their semantic similarity scores in order to include all specific and meaningful annotations, which may be relevant to the experiment. This is expected to improve the efficiency of enrichment analysis, overcoming restrictions imposed by the Boolean logic model used in the previous approaches, making them unable to handle uncertainty in gene annotation data. Specifically, we are using fuzzy logic to model the occurrence of a given annotation in the corpus under consideration through semantic similarity between terms. The GOSS-FEAT tool supports all the current IC-based GO semantic similarity measures that we are aware of, namely topology-based approaches:
The fuzzy concept is related to the fact that the results or outputs of a given query are a function of a certain agreement score or level. For GOSS-FEAT, the frequency of occurrence of a term through a gene or protein is in fact fuzzy-frequency of this term modeled using GO similarity score between the term in the ontology under consideration to the set of GO terms annotating the gene.
For more information, please refer to the associated publication: "Gaston K. Mazandu and Nicola J. Mulder. DaGO-Fun: Tool for Gene Ontology-based functional analysis using term information content measures, 2013", DaGO-Fun preliminary paper currently under review.