DaGO-Fun - Database for GO-based Functional Annotation Analysis

Browsing Tools

Browsing Resources

Protein Resources

Protein Interactions

Annotation Analysis

About the GOSP-FIT Tool

GOSP-FIT enables the identification of genes or proteins involved in a process similar to processes underlying a specific biological phenomena (disorder or disease), called candidate genes, using GO semantic similarity approaches. There has been a wide range of approaches to candidate gene identification but these approaches are impeded by either nature of the biological phenomena or sample size of people affected and are computationally expensive. Currently, several genes and proteins are annotated with Gene Ontology (GO) terms and can be used to identify candidate genes or proteins given the biological processes underlying the phenomena under study. This model only relies on knowledge of the functional characteristic of genes and proteins and is more effective and economical for candidate gene identification, especially in complex phenomena.

1. Background or Reference Data

The DaGO-Fun protein identication tool uses a specific organism-based background or reference, meaning that the user will have to select the organism under consideration. Currently, DaGO-Fun includes three population backgrounds, namely human (Homo sapiens), Mycobacterium tuberculosis and Mycobacterium leprae.

2. About Term Semantic Similarity Measures and Fuzziness

The GOSP-FIT tool supports all the current IC-based GO semantic similarity measures that we are aware of, namely topology-based approaches:

  • GO-universal metric
  • Wang et al. approach
  • Zhang et al. approach
and annotation-based approaches including:
  • Classic Resnik and Lin approaches and their possible enhancement XGraSM (eXtended Graph-based Semantic Similarity Measure), and
  • Enhancements of the Lin measure comprising Relevance (SimRel) and Li et al. (SimIC) approaches.

Note that the results or outputs of a given query are a function of a certain agreement score or level. For GOSP-FIT, the identified genes or proteins are involved in processes underlying a specific biological phenomena at a certain agreement score or level. For GOSP-FIT, the membership degree of a process underlying the phenomena under study to the set of GO terms annotating the candidate gene is modeled using term semantic similarity measures.

For more information, please refer to the associated publication: "Gaston K. Mazandu and Nicola J. Mulder. DaGO-Fun: Tool for Gene Ontology-based functional analysis using term information content measures, 2013", DaGO-Fun preliminary paper currently under review.