DaGO-Fun - Database for GO-based Functional Annotation Analysis

Browsing Tools

Browsing Resources

Protein Resources

Protein Interactions

Annotation Analysis

About the IT-GOM Tool

IT-GOM is a tool that produces Information Content (IC) based semantic similarity scores computed using Gene Ontology (GO) annotations. This tool is currently based on the 15 of April, 2014 GO and GOA-UniProtKB data. Several GO term semantic similarity approaches have been introduced, enabling efficient exploitation of the enormous corpus of biological knowledge embedded in the GO DAG (Directed Acyclic Graph). The main use of these GO term similarity approaches is the computation of functional similarity between proteins or genes based on their GO annotations, for analyses at the functional level. Several measures have been implemented to assess these similarity values, and each measure performs differently for different applications. IT-GOM arose from the need for providing a tool that integrates all the current IC-based semantic similarity measures, allowing researchers to explore different measures and to choose the relevant approach for their specific applications.

1. GO term Semantic Similarity Approaches

The IT-GOM tool supports all the current IC-based GO semantic similarity measures that we are aware of, namely topology-based approaches:

  • GO-universal metric introduced by Mazandu and Mulder
  • Wang et al. approach and
  • Zhang et al. approach
and annotation-based approaches, including:
  • Classic Resnik, Lin and Nunivers approaches and possible enhancement in XGraSM (eXtended Graph-based Semantic Similarity Measure), and
  • Enhancements of the Lin measure comprising the Relevance (SimRel) and Li et al. (SimIC) approaches.

Note that the Jiang & Conrath approach is under the Lin measure label as it is just the non-normalized version of this approach, and that all other normalization schemes that have been proposed have failed to improve the performance of this approach. Furthermore, IT-GOM implements XGraSM (eXtended GraSM) in which, instead of considering only the disjunctive common ancestors (DCA), as is the case for classic GraSM, all informative common ancestors (ICA) are considered. However, its performance is still to be evaluated.

2. Functional Similarity Measures

In the previous vesion of the DaGO-Fun tool, each topology-based approach has been implemented with its associated measure as suggested by the authors of the approach. In this version, these topology-based family based term similarity approaches implement all known protein functional similarity measures. For annotation-based approaches, each is implemented with four widely used functional similarity measures, combining all GO term similarity approaches supported by the IT-GOM tool: average (Avg), maximum (Max), best match average (BMA) and averaging best matches (ABM). In addition, we include the SimGIC model based on the Jaccard index, which uses the IC of the terms directly in order to compute functional similarity measures. The IT-GOM tool also implements two other functional similarity measures using the term IC directly: SimDIC (Czekanowski or Lin like measure) based on the Dice index and SimUIC based on the universal index.

For more information, please refer to the associated publication: "Gaston K. Mazandu and Nicola J. Mulder. DaGO-Fun: Tool for Gene Ontology-based functional analysis using term information content measures, 2013", DaGO-Fun preliminary paper published in the BMC Bioinformatics journal.