DaGO-Fun - Database for GO-based Functional Annotation Analysis

Browsing Tools

Browsing Resources

Protein Resources

Protein Interactions

Annotation Analysis

About the GOSP-FCT Tool

Genome sequencing projects and high-throughput technologies have yielded complete genome sequences and functional genomics data for human and several other organisms, including crucial microbial pathogens of humans, animals and plants. Currently, several genes and proteins are annotated with Gene Ontology (GO) terms and can be used to detect groups of functionally related proteins. Identifying gene or protein clusters based on the knowledge about their functional annotations is likely to provide an effective approach for analyzing complex biological phenomena by elucidating meaningful patterns in gene or protein dataset and improving their biological interpretation. Typically, the GOSP-FCT tool models a given list of proteins or genes as graphs or functional maps with vertices weighted using functional similarity scores between these proteins, making it easy to apply automated clustering methods to detect protein complexes or other biologically significant functional groupings.

1. Background or Reference Data and Clustering Approaches

The DaGO-Fun clustering tool uses all annotated proteins contained in the GO Annotation Uniprot Knowledgebase (GOA-UniProtKB) file when computing protein functional similarity scores to weight the functional map and clustering proteins based on their GO annotations. Remember that the selection of a given semantic similarity approach will significantly affect the output results.
GOSP-FCT implements three clustering approaches, namely hierarchical clustering, graph spectral or kmeans clustering and the community detecting model, which is referred to as a model-based approach.

2. Functional Similarity Measures and Functional Maps

GOSP-FCT sets relationships (edges) between genes or proteins in the functional map (or graph) using their functional similarity scores. Specifically, we are using fuzzy logic to model the occurrence of a given annotation in the corpus under consideration through semantic similarity between terms. The GOSP-FCT tool supports all the current IC-based GO semantic similarity measures that we are aware of, namely topology-based approaches:

  • GO-universal metric under BMA model
  • Wang et al. approach under ABM model
  • Zhang et al. approach under BMA model
and annotation-based approaches including:
  • All Direct Term-based approaches.
  • Classic Resnik and Lin approaches and their possible enhancement XGraSM (eXtended Graph-based Semantic Similarity Measure), and
  • Enhancements of the Lin measure comprising Relevance (SimRel) and Li et al. (SimIC) approaches under different Term Semantic-based Models.

Note that the results or outputs of a given query are a function of a certain agreement score or level. For GOSP-FCT, the edge or relationship between two proteins is set only in the case where the functional similarity score between these proteins is greater than or equal to the agreement level selected by the user.

For more information, please refer to the associated publication: "Gaston K. Mazandu and Nicola J. Mulder. DaGO-Fun: Tool for Gene Ontology-based functional analysis using term information content measures, 2013", DaGO-Fun preliminary paper currently under review.