DaGO-Fun - Database for GO-based Functional Annotation Analysis

Browsing Tools

Browsing Resources

Protein Resources

Protein Interactions

Annotation Analysis

IT-GOM Help and Description

Welcome to the user guide for the IT-GOM tool for computing IC-based (Information Content-based) semantic similarity scores using GO annotations. The tool provides a stepwise query selection menu, enabling the user to construct a query and adapting the selection choices in the process, leaving only relevant options open that correspond to his/her selections. This renders the user interface in IT-GOM easy to use and customized for effective exploration of GO term and protein semantic similarity scores.

1. Search step

This step allows the user to select the type of scores and the ontology to which the GO terms under consideration belong. Note that in IT-GOM, the three GO ontologies: Molecular Function (MF), Biological Process (BP) and Cellular Component (CC) are considered to be independent. This step contains two drop-down lists: measuring (for type of score) and ontology. Currently, the IT-GOM tool provides three types of scores, namely Term Information Content (IC) values, Term Semantic Similarity and Protein Semantic Similarity or Protein Functional Similarity scores. The ontology drop-down list is not available when measuring term IC.

2. Tool Category step

At this step, the user should provide the family of GO semantic similarity scores and how GO term IC values should be combined to produce GO term or protein semantic similarity scores. IT-GOM initially displays two drop-down lists: Select Family and Select Approach. It implements two main families, namely annotation- and topology-based families. If the user selects annotation-based family, there is a checkbox which appears, allowing him/her to include or exclude GO terms with the IEA (Inferred from Electronic Annotation) evidence code. Click here for more information about GO Evidence Codes. In the context of the topology-based family of methods, this checkbox appears only if the user has chosen to measure the protein semantic similarity scores as a protein may have GO terms with the IEA GO evidence code. This means if the checkbox is not checked, GO terms with IEA evidence code are ignored when computing GO term IC or protein semantic similarity scores.
In the 'Select Approach' drop-down list, the user chooses the method of combining GO term IC values for producing semantic similarity scores. This is not available when measuring GO term IC for the annotation-based family of methods or if the user chooses to compute protein semantic similarity scores using direct GO term IC (for SimGIC, SimDIC, SimUIC and SimUI).

3. Options step

This step is available only when measuring protein semantic (functional) similarity scores in the context of the annotation-based family. In this case, more information is requested on how to combine GO term IC values to produce these scores. This step contains two drop-down lists: Model and Combination. In the 'Model' drop-down list, the user indicates if he intends to use GO term IC directly (Direct Term-based) or through GO term semantic similarity values (Term Similarity-based). Depending on this choice, the 'Combination' drop-down list is updated accordingly.

4. User Input step and Result outputs

The user input is either GO IDs, GO ID pairs or UniProt protein Accession pairs, depending on the score measurements to be processed. In response to the user query, the IT-GOM tool produces a comprehensive summary in a table format on the next page of the user interface.

Measuring GO term IC scores:
In this case, input is GO IDs aligned and pasted in the Input text area or uploaded from a file. The user input is of the following form:
Output is a table with six columns as shown below:

GO IDOntologyNameStatusLevelIC Score
GO ID1B (if BP)GO ID1 Term NameA (if active)523.66875
GO ID2C (if CC)GO ID2 Term NameA (if active)523.66875
GO ID3F (if MF)GO ID3 Term NameA (if active)523.66875

Note that in the IT-GOM tool, the level of a term is the length of the longest path (or the maximum number of links) from the root of the ontology down to that term. The root itself is located at the level 0 considered to be the reference level. The status indicates whether a term is still active (A) or obsolete (O) or the term does not exist (N) in the ontology under consideration for the current settings. Note that by clicking on a given active GO ID, the associated sub-GO graph is displayed using the AmiGO tool.

Measuring GO term or Protein Semantic Similarity Scores:
The formats of user inputs and result outputs are similar when measuring GO term or Protein Semantic Similarity scores, except that for GO terms, entries are GO ID pairs and for proteins, entries are UniProt protein Accession pairs. These pairs are aligned and each pair in the line is space- or tab-separated as in the following example:
Concept1   Concept2
Concept1   Concept3
Concept4   Concept5
Concept3   Concept5
Outputs are tables with three columns in the following format:

Concept SourceConcept DestScore

Each concept is linked to its original database for viewing its features: protein is linked to the UniProt database and GO term to the GO database via the AmiGO tool. The tool also provides the possibility of displaying more details for a given pair by selecting a concept pair of interest and clicking on the view button at the bottom of the table. The new page then allows you to access the information about a specific concept by clicking on it. For this, proteins are linked to the QuickGO tool and GO terms to the AmiGO tool for viewing the sub-GO DAG of the term.

5. Important note on input limits:

We aim to let the IT-GOM tool calculate results for as many user inputs as possible, however, because of limitations in computational resources, we have to balance the maximum number of GO terms, and GO term and protein pairs for each user query. The maximum number of GO terms is 5000 when computing GO term IC scores, in which case the tool will display only 10 of them stepwise and all 5000 GO term features can be viewed by downloading them in a text file. For GO term semantic similarity scores, the user has to enter at most 3000 pairs and for protein, can enter a maximum of 3000 pairs per query. Entries beyond the maximum limitations will be ignored. Unfortunately if you have cases where your data exceed these limitations, it is necessary to divide the input data, run the IT-GOM tool separately, and merge the results at the end of the process. Alternatively you can contact the administrators who are willing to collaborate and run large data sets for analysis.

For more information, please refer to the associated publication: "Gaston K. Mazandu and Nicola J. Mulder. DaGO-Fun: Tool for Gene Ontology-based functional analysis using term information content measures, 2013", DaGO-Fun preliminary paper currently under review.