DaGO-Fun - Database for GO-based Functional Annotation Analysis

Browsing Tools

Browsing Resources

Protein Resources

Protein Interactions

Annotation Analysis

FAQ: Frequently Asked Questions for DaGO-Fun tool

  1. What is DaGO-Fun?
  2. What makes the DaGO-Fun tool different from other GO annotation based protein analysis tools?
  3. What is the best possible semantic measure approach or model I should use for my application?
  4. What protein accession numbers and gene identifiers does DaGO-Fun accept?
  5. What file formats can be uploaded/downloaded by DaGO-Fun?
  6. Who can use DaGO-Fun?
  7. Who do I contact if I find a GO annotation error for a protein or gene?
  8. How do I cite the DaGO-Fun tool?
  9. What computing technologies are used in the DaGO-Fun tool?
10. Does the DaGO-Fun tool limit the maximum number of genes, proteins, pair of genes or proteins, GO
      terms or pairs of GO terms in a list?
11. What are the choices of population backgrounds in the DaGO tool?
12. Which DaGO-Fun application is more suitable to answer my questions?
13. Where do I find different protein-protein functional networks mentioned in the DaGO-Fun tool?
14. From which external databases were these functional networks extracted?
15. How are scores computed in these different protein-protein functional networks?

1. What is DaGO-Fun? DaGO-Fun 11.x was originally designed as a web-based online tool that integrates information content-based GO semantic similarity measures for quantifying term specificity in the GO structure through GO term Information Content (IC) and for measuring GO term semantic similarity and protein functional similarity based on GO annotations. DaGO-Fun provides the largest integrated tool for IC-based GO semantic measures. As the result of continuely improving, DAGO-Fun 13.x provides an enhanced set of bioinformatics tools, in addition to protein functional similarity measurements, to systematically summarize the relevant biological patterns from a given gene or protein list. DaGO-Fun is currently an integrated set of GO-based functional analysis tools incorporating the large amounts of biological knowledge that GO offers in describing genes or groups of genes. It uses term semantic similarity measurements for understanding the biological phenomena underlying experimental data. The DaGO-Fun tool is an ongoing project, moving together with its users' needs and demands, committed to continuely addressing the challenges of GO-based protein analysis and systems biology. DaGO-Fun will keep upgrading and more tools are under development.

2. What makes the DaGO-Fun tool different from other GO annotation based protein analysis tools? In the context of semantic similarity, DaGO-Fun is the tool that integrates an unprecedented number of semantic similarity measures available via a single interface, which can be selected on the basis for user preferences. On the GO-based protein analysis side, existing tools and researchers have used GO slim to perform tasks in which GO term comparison is required. However, it is evident that while using a subset of GO terms or a reduced version of GO, such as GO slim, to relate genes makes GO terms and annotations easier to work with, valuable information is lost in the simplification. The DaGO-Fun tool solves this issue by incorporating the complex dependence structure of the Gene Ontology Directed Acyclic Graph (GO-DAG) and the uncertainty in annotation data using fuzzy expressions through semantic similarity concepts.

3. What is the best possible semantic measure approach or model I should use for my application? Click here, this may guide you to selecting the best possible semantic similarity approach for your application.

4. What protein accession numbers and gene identifiers does DaGO-Fun accept? DaGO-Fun only accepts protein UniProt accessions and gene names. The DaGO team urges users to first map their protein IDs to the recommended protein accessions using the UniProt ID mapping tool. Click here to browse the UniProt ID mapping tool.

5. What file formats can be uploaded/downloaded by DaGO-Fun? Plain text format (*.txt), a single column or tab-delimited (two columns) files (or blank space delimited when using the text area) can be uploaded by DaGO-Fun depending on the application. Indeed, the two column data or file is required only when computing term semantic similarity or protein functional similarity scores. These columns are either protein accessions or GO IDs. The file should not have a heading but if there is one then it should starts by a hash (#).

6. Who can use DaGO-Fun? DaGO-Fun is free to use for anyone. Please refers to the license (Terms of Use) section for more information.

7. Who do I contact if I find a GO annotation error for a protein or gene? The DaGO-Fun team is striving to aggregate biological knowledge about protein GO annotations into an organized structure that allows the efficient protein or gene functional analysis across genome-scale datasets. However, DaGO-Fun does not guarantee the quality or accuracy of annotation data, go to Limited Warranty and Liability section (Terms of Use) for more details. Thus, if it happens that you find annotation errors, please contact the primary source of annotation, refer to the GO Annotation Uniprot Knowledgebase (GOA-UniProtKB) for more information. If you feel that the errors may be due to some systematic error in DaGO-Fun's applications, please contact the DaGO-Fun Team.

8. How do I cite the DaGO-Fun tool? You can cite the preliminary paper "Gaston K. Mazandu and Nicola J. Mulder. DaGO-Fun: Tool for Gene Ontology-based functional analysis using term information content measures, 2013", which is currently under review or the following poster "Gaston K. Mazandu and Nicola J. Mulder. DaGO-Fun: Tool for Gene Ontology-based functional analysis enhanced through semantic similarity measures" presented at the Intelligent Systems for Molecular Biology conference, ISMB/ECCB 2013. Please refer to Copyright and license section (Terms of Use) for more details.

9. What computing technologies are used in the DaGO-Fun tool? The whole system is implemented using a LAMP (Linux-Apache-MySQL and PHP/Python) platform. This means that the DaGO-Fun tool is implemented under free software (GNU General Public Licence) using a Linux Apache server with a database structured in a relational model using MySQL, and the web interface is implemented in PHP-HTML. The back-end is composed of a set of query processing programs implemented in Python and dynamic pages were done using JavaScript.

10. Does the DaGO-Fun tool limit the maximum number of genes, proteins, pair of genes or proteins, GO terms or pairs of GO terms in a list? The goal of DaGO-Fun's design is to be able to efficiently analyse:

  • a list consisting of up to 3000 pairs of GO Ids or UniProt protein accessions or gene names can be submitted for GO term similarity and functional similarity querying. For GO term IC, the user can enter up to 5000 GO Ids. In the context of the IT-GOM tool.
  • a list of at most 20 GO Ids belonging to the same GO ontology is recommended when using the GOSP-FIT tool.
  • a target list of at most 2000 protein UniProt accessions or gene names is recommended for the GOSS-FEAT tool.
  • a list of no more than 200 protein UniProt accessions or gene names is recommended for the GOSP-FCT tool.

Please, refer to the DaGO-Fun data inputs for more details. All DaGO-Fun applications have been tested with the above specified quantities of data and have shown to return results within a few seconds or minutes, depending of the application and the volume of data uploaded. If running time is taking much longer, you should repeat your web call or check whether your input data is correct. Note that limits on inputs are mainly due to the limitations of the computational resources available but also to the visualization constraints and algorithm complexity, for example when running hierarchical clustering in GOSP-FCT. If you have trouble, please contact the DaGO-Fun Team for help.

11. What are the choices of population backgrounds in the DaGO tool? Depending on the DaGO-Fun application selected, DaGO-Fun uses different backgrounds. DaGO-Fun uses all annotated proteins contained in the GO Annotation Uniprot Knowledgebase (GOA-UniProtKB) file when computing protein functional similarity scores and clustering protein based on their GO annotations. Remember that the selection of a given semantic similarity approach will significantly affect the output results. The enrichment analysis and protein identification based on GO annotations use a specific organism-based background, meaning for these applications the user will have to select the organism under consideration. We know that the enrichment analysis aims at comparing the annotation composition in your gene list to that of a population background genes. DaGO-Fun default population background in enrichment analysis is the associated genome-wide genes with at least one annotation in the analyzing cetegories. The default background is a good choice for the studies in genome-wide scope or close to genome-wide scope. Currently, DaGO-Fun includes three population backgrounds, namely human (Homo sapiens), Mycobacterium tuberculosis and Mycobacterium leprae and user input customized background within these organisms.

12. Which DaGO-Fun application is more suitable to answer my questions? Click here, we hope that this site may help you make decisions.

13. Where do I find different protein-protein functional networks mentioned in the DaGO-Fun tool? This can be viewed using the PINV (Protein Interaction Network Visualizer) tool, which is a web-based access tool, user friendly interface and an open source, providing fully interactive networks with a set of tools that allow the querying, filtering and manipulation of the visible subnetwork in order to facilitate the visualization of different protein-protein functional networks. The tool is publicly accessible to all researchers working in applications involving protein analyses at the functional level.

14. From which external databases were these functional networks extracted? These interactions were retrieved from sequence similarity and shared domains, coexpressed protein data and protein interaction databases, including STRING, IntAct, DIP, Reactome, BIND, GRID and MINT databases, and PATRIC and HPIDB databases for host-pathogen interactions. Sequences are downloaded from the Integr8 project at the European Bioinformatics Institute (EBI) and protein domains are retrieved from the InterPro database. Coexpressed proteins are derived from similar pattern of mRNA expression measured by DNA arrays downloaded from the Stanford Microarray Database (SMD) and NCBI Gene Expression Omnibus (GEO) database.

15. How are scores computed in these different protein-protein functional networks? Click here for exploring details through the report provided.