|DaGO-Fun - Database for GO-based Functional Annotation Analysis|
|FAQ: Frequently Asked Questions for DaGO-Fun tool|
1. What is DaGO-Fun?
1. What is DaGO-Fun? DaGO-Fun 11.x was originally designed as a web-based online tool that integrates information content-based GO semantic similarity measures for quantifying term specificity in the GO structure through GO term Information Content (IC) and for measuring GO term semantic similarity and protein functional similarity based on GO annotations. DaGO-Fun provides the largest integrated tool for IC-based GO semantic measures. As the result of continuely improving, DAGO-Fun 13.x provides an enhanced set of bioinformatics tools, in addition to protein functional similarity measurements, to systematically summarize the relevant biological patterns from a given gene or protein list. DaGO-Fun is currently an integrated set of GO-based functional analysis tools incorporating the large amounts of biological knowledge that GO offers in describing genes or groups of genes. It uses term semantic similarity measurements for understanding the biological phenomena underlying experimental data. The DaGO-Fun tool is an ongoing project, moving together with its users' needs and demands, committed to continuely addressing the challenges of GO-based protein analysis and systems biology. DaGO-Fun will keep upgrading and more tools are under development.
2. What makes the DaGO-Fun tool different from other GO annotation based protein analysis tools? In the context of semantic similarity, DaGO-Fun is the tool that integrates an unprecedented number of semantic similarity measures available via a single interface, which can be selected on the basis for user preferences. On the GO-based protein analysis side, existing tools and researchers have used GO slim to perform tasks in which GO term comparison is required. However, it is evident that while using a subset of GO terms or a reduced version of GO, such as GO slim, to relate genes makes GO terms and annotations easier to work with, valuable information is lost in the simplification. The DaGO-Fun tool solves this issue by incorporating the complex dependence structure of the Gene Ontology Directed Acyclic Graph (GO-DAG) and the uncertainty in annotation data using fuzzy expressions through semantic similarity concepts.
3. What is the best possible semantic measure approach or model I should use for my application? Click here, this may guide you to selecting the best possible semantic similarity approach for your application.
4. What protein accession numbers and gene identifiers does DaGO-Fun accept? DaGO-Fun only accepts protein UniProt accessions and gene names. The DaGO team urges users to first map their protein IDs to the recommended protein accessions using the UniProt ID mapping tool. Click here to browse the UniProt ID mapping tool.
5. What file formats can be uploaded/downloaded by DaGO-Fun? Plain text format (*.txt), a single column or tab-delimited (two columns) files (or blank space delimited when using the text area) can be uploaded by DaGO-Fun depending on the application. Indeed, the two column data or file is required only when computing term semantic similarity or protein functional similarity scores. These columns are either protein accessions or GO IDs. The file should not have a heading but if there is one then it should starts by a hash (#).
10. Does the DaGO-Fun tool limit the maximum number of genes, proteins, pair of genes or proteins, GO terms or pairs of GO terms in a list? The goal of DaGO-Fun's design is to be able to efficiently analyse:
Please, refer to the DaGO-Fun data inputs for more details. All DaGO-Fun applications have been tested with the above specified quantities of data and have shown to return results within a few seconds or minutes, depending of the application and the volume of data uploaded. If running time is taking much longer, you should repeat your web call or check whether your input data is correct. Note that limits on inputs are mainly due to the limitations of the computational resources available but also to the visualization constraints and algorithm complexity, for example when running hierarchical clustering in GOSP-FCT. If you have trouble, please contact the DaGO-Fun Team for help.
11. What are the choices of population backgrounds in the DaGO tool? Depending on the DaGO-Fun application selected, DaGO-Fun uses different backgrounds. DaGO-Fun uses all annotated proteins contained in the GO Annotation Uniprot Knowledgebase (GOA-UniProtKB) file when computing protein functional similarity scores and clustering protein based on their GO annotations. Remember that the selection of a given semantic similarity approach will significantly affect the output results. The enrichment analysis and protein identification based on GO annotations use a specific organism-based background, meaning for these applications the user will have to select the organism under consideration. We know that the enrichment analysis aims at comparing the annotation composition in your gene list to that of a population background genes. DaGO-Fun default population background in enrichment analysis is the associated genome-wide genes with at least one annotation in the analyzing cetegories. The default background is a good choice for the studies in genome-wide scope or close to genome-wide scope. Currently, DaGO-Fun includes three population backgrounds, namely human (Homo sapiens), Mycobacterium tuberculosis and Mycobacterium leprae and user input customized background within these organisms.
12. Which DaGO-Fun application is more suitable to answer my questions? Click here, we hope that this site may help you make decisions.
13. Where do I find different protein-protein functional networks mentioned in the DaGO-Fun tool? This can be viewed using the PINV (Protein Interaction Network Visualizer) tool, which is a web-based access tool, user friendly interface and an open source, providing fully interactive networks with a set of tools that allow the querying, filtering and manipulation of the visible subnetwork in order to facilitate the visualization of different protein-protein functional networks. The tool is publicly accessible to all researchers working in applications involving protein analyses at the functional level.
14. From which external databases were these functional networks extracted? These interactions were retrieved from sequence similarity and shared domains, coexpressed protein data and protein interaction databases, including STRING, IntAct, DIP, Reactome, BIND, GRID and MINT databases, and PATRIC and HPIDB databases for host-pathogen interactions. Sequences are downloaded from the Integr8 project at the European Bioinformatics Institute (EBI) and protein domains are retrieved from the InterPro database. Coexpressed proteins are derived from similar pattern of mRNA expression measured by DNA arrays downloaded from the Stanford Microarray Database (SMD) and NCBI Gene Expression Omnibus (GEO) database.
15. How are scores computed in these different protein-protein functional networks? Click here for exploring details through the report provided.