Data Version::
GO Version: releases/2015-06-05 format-version 1.2
GOA-UniProtKB Version: 143 released on 27 May, 2015
Name: A-DaGO-Fun
Version: 15.1
Summary: A python package for an adaptive Gene Ontology semantic
similarity based functional analysis: Computing GO-based
semantic similarity scores and includes several other
biological applications related to GO semantic similarity
measures
Home-page: http://web.cbio.uct.ac.za/ITGOM/adagofun
Author: Gaston K Mazandu et al.
Author-email: gmazandu@cbio.uct.ac.za, gmazandu@gmail.com,
kuzamunu@aims.ac.za
License: Copyright (c) 2015 Gaston K Mazandu
Users can find essential information about obtaining A-DaGO-Fun
from its Home-page provided above. It is freely downloadable under
GNU General Public License (GPL), pre-compiled for Linux version
and protected by copyright laws. Users are free to copy, modify,
merge, publish, distribute and display information contained in
the package, provided that it is done with appropriate citation
of the package and by including the permission notice in all
copies or substantial portions of the module contained in this
package.
Permission is hereby granted, free of charge, to any person obtai-
ning a copy of this package and associated documentation files
to deal in the package without restriction, including without
limitation the rights to use, copy, modify, merge, publish, distri-
bute, sublicense and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
A-DaGO-Fun IS DISTRIBUTED AND PROVIDED "AS IS" IN THE HOPE THAT IT
WILL BE USEFUL, BUT WITHOUT ANY WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHAN-
TABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
PACKAGE OR THE USE OR OTHER DEALINGS IN THE PACKAGE.
See .
Description:
==============================
The python A-DaGO-Fun package
==============================
.. See the PDF package documentation for more information
on the use of the tool and different GO semantic simi-
larity measures
A-DaGO-AFun is a repository of python modules for analyzing
protein or gene sets at the functional level based on Gene
Ontology annotations using information content-based sema-
ntic similaritymeasures. It contains six main functions
and implements 101 different functional similarity measures.
The main use cases of the library are:
* Computing Information Content (IC), term and protein
semantic similarity scores: getTermFeatures, termsim
and funcsim.
* Identifying enriched GO terms accounting for uncert-
ainty in an annotation dataset: gossfeat.
* Discovering functionally related or similar genes/pro-
teins based on their GO terms: proteinfct.
* Retrieving genes or proteins by their GO annotations
for disease gene and target discovery: proteinfit.
Installation
------------
To install A-DaGO-Fun is quite straightforward and is similar
to installation of any other python package. The whole package
is relatively large (around 96Mb) and contains five modules
and sets of files (GO term features and IC scores) available
for download.
Four packages, namely scipy, matplotlib, networkx and cPickle,
need to be installed prior to the installation and use of
A-DaGO-Fun. To install A-DaGO-Fun, the user needs to download
the 'tar.gz' file and extract all files as follows::
::
tar -xzf dagofun.tar.gz
and then install from the package folder. To do this, one uses
the following command::
::
python setup.py install --user
The A-DaGO-Fun package is free to use under GNU General Public
License. You are free to copy, distribute and display information
contained herein, provided that it is done with appropriate ci-
tation of the tool. Thus, by using the A-DaGO-Fun package, it
is assumed that you have read and accepted the agreement provi-
ded and that you agreed to be bound to all terms and conditions
of this agreement. Please, use the following command line to
see the package licence::
::
python setup.py --licence
The two commands above should be executed inside the dagofun
directory.
It is worth mentioning that A-DaGO-Fun is a portable python pa-
ckage and can be used without installing the whole package. You
only need to be on a directory containing dagofun folder, which
is a python library of A-DaGO-Fun.
Note that one module, namely tabulate.py for Pretty-print tabular
data, which is borrowed from other authors, specifically written
by `Sergey Astanin (s.astanin@gmail.com)' and collaborators.
Build status
------------
The main website for the A-DaGO-Fun package is
http://web.cbio.uct.ac.za/ITGOM/adagofun where users can find
essential information about A-DaGO-Fun. It is freely download-
able under GNU General Public License (GPL), pre-compiled for
Linux version and protected by copyright laws. Users are free
to copy, modify, merge, publish, distribute and display informa-
mation contained in the package, provided that it is done with
appropriate citation of the package and by including the permi-
ssion notice in all copies or substantial portions of the modu-
le contained in this package.
It is currently maintained by one member of the core-develo-
pment team, Gaston K. Mazandu >> import dagofun
>>> help(dagofun)
Currently, this package provides six main modules:
* TermFeatures.py, TermSimilarity.py, ProteinSimilarity.py,
* ProteinSearch.py,
* EnrichmentAnalysis.py and
* ProteinClustering.py
written independently and each containing a specific
functions:
* getTermFeatures, termsim and funcsim,
* proteinfit,
* gossfeat and proteinfct, respectively.
One can start a module GivenModule.py from the A-DaGO-Fun
package as follows:
>>> import dagofun.GivenModule as gm
>>> help(gm)
After starting the module GivenModule.py as above, one can
call or use the special function named gofunc of this module
by writting
>>> gm.gofunc(arguments)
The function named gofunc from the module GivenModule.py can
also be made available directly as follows:
>>> from dagofun.GivenModule import gofunc
After importing the function gofunc, to get help on how to use
the function, type the command above:
>>> help(gofunc)
The function can be called directy as follows:
>>> gofunc(parameters)
Finally, all the special functions in the package can be made
available as follows:
>>> from dagofun import *
and thus, each of the specific functions can be called directly
as described above. Function arguments depend on the function
and outputs a nicely formatted plain-text table, displayed either
on the screen or written in a file, named according to the input
data or file provided and the name displayed on the screen.
Detailed description and use of main functions
----------------------------------------------
(1) getTermFeatures
===================
This function from TermFeatures.py retrieves Information Content
(IC) scores and other GO term features from the GO directed acyclic
graph (DAG) structure. Given a GO term or list/tuple or a file
containing a list of GO IDs, this function retrieves these charac-
teristics of these GO terms in the GO DAG, including their IC scores,
provided the IC model. It uses GO-universal model ('u') by default,
i.e., if no model is provided the GO-universal model is used.
(1.1) Usage:
------------
This function requires at least one argument and takes up to four
arguments.
(a) a GO term, list/tuple or file containing a list of GO terms
(b) model, (c) drop and (d) output as described below.
>>> getTermFeatures(InputData, model='u', drop = 0, output=1)
This indicates that function argument (a) above is compulsory.
* IC model:
Symbol of four different IC models implemeted in this package are:
'u': For the GO-Universal (default if no IC model is provided)
'w': For Wang et al.
'z': For Zhang et al
'a': For Annotation-based
See PDF documentation for more details about these IC models
* drop : boolean variable only useful in the context of Annotation-
based approach and it is set to 0 if Inferred from Electronic Anno-
tation (IEA) evidence code should be considered and to 1 otherwise.
By default, it is set to 0.
*output: a boolean variable also set to 1 to display results on the
screen and to 0 in a file. By default (output=1), results are disp-
layed on the screen, and finally default table display uses the pa-
ckage module written by 'Sergey Astanin (s.astanin@gmail.com)'
and collaborators. It used tablefmt="rst".
If results are written onto a file, the name of the file is basica-
lly the name of the first parameter in the function followed by TF
and where ':' is replaced by '_', this is a case when this parame-
ter is a GO term.
The command above should be run on a Python interface. However, One
can produce these results using direct command lines without making
use of a Python interface as shown below.
However, it is important to note that these command lines are only
applicable in the case where user terms' or proteins' input data are
retrieved from a file.
python $(python -m site --user-site)/dagofun/TermFeatures.py InputData model drop output
Note that arguments should be in order as shown in the command and
$(python -m site --user-site)/dagofun/ can be replaced by the path to
the module being run, especially if the package has not been installed.
(1.2) Examples:
---------------
Note that you can be in any folder if the package was locally ins-
talled, otherwise make sure on a directory containing dagofun
folder, which is a python library of A-DaGO-Fun.
>>> from dagofun.TermFeatures import *
>>> getTermFeatures('tests/TestTerms.txt', 'w', output=0)
>>> terms = ['GO:0000001','GO:0048308', 'GO:0005385']
>>> getTermFeatures(terms, 'a', 1)
>>> getTermFeatures(terms, 'a', 1, 0)
>>> getTermFeatures(terms, 'a', output=1)
>>> getTermFeatures('tests/TestTerms.txt')
>>> getTermFeatures('tests/TestTerms.txt', 'z')
Please check the file TestTerms.txt from the tests folder in the
dagofun directory. This file provides a model file of GO IDs.
Following command lines are correct:
python $(python -m site --user-site)/dagofun/TermFeatures.py tests/TestTerms.txt
python $(python -m site --user-site)/dagofun/TermFeatures.py tests/TestTerms.txt w 0 1
(2) termsim
===========
This function from TermSimilarity.py computes semantic similarity
scores between GO term pairs. Given a two GO IDs or a list/tuple
or two lists/tuples of GO IDs or a file containing a list of GO
ID pairs, this function computes the semantic similarity scores
between GO ID pairs produced from these input GO IDs:
* For a given two GO terms as arguments, the function computes
the semantic similarity between these two terms.
* For a list of tuple of GO IDs, the function computes similari-
ty scores between all GO IDs pairs (a, b) for a and b in a
list or tuple with a != b.
* If two lists A and B are given, then similarity scores are
computed between all pairs (ai, bi) with ai in A and bi in B
and 0 <= i <= min(len(A), len(B)) - 1
* If the file of GO ID pairs is provided, the function computes
similarity scores between these GO ID pairs
(2.1) Usage:
------------
This function requires at least one argument and takes up to six
arguments.
(a) A GO ID, list/tuple of GO IDs or file containing GO ID pairs
(b) A GO ID or a list/tuple or the ontology under consideration
(c) Approach to be used, the function supports up to for
different approaches provided in a tuple/list. If non approach
is provided, GO-universal model ('u') is used by default,
(d) drop and (e) output as described previously.
>>> termsim('GOID1', 'GOID2', ontology='BP', approach='u', drop=0, output=1)
>>> termsim(GO1, GO2, ontology='BP', approach='u', drop=0, output=1)
>>> termsim(GO1, ontology ='BP', approach='u', drop=0, output=1)
>>> termsim('FileName', ontology='BP', approach='u', drop=0, output=1)
* ontology: note that we are dealing with three ontology indepen-
dently. Biological Process ('BP'), Molecular Function ('MF') and
Cellular Component (CC). If no ontology is given then 'BP' is used
* Term Semantic Similarity (SS) Approach:
Approach or list/tuple of approaches under consideration (up to four
(4) approaches can be considered). If more than 4 are given, these
beyond the 4th valid symbols are ignored.
'u': for the GO-Universal
'w': for Wang et al.
'z': for Zhang et al
'r': for Resnik
'xr': for XGraSM-Resnik
'n': for Nunivers
'xn': for XGraSM-Nunivers
'l': for Lin
'xl': for XGraSM-Lin
'li': for Lin enhancement by Li et al (SimIC).
's': for Lin enhancement by Relevance (Schlicker et al.)
The corresponding command line is:
python $(python -m site --user-site)/dagofun/TermSimilarity.py FileName ontology nappr approach drop output
where the argument nappr is the number of approaches to be executed as
the the module can retrieve term semantic similarity scores using more
than one approach and can go up to four different approaches.
See PDF documentation for more details about these term SS approaches
(2.2) Examples:
---------------
Note that you can be in any folder if the package was locally ins-
talled, otherwise make sure on a directory containing dagofun
folder, which is a python library of A-DaGO-Fun.
>>> from dagofun.TermSimilarity import *
>>> termsim('tests/TermSimTest.txt', approach = ('n','z','li'))
>>> termsim('tests/TermSimTest.txt', output=0)
>>> termsim('GO:0000001','GO:0048308', 'BP', ('xr', 'w', 'z', 'li'))
>>> terms=['GO:0000001','GO:0048308', 'GO:0048311', 'GO:0000002']
>>> termsim(terms, approach = ('n','z'))
Please check the file TestSimTest.txt from the tests folder in the
dagofun directory. This file provides a model file of GO ID pairs.
Following command lines are correct:
python $(python -m site --user-site)/dagofun/TermSimilarity.py tests/TermSimTest.txt
python $(python -m site --user-site)/dagofun/TermSimilarity.py tests/TermSimTest.txt BP 3 n z w
(3) funcsim
===========
This function from ProteinSimilarity.py retrieves functional simi-
larity scores between protein/gene pairs. Given a string represen-
ting the name of the file containing protein IDs and their asso-
ciated GO IDs or two sets of GO IDs or a dictionary with protein
or gene IDs and keys and list ot tuples of GO IDs as values, for which
functional similarity (SS) scores must be computed, this function
computes the functional similarity scores between them:
* For a given file or an dictionary, if the target list of protein
or gene ID pairs is not provided, this function computes similari-
ty scores between all protein/gene pairs (p, q) for p and q in
the file or dictionary.
(3.1) Usage:
------------
This function requires at least one argument and takes up to six
arguments.
(a) Dictionary of protein/gene and or file containing protein/
genes and associated GO IDs annotating them.
(b) Protein or gene ID or a list/tuple or the ontology under consideration
(c) Measure to be used, the function supports up to for
different approaches provided in a tuple/list. If non measure
is provided, GO-universal based BMA ('ubma') is used by default,
(d) drop and (e) output as described previously.
>>> funcsim(GO1, GO2, ontology='BP', measure='ubma', drop=0, output=1)
>>> funcsim(ProtGO, TargetPairs=[], ontology='BP', measure='u', drop=0, output=1)
>>> funcsim(FileName, TargetPairs = [], ontology='BP', measure='ubma', drop=0, output=1)
* ontology: note that we are dealing with three ontology indepen-
dently. Biological Process ('BP'), Molecular Function ('MF') and
Cellular Component (CC). If no ontology is given then 'BP' is used
* FS measure
------------
measure or tuple of measures under consideration (up to three measures
can be considered). The symbole of a given functional similarity
measure is constructed as follows:
The starting letter r, n, l, li, s, x, a, z, w, and u represent GO term
semantic similarity approaches and stand for Resnik, Nunivers, Lin,
Li, Relevance, XGraSM, Annotation-based, Zhang, Wang and GO-universal,
respectively. The suffixes gic, uic, dic, cou, cot, avg, bma, abm, bmm
hdf, vhdf and max represent SimGIC, SimUIC, SimDIC, SimCOU, SimCOT,
Average, Best Match Average, Average Best Matches, Hausdorff, Variant
Hausdorff and Max measures, respectively. In cases where the
prefix x is used, indicating XGraSM-based, it is immediately followed
by the approach prefix. For example:
'xlmax' for XGraSM-Lin based Average Functional Similarity Measure
'agic' for Annotation-based SimGIC Functional Similarity Measure
'zuic' for Zhang et al. based SimUIC Functional Similarity Measure
And the Union-Intersection functional similarity measure is represented
by the symbol 'ui'.
- If the functional measure is not provided, the GO-universal based Best
Match Average (ubma) is used by default.
See the PDF package documentation for more details about these FS
measures and their corresponding symbols.
The corresponding command line is:
python $(python -m site --user-site)/dagofun/ProteinSimilarity.py FileName TargetPairs ontology nmeas measure drop output
where the argument nmeas is the number of measures to be executed as
the the module can retrieve functional semantic similarity scores using
more than one approach and can go up to four different approaches.
(3.2) Examples:
---------------
Note that you can be in any folder if the package was locally ins-
talled, otherwise make sure on a directory containing dagofun
folder, which is a python library of A-DaGO-Fun.
>>> from dagofun.ProteinSimilarity import *
>>> funcsim('tests/TestProteins.txt', measure = ('wcou','acou','acot'))
>>> funcsim('tests/TestProteins.txt', ontology = 'BP', measure = 'agic', drop = 1, output=1)
>>> funcsim('tests/TestProteins.txt', measure = ('wcou','acou','acot'), output=0)
>>> proteinterms = {'Q5H9L2':['GO:0006355','GO:0006351'], 'P03891':['GO:0022904','GO:0044281','GO:0044237','GO:0006120'], 'Q5H9L2':['GO:0006355','GO:0006351']}
>>> funcsim(proteinterms, measure = ('wvhdf','zvhdf','nvhdf'))
>>> A = ['GO:0022904', 'GO:0044281', 'GO:0044237', 'GO:0006120']; B = ['GO:0006355', 'GO:0006351']
>>> funcsim(A, B, measure = ('ub','nto','db','ub'))
>>> funcsim('tests/SpecificRefSet1.txt')
>>> funcsim('tests/SpecificRefSet2.txt', measure = ('wcou','acou','acot', 'ubmm'))
Please check the file TestProteins.txt, SpecificRefSet1.txt and
SpecificRefSet2.txt from the tests folder in the
dagofun directory. This file provides a model file of Protein-GO
annotations.
(4) proteinfit
==============
This function from ProteinSearch.py retrieves genes or proteins con-
tributing to a given processes at a certain threshold or agreement
level based of protein or gene annotations. For a given two strings
representing the name of the file of background proteins, each with
its GO ID annotations, and the target GO ID file or list, or a dic-
tionary with protein or gene IDs and keys and list ot tuples of GO
IDs as values and the target GO ID file or list, the function iden-
tifies of candidate genes or proteins matching these terms or
meeting the threshold or agreement level.
(4.1) Usage:
------------
This function requires at least two arguments and takes up to six
arguments.
(a) The name of the file containing the list of protein or gene IDs
and their associated GO ID pairs.
(b) A list/tuple or name of the file containing the list of GO ID
targets for which, protein/gene should be identified.
(c) One of the GO ontologies: BP, MF and CC
(d) One of the term semantic similarity approaches. Refer to termsim
function
(e) The threshold score providing the semantic similarity degree at
which terms are considered to be semantically close in the GO struc-
ture. More specifically, the score at which an ancestor can be
considered to occur through its descendant.
(f) drop as described previously.
>>> proteinfit(AnnotationData, TargetGOIDs, ontology='BP', approach='u', score=0.3, drop = 0)
The function outputs:
(a) Summary statistics for different target GO IDs proteins, which
includes following fields:
GO ID, GO term, Term Level, Number of proteins, Average SS, p-value
and Bonferroni correction displayed on the screen or in a file
depending on the argument output.
(b) For each GO ID target, a summary statistics is provided a file
named using the GO ID under consideration with following fields:
Protein ID, GO-ID related to the term, GO-IDs with high-SS,
Maximum SS and Average SS
A default parameters are GO-universal ('u') for SS, score = 0.3 the
threshold or agreement level, Considering all GO evidence codes
(drop = 0) and display by default on the screen (outputs=1), and
finally default table display (tablefmt="rst") see tabulate package
written by 'Sergey Astanin' and collaborators.
The corresponding command line is:
python $(python -m site --user-site)/dagofun/ProteinSearch.py AnnotationData TargetGOIDs ontology approach score drop output
(4.2) Examples:
---------------
Note that you can be in any folder if the package was locally ins-
talled, otherwise make sure on a directory containing dagofun
folder, which is a python library of A-DaGO-Fun.
>>> from dagofun.ProteinSearch import *
>>> targets = ['GO:0006355', 'GO:001905', 'GO:0001658']
>>> proteinfit('tests/TestProteins.txt', targets)
>>> proteinfit('tests/TestProteins.txt', 'tests/TermSimTest.txt', 'BP', 'n')
>>> background = {'Q5H9L2':['GO:0006355','GO:0006351'], 'P03891':['GO:0022904','GO:0044281','GO:0044237','GO:0006120'], 'Q5H9L2':['GO:0006355','GO:0006351']}
>>> proteinfit(background, targets, approach='s', score = 0)
>>> targets = ['GO:0009597', 'GO:0046718', 'GO:0032725', 'GO:0032727', 'GO:0047088', 'GO:0045416']
>>> proteinfit('tests/SpecificRefSet1.txt', targets, score=0.0)
>>> targets = ['GO:0006612', 'GO:0001666', 'GO:0009611', 'GO:0009409', 'GO:0002679', 'GO:0009626', 'GO:0050832', 'GO:0009410', 'GO:0016045']
>>> proteinfit('tests/SpecificRefSet2.txt', targets, score=0.0)
Please check the file TestProteins.txt, SpecificRefSet1.txt and
SpecificRefSet2.txt from the ests folder in the dagofun directory.
This file provides a model file of GO ID pairs.
(5) gossfeat
============
This function from EnrichmentAnalysis.py retrieves biological
processes most pertinent to the experiment performed based on the
target set and background provided. Given two strings represen-
ting the name of the file of background proteins, each with its
GO ID annotations, and the target protein file, the function
identify biological processes most pertinent to the experiment
performed.
The function incorporates the complex dependence structure of the
GO DAG and the uncertainty in annotation data using fuzzy expres-
sions through GO term semantic similarity measures.
(5.1) Usage:
------------
This function requires at least two arguments and takes up to six
arguments.
(a) The name of the file containing the list of protein or gene IDs
and their associated GO ID pairs, as described previuosly.
(b) The name of the file containing the list of target proteins or
genes.
(c) Ontology, approch, threshold score and drop arguments work in
the same way as described previously (see 3.1, and 2.1).
(d) The significance level cut-off (pvalue) from which an identified
term is considered to be statistically significant, set to 0.05 by
default.
>>> gossfeat(ReferenceFile, TargetFile, ontology='BP', approach='u', score=0.3, pvalue=0.05, drop=0, output=0)
It is worth mentioning that the strict cases where the score 0 and
1 correspond to the well known traditional cases: score=0 when
considering true path rule and score=1 for effective occurence or
exact match
The corresponding command line is:
python $(python -m site --user-site)/dagofun/EnrichmentAnalysis.py ReferenceFile TargetFile ontology approach score pvalue drop output
(5.2) Examples:
---------------
>>> from dagofun.EnrichmentAnalysis import *
>>> gossfeat('tests/ReferenceSetTest.txt', "tests/TargetSetTest.txt", approach = 's', score=0.5)
>>> gossfeat('tests/ReferenceSetTest.txt', "tests/TargetSetTest.txt")
>>> gossfeat('tests/ReferenceSetTest.txt', "tests/TargetSetTest.txt", approach = 'n', score=0.0)
>>> gossfeat('tests/ReferenceSetTest.txt', "tests/TargetSetTest.txt", approach = 'w', score=1.0)
Note that for this specific function, resulting list all enriched
terms and their features are displayed in a file from the forder
where the module is being executed. For the above example, the
name of this file will be ReferenceSetTestEA.txt
Please check the file ReferenceSetTest.txt and TargetSetTest.txt from
the tests folder in the dagofun directory. This file provides a model
file of GO ID pairs.
(6) proteinfct
==============
This function from ProteinClustering.py allows the partitioning of a
gene or protein set into a set of biological meaningful sub-classes
using their functional closeness based on GO annotations and derived
from a selected semantic similarity model. For a given two strings
representing the name of the file of proteins, each with its GO ID
annotations, and the potential target proteins file or list to be
clustered, or a dictionary with protein or gene IDs and keys and list
ot tuples of GO IDs as values and the target protein file or list, the
function elucidates functionally related or similar genes/proteins
based on their GO termsidentifies of candidate genes or proteins
matching these terms or meeting the threshold or agreement level.
(6.1) Usage:
-----------
This function requires at least two arguments and takes up to nine
arguments.
(a) Dictionary of protein/gene and or name of the file containing
protein/genes and associated GO IDs annotating them.
(b) Protein or gene ID or a list/tuple or the ontology under conside-
ration
(c) For the functional similarity measure, refer to funcsim (see 3.1)
(d) The threshold score providing the functional similarity degree at
which proteins can be considered to be functionally close, set to 0.3
by defaut. The score 0 indicates that all protein pairs with func-
tional similarity score greater that 0.
(d) drop and output as described previously.
(e) Clustering model under consideration, and this function implements
three different models (mclust):
- hierarchical clustering (mclust = 1)
- Graph spectral clustering or kmeans (mclust = 2)
- community detecting model by Thomas Aynaud, 2009 (mclust = 3)
(f) Number of clusters (nclust) applies only for the kmeans model and
it is set to 0 by default. In this case, if mclust is less than 2 then
the community detecting model is applied instead of kmeans!
>>> proteinfct(AnnotationData, TargetIDs=[], ontology='BP', measure='ubma', score=0.3, mclust=1, nclust=0, drop=0, output=1)
The corresponding command line is:
python $(python -m site --user-site)/dagofun/ProteinClustering.py AnnotationData TargetIDs ontology measure score mclust nclust drop output
(6.2) Examples:
---------------
>>> from dagofun.ProteinClustering import *
>>> proteinfct('tests/TestProteins.txt', measure='nbma', score=0.0, mclust=2, nclust=3)
>>> proteinfct('tests/TestProteins.txt', mclust=2)
>>> proteinfct('tests/TestProteins.txt', measure='rbmm', score=0.0, mclust=1)
>>> background = {'Q5H9L2':['GO:0006355','GO:0006351'], 'P03891':['GO:0022904','GO:0044281','GO:0044237','GO:0006120'], 'Q5H9L2':['GO:0006355','GO:0006351']}
>>> proteinfct(background, measure='rbmm', score=0.0)
>>> proteinfct('tests/SpecificRefSet2.txt', measure='ubmm', score=0.0, mclust=3)
>>> proteinfct('tests/SpecificRefSet1.txt', score=0.0)
Version history
---------------
- 15.1: Initial A-DaGO-Fun release in June 2015.
Maintainer
----------
Gaston K. Mazandu
Email: gmazandu@cbio.uct.ac.za, gmazandu@gmail.com,
kuzamunu@aims.ac.za
Contributors
------------
Gaston K Mazandu, Emile R Chimusa, Mbiyavanga Mamana, Nicola J Mulder
Emails: gmazandu@cbio.uct.ac.za, emile@cbio.uct.ac.za,
mamana@aims.ac.za, nicola.mulder@uct.ac.za
Classifier: License :: GPL (>= 2)
Classifier: Operating System :: OS Independent, but tested only on Linux (Ubuntu)
Classifier: Programming Language :: Python :: 2.7.3, but not tested on the version less than 2.7.3
Classifier: Topic :: Software Development :: Libraries