-
Notifications
You must be signed in to change notification settings - Fork 0
Python Functions
Python 2.x needs to be setup such that it is invokable from the command-line as python
. Try it out now: type python
into the command prompt. If you get a Python prompt, then you are all set. If you get an error about python
not existing, then keep reading.
You need to locate the directory where you installed Python. If you used the installer's default path, then it probably looks something like C:\Python27
. This directory needs to be added to your PATH environment variable. To do that, open Windows Explorer, right-click on Computer
, and click on Properties
in the drop-down menu. In the left-hand pane, click Advanced system settings
. In the dialog box, click the Environment Variables...
button.
In that dialog box, look at the upper box labeled User variables for
and click on the button New...
corresponding to that box.
In the New User Variable
dialog box:
- Variable name:
PATH
- Variable value:
%PATH%;C:\Python27
Be sure to enter all the funny characters exactly as shown.
Then click OK
on all the dialog boxes, save all your open documents, and log out. Log back in again, open command prompt, and try out python
.
Biopython comes packaged along with Clotho.
Type python --version
in the command prompt to see your Python version. If it is 2.7.x, then use this installer.
Note that you can return a dict
from python and it will easily convert to a JSON object. A python list
will be converted to an array.
Reverse complement via BioPython
clotho.run2("py_biorc", ["atcgc"])
Importing a test Python module in src/main/python/lib/hello.py
clotho.run("org.andersonlab.py_greet", [])
Bill Cao's PCR predictor (please fill in a better example here)
clotho.run2("py_pcr", ["CGCTCCAAGCTGGGCTGTGTG", "CGATAGTTACCGGATAAGGC", "CGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCG"])
Mina Li's NCBI NucSeq fetcher
clotho.run2("py_nucseqfetch", ['123746834', '1322'])
Returns its own code via a clotho.get
call
clotho.run("org.andersonlab.py_selfie", [])
Modifies itself via a clotho.set
call
clotho.run("org.andersonlab.py_selfsetter", [])
Makes a clotho.run
call which fails, and the Python function catches the resulting ClothoError
clotho.run2("py_error_recover", [])
Die at various stages of execution (should expect a run error and a clotho.say message)
clotho.run2("py_die_early", [])
clotho.run2("py_die_run", [])
Simulates Python process crashing really hard (shows that Clotho will cleanup properly)
clotho.run2("py_die_abrupt", [])
(Sans formatting for the time being.)
Author: Mina Li
Contact: [email protected]
Timeline: January 2014 - present
Summary: This is a document that outlines all the work I've done on Clotho.
***In src/main/python/lib
**BlastN, BlastP, BlastX, TBlastN, TBlastX
These are probably the most recent addition to the functions here. Essentially, this does what you can do here http://blast.ncbi.nlm.nih.gov/Blast.cgi but using Python instead.
Input: array ([string sequence, int no_of_alignments] or [string sequence, null])
Output: Blast_Record (defined in ClothoPy)
**convertfile, convertgb, convertid, nucseqfetch, id_to_poly, convertpoly
I just want to make a note here that I believe all of the functions returning JSON strings are probably going to just be returning Polynucleotide objects instead, when they're revised.
*convertfile - This isn't exactly functional yet, but I think it might be replaced by Max's page where you can drag and drop files into Clotho (but he then converts them with convertgb).
*convertgb - It expects to see a string representation of the contents of a Genbank file (like http://www.ncbi.nlm.nih.gov/nuccore/M98350.1), and turns it into a JSON string representation of a Polynucleotide.
*convertid - The input is an accession number (NCBI) in the form of a string, and it should return the string representation the Genbank file format. (Basically what convertgb would be expecting.)
*nucseqfetch - This particular function expects a Genbank object and returns a Polynucleotide JSON string representation of its contents. (I'm almost 100% certain this is going to need to change; it was one of the first functions written.)
*filenucseqfetch - As with convertfile, it would expect a path to a Genbank file and return a Polynucleotide JSON string representation of its contents.
*id_to_poly - To go straight from an accession number (NCBI) to a Polynucleotide JSON string representation of the Genbank file, you would use this function.
*convertpoly - This function is the only function that goes the other way around, and turns a Polynucleotide object into a Genbank string.
**oligo_to_poly, poly_to_fasta
*oligo_to_poly - This function takes an Oligo object (defined elsewhere in Clotho) and turns it into a Polynucleotide in the form of a JSON
*poly_to_fasta - The input is an Polynucleotide object, which gets turned into a FASTA string.
**registry_collector, reg_to_poly
*registry_collector - This takes in an ID for the iGem database and returns a JSON string representation of the Part (defined elsewhere in Clotho).
*reg_to_poly - Takes in a Part JSON string and outputs a Polynucleotide JSON string.
**protein_by_name, protein_by_gene, protein_to_orf
Note: In part because of the addition of Polypeptide, anything labeled with "poly" is referring to Polynucleotide, and everything labeled with "protein" is referring to Polypeptide.
*protein_by_name, protein_by_gene - These two are almost the same, so I'm putting them together. The input is in the form of an array ([string organism, string protein_name OR string gene_name, int retmax]), where retmax is null if you don't want to set your own, and will return all results. The output is going to need to change when we reassess, but currently I'm returning a string representation of an array of Polypeptide objects.
*protein_to_orf - The function takes in a Polypeptide as input and returns its ORF (if it has one).
**act_parser, act_query, fetch_uniprot
*act_query - This function is used to obtain a chemical from Act.20n. The input is a string and the output is a JSON dictionary directly from the API.
*act_parser - After retrieving the output from Act.20n, this takes in the dictionary outputted from actQuery, and converts it into a SinglePathway schema.
*fetch_uniprot - This fetches a protein from the UniProt database using the ID as its input.
***In src/main/python/lib/ClothoPy
**accn_retrieval, protein_retrieval
They are essentially the same protocol: they both define a class call_accn which has a method retrieve_gb that takes in a list of accessions and grabs them from NCBI. To actually get them, since they're stored and not returned, there's another method returnGB for a specific accession you might be looking for, or you can grab all of them using the attribute .records.
The only difference is that accn_retrieval stores the records as Genbank (Polynucleotide) and protein_retrieval stores the records as Polypeptide.
**genbank_holder, new_genbank_holder, protein_holder, blast_holder
These all define classes that hold representations of Genbank, New_Genbank, Polypeptide, and Blast_Record.
**ClothoAlignIO, ClothoInsdcIO, ClothoSeqIO, ClothoGenBankScanner
You probably will never need to know what's happening in these, but the general jist of it was that BioPython was insufficient for our needs, but I used their code and modified it to suit Clotho better.
-
testing all the functions
-
tested:
-
convertID: output String
clotho.run("org.andersonlab.py_convertID", ["1234890"])
clotho.run('org.andersonlab.py_convertID', ['19203732'])
-
convertGB: output String
clotho.run("org.andersonlab.py_convertGB", ['LOCUS CV961319 921 bp DNA EST 07-FEB-2011\nDEFINITION PYrpcy_2963 mycelium, Plich medium Phytophthora infestans cDNA, mRNA\n sequence.\nACCESSION CV961319\nVERSION CV961319.1 GI:58151110\nDBLINK BioSample:LIBEST_016732\nKEYWORDS EST.\nSOURCE Phytophthora infestans (potato late blight agent)\n ORGANISM Phytophthora infestans\n Eukaryota; Stramenopiles; Oomycetes; Peronosporales; Phytophthora.\nREFERENCE 1 (bases 1 to 921)\n AUTHORS Randall,T., Dwyer,R.A., Huitema,E., Beyer,K., Cvitanich,C.,\n Kelkar,H., Fong,A.M., Gates,K., Roberts,S., Yatzkan,E., Gaffney,T.,\n Law,M., Testa,A., Torto-Alalibo,A., Zhang,M., Zheng,L., Mueller,E.,\n Windass,J., Binder,A., Birch,P.R.J., Gisi,U., Govers,F., Gow,N.A.,\n Mauch,F., van West,P., Waugh,M.E., Yu,J., Boller,T., Kamoun,S.,\n Lam,S.T. and Judelson,H.S.\n TITLE Large-scale gene discovery in the oomycete Phytophthora infestans\n reveals likely components of phytopathogenicity shared with true\n fungi\n JOURNAL Mol. Plant Microbe Interact. 18 (3), 229-243 (2005)\n PUBMED 15782637\nCOMMENT Contact: Judelson HS\n Department of Plant Pathology\n University of California\n Weber Hall, Riverside, CA 92521, USA\n Tel: 909 787 4199\n Fax: 909 787 4294\n Email: [email protected].\nFEATURES Location/Qualifiers\n source 1..921\n /mol_type="mRNA"\n /db_xref="taxon:4787"\n /sex="A1"\n /note="Vector: pSPORT1"\n /strain="88069"\n /organism="Phytophthora infestans"\n /clone_lib="LIBEST_016732 mycelium, Plich medium"\nORIGIN\n 1 tcactatagg gaaagctggt acgcctgcag gtaccggtcc ggaattcccg gtcgacccac\n 61 gcgtccggac gcaacttctt ttcgcaatgt tggccgctaa gtctctctct cgttgccggt\n 121 gttggacgtc gcttgctcgt agcgtcacgt ggcatggctg gaggccgtgc tgcctttaat\n 181 tggcgtgatc cgcttatgct ggatggccag ctgacggacg aggaggccat gattcaaaaa\n 241 tcggccaacg actactgcca ggggcaactg ctgccgcgca ttggagaagc gaaccgtaag\n 301 ggcaagtttg accgctccat tatgaaggaa atgggcgaaa tgggcttcct tggtcccacg\n 361 gtccagggct acggctgcgc cggtgtgggc tacgtgtcct atggactcat tgcgaacgca\n 421 gtggagcgtg ttgacagcgc ctacaggtcg gcgatgagtg tgcagtcgtc tctggtaatg\n 481 cacccaatta accaattcgg atctgacgag cagaaggaaa agtacctccc tcgtcttggc\n 541 actggcgaac tcattggctg gttcggcttg acggagccga aacacggatc agaccctgga\n 601 tcaatggaga cgcgtgctag actcaaagga gacaagtaca tcctcaacgg ctccaagaac\n 661 tggatcacca acgctccgat cgctgacgtg ttcctcgtct gggccaagga cgacgagggc\n 721 gacatccgtg gtttcattct ggagaaggtg ggtttactta ctcttcactt gtcacttgca\n 781 gttgactcac gtaacttcac ctgcatctac aggacttccc tggcctatca gctccctaca\n 841 tcgaaggcaa ggcgacgttg ttggcatctg ctactggtat gatcttcctg gaagacgtcg\n 901 aaagttccca aggagaacat g\n//\n'])
-
nucseqfetch: output String
clotho.run("org.andersonlab.py_nucseqfetch", ["1234890"])
-
protein_by_gene: output String
clotho.run("org.andersonlab.py_proteinbygene", "Salmonella enterica", "tyrB", 5)
-
protein_by_name: output String
clotho.run("org.andersonlab.py_proteinbyname", "Halobacterium salinarum", "NAD-specific glutamate dehydrogenase A", 2)
-
poly_to_fasta: output String
clotho.run("org.andersonlab.py_polytoFASTA", ['195591271'])
-
registry_collector: output String
clotho.run('org.andersonlab.py_fetchRegistry', ['BBa_K189004'])
-
fetch_uniprot: output dict
clotho.run("org.andersonlab.py_fetchuniprot", ['Q6GZX3'])
-
convertpoly: output String
clotho.run("org.andersonlab.py_convertpoly", ["374333820"])
-
reg_to_poly
clotho.run("org.andersonlab.py_registryToPoly", ["org.registry.part.BBa_K189004"])
-
oligo_to_poly
clotho.run("org.andersonlab.py_oligoToPoly", ["ca581F"])
-
-
testing:
-
ActToOperon: output String
clotho.run("org.andersonlab.py_act_to_operon", ["1-Butanol"])
-
id_to_poly: output String
clotho.run("org.andersonlab.py_IDToPoly", ["BBa_B0000"])
(not JSON serializable)
-
protein_to_orf
clotho.run("org.andersonlab.py_ProteinToORF", ['ABW97930.1'])
(not JSON serializable)
-
Blast functions (BlastN, BlastP, BlastX, TBlastN, TBlastX)
clotho.run("org.andersonlab.py_blastp", 'mikkkkkmrkiiyfdfknfskfckkkfykyffnl', 3)
clotho.run("org.andersonlab.py_blastx", 'aagatggagaggcaaaattaaaatctatgaaaaattacaaaaaatttat', 3)
clotho.run("org.andersonlab.py_tblastn", ["mevrrggadgftvtlpslalargevaaltgqsgcgkstllemigailrpdtlgeyrlhqpevdiaaplmaanevamsairarelgfvlqhggllpwltvidnivlprrlagmdihshwlr", 3])
clotho.run("org.andersonlab.py_tblastx", 'aagatggagaggcaaaattaaaatctatgaaaaattacaaaaaatttat', 3)
(not JSON serializable)
-
-
to test:
- convertfile, filenucseqfetch
-