Mercurial > repos > ktnyt > gembassy
diff GEMBASSY-1.0.3/doc/text/genret.txt @ 0:8300eb051bea draft
Initial upload
author | ktnyt |
---|---|
date | Fri, 26 Jun 2015 05:19:29 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/GEMBASSY-1.0.3/doc/text/genret.txt Fri Jun 26 05:19:29 2015 -0400 @@ -0,0 +1,451 @@ + genret +Function + + Retrieves various gene related information from genome flatfile + +Description + + genret reads in one or more genome flatfiles and retrieves various data from + the input file. It is a wrapper program to the G-language REST service, + where a method is specified by giving a string to the "method" qualifier. By + default, genret will parse the input file to retrieve the accession ID + (or name) of the genome to query G-language REST service. By setting the + "accid" qualifier to false (or 0), genret will instead parse the sequence + and features of the genome to create a GenBank formatted flatfile and upload + the file to the G-language web server. Using the file uploaded, genret will + execute the method provided. + + genret is able to perform a variety of tasks, incluing the retrieval of + sequence upstream, downstream, or around the start or stop codon, + translated gene sequences search of gene data by keyword, and re-annotation + and retrieval of genome flatfiles. The set of genes can be given as flat + text, regular expression, or a file containing the list of genes. + + Details on G-language REST service is available from the wiki page + + http://www.g-language.org/wiki/rest + + Documentation on G-language Genome Analysis Environment methods are + provided at the Document Center + + http://ws.g-language.org/gdoc/ + +Usage + + Here is a sample session with genret + + Retrieving sequences upstream, downstream, or around the start/stop codons. + The following example shows the retrieval of sequence around the start + codons of all genes. + + Genes to access are specified by regular expression. '*' stands for every + gene. + + Available methods are: + after_startcodon + after_stopcodon + around_startcodon + around_stopcodon + before_startcodon + before_stopcodon + +% genret +Retrieves various gene related information from genome flatfile +Input nucleotide sequence(s): refseqn:NC_000913 +Gene name(s) to lookup [*]: +Feature to access: around_startcodon +Full text output file [nc_000913.around_startcodon]: + + Go to the input files for this example + Go to the output files for this example + + Example 2 + + Using flat text as target genes. The names can be split with with a space, + comma, or vertical bar. + +% genret +Retrieves various gene related information from genome flatfile +Input nucleotide sequence(s): refseqn:NC_000913 +List of gene name(s) to report [*]: recA,recB +Name of gene feature to access: translation +Sequence output file [nc_000913.translation.genret]: stdout +>recA +MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR +IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT +GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL +KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR +VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN +ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF +>recB +MSDVAETLDPLRLPLQGERLIEASAGTGKTFTIAALYLRLLLGLGGSAAFPRPLTVEELLV +VTFTEAATAELRGRIRSNIHELRIACLRETTDNPLYERLLEEIDDKAQAAQWLLLAERQMD +EAAVFTIHGFCQRMLNLNAFESGMLFEQQLIEDESLLRYQACADFWRRHCYPLPREIAQVV +FETWKGPQALLRDINRYLQGEAPVIKAPPPDDETLASRHAQIVARIDTVKQQWRDAVGELD +ALIESSGIDRRKFNRSNQAKWIDKISAWAEEETNSYQLPESLEKFSQRFLEDRTKAGGETP +RHPLFEAIDQLLAEPLSIRDLVITRALAEIRETVAREKRRRGELGFDDMLSRLDSALRSES +GEVLAAAIRTRFPVAMIDEFQDTDPQQYRIFRRIWHHQPETALLLIGDPKQAIYAFRGADI +FTYMKARSEVHAHYTLDTNWRSAPGMVNSVNKLFSQTDDAFMFREIPFIPVKSAGKNQALR +FVFKGETQPAMKMWLMEGESCGVGDYQSTMAQVCAAQIRDWLQAGQRGEALLMNGDDARPV +RASDISVLVRSRQEAAQVRDALTLLEIPSVYLSNRDSVFETLEAQEMLWLLQAVMTPEREN +TLRSALATSMMGLNALDIETLNNDEHAWDVVVEEFDGYRQIWRKRGVMPMLRALMSARNIA +ENLLATAGGERRLTDILHISELLQEAGTQLESEHALVRWLSQHILEPDSNASSQQMRLESD +KHLVQIVTIHKSKGLEYPLVWLPFITNFRVQEQAFYHDRHSFEAVLDLNAAPESVDLAEAE +RLAEDLRLLYVALTRSVWHCSLGVAPLVRRRGDKKGDTDVHQSALGRLLQKGEPQDAAGLR +TCIEALCDDDIAWQTAQTGDNQPWQVNDVSTAELNAKTLQRLPGDNWRVTSYSGLQQRGHG +IAQDLMPRLDVDAAGVASVVEEPTLTPHQFPRGASPGTFLHSLFEDLDFTQPVDPNWVREK +LELGGFESQWEPVLTEWITAVLQAPLNETGVSLSQLSARNKQVEMEFYLPISEPLIASQLD +TLIRQFDPLSAGCPPLEFMQVRGMLKGFIDLVFRHEGRYYLLDYKSNWLGEDSSAYTQQAM +AAAMQAHRYDLQYQLYTLALHRYLRHRIADYDYEHHFGGVIYLFLRGVDKEHPQQGIYTTR +PNAGLIALMDEMFAGMTLEEA + + Example 3 + + Using a file with a list of gene names. + The following example will retrieve the strand direction for each gene + listed in the "gene_list.txt" file. String prefixed with an "@" or "list::" + will be interpreted as file names. + +% genret +Retrieves various gene features from genome flatfile +Input nucleotide sequence(s): refseqn:NC_000913 +List of gene name(s) to report [*]: @gene_list.txt +Name of gene feature to access: direction +Full text output file [nc_000913.direction]: stdout +gene,direction +thrA,direct +thrB,direct +thrC,direct + + Go to the input files for this example + Go to the output files for this example + + Example 4 + + Retrieving translations of coding sequences. + The following example will retrieve the translated protein sequence of + the "recA" gene. + +% genret +Retrieves various gene related information from genome flatfile +Input nucleotide sequence(s): refseqn:NC_000913 +Gene name(s) to lookup [*]: recA +Feature to access: translation +Full text output file [nc_000913.translation]: stdout +>recA +MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR +IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT +GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL +KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR +VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN +ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF + + Example 5 + + Retrieving feature information of the genes. + The following example will retrieve the start positions for each gene. + The values for the keys in GenBank format is available for retrieval. + (ex. start end direction GO* etc.) + Positions will be returned with a 1 start value. + +% genret +Retrieves various gene related information from genome flatfile +Input nucleotide sequence(s): refseqn:NC_000913 +Gene name(s) to lookup [*]: +Feature to access: start +Full text output file [nc_000913.start]: + + Go to the input files for this example + Go to the output files for this example + + Example 6 + + Passing extra arguments to the methods. + The following example shows the retrieval of 30 base pairs around the + start codon of the "recA" gene. By default, the "around_startcodon" method + returns 200 base pairs around the start codon. Using the "-argument" + qualifier allows the user to change this value. + +% genret refseqn:NC_000913 recA around_startcodon -argument 30,30 stdout +Retrieves various gene features from genome flatfile +>recA +ccggtattacccggcatgacaggagtaaaaatggctatcgacgaaaacaaacagaaagcgt +tg + + Example 7 + + Re-annotating a flatfile. + genret supports re-annotation of a genome flatfile via Restauro-G + service developed by our team. Using the BLAST Like Alignment Tool, + to refer the UniProt KB and annotates information including the description, + comments, feature tables, cross references, COG family, position, and Pfam. + The original software is available at [http://restauro-g.iab.keio.ac.jp]. + + +% genret refseqn:NC_000913 '*' annotate nc_000913-annotate.gbk +Retrieves various gene features from genome flatfile + +Command line arguments + + Standard (Mandatory) qualifiers: + [-sequence] seqall Nucleotide sequence(s) filename and optional + format, or reference (input USA) + [-gene] string [*] Gene name(s) to lookup (Any string) + [-access] string Feature to access (Any string) + [-outfile] outfile [*.genret] Full text output file + + Additional (Optional) qualifiers: (none) + Advanced (Unprompted) qualifiers: + -argument string Option to give to method (Any string) + -[no]accid boolean [Y] Include to use sequence accession ID as + query + + General qualifiers: + -help boolean Report command line options and exit. More + information on associated and general + qualifiers can be found with -help -verbose + +Input file format + + Database definitions for the examples are included in the embossrc_template + file of the Keio Bioinformatcs Web Service (KBWS) package. + + Input files for usage example 4 + + File: gene_list.txt + +thrA +thrB +thrC + +Output file format + + Output files for usage example 1 + + File: nc_000913.around_startcodon + +>thrL +cgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcata +gcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccacca +ttaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacaca +gaaaaaagcccgcacctgac +>thrA +aggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgc +gggctttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcg +gtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgc +caggcaggggcaggtggcca + + [Part of this file has been deleted for brevity] + +>yjjY +tgcatgtttgctacctaaattgccaactaaatcgaaacaggaagtacaaaagtccctgacc +tgcctgatgcatgctgcaaattaacatgatcggcgtaacatgactaaagtacgtaattgcg +ttcttgatgcactttccatcaacgtcaacaacatcattagcttggtcgtgggtactttccc +tcaggacccgacagtgtcaa +>yjtD +tttttctgcgacttacgttaagaatttgtaaattcgcaccgcgtaataagttgacagtgat +cacccggttcgcggttatttgatcaagaagagtggcaatatgcgtataacgattattctgg +tcgcacccgccagagcagaaaatattggggcagcggcgcgggcaatgaaaacgatggggtt +tagcgatctgcggattgtcg + + Output files for usage example 5 + + File: nc_000913.start + +gene,start +thrL,190 +thrA,337 +thrB,2801 +thrC,3734 +yaaX,5234 +yaaA,5683 +yaaJ,6529 +talB,8238 +mog,9306 + + [Part of this file has been deleted for brevity] + +yjjX,4631256 +ytjC,4631820 +rob,4632464 +creA,4633544 +creB,4634030 +creC,4634719 +creD,4636201 +arcA,4637613 +yjjY,4638425 +yjtD,4638965 + + Output files for usage example 7 + + File: ecoli-annotate.gbk + +LOCUS NC_000913 4639675 bp DNA circular BCT 25-OCT-2010 +DEFINITION Escherichia coli str. K-12 substr. MG1655 chromosome, complete + genome. +ACCESSION NC_000913 +VERSION NC_000913.2 GI:49175990 +DBLINK Project: 57779 +KEYWORDS . +SOURCE Escherichia coli str. K-12 substr. MG1655 + ORGANISM Escherichia coli str. K-12 substr. MG1655 + Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; + + [Part of this file has been deleted for brevity] + + CDS 2801..3733 + /EC_number="2.7.1.39" + /codon_start="1" + /db_xref="GI:16127997" + /db_xref="ASAP:ABE-0000010" + /db_xref="UniProtKB/Swiss-Prot:P00547" + /db_xref="ECOCYC:EG10999" + /db_xref="EcoGene:EG10999" + /db_xref="GeneID:947498" + /function="enzyme; Amino acid biosynthesis: Threonine" + /function="1.5.1.8 metabolism; building block + biosynthesis; amino acids; threonine" + /function="7.1 location of gene products; cytoplasm" + /gene="thrB" + /gene_synonym="ECK0003; JW0002" + /locus_tag="b0003" + /note="GO_component: GO:0005737 - cytoplasm; GO_process: + GO:0009088 - threonine biosynthetic process" + /product="homoserine kinase" + /protein_id="NP_414544.1" + /rs_com="FUNCTION: Catalyzes the ATP-dependent + phosphorylation of L- homoserine to L-homoserine + phosphate (By similarity)." + /rs_com="CATALYTIC ACTIVITY: ATP + L-homoserine = ADP + + O-phospho-L- homoserine." + /rs_com="PATHWAY: Amino-acid biosynthesis; L-threonine + biosynthesis; L- threonine from L-aspartate: step 4/5." + /rs_com="SUBCELLULAR LOCATION: Cytoplasm (Potential)." + /rs_com="SIMILARITY: Belongs to the GHMP kinase family. + Homoserine kinase subfamily." + /rs_des="RecName: Full=Homoserine kinase; Short=HK; + Short=HSK; EC=2.7.1.39;" + /rs_protein="Level 1: similar to KHSE_ECODH 1.7e-180" + /rs_xr="EMBL; CP000948; ACB01208.1; -; Genomic_DNA." + /rs_xr="RefSeq; YP_001728986.1; -." + /rs_xr="ProteinModelPortal; B1XBC8; -." + /rs_xr="SMR; B1XBC8; 2-308." + /rs_xr="EnsemblBacteria; EBESCT00000012034; + EBESCP00000011562; EBESCG00000011096." + /rs_xr="GeneID; 6058639; -." + /rs_xr="GenomeReviews; CP000948_GR; ECDH10B_0003." + /rs_xr="KEGG; ecd:ECDH10B_0003; -." + /rs_xr="HOGENOM; HBG646290; -." + /rs_xr="OMA; GSAHADN; -." + /rs_xr="ProtClustDB; PRK01212; -." + /rs_xr="BioCyc; ECOL316385:ECDH10B_0003-MONOMER; -." + /rs_xr="GO; GO:0005737; C:cytoplasm; + IEA:UniProtKB-SubCell." + /rs_xr="GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW." + /rs_xr="GO; GO:0004413; F:homoserine kinase activity; + IEA:EC." + /rs_xr="GO; GO:0009088; P:threonine biosynthetic process; + IEA:UniProtKB-KW." + /rs_xr="HAMAP; MF_00384; Homoser_kinase; 1; -." + /rs_xr="InterPro; IPR006204; GHMP_kinase." + /rs_xr="InterPro; IPR013750; GHMP_kinase_C." + /rs_xr="InterPro; IPR006203; GHMP_knse_ATP-bd_CS." + /rs_xr="InterPro; IPR000870; Homoserine_kin." + /rs_xr="InterPro; IPR020568; Ribosomal_S5_D2-typ_fold." + /rs_xr="InterPro; IPR014721; + Ribosomal_S5_D2-typ_fold_subgr." + /rs_xr="Gene3D; G3DSA:3.30.230.10; + Ribosomal_S5_D2-type_fold; 1." + /rs_xr="Pfam; PF08544; GHMP_kinases_C; 1." + /rs_xr="Pfam; PF00288; GHMP_kinases_N; 1." + /rs_xr="PIRSF; PIRSF000676; Homoser_kin; 1." + /rs_xr="PRINTS; PR00958; HOMSERKINASE." + /rs_xr="SUPFAM; SSF54211; Ribosomal_S5_D2-typ_fold; 1." + /rs_xr="TIGRFAMs; TIGR00191; thrB; 1." + /rs_xr="PROSITE; PS00627; GHMP_KINASES_ATP; 1." + /transl_table="11" + /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETF + SLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACS + VVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDI + ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ + PELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETA + QRVADWLGKNYLQNQEGFVHICRLDTAGARVLEN" + + [Part of this file has been deleted for brevity] + + 4639201 gcgcagtcgg gcgaaatatc attactacgc cacgccagtt gaactggtgc cgctgttaga + 4639261 ggaaaaatct tcatggatga gccatgccgc gctggtgttt ggtcgcgaag attccgggtt + 4639321 gactaacgaa gagttagcgt tggctgacgt tcttactggt gtgccgatgg tggcggatta + 4639381 tccttcgctc aatctggggc aggcggtgat ggtctattgc tatcaattag caacattaat + 4639441 acaacaaccg gcgaaaagtg atgcaacggc agaccaacat caactgcaag ctttacgcga + 4639501 acgagccatg acattgctga cgactctggc agtggcagat gacataaaac tggtcgactg + 4639561 gttacaacaa cgcctggggc ttttagagca acgagacacg gcaatgttgc accgtttgct + 4639621 gcatgatatt gaaaaaaata tcaccaaata aaaaacgcct tagtaagtat ttttc +// + +Data files + + None. + +Notes + + None. + +References + + Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and + Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench + for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306. + + Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for + large-scale analysis of high-throughput omics data, J. Pest Sci., + 31, 7. + + Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome + Analysis Environment with REST and SOAP Web Service Interfaces, + Nucleic Acids Res., 38, W700-W705. + +Warnings + + None. + +Diagnostic Error Messages + + None. + +Exit status + + It always exits with a status of 0. + +Known bugs + + None. + +See also + + entret Retrieve sequence entries from flatfile databases and files + seqret Read and write (return) sequences + +Author(s) + + Hidetoshi Itaya (celery@g-language.org) + Institute for Advanced Biosciences, Keio University + 252-0882 Japan + + Kazuharu Arakawa (gaou@sfc.keio.ac.jp) + Institute for Advanced Biosciences, Keio University + 252-0882 Japan + +History + + 2012 - Written by Hidetoshi Itaya + +Target users + + This program is intended to be used by everyone and everything, from + naive users to embedded scripts. + +Comments + + None. +