Mercurial > repos > ktnyt > gembassy
diff GEMBASSY-1.0.3/doc/text/genret.txt @ 2:8947fca5f715 draft default tip
Uploaded
author | ktnyt |
---|---|
date | Fri, 26 Jun 2015 05:21:44 -0400 |
parents | 84a17b3fad1f |
children |
line wrap: on
line diff
--- a/GEMBASSY-1.0.3/doc/text/genret.txt Fri Jun 26 05:20:29 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,451 +0,0 @@ - genret -Function - - Retrieves various gene related information from genome flatfile - -Description - - genret reads in one or more genome flatfiles and retrieves various data from - the input file. It is a wrapper program to the G-language REST service, - where a method is specified by giving a string to the "method" qualifier. By - default, genret will parse the input file to retrieve the accession ID - (or name) of the genome to query G-language REST service. By setting the - "accid" qualifier to false (or 0), genret will instead parse the sequence - and features of the genome to create a GenBank formatted flatfile and upload - the file to the G-language web server. Using the file uploaded, genret will - execute the method provided. - - genret is able to perform a variety of tasks, incluing the retrieval of - sequence upstream, downstream, or around the start or stop codon, - translated gene sequences search of gene data by keyword, and re-annotation - and retrieval of genome flatfiles. The set of genes can be given as flat - text, regular expression, or a file containing the list of genes. - - Details on G-language REST service is available from the wiki page - - http://www.g-language.org/wiki/rest - - Documentation on G-language Genome Analysis Environment methods are - provided at the Document Center - - http://ws.g-language.org/gdoc/ - -Usage - - Here is a sample session with genret - - Retrieving sequences upstream, downstream, or around the start/stop codons. - The following example shows the retrieval of sequence around the start - codons of all genes. - - Genes to access are specified by regular expression. '*' stands for every - gene. - - Available methods are: - after_startcodon - after_stopcodon - around_startcodon - around_stopcodon - before_startcodon - before_stopcodon - -% genret -Retrieves various gene related information from genome flatfile -Input nucleotide sequence(s): refseqn:NC_000913 -Gene name(s) to lookup [*]: -Feature to access: around_startcodon -Full text output file [nc_000913.around_startcodon]: - - Go to the input files for this example - Go to the output files for this example - - Example 2 - - Using flat text as target genes. The names can be split with with a space, - comma, or vertical bar. - -% genret -Retrieves various gene related information from genome flatfile -Input nucleotide sequence(s): refseqn:NC_000913 -List of gene name(s) to report [*]: recA,recB -Name of gene feature to access: translation -Sequence output file [nc_000913.translation.genret]: stdout ->recA -MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR -IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT -GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL -KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR -VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN -ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF ->recB -MSDVAETLDPLRLPLQGERLIEASAGTGKTFTIAALYLRLLLGLGGSAAFPRPLTVEELLV -VTFTEAATAELRGRIRSNIHELRIACLRETTDNPLYERLLEEIDDKAQAAQWLLLAERQMD -EAAVFTIHGFCQRMLNLNAFESGMLFEQQLIEDESLLRYQACADFWRRHCYPLPREIAQVV -FETWKGPQALLRDINRYLQGEAPVIKAPPPDDETLASRHAQIVARIDTVKQQWRDAVGELD -ALIESSGIDRRKFNRSNQAKWIDKISAWAEEETNSYQLPESLEKFSQRFLEDRTKAGGETP -RHPLFEAIDQLLAEPLSIRDLVITRALAEIRETVAREKRRRGELGFDDMLSRLDSALRSES -GEVLAAAIRTRFPVAMIDEFQDTDPQQYRIFRRIWHHQPETALLLIGDPKQAIYAFRGADI -FTYMKARSEVHAHYTLDTNWRSAPGMVNSVNKLFSQTDDAFMFREIPFIPVKSAGKNQALR -FVFKGETQPAMKMWLMEGESCGVGDYQSTMAQVCAAQIRDWLQAGQRGEALLMNGDDARPV -RASDISVLVRSRQEAAQVRDALTLLEIPSVYLSNRDSVFETLEAQEMLWLLQAVMTPEREN -TLRSALATSMMGLNALDIETLNNDEHAWDVVVEEFDGYRQIWRKRGVMPMLRALMSARNIA -ENLLATAGGERRLTDILHISELLQEAGTQLESEHALVRWLSQHILEPDSNASSQQMRLESD -KHLVQIVTIHKSKGLEYPLVWLPFITNFRVQEQAFYHDRHSFEAVLDLNAAPESVDLAEAE -RLAEDLRLLYVALTRSVWHCSLGVAPLVRRRGDKKGDTDVHQSALGRLLQKGEPQDAAGLR -TCIEALCDDDIAWQTAQTGDNQPWQVNDVSTAELNAKTLQRLPGDNWRVTSYSGLQQRGHG -IAQDLMPRLDVDAAGVASVVEEPTLTPHQFPRGASPGTFLHSLFEDLDFTQPVDPNWVREK -LELGGFESQWEPVLTEWITAVLQAPLNETGVSLSQLSARNKQVEMEFYLPISEPLIASQLD -TLIRQFDPLSAGCPPLEFMQVRGMLKGFIDLVFRHEGRYYLLDYKSNWLGEDSSAYTQQAM -AAAMQAHRYDLQYQLYTLALHRYLRHRIADYDYEHHFGGVIYLFLRGVDKEHPQQGIYTTR -PNAGLIALMDEMFAGMTLEEA - - Example 3 - - Using a file with a list of gene names. - The following example will retrieve the strand direction for each gene - listed in the "gene_list.txt" file. String prefixed with an "@" or "list::" - will be interpreted as file names. - -% genret -Retrieves various gene features from genome flatfile -Input nucleotide sequence(s): refseqn:NC_000913 -List of gene name(s) to report [*]: @gene_list.txt -Name of gene feature to access: direction -Full text output file [nc_000913.direction]: stdout -gene,direction -thrA,direct -thrB,direct -thrC,direct - - Go to the input files for this example - Go to the output files for this example - - Example 4 - - Retrieving translations of coding sequences. - The following example will retrieve the translated protein sequence of - the "recA" gene. - -% genret -Retrieves various gene related information from genome flatfile -Input nucleotide sequence(s): refseqn:NC_000913 -Gene name(s) to lookup [*]: recA -Feature to access: translation -Full text output file [nc_000913.translation]: stdout ->recA -MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR -IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT -GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL -KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR -VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN -ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF - - Example 5 - - Retrieving feature information of the genes. - The following example will retrieve the start positions for each gene. - The values for the keys in GenBank format is available for retrieval. - (ex. start end direction GO* etc.) - Positions will be returned with a 1 start value. - -% genret -Retrieves various gene related information from genome flatfile -Input nucleotide sequence(s): refseqn:NC_000913 -Gene name(s) to lookup [*]: -Feature to access: start -Full text output file [nc_000913.start]: - - Go to the input files for this example - Go to the output files for this example - - Example 6 - - Passing extra arguments to the methods. - The following example shows the retrieval of 30 base pairs around the - start codon of the "recA" gene. By default, the "around_startcodon" method - returns 200 base pairs around the start codon. Using the "-argument" - qualifier allows the user to change this value. - -% genret refseqn:NC_000913 recA around_startcodon -argument 30,30 stdout -Retrieves various gene features from genome flatfile ->recA -ccggtattacccggcatgacaggagtaaaaatggctatcgacgaaaacaaacagaaagcgt -tg - - Example 7 - - Re-annotating a flatfile. - genret supports re-annotation of a genome flatfile via Restauro-G - service developed by our team. Using the BLAST Like Alignment Tool, - to refer the UniProt KB and annotates information including the description, - comments, feature tables, cross references, COG family, position, and Pfam. - The original software is available at [http://restauro-g.iab.keio.ac.jp]. - - -% genret refseqn:NC_000913 '*' annotate nc_000913-annotate.gbk -Retrieves various gene features from genome flatfile - -Command line arguments - - Standard (Mandatory) qualifiers: - [-sequence] seqall Nucleotide sequence(s) filename and optional - format, or reference (input USA) - [-gene] string [*] Gene name(s) to lookup (Any string) - [-access] string Feature to access (Any string) - [-outfile] outfile [*.genret] Full text output file - - Additional (Optional) qualifiers: (none) - Advanced (Unprompted) qualifiers: - -argument string Option to give to method (Any string) - -[no]accid boolean [Y] Include to use sequence accession ID as - query - - General qualifiers: - -help boolean Report command line options and exit. More - information on associated and general - qualifiers can be found with -help -verbose - -Input file format - - Database definitions for the examples are included in the embossrc_template - file of the Keio Bioinformatcs Web Service (KBWS) package. - - Input files for usage example 4 - - File: gene_list.txt - -thrA -thrB -thrC - -Output file format - - Output files for usage example 1 - - File: nc_000913.around_startcodon - ->thrL -cgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcata -gcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccacca -ttaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacaca -gaaaaaagcccgcacctgac ->thrA -aggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgc -gggctttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcg -gtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgc -caggcaggggcaggtggcca - - [Part of this file has been deleted for brevity] - ->yjjY -tgcatgtttgctacctaaattgccaactaaatcgaaacaggaagtacaaaagtccctgacc -tgcctgatgcatgctgcaaattaacatgatcggcgtaacatgactaaagtacgtaattgcg -ttcttgatgcactttccatcaacgtcaacaacatcattagcttggtcgtgggtactttccc -tcaggacccgacagtgtcaa ->yjtD -tttttctgcgacttacgttaagaatttgtaaattcgcaccgcgtaataagttgacagtgat -cacccggttcgcggttatttgatcaagaagagtggcaatatgcgtataacgattattctgg -tcgcacccgccagagcagaaaatattggggcagcggcgcgggcaatgaaaacgatggggtt -tagcgatctgcggattgtcg - - Output files for usage example 5 - - File: nc_000913.start - -gene,start -thrL,190 -thrA,337 -thrB,2801 -thrC,3734 -yaaX,5234 -yaaA,5683 -yaaJ,6529 -talB,8238 -mog,9306 - - [Part of this file has been deleted for brevity] - -yjjX,4631256 -ytjC,4631820 -rob,4632464 -creA,4633544 -creB,4634030 -creC,4634719 -creD,4636201 -arcA,4637613 -yjjY,4638425 -yjtD,4638965 - - Output files for usage example 7 - - File: ecoli-annotate.gbk - -LOCUS NC_000913 4639675 bp DNA circular BCT 25-OCT-2010 -DEFINITION Escherichia coli str. K-12 substr. MG1655 chromosome, complete - genome. -ACCESSION NC_000913 -VERSION NC_000913.2 GI:49175990 -DBLINK Project: 57779 -KEYWORDS . -SOURCE Escherichia coli str. K-12 substr. MG1655 - ORGANISM Escherichia coli str. K-12 substr. MG1655 - Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; - - [Part of this file has been deleted for brevity] - - CDS 2801..3733 - /EC_number="2.7.1.39" - /codon_start="1" - /db_xref="GI:16127997" - /db_xref="ASAP:ABE-0000010" - /db_xref="UniProtKB/Swiss-Prot:P00547" - /db_xref="ECOCYC:EG10999" - /db_xref="EcoGene:EG10999" - /db_xref="GeneID:947498" - /function="enzyme; Amino acid biosynthesis: Threonine" - /function="1.5.1.8 metabolism; building block - biosynthesis; amino acids; threonine" - /function="7.1 location of gene products; cytoplasm" - /gene="thrB" - /gene_synonym="ECK0003; JW0002" - /locus_tag="b0003" - /note="GO_component: GO:0005737 - cytoplasm; GO_process: - GO:0009088 - threonine biosynthetic process" - /product="homoserine kinase" - /protein_id="NP_414544.1" - /rs_com="FUNCTION: Catalyzes the ATP-dependent - phosphorylation of L- homoserine to L-homoserine - phosphate (By similarity)." - /rs_com="CATALYTIC ACTIVITY: ATP + L-homoserine = ADP + - O-phospho-L- homoserine." - /rs_com="PATHWAY: Amino-acid biosynthesis; L-threonine - biosynthesis; L- threonine from L-aspartate: step 4/5." - /rs_com="SUBCELLULAR LOCATION: Cytoplasm (Potential)." - /rs_com="SIMILARITY: Belongs to the GHMP kinase family. - Homoserine kinase subfamily." - /rs_des="RecName: Full=Homoserine kinase; Short=HK; - Short=HSK; EC=2.7.1.39;" - /rs_protein="Level 1: similar to KHSE_ECODH 1.7e-180" - /rs_xr="EMBL; CP000948; ACB01208.1; -; Genomic_DNA." - /rs_xr="RefSeq; YP_001728986.1; -." - /rs_xr="ProteinModelPortal; B1XBC8; -." - /rs_xr="SMR; B1XBC8; 2-308." - /rs_xr="EnsemblBacteria; EBESCT00000012034; - EBESCP00000011562; EBESCG00000011096." - /rs_xr="GeneID; 6058639; -." - /rs_xr="GenomeReviews; CP000948_GR; ECDH10B_0003." - /rs_xr="KEGG; ecd:ECDH10B_0003; -." - /rs_xr="HOGENOM; HBG646290; -." - /rs_xr="OMA; GSAHADN; -." - /rs_xr="ProtClustDB; PRK01212; -." - /rs_xr="BioCyc; ECOL316385:ECDH10B_0003-MONOMER; -." - /rs_xr="GO; GO:0005737; C:cytoplasm; - IEA:UniProtKB-SubCell." - /rs_xr="GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW." - /rs_xr="GO; GO:0004413; F:homoserine kinase activity; - IEA:EC." - /rs_xr="GO; GO:0009088; P:threonine biosynthetic process; - IEA:UniProtKB-KW." - /rs_xr="HAMAP; MF_00384; Homoser_kinase; 1; -." - /rs_xr="InterPro; IPR006204; GHMP_kinase." - /rs_xr="InterPro; IPR013750; GHMP_kinase_C." - /rs_xr="InterPro; IPR006203; GHMP_knse_ATP-bd_CS." - /rs_xr="InterPro; IPR000870; Homoserine_kin." - /rs_xr="InterPro; IPR020568; Ribosomal_S5_D2-typ_fold." - /rs_xr="InterPro; IPR014721; - Ribosomal_S5_D2-typ_fold_subgr." - /rs_xr="Gene3D; G3DSA:3.30.230.10; - Ribosomal_S5_D2-type_fold; 1." - /rs_xr="Pfam; PF08544; GHMP_kinases_C; 1." - /rs_xr="Pfam; PF00288; GHMP_kinases_N; 1." - /rs_xr="PIRSF; PIRSF000676; Homoser_kin; 1." - /rs_xr="PRINTS; PR00958; HOMSERKINASE." - /rs_xr="SUPFAM; SSF54211; Ribosomal_S5_D2-typ_fold; 1." - /rs_xr="TIGRFAMs; TIGR00191; thrB; 1." - /rs_xr="PROSITE; PS00627; GHMP_KINASES_ATP; 1." - /transl_table="11" - /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETF - SLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACS - VVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDI - ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ - PELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETA - QRVADWLGKNYLQNQEGFVHICRLDTAGARVLEN" - - [Part of this file has been deleted for brevity] - - 4639201 gcgcagtcgg gcgaaatatc attactacgc cacgccagtt gaactggtgc cgctgttaga - 4639261 ggaaaaatct tcatggatga gccatgccgc gctggtgttt ggtcgcgaag attccgggtt - 4639321 gactaacgaa gagttagcgt tggctgacgt tcttactggt gtgccgatgg tggcggatta - 4639381 tccttcgctc aatctggggc aggcggtgat ggtctattgc tatcaattag caacattaat - 4639441 acaacaaccg gcgaaaagtg atgcaacggc agaccaacat caactgcaag ctttacgcga - 4639501 acgagccatg acattgctga cgactctggc agtggcagat gacataaaac tggtcgactg - 4639561 gttacaacaa cgcctggggc ttttagagca acgagacacg gcaatgttgc accgtttgct - 4639621 gcatgatatt gaaaaaaata tcaccaaata aaaaacgcct tagtaagtat ttttc -// - -Data files - - None. - -Notes - - None. - -References - - Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and - Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench - for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306. - - Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for - large-scale analysis of high-throughput omics data, J. Pest Sci., - 31, 7. - - Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome - Analysis Environment with REST and SOAP Web Service Interfaces, - Nucleic Acids Res., 38, W700-W705. - -Warnings - - None. - -Diagnostic Error Messages - - None. - -Exit status - - It always exits with a status of 0. - -Known bugs - - None. - -See also - - entret Retrieve sequence entries from flatfile databases and files - seqret Read and write (return) sequences - -Author(s) - - Hidetoshi Itaya (celery@g-language.org) - Institute for Advanced Biosciences, Keio University - 252-0882 Japan - - Kazuharu Arakawa (gaou@sfc.keio.ac.jp) - Institute for Advanced Biosciences, Keio University - 252-0882 Japan - -History - - 2012 - Written by Hidetoshi Itaya - -Target users - - This program is intended to be used by everyone and everything, from - naive users to embedded scripts. - -Comments - - None. -