gembassy: GEMBASSY-1.0.3/doc/text/genret.txt comparison

comparison GEMBASSY-1.0.3/doc/text/genret.txt @ 0:8300eb051bea draft

Initial upload

author	ktnyt
date	Fri, 26 Jun 2015 05:19:29 -0400
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:8300eb051bea
+genret
+Function
+Retrieves various gene related information from genome flatfile
+Description
+genret reads in one or more genome flatfiles and retrieves various data from
+the input file. It is a wrapper program to the G-language REST service,
+where a method is specified by giving a string to the "method" qualifier. By
+default, genret will parse the input file to retrieve the accession ID
+(or name) of the genome to query G-language REST service. By setting the
+"accid" qualifier to false (or 0), genret will instead parse the sequence
+and features of the genome to create a GenBank formatted flatfile and upload
+the file to the G-language web server. Using the file uploaded, genret will
+execute the method provided.
+genret is able to perform a variety of tasks, incluing the retrieval of
+sequence upstream, downstream, or around the start or stop codon,
+translated gene sequences search of gene data by keyword, and re-annotation
+and retrieval of genome flatfiles. The set of genes can be given as flat
+text, regular expression, or a file containing the list of genes.
+Details on G-language REST service is available from the wiki page
+http://www.g-language.org/wiki/rest
+Documentation on G-language Genome Analysis Environment methods are
+provided at the Document Center
+http://ws.g-language.org/gdoc/
+Usage
+Here is a sample session with genret
+Retrieving sequences upstream, downstream, or around the start/stop codons.
+The following example shows the retrieval of sequence around the start
+codons of all genes.
+Genes to access are specified by regular expression. '*' stands for every
+gene.
+Available methods are:
+after_startcodon
+after_stopcodon
+around_startcodon
+around_stopcodon
+before_startcodon
+before_stopcodon
+% genret
+Retrieves various gene related information from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+Gene name(s) to lookup [*]:
+Feature to access: around_startcodon
+Full text output file [nc_000913.around_startcodon]:
+Go to the input files for this example
+Go to the output files for this example
+Example 2
+Using flat text as target genes. The names can be split with with a space,
+comma, or vertical bar.
+% genret
+Retrieves various gene related information from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+List of gene name(s) to report [*]: recA,recB
+Name of gene feature to access: translation
+Sequence output file [nc_000913.translation.genret]: stdout
+>recA
+MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR
+IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT
+GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL
+KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR
+VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN
+ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF
+>recB
+MSDVAETLDPLRLPLQGERLIEASAGTGKTFTIAALYLRLLLGLGGSAAFPRPLTVEELLV
+VTFTEAATAELRGRIRSNIHELRIACLRETTDNPLYERLLEEIDDKAQAAQWLLLAERQMD
+EAAVFTIHGFCQRMLNLNAFESGMLFEQQLIEDESLLRYQACADFWRRHCYPLPREIAQVV
+FETWKGPQALLRDINRYLQGEAPVIKAPPPDDETLASRHAQIVARIDTVKQQWRDAVGELD
+ALIESSGIDRRKFNRSNQAKWIDKISAWAEEETNSYQLPESLEKFSQRFLEDRTKAGGETP
+RHPLFEAIDQLLAEPLSIRDLVITRALAEIRETVAREKRRRGELGFDDMLSRLDSALRSES
+GEVLAAAIRTRFPVAMIDEFQDTDPQQYRIFRRIWHHQPETALLLIGDPKQAIYAFRGADI
+FTYMKARSEVHAHYTLDTNWRSAPGMVNSVNKLFSQTDDAFMFREIPFIPVKSAGKNQALR
+FVFKGETQPAMKMWLMEGESCGVGDYQSTMAQVCAAQIRDWLQAGQRGEALLMNGDDARPV
+RASDISVLVRSRQEAAQVRDALTLLEIPSVYLSNRDSVFETLEAQEMLWLLQAVMTPEREN
+TLRSALATSMMGLNALDIETLNNDEHAWDVVVEEFDGYRQIWRKRGVMPMLRALMSARNIA
+ENLLATAGGERRLTDILHISELLQEAGTQLESEHALVRWLSQHILEPDSNASSQQMRLESD
+KHLVQIVTIHKSKGLEYPLVWLPFITNFRVQEQAFYHDRHSFEAVLDLNAAPESVDLAEAE
+RLAEDLRLLYVALTRSVWHCSLGVAPLVRRRGDKKGDTDVHQSALGRLLQKGEPQDAAGLR
+TCIEALCDDDIAWQTAQTGDNQPWQVNDVSTAELNAKTLQRLPGDNWRVTSYSGLQQRGHG
+IAQDLMPRLDVDAAGVASVVEEPTLTPHQFPRGASPGTFLHSLFEDLDFTQPVDPNWVREK
+LELGGFESQWEPVLTEWITAVLQAPLNETGVSLSQLSARNKQVEMEFYLPISEPLIASQLD
+TLIRQFDPLSAGCPPLEFMQVRGMLKGFIDLVFRHEGRYYLLDYKSNWLGEDSSAYTQQAM
+AAAMQAHRYDLQYQLYTLALHRYLRHRIADYDYEHHFGGVIYLFLRGVDKEHPQQGIYTTR
+PNAGLIALMDEMFAGMTLEEA
+Example 3
+Using a file with a list of gene names.
+The following example will retrieve the strand direction for each gene
+listed in the "gene_list.txt" file. String prefixed with an "@" or "list::"
+will be interpreted as file names.
+% genret
+Retrieves various gene features from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+List of gene name(s) to report [*]: @gene_list.txt
+Name of gene feature to access: direction
+Full text output file [nc_000913.direction]: stdout
+gene,direction
+thrA,direct
+thrB,direct
+thrC,direct
+Go to the input files for this example
+Go to the output files for this example
+Example 4
+Retrieving translations of coding sequences.
+The following example will retrieve the translated protein sequence of
+the "recA" gene.
+% genret
+Retrieves various gene related information from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+Gene name(s) to lookup [*]: recA
+Feature to access: translation
+Full text output file [nc_000913.translation]: stdout
+>recA
+MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR
+IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT
+GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL
+KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR
+VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN
+ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF
+Example 5
+Retrieving feature information of the genes.
+The following example will retrieve the start positions for each gene.
+The values for the keys in GenBank format is available for retrieval.
+(ex. start end direction GO* etc.)
+Positions will be returned with a 1 start value.
+% genret
+Retrieves various gene related information from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+Gene name(s) to lookup [*]:
+Feature to access: start
+Full text output file [nc_000913.start]:
+Go to the input files for this example
+Go to the output files for this example
+Example 6
+Passing extra arguments to the methods.
+The following example shows the retrieval of 30 base pairs around the
+start codon of the "recA" gene. By default, the "around_startcodon" method
+returns 200 base pairs around the start codon. Using the "-argument"
+qualifier allows the user to change this value.
+% genret refseqn:NC_000913 recA around_startcodon -argument 30,30 stdout
+Retrieves various gene features from genome flatfile
+>recA
+ccggtattacccggcatgacaggagtaaaaatggctatcgacgaaaacaaacagaaagcgt
+tg
+Example 7
+Re-annotating a flatfile.
+genret supports re-annotation of a genome flatfile via Restauro-G
+service developed by our team. Using the BLAST Like Alignment Tool,
+to refer the UniProt KB and annotates information including the description,
+comments, feature tables, cross references, COG family, position, and Pfam.
+The original software is available at [http://restauro-g.iab.keio.ac.jp].
+% genret refseqn:NC_000913 '*' annotate nc_000913-annotate.gbk
+Retrieves various gene features from genome flatfile
+Command line arguments
+Standard (Mandatory) qualifiers:
+[-sequence]          seqall     Nucleotide sequence(s) filename and optional
+format, or reference (input USA)
+[-gene]              string     [*] Gene name(s) to lookup (Any string)
+[-access]            string     Feature to access (Any string)
+[-outfile]           outfile    [*.genret] Full text output file
+Additional (Optional) qualifiers: (none)
+Advanced (Unprompted) qualifiers:
+-argument           string     Option to give to method (Any string)
+-[no]accid          boolean    [Y] Include to use sequence accession ID as
+query
+General qualifiers:
+-help               boolean    Report command line options and exit. More
+information on associated and general
+qualifiers can be found with -help -verbose
+Input file format
+Database definitions for the examples are included in the embossrc_template
+file of the Keio Bioinformatcs Web Service (KBWS) package.
+Input files for usage example 4
+File: gene_list.txt
+thrA
+thrB
+thrC
+Output file format
+Output files for usage example 1
+File: nc_000913.around_startcodon
+>thrL
+cgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcata
+gcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccacca
+ttaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacaca
+gaaaaaagcccgcacctgac
+>thrA
+aggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgc
+gggctttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcg
+gtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgc
+caggcaggggcaggtggcca
+[Part of this file has been deleted for brevity]
+>yjjY
+tgcatgtttgctacctaaattgccaactaaatcgaaacaggaagtacaaaagtccctgacc
+tgcctgatgcatgctgcaaattaacatgatcggcgtaacatgactaaagtacgtaattgcg
+ttcttgatgcactttccatcaacgtcaacaacatcattagcttggtcgtgggtactttccc
+tcaggacccgacagtgtcaa
+>yjtD
+tttttctgcgacttacgttaagaatttgtaaattcgcaccgcgtaataagttgacagtgat
+cacccggttcgcggttatttgatcaagaagagtggcaatatgcgtataacgattattctgg
+tcgcacccgccagagcagaaaatattggggcagcggcgcgggcaatgaaaacgatggggtt
+tagcgatctgcggattgtcg
+Output files for usage example 5
+File: nc_000913.start
+gene,start
+thrL,190
+thrA,337
+thrB,2801
+thrC,3734
+yaaX,5234
+yaaA,5683
+yaaJ,6529
+talB,8238
+mog,9306
+[Part of this file has been deleted for brevity]
+yjjX,4631256
+ytjC,4631820
+rob,4632464
+creA,4633544
+creB,4634030
+creC,4634719
+creD,4636201
+arcA,4637613
+yjjY,4638425
+yjtD,4638965
+Output files for usage example 7
+File: ecoli-annotate.gbk
+LOCUS       NC_000913            4639675 bp    DNA     circular BCT 25-OCT-2010
+DEFINITION  Escherichia coli str. K-12 substr. MG1655 chromosome, complete
+genome.
+ACCESSION   NC_000913
+VERSION     NC_000913.2  GI:49175990
+DBLINK      Project: 57779
+KEYWORDS    .
+SOURCE      Escherichia coli str. K-12 substr. MG1655
+ORGANISM  Escherichia coli str. K-12 substr. MG1655
+Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
+[Part of this file has been deleted for brevity]
+CDS             2801..3733
+/EC_number="2.7.1.39"
+/codon_start="1"
+/db_xref="GI:16127997"
+/db_xref="ASAP:ABE-0000010"
+/db_xref="UniProtKB/Swiss-Prot:P00547"
+/db_xref="ECOCYC:EG10999"
+/db_xref="EcoGene:EG10999"
+/db_xref="GeneID:947498"
+/function="enzyme; Amino acid biosynthesis: Threonine"
+/function="1.5.1.8 metabolism; building block
+biosynthesis; amino acids; threonine"
+/function="7.1 location of gene products; cytoplasm"
+/gene="thrB"
+/gene_synonym="ECK0003; JW0002"
+/locus_tag="b0003"
+/note="GO_component: GO:0005737 - cytoplasm; GO_process:
+GO:0009088 - threonine biosynthetic process"
+/product="homoserine kinase"
+/protein_id="NP_414544.1"
+/rs_com="FUNCTION: Catalyzes the ATP-dependent
+phosphorylation of L- homoserine to L-homoserine
+phosphate (By similarity)."
+/rs_com="CATALYTIC ACTIVITY: ATP + L-homoserine = ADP +
+O-phospho-L- homoserine."
+/rs_com="PATHWAY: Amino-acid biosynthesis; L-threonine
+biosynthesis; L- threonine from L-aspartate: step 4/5."
+/rs_com="SUBCELLULAR LOCATION: Cytoplasm (Potential)."
+/rs_com="SIMILARITY: Belongs to the GHMP kinase family.
+Homoserine kinase subfamily."
+/rs_des="RecName: Full=Homoserine kinase; Short=HK;
+Short=HSK; EC=2.7.1.39;"
+/rs_protein="Level 1: similar to KHSE_ECODH 1.7e-180"
+/rs_xr="EMBL; CP000948; ACB01208.1; -; Genomic_DNA."
+/rs_xr="RefSeq; YP_001728986.1; -."
+/rs_xr="ProteinModelPortal; B1XBC8; -."
+/rs_xr="SMR; B1XBC8; 2-308."
+/rs_xr="EnsemblBacteria; EBESCT00000012034;
+EBESCP00000011562; EBESCG00000011096."
+/rs_xr="GeneID; 6058639; -."
+/rs_xr="GenomeReviews; CP000948_GR; ECDH10B_0003."
+/rs_xr="KEGG; ecd:ECDH10B_0003; -."
+/rs_xr="HOGENOM; HBG646290; -."
+/rs_xr="OMA; GSAHADN; -."
+/rs_xr="ProtClustDB; PRK01212; -."
+/rs_xr="BioCyc; ECOL316385:ECDH10B_0003-MONOMER; -."
+/rs_xr="GO; GO:0005737; C:cytoplasm;
+IEA:UniProtKB-SubCell."
+/rs_xr="GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW."
+/rs_xr="GO; GO:0004413; F:homoserine kinase activity;
+IEA:EC."
+/rs_xr="GO; GO:0009088; P:threonine biosynthetic process;
+IEA:UniProtKB-KW."
+/rs_xr="HAMAP; MF_00384; Homoser_kinase; 1; -."
+/rs_xr="InterPro; IPR006204; GHMP_kinase."
+/rs_xr="InterPro; IPR013750; GHMP_kinase_C."
+/rs_xr="InterPro; IPR006203; GHMP_knse_ATP-bd_CS."
+/rs_xr="InterPro; IPR000870; Homoserine_kin."
+/rs_xr="InterPro; IPR020568; Ribosomal_S5_D2-typ_fold."
+/rs_xr="InterPro; IPR014721;
+Ribosomal_S5_D2-typ_fold_subgr."
+/rs_xr="Gene3D; G3DSA:3.30.230.10;
+Ribosomal_S5_D2-type_fold; 1."
+/rs_xr="Pfam; PF08544; GHMP_kinases_C; 1."
+/rs_xr="Pfam; PF00288; GHMP_kinases_N; 1."
+/rs_xr="PIRSF; PIRSF000676; Homoser_kin; 1."
+/rs_xr="PRINTS; PR00958; HOMSERKINASE."
+/rs_xr="SUPFAM; SSF54211; Ribosomal_S5_D2-typ_fold; 1."
+/rs_xr="TIGRFAMs; TIGR00191; thrB; 1."
+/rs_xr="PROSITE; PS00627; GHMP_KINASES_ATP; 1."
+/transl_table="11"
+/translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETF
+SLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACS
+VVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDI
+ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ
+PELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETA
+QRVADWLGKNYLQNQEGFVHICRLDTAGARVLEN"
+[Part of this file has been deleted for brevity]
+4639201 gcgcagtcgg gcgaaatatc attactacgc cacgccagtt gaactggtgc cgctgttaga
+4639261 ggaaaaatct tcatggatga gccatgccgc gctggtgttt ggtcgcgaag attccgggtt
+4639321 gactaacgaa gagttagcgt tggctgacgt tcttactggt gtgccgatgg tggcggatta
+4639381 tccttcgctc aatctggggc aggcggtgat ggtctattgc tatcaattag caacattaat
+4639441 acaacaaccg gcgaaaagtg atgcaacggc agaccaacat caactgcaag ctttacgcga
+4639501 acgagccatg acattgctga cgactctggc agtggcagat gacataaaac tggtcgactg
+4639561 gttacaacaa cgcctggggc ttttagagca acgagacacg gcaatgttgc accgtttgct
+4639621 gcatgatatt gaaaaaaata tcaccaaata aaaaacgcct tagtaagtat ttttc
+//
+Data files
+None.
+Notes
+None.
+References
+Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
+Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
+for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.
+Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
+large-scale analysis of high-throughput omics data, J. Pest Sci.,
+31, 7.
+Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
+Analysis Environment with REST and SOAP Web Service Interfaces,
+Nucleic Acids Res., 38, W700-W705.
+Warnings
+None.
+Diagnostic Error Messages
+None.
+Exit status
+It always exits with a status of 0.
+Known bugs
+None.
+See also
+entret Retrieve sequence entries from flatfile databases and files
+seqret Read and write (return) sequences
+Author(s)
+Hidetoshi Itaya (celery@g-language.org)
+Institute for Advanced Biosciences, Keio University
+252-0882 Japan
+Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
+Institute for Advanced Biosciences, Keio University
+252-0882 Japan
+History
+2012 - Written by Hidetoshi Itaya
+Target users
+This program is intended to be used by everyone and everything, from
+naive users to embedded scripts.
+Comments
+None.

Mercurial > repos > ktnyt > gembassy

comparison GEMBASSY-1.0.3/doc/text/genret.txt @ 0:8300eb051bea draft