diff GEMBASSY-1.0.3/doc/text/genret.txt @ 0:8300eb051bea draft

Initial upload
author ktnyt
date Fri, 26 Jun 2015 05:19:29 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/GEMBASSY-1.0.3/doc/text/genret.txt	Fri Jun 26 05:19:29 2015 -0400
@@ -0,0 +1,451 @@
+                                     genret
+Function
+
+   Retrieves various gene related information from genome flatfile
+
+Description
+
+   genret reads in one or more genome flatfiles and retrieves various data from
+   the input file. It is a wrapper program to the G-language REST service,
+   where a method is specified by giving a string to the "method" qualifier. By
+   default, genret will parse the input file to retrieve the accession ID
+   (or name) of the genome to query G-language REST service. By setting the
+   "accid" qualifier to false (or 0), genret will instead parse the sequence
+   and features of the genome to create a GenBank formatted flatfile and upload
+   the file to the G-language web server. Using the file uploaded, genret will
+   execute the method provided.
+
+   genret is able to perform a variety of tasks, incluing the retrieval of
+   sequence upstream, downstream, or around the start or stop codon,
+   translated gene sequences search of gene data by keyword, and re-annotation
+   and retrieval of genome flatfiles. The set of genes can be given as flat
+   text, regular expression, or a file containing the list of genes.
+
+   Details on G-language REST service is available from the wiki page
+
+   http://www.g-language.org/wiki/rest
+
+   Documentation on G-language Genome Analysis Environment methods are
+   provided at the Document Center
+
+   http://ws.g-language.org/gdoc/
+
+Usage
+
+   Here is a sample session with genret
+
+   Retrieving sequences upstream, downstream, or around the start/stop codons. 
+   The following example shows the retrieval of sequence around the start
+   codons of all genes.
+
+   Genes to access are specified by regular expression. '*' stands for every
+   gene.
+
+   Available methods are:
+      after_startcodon
+      after_stopcodon
+      around_startcodon
+      around_stopcodon
+      before_startcodon
+      before_stopcodon
+
+% genret
+Retrieves various gene related information from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+Gene name(s) to lookup [*]:
+Feature to access: around_startcodon
+Full text output file [nc_000913.around_startcodon]:
+
+   Go to the input files for this example
+   Go to the output files for this example
+
+   Example 2
+
+   Using flat text as target genes. The names can be split with with a space,
+   comma, or vertical bar.
+
+% genret
+Retrieves various gene related information from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+List of gene name(s) to report [*]: recA,recB
+Name of gene feature to access: translation
+Sequence output file [nc_000913.translation.genret]: stdout
+>recA
+MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR
+IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT
+GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL
+KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR
+VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN
+ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF
+>recB
+MSDVAETLDPLRLPLQGERLIEASAGTGKTFTIAALYLRLLLGLGGSAAFPRPLTVEELLV
+VTFTEAATAELRGRIRSNIHELRIACLRETTDNPLYERLLEEIDDKAQAAQWLLLAERQMD
+EAAVFTIHGFCQRMLNLNAFESGMLFEQQLIEDESLLRYQACADFWRRHCYPLPREIAQVV
+FETWKGPQALLRDINRYLQGEAPVIKAPPPDDETLASRHAQIVARIDTVKQQWRDAVGELD
+ALIESSGIDRRKFNRSNQAKWIDKISAWAEEETNSYQLPESLEKFSQRFLEDRTKAGGETP
+RHPLFEAIDQLLAEPLSIRDLVITRALAEIRETVAREKRRRGELGFDDMLSRLDSALRSES
+GEVLAAAIRTRFPVAMIDEFQDTDPQQYRIFRRIWHHQPETALLLIGDPKQAIYAFRGADI
+FTYMKARSEVHAHYTLDTNWRSAPGMVNSVNKLFSQTDDAFMFREIPFIPVKSAGKNQALR
+FVFKGETQPAMKMWLMEGESCGVGDYQSTMAQVCAAQIRDWLQAGQRGEALLMNGDDARPV
+RASDISVLVRSRQEAAQVRDALTLLEIPSVYLSNRDSVFETLEAQEMLWLLQAVMTPEREN
+TLRSALATSMMGLNALDIETLNNDEHAWDVVVEEFDGYRQIWRKRGVMPMLRALMSARNIA
+ENLLATAGGERRLTDILHISELLQEAGTQLESEHALVRWLSQHILEPDSNASSQQMRLESD
+KHLVQIVTIHKSKGLEYPLVWLPFITNFRVQEQAFYHDRHSFEAVLDLNAAPESVDLAEAE
+RLAEDLRLLYVALTRSVWHCSLGVAPLVRRRGDKKGDTDVHQSALGRLLQKGEPQDAAGLR
+TCIEALCDDDIAWQTAQTGDNQPWQVNDVSTAELNAKTLQRLPGDNWRVTSYSGLQQRGHG
+IAQDLMPRLDVDAAGVASVVEEPTLTPHQFPRGASPGTFLHSLFEDLDFTQPVDPNWVREK
+LELGGFESQWEPVLTEWITAVLQAPLNETGVSLSQLSARNKQVEMEFYLPISEPLIASQLD
+TLIRQFDPLSAGCPPLEFMQVRGMLKGFIDLVFRHEGRYYLLDYKSNWLGEDSSAYTQQAM
+AAAMQAHRYDLQYQLYTLALHRYLRHRIADYDYEHHFGGVIYLFLRGVDKEHPQQGIYTTR
+PNAGLIALMDEMFAGMTLEEA
+
+   Example 3
+
+   Using a file with a list of gene names.
+   The following example will retrieve the strand direction for each gene
+   listed in the "gene_list.txt" file. String prefixed with an "@" or "list::"
+   will be interpreted as file names.
+
+% genret
+Retrieves various gene features from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+List of gene name(s) to report [*]: @gene_list.txt
+Name of gene feature to access: direction
+Full text output file [nc_000913.direction]: stdout
+gene,direction
+thrA,direct
+thrB,direct
+thrC,direct
+
+   Go to the input files for this example
+   Go to the output files for this example
+
+   Example 4
+
+   Retrieving translations of coding sequences.
+   The following example will retrieve the translated protein sequence of
+   the "recA" gene.
+
+% genret
+Retrieves various gene related information from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+Gene name(s) to lookup [*]: recA
+Feature to access: translation
+Full text output file [nc_000913.translation]: stdout
+>recA
+MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR
+IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT
+GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL
+KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR
+VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN
+ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF
+
+   Example 5
+
+   Retrieving feature information of the genes.
+   The following example will retrieve the start positions for each gene.
+   The values for the keys in GenBank format is available for retrieval.
+   (ex. start end direction GO* etc.)
+   Positions will be returned with a 1 start value.
+
+% genret
+Retrieves various gene related information from genome flatfile
+Input nucleotide sequence(s): refseqn:NC_000913
+Gene name(s) to lookup [*]:
+Feature to access: start
+Full text output file [nc_000913.start]:
+
+   Go to the input files for this example
+   Go to the output files for this example
+
+   Example 6
+
+   Passing extra arguments to the methods.
+   The following example shows the retrieval of 30 base pairs around the
+   start codon of the "recA" gene. By default, the "around_startcodon" method
+   returns 200 base pairs around the start codon. Using the "-argument"
+   qualifier allows the user to change this value.
+
+% genret refseqn:NC_000913 recA around_startcodon -argument 30,30 stdout
+Retrieves various gene features from genome flatfile
+>recA
+ccggtattacccggcatgacaggagtaaaaatggctatcgacgaaaacaaacagaaagcgt
+tg
+
+   Example 7
+
+   Re-annotating a flatfile.
+   genret supports re-annotation of a genome flatfile via Restauro-G
+   service developed by our team. Using the BLAST Like Alignment Tool,
+   to refer the UniProt KB and annotates information including the description,
+   comments, feature tables, cross references, COG family, position, and Pfam.
+   The original software is available at [http://restauro-g.iab.keio.ac.jp].
+   
+
+% genret refseqn:NC_000913 '*' annotate nc_000913-annotate.gbk
+Retrieves various gene features from genome flatfile
+
+Command line arguments
+
+   Standard (Mandatory) qualifiers:
+  [-sequence]          seqall     Nucleotide sequence(s) filename and optional
+                                  format, or reference (input USA)
+  [-gene]              string     [*] Gene name(s) to lookup (Any string)
+  [-access]            string     Feature to access (Any string)
+  [-outfile]           outfile    [*.genret] Full text output file
+
+   Additional (Optional) qualifiers: (none)
+   Advanced (Unprompted) qualifiers:
+   -argument           string     Option to give to method (Any string)
+   -[no]accid          boolean    [Y] Include to use sequence accession ID as
+                                  query
+
+   General qualifiers:
+   -help               boolean    Report command line options and exit. More
+                                  information on associated and general
+                                  qualifiers can be found with -help -verbose
+
+Input file format
+
+   Database definitions for the examples are included in the embossrc_template
+   file of the Keio Bioinformatcs Web Service (KBWS) package.
+
+   Input files for usage example 4
+
+   File: gene_list.txt
+
+thrA
+thrB
+thrC
+
+Output file format
+
+   Output files for usage example 1
+
+   File: nc_000913.around_startcodon
+
+>thrL
+cgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcata
+gcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccacca
+ttaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacaca
+gaaaaaagcccgcacctgac
+>thrA
+aggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgc
+gggctttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcg
+gtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgc
+caggcaggggcaggtggcca
+
+   [Part of this file has been deleted for brevity]
+
+>yjjY
+tgcatgtttgctacctaaattgccaactaaatcgaaacaggaagtacaaaagtccctgacc
+tgcctgatgcatgctgcaaattaacatgatcggcgtaacatgactaaagtacgtaattgcg
+ttcttgatgcactttccatcaacgtcaacaacatcattagcttggtcgtgggtactttccc
+tcaggacccgacagtgtcaa
+>yjtD
+tttttctgcgacttacgttaagaatttgtaaattcgcaccgcgtaataagttgacagtgat
+cacccggttcgcggttatttgatcaagaagagtggcaatatgcgtataacgattattctgg
+tcgcacccgccagagcagaaaatattggggcagcggcgcgggcaatgaaaacgatggggtt
+tagcgatctgcggattgtcg
+
+   Output files for usage example 5
+
+   File: nc_000913.start
+
+gene,start
+thrL,190
+thrA,337
+thrB,2801
+thrC,3734
+yaaX,5234
+yaaA,5683
+yaaJ,6529
+talB,8238
+mog,9306
+
+   [Part of this file has been deleted for brevity]
+
+yjjX,4631256
+ytjC,4631820
+rob,4632464
+creA,4633544
+creB,4634030
+creC,4634719
+creD,4636201
+arcA,4637613
+yjjY,4638425
+yjtD,4638965
+
+   Output files for usage example 7
+
+   File: ecoli-annotate.gbk
+
+LOCUS       NC_000913            4639675 bp    DNA     circular BCT 25-OCT-2010
+DEFINITION  Escherichia coli str. K-12 substr. MG1655 chromosome, complete
+            genome.
+ACCESSION   NC_000913
+VERSION     NC_000913.2  GI:49175990
+DBLINK      Project: 57779
+KEYWORDS    .
+SOURCE      Escherichia coli str. K-12 substr. MG1655
+  ORGANISM  Escherichia coli str. K-12 substr. MG1655
+            Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
+
+   [Part of this file has been deleted for brevity]
+
+     CDS             2801..3733
+                     /EC_number="2.7.1.39"
+                     /codon_start="1"
+                     /db_xref="GI:16127997"
+                     /db_xref="ASAP:ABE-0000010"
+                     /db_xref="UniProtKB/Swiss-Prot:P00547"
+                     /db_xref="ECOCYC:EG10999"
+                     /db_xref="EcoGene:EG10999"
+                     /db_xref="GeneID:947498"
+                     /function="enzyme; Amino acid biosynthesis: Threonine"
+                     /function="1.5.1.8 metabolism; building block
+                     biosynthesis; amino acids; threonine"
+                     /function="7.1 location of gene products; cytoplasm"
+                     /gene="thrB"
+                     /gene_synonym="ECK0003; JW0002"
+                     /locus_tag="b0003"
+                     /note="GO_component: GO:0005737 - cytoplasm; GO_process:
+                     GO:0009088 - threonine biosynthetic process"
+                     /product="homoserine kinase"
+                     /protein_id="NP_414544.1"
+                     /rs_com="FUNCTION: Catalyzes the ATP-dependent
+                     phosphorylation of L- homoserine to L-homoserine
+                     phosphate (By similarity)."
+                     /rs_com="CATALYTIC ACTIVITY: ATP + L-homoserine = ADP +
+                     O-phospho-L- homoserine."
+                     /rs_com="PATHWAY: Amino-acid biosynthesis; L-threonine
+                     biosynthesis; L- threonine from L-aspartate: step 4/5."
+                     /rs_com="SUBCELLULAR LOCATION: Cytoplasm (Potential)."
+                     /rs_com="SIMILARITY: Belongs to the GHMP kinase family.
+                     Homoserine kinase subfamily."
+                     /rs_des="RecName: Full=Homoserine kinase; Short=HK;
+                     Short=HSK; EC=2.7.1.39;"
+                     /rs_protein="Level 1: similar to KHSE_ECODH 1.7e-180"
+                     /rs_xr="EMBL; CP000948; ACB01208.1; -; Genomic_DNA."
+                     /rs_xr="RefSeq; YP_001728986.1; -."
+                     /rs_xr="ProteinModelPortal; B1XBC8; -."
+                     /rs_xr="SMR; B1XBC8; 2-308."
+                     /rs_xr="EnsemblBacteria; EBESCT00000012034;
+                     EBESCP00000011562; EBESCG00000011096."
+                     /rs_xr="GeneID; 6058639; -."
+                     /rs_xr="GenomeReviews; CP000948_GR; ECDH10B_0003."
+                     /rs_xr="KEGG; ecd:ECDH10B_0003; -."
+                     /rs_xr="HOGENOM; HBG646290; -."
+                     /rs_xr="OMA; GSAHADN; -."
+                     /rs_xr="ProtClustDB; PRK01212; -."
+                     /rs_xr="BioCyc; ECOL316385:ECDH10B_0003-MONOMER; -."
+                     /rs_xr="GO; GO:0005737; C:cytoplasm;
+                     IEA:UniProtKB-SubCell."
+                     /rs_xr="GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW."
+                     /rs_xr="GO; GO:0004413; F:homoserine kinase activity;
+                     IEA:EC."
+                     /rs_xr="GO; GO:0009088; P:threonine biosynthetic process;
+                     IEA:UniProtKB-KW."
+                     /rs_xr="HAMAP; MF_00384; Homoser_kinase; 1; -."
+                     /rs_xr="InterPro; IPR006204; GHMP_kinase."
+                     /rs_xr="InterPro; IPR013750; GHMP_kinase_C."
+                     /rs_xr="InterPro; IPR006203; GHMP_knse_ATP-bd_CS."
+                     /rs_xr="InterPro; IPR000870; Homoserine_kin."
+                     /rs_xr="InterPro; IPR020568; Ribosomal_S5_D2-typ_fold."
+                     /rs_xr="InterPro; IPR014721;
+                     Ribosomal_S5_D2-typ_fold_subgr."
+                     /rs_xr="Gene3D; G3DSA:3.30.230.10;
+                     Ribosomal_S5_D2-type_fold; 1."
+                     /rs_xr="Pfam; PF08544; GHMP_kinases_C; 1."
+                     /rs_xr="Pfam; PF00288; GHMP_kinases_N; 1."
+                     /rs_xr="PIRSF; PIRSF000676; Homoser_kin; 1."
+                     /rs_xr="PRINTS; PR00958; HOMSERKINASE."
+                     /rs_xr="SUPFAM; SSF54211; Ribosomal_S5_D2-typ_fold; 1."
+                     /rs_xr="TIGRFAMs; TIGR00191; thrB; 1."
+                     /rs_xr="PROSITE; PS00627; GHMP_KINASES_ATP; 1."
+                     /transl_table="11"
+                     /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETF
+                     SLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACS
+                     VVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDI
+                     ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ
+                     PELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETA
+                     QRVADWLGKNYLQNQEGFVHICRLDTAGARVLEN"
+
+   [Part of this file has been deleted for brevity]
+
+  4639201 gcgcagtcgg gcgaaatatc attactacgc cacgccagtt gaactggtgc cgctgttaga
+  4639261 ggaaaaatct tcatggatga gccatgccgc gctggtgttt ggtcgcgaag attccgggtt
+  4639321 gactaacgaa gagttagcgt tggctgacgt tcttactggt gtgccgatgg tggcggatta
+  4639381 tccttcgctc aatctggggc aggcggtgat ggtctattgc tatcaattag caacattaat
+  4639441 acaacaaccg gcgaaaagtg atgcaacggc agaccaacat caactgcaag ctttacgcga
+  4639501 acgagccatg acattgctga cgactctggc agtggcagat gacataaaac tggtcgactg
+  4639561 gttacaacaa cgcctggggc ttttagagca acgagacacg gcaatgttgc accgtttgct
+  4639621 gcatgatatt gaaaaaaata tcaccaaata aaaaacgcct tagtaagtat ttttc
+//
+
+Data files
+
+   None.
+
+Notes
+
+   None.
+
+References
+
+   Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
+      Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
+      for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.
+
+   Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
+      large-scale analysis of high-throughput omics data, J. Pest Sci.,
+      31, 7.
+
+   Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
+      Analysis Environment with REST and SOAP Web Service Interfaces,
+      Nucleic Acids Res., 38, W700-W705.
+
+Warnings
+
+   None.
+
+Diagnostic Error Messages
+
+   None.
+
+Exit status
+
+   It always exits with a status of 0.
+
+Known bugs
+
+   None.
+
+See also
+
+   entret Retrieve sequence entries from flatfile databases and files
+   seqret Read and write (return) sequences
+
+Author(s)
+
+   Hidetoshi Itaya (celery@g-language.org)
+   Institute for Advanced Biosciences, Keio University
+   252-0882 Japan
+
+   Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
+   Institute for Advanced Biosciences, Keio University
+   252-0882 Japan
+
+History
+
+   2012 - Written by Hidetoshi Itaya
+
+Target users
+
+   This program is intended to be used by everyone and everything, from
+   naive users to embedded scripts.
+
+Comments
+
+   None.
+