diff GEMBASSY-1.0.3/doc/text/genret.txt @ 2:8947fca5f715 draft default tip

Uploaded
author ktnyt
date Fri, 26 Jun 2015 05:21:44 -0400
parents 84a17b3fad1f
children
line wrap: on
line diff
--- a/GEMBASSY-1.0.3/doc/text/genret.txt	Fri Jun 26 05:20:29 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,451 +0,0 @@
-                                     genret
-Function
-
-   Retrieves various gene related information from genome flatfile
-
-Description
-
-   genret reads in one or more genome flatfiles and retrieves various data from
-   the input file. It is a wrapper program to the G-language REST service,
-   where a method is specified by giving a string to the "method" qualifier. By
-   default, genret will parse the input file to retrieve the accession ID
-   (or name) of the genome to query G-language REST service. By setting the
-   "accid" qualifier to false (or 0), genret will instead parse the sequence
-   and features of the genome to create a GenBank formatted flatfile and upload
-   the file to the G-language web server. Using the file uploaded, genret will
-   execute the method provided.
-
-   genret is able to perform a variety of tasks, incluing the retrieval of
-   sequence upstream, downstream, or around the start or stop codon,
-   translated gene sequences search of gene data by keyword, and re-annotation
-   and retrieval of genome flatfiles. The set of genes can be given as flat
-   text, regular expression, or a file containing the list of genes.
-
-   Details on G-language REST service is available from the wiki page
-
-   http://www.g-language.org/wiki/rest
-
-   Documentation on G-language Genome Analysis Environment methods are
-   provided at the Document Center
-
-   http://ws.g-language.org/gdoc/
-
-Usage
-
-   Here is a sample session with genret
-
-   Retrieving sequences upstream, downstream, or around the start/stop codons. 
-   The following example shows the retrieval of sequence around the start
-   codons of all genes.
-
-   Genes to access are specified by regular expression. '*' stands for every
-   gene.
-
-   Available methods are:
-      after_startcodon
-      after_stopcodon
-      around_startcodon
-      around_stopcodon
-      before_startcodon
-      before_stopcodon
-
-% genret
-Retrieves various gene related information from genome flatfile
-Input nucleotide sequence(s): refseqn:NC_000913
-Gene name(s) to lookup [*]:
-Feature to access: around_startcodon
-Full text output file [nc_000913.around_startcodon]:
-
-   Go to the input files for this example
-   Go to the output files for this example
-
-   Example 2
-
-   Using flat text as target genes. The names can be split with with a space,
-   comma, or vertical bar.
-
-% genret
-Retrieves various gene related information from genome flatfile
-Input nucleotide sequence(s): refseqn:NC_000913
-List of gene name(s) to report [*]: recA,recB
-Name of gene feature to access: translation
-Sequence output file [nc_000913.translation.genret]: stdout
->recA
-MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR
-IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT
-GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL
-KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR
-VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN
-ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF
->recB
-MSDVAETLDPLRLPLQGERLIEASAGTGKTFTIAALYLRLLLGLGGSAAFPRPLTVEELLV
-VTFTEAATAELRGRIRSNIHELRIACLRETTDNPLYERLLEEIDDKAQAAQWLLLAERQMD
-EAAVFTIHGFCQRMLNLNAFESGMLFEQQLIEDESLLRYQACADFWRRHCYPLPREIAQVV
-FETWKGPQALLRDINRYLQGEAPVIKAPPPDDETLASRHAQIVARIDTVKQQWRDAVGELD
-ALIESSGIDRRKFNRSNQAKWIDKISAWAEEETNSYQLPESLEKFSQRFLEDRTKAGGETP
-RHPLFEAIDQLLAEPLSIRDLVITRALAEIRETVAREKRRRGELGFDDMLSRLDSALRSES
-GEVLAAAIRTRFPVAMIDEFQDTDPQQYRIFRRIWHHQPETALLLIGDPKQAIYAFRGADI
-FTYMKARSEVHAHYTLDTNWRSAPGMVNSVNKLFSQTDDAFMFREIPFIPVKSAGKNQALR
-FVFKGETQPAMKMWLMEGESCGVGDYQSTMAQVCAAQIRDWLQAGQRGEALLMNGDDARPV
-RASDISVLVRSRQEAAQVRDALTLLEIPSVYLSNRDSVFETLEAQEMLWLLQAVMTPEREN
-TLRSALATSMMGLNALDIETLNNDEHAWDVVVEEFDGYRQIWRKRGVMPMLRALMSARNIA
-ENLLATAGGERRLTDILHISELLQEAGTQLESEHALVRWLSQHILEPDSNASSQQMRLESD
-KHLVQIVTIHKSKGLEYPLVWLPFITNFRVQEQAFYHDRHSFEAVLDLNAAPESVDLAEAE
-RLAEDLRLLYVALTRSVWHCSLGVAPLVRRRGDKKGDTDVHQSALGRLLQKGEPQDAAGLR
-TCIEALCDDDIAWQTAQTGDNQPWQVNDVSTAELNAKTLQRLPGDNWRVTSYSGLQQRGHG
-IAQDLMPRLDVDAAGVASVVEEPTLTPHQFPRGASPGTFLHSLFEDLDFTQPVDPNWVREK
-LELGGFESQWEPVLTEWITAVLQAPLNETGVSLSQLSARNKQVEMEFYLPISEPLIASQLD
-TLIRQFDPLSAGCPPLEFMQVRGMLKGFIDLVFRHEGRYYLLDYKSNWLGEDSSAYTQQAM
-AAAMQAHRYDLQYQLYTLALHRYLRHRIADYDYEHHFGGVIYLFLRGVDKEHPQQGIYTTR
-PNAGLIALMDEMFAGMTLEEA
-
-   Example 3
-
-   Using a file with a list of gene names.
-   The following example will retrieve the strand direction for each gene
-   listed in the "gene_list.txt" file. String prefixed with an "@" or "list::"
-   will be interpreted as file names.
-
-% genret
-Retrieves various gene features from genome flatfile
-Input nucleotide sequence(s): refseqn:NC_000913
-List of gene name(s) to report [*]: @gene_list.txt
-Name of gene feature to access: direction
-Full text output file [nc_000913.direction]: stdout
-gene,direction
-thrA,direct
-thrB,direct
-thrC,direct
-
-   Go to the input files for this example
-   Go to the output files for this example
-
-   Example 4
-
-   Retrieving translations of coding sequences.
-   The following example will retrieve the translated protein sequence of
-   the "recA" gene.
-
-% genret
-Retrieves various gene related information from genome flatfile
-Input nucleotide sequence(s): refseqn:NC_000913
-Gene name(s) to lookup [*]: recA
-Feature to access: translation
-Full text output file [nc_000913.translation]: stdout
->recA
-MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR
-IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT
-GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL
-KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR
-VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN
-ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF
-
-   Example 5
-
-   Retrieving feature information of the genes.
-   The following example will retrieve the start positions for each gene.
-   The values for the keys in GenBank format is available for retrieval.
-   (ex. start end direction GO* etc.)
-   Positions will be returned with a 1 start value.
-
-% genret
-Retrieves various gene related information from genome flatfile
-Input nucleotide sequence(s): refseqn:NC_000913
-Gene name(s) to lookup [*]:
-Feature to access: start
-Full text output file [nc_000913.start]:
-
-   Go to the input files for this example
-   Go to the output files for this example
-
-   Example 6
-
-   Passing extra arguments to the methods.
-   The following example shows the retrieval of 30 base pairs around the
-   start codon of the "recA" gene. By default, the "around_startcodon" method
-   returns 200 base pairs around the start codon. Using the "-argument"
-   qualifier allows the user to change this value.
-
-% genret refseqn:NC_000913 recA around_startcodon -argument 30,30 stdout
-Retrieves various gene features from genome flatfile
->recA
-ccggtattacccggcatgacaggagtaaaaatggctatcgacgaaaacaaacagaaagcgt
-tg
-
-   Example 7
-
-   Re-annotating a flatfile.
-   genret supports re-annotation of a genome flatfile via Restauro-G
-   service developed by our team. Using the BLAST Like Alignment Tool,
-   to refer the UniProt KB and annotates information including the description,
-   comments, feature tables, cross references, COG family, position, and Pfam.
-   The original software is available at [http://restauro-g.iab.keio.ac.jp].
-   
-
-% genret refseqn:NC_000913 '*' annotate nc_000913-annotate.gbk
-Retrieves various gene features from genome flatfile
-
-Command line arguments
-
-   Standard (Mandatory) qualifiers:
-  [-sequence]          seqall     Nucleotide sequence(s) filename and optional
-                                  format, or reference (input USA)
-  [-gene]              string     [*] Gene name(s) to lookup (Any string)
-  [-access]            string     Feature to access (Any string)
-  [-outfile]           outfile    [*.genret] Full text output file
-
-   Additional (Optional) qualifiers: (none)
-   Advanced (Unprompted) qualifiers:
-   -argument           string     Option to give to method (Any string)
-   -[no]accid          boolean    [Y] Include to use sequence accession ID as
-                                  query
-
-   General qualifiers:
-   -help               boolean    Report command line options and exit. More
-                                  information on associated and general
-                                  qualifiers can be found with -help -verbose
-
-Input file format
-
-   Database definitions for the examples are included in the embossrc_template
-   file of the Keio Bioinformatcs Web Service (KBWS) package.
-
-   Input files for usage example 4
-
-   File: gene_list.txt
-
-thrA
-thrB
-thrC
-
-Output file format
-
-   Output files for usage example 1
-
-   File: nc_000913.around_startcodon
-
->thrL
-cgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcata
-gcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccacca
-ttaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacaca
-gaaaaaagcccgcacctgac
->thrA
-aggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgc
-gggctttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcg
-gtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgc
-caggcaggggcaggtggcca
-
-   [Part of this file has been deleted for brevity]
-
->yjjY
-tgcatgtttgctacctaaattgccaactaaatcgaaacaggaagtacaaaagtccctgacc
-tgcctgatgcatgctgcaaattaacatgatcggcgtaacatgactaaagtacgtaattgcg
-ttcttgatgcactttccatcaacgtcaacaacatcattagcttggtcgtgggtactttccc
-tcaggacccgacagtgtcaa
->yjtD
-tttttctgcgacttacgttaagaatttgtaaattcgcaccgcgtaataagttgacagtgat
-cacccggttcgcggttatttgatcaagaagagtggcaatatgcgtataacgattattctgg
-tcgcacccgccagagcagaaaatattggggcagcggcgcgggcaatgaaaacgatggggtt
-tagcgatctgcggattgtcg
-
-   Output files for usage example 5
-
-   File: nc_000913.start
-
-gene,start
-thrL,190
-thrA,337
-thrB,2801
-thrC,3734
-yaaX,5234
-yaaA,5683
-yaaJ,6529
-talB,8238
-mog,9306
-
-   [Part of this file has been deleted for brevity]
-
-yjjX,4631256
-ytjC,4631820
-rob,4632464
-creA,4633544
-creB,4634030
-creC,4634719
-creD,4636201
-arcA,4637613
-yjjY,4638425
-yjtD,4638965
-
-   Output files for usage example 7
-
-   File: ecoli-annotate.gbk
-
-LOCUS       NC_000913            4639675 bp    DNA     circular BCT 25-OCT-2010
-DEFINITION  Escherichia coli str. K-12 substr. MG1655 chromosome, complete
-            genome.
-ACCESSION   NC_000913
-VERSION     NC_000913.2  GI:49175990
-DBLINK      Project: 57779
-KEYWORDS    .
-SOURCE      Escherichia coli str. K-12 substr. MG1655
-  ORGANISM  Escherichia coli str. K-12 substr. MG1655
-            Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
-
-   [Part of this file has been deleted for brevity]
-
-     CDS             2801..3733
-                     /EC_number="2.7.1.39"
-                     /codon_start="1"
-                     /db_xref="GI:16127997"
-                     /db_xref="ASAP:ABE-0000010"
-                     /db_xref="UniProtKB/Swiss-Prot:P00547"
-                     /db_xref="ECOCYC:EG10999"
-                     /db_xref="EcoGene:EG10999"
-                     /db_xref="GeneID:947498"
-                     /function="enzyme; Amino acid biosynthesis: Threonine"
-                     /function="1.5.1.8 metabolism; building block
-                     biosynthesis; amino acids; threonine"
-                     /function="7.1 location of gene products; cytoplasm"
-                     /gene="thrB"
-                     /gene_synonym="ECK0003; JW0002"
-                     /locus_tag="b0003"
-                     /note="GO_component: GO:0005737 - cytoplasm; GO_process:
-                     GO:0009088 - threonine biosynthetic process"
-                     /product="homoserine kinase"
-                     /protein_id="NP_414544.1"
-                     /rs_com="FUNCTION: Catalyzes the ATP-dependent
-                     phosphorylation of L- homoserine to L-homoserine
-                     phosphate (By similarity)."
-                     /rs_com="CATALYTIC ACTIVITY: ATP + L-homoserine = ADP +
-                     O-phospho-L- homoserine."
-                     /rs_com="PATHWAY: Amino-acid biosynthesis; L-threonine
-                     biosynthesis; L- threonine from L-aspartate: step 4/5."
-                     /rs_com="SUBCELLULAR LOCATION: Cytoplasm (Potential)."
-                     /rs_com="SIMILARITY: Belongs to the GHMP kinase family.
-                     Homoserine kinase subfamily."
-                     /rs_des="RecName: Full=Homoserine kinase; Short=HK;
-                     Short=HSK; EC=2.7.1.39;"
-                     /rs_protein="Level 1: similar to KHSE_ECODH 1.7e-180"
-                     /rs_xr="EMBL; CP000948; ACB01208.1; -; Genomic_DNA."
-                     /rs_xr="RefSeq; YP_001728986.1; -."
-                     /rs_xr="ProteinModelPortal; B1XBC8; -."
-                     /rs_xr="SMR; B1XBC8; 2-308."
-                     /rs_xr="EnsemblBacteria; EBESCT00000012034;
-                     EBESCP00000011562; EBESCG00000011096."
-                     /rs_xr="GeneID; 6058639; -."
-                     /rs_xr="GenomeReviews; CP000948_GR; ECDH10B_0003."
-                     /rs_xr="KEGG; ecd:ECDH10B_0003; -."
-                     /rs_xr="HOGENOM; HBG646290; -."
-                     /rs_xr="OMA; GSAHADN; -."
-                     /rs_xr="ProtClustDB; PRK01212; -."
-                     /rs_xr="BioCyc; ECOL316385:ECDH10B_0003-MONOMER; -."
-                     /rs_xr="GO; GO:0005737; C:cytoplasm;
-                     IEA:UniProtKB-SubCell."
-                     /rs_xr="GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW."
-                     /rs_xr="GO; GO:0004413; F:homoserine kinase activity;
-                     IEA:EC."
-                     /rs_xr="GO; GO:0009088; P:threonine biosynthetic process;
-                     IEA:UniProtKB-KW."
-                     /rs_xr="HAMAP; MF_00384; Homoser_kinase; 1; -."
-                     /rs_xr="InterPro; IPR006204; GHMP_kinase."
-                     /rs_xr="InterPro; IPR013750; GHMP_kinase_C."
-                     /rs_xr="InterPro; IPR006203; GHMP_knse_ATP-bd_CS."
-                     /rs_xr="InterPro; IPR000870; Homoserine_kin."
-                     /rs_xr="InterPro; IPR020568; Ribosomal_S5_D2-typ_fold."
-                     /rs_xr="InterPro; IPR014721;
-                     Ribosomal_S5_D2-typ_fold_subgr."
-                     /rs_xr="Gene3D; G3DSA:3.30.230.10;
-                     Ribosomal_S5_D2-type_fold; 1."
-                     /rs_xr="Pfam; PF08544; GHMP_kinases_C; 1."
-                     /rs_xr="Pfam; PF00288; GHMP_kinases_N; 1."
-                     /rs_xr="PIRSF; PIRSF000676; Homoser_kin; 1."
-                     /rs_xr="PRINTS; PR00958; HOMSERKINASE."
-                     /rs_xr="SUPFAM; SSF54211; Ribosomal_S5_D2-typ_fold; 1."
-                     /rs_xr="TIGRFAMs; TIGR00191; thrB; 1."
-                     /rs_xr="PROSITE; PS00627; GHMP_KINASES_ATP; 1."
-                     /transl_table="11"
-                     /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETF
-                     SLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACS
-                     VVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDI
-                     ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ
-                     PELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETA
-                     QRVADWLGKNYLQNQEGFVHICRLDTAGARVLEN"
-
-   [Part of this file has been deleted for brevity]
-
-  4639201 gcgcagtcgg gcgaaatatc attactacgc cacgccagtt gaactggtgc cgctgttaga
-  4639261 ggaaaaatct tcatggatga gccatgccgc gctggtgttt ggtcgcgaag attccgggtt
-  4639321 gactaacgaa gagttagcgt tggctgacgt tcttactggt gtgccgatgg tggcggatta
-  4639381 tccttcgctc aatctggggc aggcggtgat ggtctattgc tatcaattag caacattaat
-  4639441 acaacaaccg gcgaaaagtg atgcaacggc agaccaacat caactgcaag ctttacgcga
-  4639501 acgagccatg acattgctga cgactctggc agtggcagat gacataaaac tggtcgactg
-  4639561 gttacaacaa cgcctggggc ttttagagca acgagacacg gcaatgttgc accgtttgct
-  4639621 gcatgatatt gaaaaaaata tcaccaaata aaaaacgcct tagtaagtat ttttc
-//
-
-Data files
-
-   None.
-
-Notes
-
-   None.
-
-References
-
-   Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
-      Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
-      for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.
-
-   Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
-      large-scale analysis of high-throughput omics data, J. Pest Sci.,
-      31, 7.
-
-   Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
-      Analysis Environment with REST and SOAP Web Service Interfaces,
-      Nucleic Acids Res., 38, W700-W705.
-
-Warnings
-
-   None.
-
-Diagnostic Error Messages
-
-   None.
-
-Exit status
-
-   It always exits with a status of 0.
-
-Known bugs
-
-   None.
-
-See also
-
-   entret Retrieve sequence entries from flatfile databases and files
-   seqret Read and write (return) sequences
-
-Author(s)
-
-   Hidetoshi Itaya (celery@g-language.org)
-   Institute for Advanced Biosciences, Keio University
-   252-0882 Japan
-
-   Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
-   Institute for Advanced Biosciences, Keio University
-   252-0882 Japan
-
-History
-
-   2012 - Written by Hidetoshi Itaya
-
-Target users
-
-   This program is intended to be used by everyone and everything, from
-   naive users to embedded scripts.
-
-Comments
-
-   None.
-