Mercurial > repos > ktnyt > gembassy
view GEMBASSY-1.0.3/doc/text/genret.txt @ 1:84a17b3fad1f draft
Uploaded
author | ktnyt |
---|---|
date | Fri, 26 Jun 2015 05:20:29 -0400 |
parents | 8300eb051bea |
children |
line wrap: on
line source
genret Function Retrieves various gene related information from genome flatfile Description genret reads in one or more genome flatfiles and retrieves various data from the input file. It is a wrapper program to the G-language REST service, where a method is specified by giving a string to the "method" qualifier. By default, genret will parse the input file to retrieve the accession ID (or name) of the genome to query G-language REST service. By setting the "accid" qualifier to false (or 0), genret will instead parse the sequence and features of the genome to create a GenBank formatted flatfile and upload the file to the G-language web server. Using the file uploaded, genret will execute the method provided. genret is able to perform a variety of tasks, incluing the retrieval of sequence upstream, downstream, or around the start or stop codon, translated gene sequences search of gene data by keyword, and re-annotation and retrieval of genome flatfiles. The set of genes can be given as flat text, regular expression, or a file containing the list of genes. Details on G-language REST service is available from the wiki page http://www.g-language.org/wiki/rest Documentation on G-language Genome Analysis Environment methods are provided at the Document Center http://ws.g-language.org/gdoc/ Usage Here is a sample session with genret Retrieving sequences upstream, downstream, or around the start/stop codons. The following example shows the retrieval of sequence around the start codons of all genes. Genes to access are specified by regular expression. '*' stands for every gene. Available methods are: after_startcodon after_stopcodon around_startcodon around_stopcodon before_startcodon before_stopcodon % genret Retrieves various gene related information from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 Gene name(s) to lookup [*]: Feature to access: around_startcodon Full text output file [nc_000913.around_startcodon]: Go to the input files for this example Go to the output files for this example Example 2 Using flat text as target genes. The names can be split with with a space, comma, or vertical bar. % genret Retrieves various gene related information from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 List of gene name(s) to report [*]: recA,recB Name of gene feature to access: translation Sequence output file [nc_000913.translation.genret]: stdout >recA MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF >recB MSDVAETLDPLRLPLQGERLIEASAGTGKTFTIAALYLRLLLGLGGSAAFPRPLTVEELLV VTFTEAATAELRGRIRSNIHELRIACLRETTDNPLYERLLEEIDDKAQAAQWLLLAERQMD EAAVFTIHGFCQRMLNLNAFESGMLFEQQLIEDESLLRYQACADFWRRHCYPLPREIAQVV FETWKGPQALLRDINRYLQGEAPVIKAPPPDDETLASRHAQIVARIDTVKQQWRDAVGELD ALIESSGIDRRKFNRSNQAKWIDKISAWAEEETNSYQLPESLEKFSQRFLEDRTKAGGETP RHPLFEAIDQLLAEPLSIRDLVITRALAEIRETVAREKRRRGELGFDDMLSRLDSALRSES GEVLAAAIRTRFPVAMIDEFQDTDPQQYRIFRRIWHHQPETALLLIGDPKQAIYAFRGADI FTYMKARSEVHAHYTLDTNWRSAPGMVNSVNKLFSQTDDAFMFREIPFIPVKSAGKNQALR FVFKGETQPAMKMWLMEGESCGVGDYQSTMAQVCAAQIRDWLQAGQRGEALLMNGDDARPV RASDISVLVRSRQEAAQVRDALTLLEIPSVYLSNRDSVFETLEAQEMLWLLQAVMTPEREN TLRSALATSMMGLNALDIETLNNDEHAWDVVVEEFDGYRQIWRKRGVMPMLRALMSARNIA ENLLATAGGERRLTDILHISELLQEAGTQLESEHALVRWLSQHILEPDSNASSQQMRLESD KHLVQIVTIHKSKGLEYPLVWLPFITNFRVQEQAFYHDRHSFEAVLDLNAAPESVDLAEAE RLAEDLRLLYVALTRSVWHCSLGVAPLVRRRGDKKGDTDVHQSALGRLLQKGEPQDAAGLR TCIEALCDDDIAWQTAQTGDNQPWQVNDVSTAELNAKTLQRLPGDNWRVTSYSGLQQRGHG IAQDLMPRLDVDAAGVASVVEEPTLTPHQFPRGASPGTFLHSLFEDLDFTQPVDPNWVREK LELGGFESQWEPVLTEWITAVLQAPLNETGVSLSQLSARNKQVEMEFYLPISEPLIASQLD TLIRQFDPLSAGCPPLEFMQVRGMLKGFIDLVFRHEGRYYLLDYKSNWLGEDSSAYTQQAM AAAMQAHRYDLQYQLYTLALHRYLRHRIADYDYEHHFGGVIYLFLRGVDKEHPQQGIYTTR PNAGLIALMDEMFAGMTLEEA Example 3 Using a file with a list of gene names. The following example will retrieve the strand direction for each gene listed in the "gene_list.txt" file. String prefixed with an "@" or "list::" will be interpreted as file names. % genret Retrieves various gene features from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 List of gene name(s) to report [*]: @gene_list.txt Name of gene feature to access: direction Full text output file [nc_000913.direction]: stdout gene,direction thrA,direct thrB,direct thrC,direct Go to the input files for this example Go to the output files for this example Example 4 Retrieving translations of coding sequences. The following example will retrieve the translated protein sequence of the "recA" gene. % genret Retrieves various gene related information from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 Gene name(s) to lookup [*]: recA Feature to access: translation Full text output file [nc_000913.translation]: stdout >recA MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF Example 5 Retrieving feature information of the genes. The following example will retrieve the start positions for each gene. The values for the keys in GenBank format is available for retrieval. (ex. start end direction GO* etc.) Positions will be returned with a 1 start value. % genret Retrieves various gene related information from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 Gene name(s) to lookup [*]: Feature to access: start Full text output file [nc_000913.start]: Go to the input files for this example Go to the output files for this example Example 6 Passing extra arguments to the methods. The following example shows the retrieval of 30 base pairs around the start codon of the "recA" gene. By default, the "around_startcodon" method returns 200 base pairs around the start codon. Using the "-argument" qualifier allows the user to change this value. % genret refseqn:NC_000913 recA around_startcodon -argument 30,30 stdout Retrieves various gene features from genome flatfile >recA ccggtattacccggcatgacaggagtaaaaatggctatcgacgaaaacaaacagaaagcgt tg Example 7 Re-annotating a flatfile. genret supports re-annotation of a genome flatfile via Restauro-G service developed by our team. Using the BLAST Like Alignment Tool, to refer the UniProt KB and annotates information including the description, comments, feature tables, cross references, COG family, position, and Pfam. The original software is available at [http://restauro-g.iab.keio.ac.jp]. % genret refseqn:NC_000913 '*' annotate nc_000913-annotate.gbk Retrieves various gene features from genome flatfile Command line arguments Standard (Mandatory) qualifiers: [-sequence] seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) [-gene] string [*] Gene name(s) to lookup (Any string) [-access] string Feature to access (Any string) [-outfile] outfile [*.genret] Full text output file Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -argument string Option to give to method (Any string) -[no]accid boolean [Y] Include to use sequence accession ID as query General qualifiers: -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Input file format Database definitions for the examples are included in the embossrc_template file of the Keio Bioinformatcs Web Service (KBWS) package. Input files for usage example 4 File: gene_list.txt thrA thrB thrC Output file format Output files for usage example 1 File: nc_000913.around_startcodon >thrL cgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcata gcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccacca ttaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacaca gaaaaaagcccgcacctgac >thrA aggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgc gggctttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcg gtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgc caggcaggggcaggtggcca [Part of this file has been deleted for brevity] >yjjY tgcatgtttgctacctaaattgccaactaaatcgaaacaggaagtacaaaagtccctgacc tgcctgatgcatgctgcaaattaacatgatcggcgtaacatgactaaagtacgtaattgcg ttcttgatgcactttccatcaacgtcaacaacatcattagcttggtcgtgggtactttccc tcaggacccgacagtgtcaa >yjtD tttttctgcgacttacgttaagaatttgtaaattcgcaccgcgtaataagttgacagtgat cacccggttcgcggttatttgatcaagaagagtggcaatatgcgtataacgattattctgg tcgcacccgccagagcagaaaatattggggcagcggcgcgggcaatgaaaacgatggggtt tagcgatctgcggattgtcg Output files for usage example 5 File: nc_000913.start gene,start thrL,190 thrA,337 thrB,2801 thrC,3734 yaaX,5234 yaaA,5683 yaaJ,6529 talB,8238 mog,9306 [Part of this file has been deleted for brevity] yjjX,4631256 ytjC,4631820 rob,4632464 creA,4633544 creB,4634030 creC,4634719 creD,4636201 arcA,4637613 yjjY,4638425 yjtD,4638965 Output files for usage example 7 File: ecoli-annotate.gbk LOCUS NC_000913 4639675 bp DNA circular BCT 25-OCT-2010 DEFINITION Escherichia coli str. K-12 substr. MG1655 chromosome, complete genome. ACCESSION NC_000913 VERSION NC_000913.2 GI:49175990 DBLINK Project: 57779 KEYWORDS . SOURCE Escherichia coli str. K-12 substr. MG1655 ORGANISM Escherichia coli str. K-12 substr. MG1655 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; [Part of this file has been deleted for brevity] CDS 2801..3733 /EC_number="2.7.1.39" /codon_start="1" /db_xref="GI:16127997" /db_xref="ASAP:ABE-0000010" /db_xref="UniProtKB/Swiss-Prot:P00547" /db_xref="ECOCYC:EG10999" /db_xref="EcoGene:EG10999" /db_xref="GeneID:947498" /function="enzyme; Amino acid biosynthesis: Threonine" /function="1.5.1.8 metabolism; building block biosynthesis; amino acids; threonine" /function="7.1 location of gene products; cytoplasm" /gene="thrB" /gene_synonym="ECK0003; JW0002" /locus_tag="b0003" /note="GO_component: GO:0005737 - cytoplasm; GO_process: GO:0009088 - threonine biosynthetic process" /product="homoserine kinase" /protein_id="NP_414544.1" /rs_com="FUNCTION: Catalyzes the ATP-dependent phosphorylation of L- homoserine to L-homoserine phosphate (By similarity)." /rs_com="CATALYTIC ACTIVITY: ATP + L-homoserine = ADP + O-phospho-L- homoserine." /rs_com="PATHWAY: Amino-acid biosynthesis; L-threonine biosynthesis; L- threonine from L-aspartate: step 4/5." /rs_com="SUBCELLULAR LOCATION: Cytoplasm (Potential)." /rs_com="SIMILARITY: Belongs to the GHMP kinase family. Homoserine kinase subfamily." /rs_des="RecName: Full=Homoserine kinase; Short=HK; Short=HSK; EC=2.7.1.39;" /rs_protein="Level 1: similar to KHSE_ECODH 1.7e-180" /rs_xr="EMBL; CP000948; ACB01208.1; -; Genomic_DNA." /rs_xr="RefSeq; YP_001728986.1; -." /rs_xr="ProteinModelPortal; B1XBC8; -." /rs_xr="SMR; B1XBC8; 2-308." /rs_xr="EnsemblBacteria; EBESCT00000012034; EBESCP00000011562; EBESCG00000011096." /rs_xr="GeneID; 6058639; -." /rs_xr="GenomeReviews; CP000948_GR; ECDH10B_0003." /rs_xr="KEGG; ecd:ECDH10B_0003; -." /rs_xr="HOGENOM; HBG646290; -." /rs_xr="OMA; GSAHADN; -." /rs_xr="ProtClustDB; PRK01212; -." /rs_xr="BioCyc; ECOL316385:ECDH10B_0003-MONOMER; -." /rs_xr="GO; GO:0005737; C:cytoplasm; IEA:UniProtKB-SubCell." /rs_xr="GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW." /rs_xr="GO; GO:0004413; F:homoserine kinase activity; IEA:EC." /rs_xr="GO; GO:0009088; P:threonine biosynthetic process; IEA:UniProtKB-KW." /rs_xr="HAMAP; MF_00384; Homoser_kinase; 1; -." /rs_xr="InterPro; IPR006204; GHMP_kinase." /rs_xr="InterPro; IPR013750; GHMP_kinase_C." /rs_xr="InterPro; IPR006203; GHMP_knse_ATP-bd_CS." /rs_xr="InterPro; IPR000870; Homoserine_kin." /rs_xr="InterPro; IPR020568; Ribosomal_S5_D2-typ_fold." /rs_xr="InterPro; IPR014721; Ribosomal_S5_D2-typ_fold_subgr." /rs_xr="Gene3D; G3DSA:3.30.230.10; Ribosomal_S5_D2-type_fold; 1." /rs_xr="Pfam; PF08544; GHMP_kinases_C; 1." /rs_xr="Pfam; PF00288; GHMP_kinases_N; 1." /rs_xr="PIRSF; PIRSF000676; Homoser_kin; 1." /rs_xr="PRINTS; PR00958; HOMSERKINASE." /rs_xr="SUPFAM; SSF54211; Ribosomal_S5_D2-typ_fold; 1." /rs_xr="TIGRFAMs; TIGR00191; thrB; 1." /rs_xr="PROSITE; PS00627; GHMP_KINASES_ATP; 1." /transl_table="11" /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETF SLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACS VVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDI ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ PELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETA QRVADWLGKNYLQNQEGFVHICRLDTAGARVLEN" [Part of this file has been deleted for brevity] 4639201 gcgcagtcgg gcgaaatatc attactacgc cacgccagtt gaactggtgc cgctgttaga 4639261 ggaaaaatct tcatggatga gccatgccgc gctggtgttt ggtcgcgaag attccgggtt 4639321 gactaacgaa gagttagcgt tggctgacgt tcttactggt gtgccgatgg tggcggatta 4639381 tccttcgctc aatctggggc aggcggtgat ggtctattgc tatcaattag caacattaat 4639441 acaacaaccg gcgaaaagtg atgcaacggc agaccaacat caactgcaag ctttacgcga 4639501 acgagccatg acattgctga cgactctggc agtggcagat gacataaaac tggtcgactg 4639561 gttacaacaa cgcctggggc ttttagagca acgagacacg gcaatgttgc accgtttgct 4639621 gcatgatatt gaaaaaaata tcaccaaata aaaaacgcct tagtaagtat ttttc // Data files None. Notes None. References Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306. Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for large-scale analysis of high-throughput omics data, J. Pest Sci., 31, 7. Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome Analysis Environment with REST and SOAP Web Service Interfaces, Nucleic Acids Res., 38, W700-W705. Warnings None. Diagnostic Error Messages None. Exit status It always exits with a status of 0. Known bugs None. See also entret Retrieve sequence entries from flatfile databases and files seqret Read and write (return) sequences Author(s) Hidetoshi Itaya (celery@g-language.org) Institute for Advanced Biosciences, Keio University 252-0882 Japan Kazuharu Arakawa (gaou@sfc.keio.ac.jp) Institute for Advanced Biosciences, Keio University 252-0882 Japan History 2012 - Written by Hidetoshi Itaya Target users This program is intended to be used by everyone and everything, from naive users to embedded scripts. Comments None.