Mercurial > repos > ktnyt > gembassy
comparison GEMBASSY-1.0.3/doc/text/genret.txt @ 2:8947fca5f715 draft default tip
Uploaded
author | ktnyt |
---|---|
date | Fri, 26 Jun 2015 05:21:44 -0400 |
parents | 84a17b3fad1f |
children |
comparison
equal
deleted
inserted
replaced
1:84a17b3fad1f | 2:8947fca5f715 |
---|---|
1 genret | |
2 Function | |
3 | |
4 Retrieves various gene related information from genome flatfile | |
5 | |
6 Description | |
7 | |
8 genret reads in one or more genome flatfiles and retrieves various data from | |
9 the input file. It is a wrapper program to the G-language REST service, | |
10 where a method is specified by giving a string to the "method" qualifier. By | |
11 default, genret will parse the input file to retrieve the accession ID | |
12 (or name) of the genome to query G-language REST service. By setting the | |
13 "accid" qualifier to false (or 0), genret will instead parse the sequence | |
14 and features of the genome to create a GenBank formatted flatfile and upload | |
15 the file to the G-language web server. Using the file uploaded, genret will | |
16 execute the method provided. | |
17 | |
18 genret is able to perform a variety of tasks, incluing the retrieval of | |
19 sequence upstream, downstream, or around the start or stop codon, | |
20 translated gene sequences search of gene data by keyword, and re-annotation | |
21 and retrieval of genome flatfiles. The set of genes can be given as flat | |
22 text, regular expression, or a file containing the list of genes. | |
23 | |
24 Details on G-language REST service is available from the wiki page | |
25 | |
26 http://www.g-language.org/wiki/rest | |
27 | |
28 Documentation on G-language Genome Analysis Environment methods are | |
29 provided at the Document Center | |
30 | |
31 http://ws.g-language.org/gdoc/ | |
32 | |
33 Usage | |
34 | |
35 Here is a sample session with genret | |
36 | |
37 Retrieving sequences upstream, downstream, or around the start/stop codons. | |
38 The following example shows the retrieval of sequence around the start | |
39 codons of all genes. | |
40 | |
41 Genes to access are specified by regular expression. '*' stands for every | |
42 gene. | |
43 | |
44 Available methods are: | |
45 after_startcodon | |
46 after_stopcodon | |
47 around_startcodon | |
48 around_stopcodon | |
49 before_startcodon | |
50 before_stopcodon | |
51 | |
52 % genret | |
53 Retrieves various gene related information from genome flatfile | |
54 Input nucleotide sequence(s): refseqn:NC_000913 | |
55 Gene name(s) to lookup [*]: | |
56 Feature to access: around_startcodon | |
57 Full text output file [nc_000913.around_startcodon]: | |
58 | |
59 Go to the input files for this example | |
60 Go to the output files for this example | |
61 | |
62 Example 2 | |
63 | |
64 Using flat text as target genes. The names can be split with with a space, | |
65 comma, or vertical bar. | |
66 | |
67 % genret | |
68 Retrieves various gene related information from genome flatfile | |
69 Input nucleotide sequence(s): refseqn:NC_000913 | |
70 List of gene name(s) to report [*]: recA,recB | |
71 Name of gene feature to access: translation | |
72 Sequence output file [nc_000913.translation.genret]: stdout | |
73 >recA | |
74 MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR | |
75 IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT | |
76 GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL | |
77 KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR | |
78 VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN | |
79 ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF | |
80 >recB | |
81 MSDVAETLDPLRLPLQGERLIEASAGTGKTFTIAALYLRLLLGLGGSAAFPRPLTVEELLV | |
82 VTFTEAATAELRGRIRSNIHELRIACLRETTDNPLYERLLEEIDDKAQAAQWLLLAERQMD | |
83 EAAVFTIHGFCQRMLNLNAFESGMLFEQQLIEDESLLRYQACADFWRRHCYPLPREIAQVV | |
84 FETWKGPQALLRDINRYLQGEAPVIKAPPPDDETLASRHAQIVARIDTVKQQWRDAVGELD | |
85 ALIESSGIDRRKFNRSNQAKWIDKISAWAEEETNSYQLPESLEKFSQRFLEDRTKAGGETP | |
86 RHPLFEAIDQLLAEPLSIRDLVITRALAEIRETVAREKRRRGELGFDDMLSRLDSALRSES | |
87 GEVLAAAIRTRFPVAMIDEFQDTDPQQYRIFRRIWHHQPETALLLIGDPKQAIYAFRGADI | |
88 FTYMKARSEVHAHYTLDTNWRSAPGMVNSVNKLFSQTDDAFMFREIPFIPVKSAGKNQALR | |
89 FVFKGETQPAMKMWLMEGESCGVGDYQSTMAQVCAAQIRDWLQAGQRGEALLMNGDDARPV | |
90 RASDISVLVRSRQEAAQVRDALTLLEIPSVYLSNRDSVFETLEAQEMLWLLQAVMTPEREN | |
91 TLRSALATSMMGLNALDIETLNNDEHAWDVVVEEFDGYRQIWRKRGVMPMLRALMSARNIA | |
92 ENLLATAGGERRLTDILHISELLQEAGTQLESEHALVRWLSQHILEPDSNASSQQMRLESD | |
93 KHLVQIVTIHKSKGLEYPLVWLPFITNFRVQEQAFYHDRHSFEAVLDLNAAPESVDLAEAE | |
94 RLAEDLRLLYVALTRSVWHCSLGVAPLVRRRGDKKGDTDVHQSALGRLLQKGEPQDAAGLR | |
95 TCIEALCDDDIAWQTAQTGDNQPWQVNDVSTAELNAKTLQRLPGDNWRVTSYSGLQQRGHG | |
96 IAQDLMPRLDVDAAGVASVVEEPTLTPHQFPRGASPGTFLHSLFEDLDFTQPVDPNWVREK | |
97 LELGGFESQWEPVLTEWITAVLQAPLNETGVSLSQLSARNKQVEMEFYLPISEPLIASQLD | |
98 TLIRQFDPLSAGCPPLEFMQVRGMLKGFIDLVFRHEGRYYLLDYKSNWLGEDSSAYTQQAM | |
99 AAAMQAHRYDLQYQLYTLALHRYLRHRIADYDYEHHFGGVIYLFLRGVDKEHPQQGIYTTR | |
100 PNAGLIALMDEMFAGMTLEEA | |
101 | |
102 Example 3 | |
103 | |
104 Using a file with a list of gene names. | |
105 The following example will retrieve the strand direction for each gene | |
106 listed in the "gene_list.txt" file. String prefixed with an "@" or "list::" | |
107 will be interpreted as file names. | |
108 | |
109 % genret | |
110 Retrieves various gene features from genome flatfile | |
111 Input nucleotide sequence(s): refseqn:NC_000913 | |
112 List of gene name(s) to report [*]: @gene_list.txt | |
113 Name of gene feature to access: direction | |
114 Full text output file [nc_000913.direction]: stdout | |
115 gene,direction | |
116 thrA,direct | |
117 thrB,direct | |
118 thrC,direct | |
119 | |
120 Go to the input files for this example | |
121 Go to the output files for this example | |
122 | |
123 Example 4 | |
124 | |
125 Retrieving translations of coding sequences. | |
126 The following example will retrieve the translated protein sequence of | |
127 the "recA" gene. | |
128 | |
129 % genret | |
130 Retrieves various gene related information from genome flatfile | |
131 Input nucleotide sequence(s): refseqn:NC_000913 | |
132 Gene name(s) to lookup [*]: recA | |
133 Feature to access: translation | |
134 Full text output file [nc_000913.translation]: stdout | |
135 >recA | |
136 MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR | |
137 IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT | |
138 GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL | |
139 KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR | |
140 VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN | |
141 ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF | |
142 | |
143 Example 5 | |
144 | |
145 Retrieving feature information of the genes. | |
146 The following example will retrieve the start positions for each gene. | |
147 The values for the keys in GenBank format is available for retrieval. | |
148 (ex. start end direction GO* etc.) | |
149 Positions will be returned with a 1 start value. | |
150 | |
151 % genret | |
152 Retrieves various gene related information from genome flatfile | |
153 Input nucleotide sequence(s): refseqn:NC_000913 | |
154 Gene name(s) to lookup [*]: | |
155 Feature to access: start | |
156 Full text output file [nc_000913.start]: | |
157 | |
158 Go to the input files for this example | |
159 Go to the output files for this example | |
160 | |
161 Example 6 | |
162 | |
163 Passing extra arguments to the methods. | |
164 The following example shows the retrieval of 30 base pairs around the | |
165 start codon of the "recA" gene. By default, the "around_startcodon" method | |
166 returns 200 base pairs around the start codon. Using the "-argument" | |
167 qualifier allows the user to change this value. | |
168 | |
169 % genret refseqn:NC_000913 recA around_startcodon -argument 30,30 stdout | |
170 Retrieves various gene features from genome flatfile | |
171 >recA | |
172 ccggtattacccggcatgacaggagtaaaaatggctatcgacgaaaacaaacagaaagcgt | |
173 tg | |
174 | |
175 Example 7 | |
176 | |
177 Re-annotating a flatfile. | |
178 genret supports re-annotation of a genome flatfile via Restauro-G | |
179 service developed by our team. Using the BLAST Like Alignment Tool, | |
180 to refer the UniProt KB and annotates information including the description, | |
181 comments, feature tables, cross references, COG family, position, and Pfam. | |
182 The original software is available at [http://restauro-g.iab.keio.ac.jp]. | |
183 | |
184 | |
185 % genret refseqn:NC_000913 '*' annotate nc_000913-annotate.gbk | |
186 Retrieves various gene features from genome flatfile | |
187 | |
188 Command line arguments | |
189 | |
190 Standard (Mandatory) qualifiers: | |
191 [-sequence] seqall Nucleotide sequence(s) filename and optional | |
192 format, or reference (input USA) | |
193 [-gene] string [*] Gene name(s) to lookup (Any string) | |
194 [-access] string Feature to access (Any string) | |
195 [-outfile] outfile [*.genret] Full text output file | |
196 | |
197 Additional (Optional) qualifiers: (none) | |
198 Advanced (Unprompted) qualifiers: | |
199 -argument string Option to give to method (Any string) | |
200 -[no]accid boolean [Y] Include to use sequence accession ID as | |
201 query | |
202 | |
203 General qualifiers: | |
204 -help boolean Report command line options and exit. More | |
205 information on associated and general | |
206 qualifiers can be found with -help -verbose | |
207 | |
208 Input file format | |
209 | |
210 Database definitions for the examples are included in the embossrc_template | |
211 file of the Keio Bioinformatcs Web Service (KBWS) package. | |
212 | |
213 Input files for usage example 4 | |
214 | |
215 File: gene_list.txt | |
216 | |
217 thrA | |
218 thrB | |
219 thrC | |
220 | |
221 Output file format | |
222 | |
223 Output files for usage example 1 | |
224 | |
225 File: nc_000913.around_startcodon | |
226 | |
227 >thrL | |
228 cgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcata | |
229 gcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccacca | |
230 ttaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacaca | |
231 gaaaaaagcccgcacctgac | |
232 >thrA | |
233 aggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgc | |
234 gggctttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcg | |
235 gtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgc | |
236 caggcaggggcaggtggcca | |
237 | |
238 [Part of this file has been deleted for brevity] | |
239 | |
240 >yjjY | |
241 tgcatgtttgctacctaaattgccaactaaatcgaaacaggaagtacaaaagtccctgacc | |
242 tgcctgatgcatgctgcaaattaacatgatcggcgtaacatgactaaagtacgtaattgcg | |
243 ttcttgatgcactttccatcaacgtcaacaacatcattagcttggtcgtgggtactttccc | |
244 tcaggacccgacagtgtcaa | |
245 >yjtD | |
246 tttttctgcgacttacgttaagaatttgtaaattcgcaccgcgtaataagttgacagtgat | |
247 cacccggttcgcggttatttgatcaagaagagtggcaatatgcgtataacgattattctgg | |
248 tcgcacccgccagagcagaaaatattggggcagcggcgcgggcaatgaaaacgatggggtt | |
249 tagcgatctgcggattgtcg | |
250 | |
251 Output files for usage example 5 | |
252 | |
253 File: nc_000913.start | |
254 | |
255 gene,start | |
256 thrL,190 | |
257 thrA,337 | |
258 thrB,2801 | |
259 thrC,3734 | |
260 yaaX,5234 | |
261 yaaA,5683 | |
262 yaaJ,6529 | |
263 talB,8238 | |
264 mog,9306 | |
265 | |
266 [Part of this file has been deleted for brevity] | |
267 | |
268 yjjX,4631256 | |
269 ytjC,4631820 | |
270 rob,4632464 | |
271 creA,4633544 | |
272 creB,4634030 | |
273 creC,4634719 | |
274 creD,4636201 | |
275 arcA,4637613 | |
276 yjjY,4638425 | |
277 yjtD,4638965 | |
278 | |
279 Output files for usage example 7 | |
280 | |
281 File: ecoli-annotate.gbk | |
282 | |
283 LOCUS NC_000913 4639675 bp DNA circular BCT 25-OCT-2010 | |
284 DEFINITION Escherichia coli str. K-12 substr. MG1655 chromosome, complete | |
285 genome. | |
286 ACCESSION NC_000913 | |
287 VERSION NC_000913.2 GI:49175990 | |
288 DBLINK Project: 57779 | |
289 KEYWORDS . | |
290 SOURCE Escherichia coli str. K-12 substr. MG1655 | |
291 ORGANISM Escherichia coli str. K-12 substr. MG1655 | |
292 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; | |
293 | |
294 [Part of this file has been deleted for brevity] | |
295 | |
296 CDS 2801..3733 | |
297 /EC_number="2.7.1.39" | |
298 /codon_start="1" | |
299 /db_xref="GI:16127997" | |
300 /db_xref="ASAP:ABE-0000010" | |
301 /db_xref="UniProtKB/Swiss-Prot:P00547" | |
302 /db_xref="ECOCYC:EG10999" | |
303 /db_xref="EcoGene:EG10999" | |
304 /db_xref="GeneID:947498" | |
305 /function="enzyme; Amino acid biosynthesis: Threonine" | |
306 /function="1.5.1.8 metabolism; building block | |
307 biosynthesis; amino acids; threonine" | |
308 /function="7.1 location of gene products; cytoplasm" | |
309 /gene="thrB" | |
310 /gene_synonym="ECK0003; JW0002" | |
311 /locus_tag="b0003" | |
312 /note="GO_component: GO:0005737 - cytoplasm; GO_process: | |
313 GO:0009088 - threonine biosynthetic process" | |
314 /product="homoserine kinase" | |
315 /protein_id="NP_414544.1" | |
316 /rs_com="FUNCTION: Catalyzes the ATP-dependent | |
317 phosphorylation of L- homoserine to L-homoserine | |
318 phosphate (By similarity)." | |
319 /rs_com="CATALYTIC ACTIVITY: ATP + L-homoserine = ADP + | |
320 O-phospho-L- homoserine." | |
321 /rs_com="PATHWAY: Amino-acid biosynthesis; L-threonine | |
322 biosynthesis; L- threonine from L-aspartate: step 4/5." | |
323 /rs_com="SUBCELLULAR LOCATION: Cytoplasm (Potential)." | |
324 /rs_com="SIMILARITY: Belongs to the GHMP kinase family. | |
325 Homoserine kinase subfamily." | |
326 /rs_des="RecName: Full=Homoserine kinase; Short=HK; | |
327 Short=HSK; EC=2.7.1.39;" | |
328 /rs_protein="Level 1: similar to KHSE_ECODH 1.7e-180" | |
329 /rs_xr="EMBL; CP000948; ACB01208.1; -; Genomic_DNA." | |
330 /rs_xr="RefSeq; YP_001728986.1; -." | |
331 /rs_xr="ProteinModelPortal; B1XBC8; -." | |
332 /rs_xr="SMR; B1XBC8; 2-308." | |
333 /rs_xr="EnsemblBacteria; EBESCT00000012034; | |
334 EBESCP00000011562; EBESCG00000011096." | |
335 /rs_xr="GeneID; 6058639; -." | |
336 /rs_xr="GenomeReviews; CP000948_GR; ECDH10B_0003." | |
337 /rs_xr="KEGG; ecd:ECDH10B_0003; -." | |
338 /rs_xr="HOGENOM; HBG646290; -." | |
339 /rs_xr="OMA; GSAHADN; -." | |
340 /rs_xr="ProtClustDB; PRK01212; -." | |
341 /rs_xr="BioCyc; ECOL316385:ECDH10B_0003-MONOMER; -." | |
342 /rs_xr="GO; GO:0005737; C:cytoplasm; | |
343 IEA:UniProtKB-SubCell." | |
344 /rs_xr="GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW." | |
345 /rs_xr="GO; GO:0004413; F:homoserine kinase activity; | |
346 IEA:EC." | |
347 /rs_xr="GO; GO:0009088; P:threonine biosynthetic process; | |
348 IEA:UniProtKB-KW." | |
349 /rs_xr="HAMAP; MF_00384; Homoser_kinase; 1; -." | |
350 /rs_xr="InterPro; IPR006204; GHMP_kinase." | |
351 /rs_xr="InterPro; IPR013750; GHMP_kinase_C." | |
352 /rs_xr="InterPro; IPR006203; GHMP_knse_ATP-bd_CS." | |
353 /rs_xr="InterPro; IPR000870; Homoserine_kin." | |
354 /rs_xr="InterPro; IPR020568; Ribosomal_S5_D2-typ_fold." | |
355 /rs_xr="InterPro; IPR014721; | |
356 Ribosomal_S5_D2-typ_fold_subgr." | |
357 /rs_xr="Gene3D; G3DSA:3.30.230.10; | |
358 Ribosomal_S5_D2-type_fold; 1." | |
359 /rs_xr="Pfam; PF08544; GHMP_kinases_C; 1." | |
360 /rs_xr="Pfam; PF00288; GHMP_kinases_N; 1." | |
361 /rs_xr="PIRSF; PIRSF000676; Homoser_kin; 1." | |
362 /rs_xr="PRINTS; PR00958; HOMSERKINASE." | |
363 /rs_xr="SUPFAM; SSF54211; Ribosomal_S5_D2-typ_fold; 1." | |
364 /rs_xr="TIGRFAMs; TIGR00191; thrB; 1." | |
365 /rs_xr="PROSITE; PS00627; GHMP_KINASES_ATP; 1." | |
366 /transl_table="11" | |
367 /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETF | |
368 SLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACS | |
369 VVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDI | |
370 ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ | |
371 PELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETA | |
372 QRVADWLGKNYLQNQEGFVHICRLDTAGARVLEN" | |
373 | |
374 [Part of this file has been deleted for brevity] | |
375 | |
376 4639201 gcgcagtcgg gcgaaatatc attactacgc cacgccagtt gaactggtgc cgctgttaga | |
377 4639261 ggaaaaatct tcatggatga gccatgccgc gctggtgttt ggtcgcgaag attccgggtt | |
378 4639321 gactaacgaa gagttagcgt tggctgacgt tcttactggt gtgccgatgg tggcggatta | |
379 4639381 tccttcgctc aatctggggc aggcggtgat ggtctattgc tatcaattag caacattaat | |
380 4639441 acaacaaccg gcgaaaagtg atgcaacggc agaccaacat caactgcaag ctttacgcga | |
381 4639501 acgagccatg acattgctga cgactctggc agtggcagat gacataaaac tggtcgactg | |
382 4639561 gttacaacaa cgcctggggc ttttagagca acgagacacg gcaatgttgc accgtttgct | |
383 4639621 gcatgatatt gaaaaaaata tcaccaaata aaaaacgcct tagtaagtat ttttc | |
384 // | |
385 | |
386 Data files | |
387 | |
388 None. | |
389 | |
390 Notes | |
391 | |
392 None. | |
393 | |
394 References | |
395 | |
396 Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and | |
397 Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench | |
398 for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306. | |
399 | |
400 Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for | |
401 large-scale analysis of high-throughput omics data, J. Pest Sci., | |
402 31, 7. | |
403 | |
404 Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome | |
405 Analysis Environment with REST and SOAP Web Service Interfaces, | |
406 Nucleic Acids Res., 38, W700-W705. | |
407 | |
408 Warnings | |
409 | |
410 None. | |
411 | |
412 Diagnostic Error Messages | |
413 | |
414 None. | |
415 | |
416 Exit status | |
417 | |
418 It always exits with a status of 0. | |
419 | |
420 Known bugs | |
421 | |
422 None. | |
423 | |
424 See also | |
425 | |
426 entret Retrieve sequence entries from flatfile databases and files | |
427 seqret Read and write (return) sequences | |
428 | |
429 Author(s) | |
430 | |
431 Hidetoshi Itaya (celery@g-language.org) | |
432 Institute for Advanced Biosciences, Keio University | |
433 252-0882 Japan | |
434 | |
435 Kazuharu Arakawa (gaou@sfc.keio.ac.jp) | |
436 Institute for Advanced Biosciences, Keio University | |
437 252-0882 Japan | |
438 | |
439 History | |
440 | |
441 2012 - Written by Hidetoshi Itaya | |
442 | |
443 Target users | |
444 | |
445 This program is intended to be used by everyone and everything, from | |
446 naive users to embedded scripts. | |
447 | |
448 Comments | |
449 | |
450 None. | |
451 |