comparison GEMBASSY-1.0.3/doc/text/gbaseentropy.txt @ 0:8300eb051bea draft

Initial upload
author ktnyt
date Fri, 26 Jun 2015 05:19:29 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:8300eb051bea
1 gbaseentropy
2 Function
3
4 Calculates and graphs the sequence conservation using Shanon uncertainty
5
6 Description
7
8 This function calculates and graphs the sequence conservation in regions
9 around the start/stop codons using Shanon uncertainty (entropy). Smaller
10 values resemble higher conservation where the minumum value is 0 and the
11 maximum value is 2. The entropy is typically the lowest around position 0
12 (start/stop codon position).
13
14 The entropy H at position i with distribution P(i) is calculated as follows:
15 H(P(i)) = -sum(P(i,j) * log(2,P(i,j)))
16
17 G-language SOAP service is provided by the
18 Institute for Advanced Biosciences, Keio University.
19 The original web service is located at the following URL:
20
21 http://www.g-language.org/wiki/soap
22
23 WSDL(RPC/Encoded) file is located at:
24
25 http://soap.g-language.org/g-language.wsdl
26
27 Documentation on G-language Genome Analysis Environment methods are
28 provided at the Document Center
29
30 http://ws.g-language.org/gdoc/
31
32 Usage
33
34 Here is a sample session with gbaseentropy
35
36 % gbaseentropy refseqn:NC_000913
37 Calculates and graphs the sequence conservation using Shanon uncertainty
38 (entropy)
39 Program compseq output file (optional) [nc_000913.gbaseentropy]:
40
41 Go to the input files for this example
42 Go to the output files for this example
43
44 Example 2
45
46 % gbaseentropy refseqn:NC_000913 -plot -graph png
47 Calculates and graphs the sequence conservation using Shanon uncertainty
48 (entropy)
49 Created gbaseentropy.1.png
50
51 Go to the input files for this example
52 Go to the output files for this example
53
54 Command line arguments
55
56 Calculates and graphs the sequence conservation using Shanon uncertainty
57 (entropy)
58 Version: EMBOSS:6.5.7.0 GEMBASSY:1.0.1
59
60 Standard (Mandatory) qualifiers (* if not always prompted):
61 [-sequence] seqall Nucleotide sequence(s) filename and optional
62 format, or reference (input USA)
63 * -graph xygraph [$EMBOSS_GRAPHICS value, or x11] Graph type
64 (ps, hpgl, hp7470, hp7580, meta, cps, x11,
65 tek, tekt, none, data, xterm, png, gif, svg)
66 * -outfile outfile [*.gbaseentropy] Program compseq output file
67 (optional)
68
69 Additional (Optional) qualifiers: (none)
70 Advanced (Unprompted) qualifiers:
71 -position selection [start] Either 'start' (around start codon)
72 or 'end' (around stop codon) to create the
73 PWM
74 -patlen integer [3] Length of oligomer to count (Any integer
75 value)
76 -upstream integer [30] Length upstream of specified position
77 to create PWM (Any integer value)
78 -downstream integer [30] Length downstream of specified position
79 to create PWM (Any integer value)
80 -[no]accid boolean [Y] Include to use sequence accession ID as
81 query
82 -plot toggle [N] Include to plot result
83
84 Associated qualifiers:
85
86 "-sequence" associated qualifiers
87 -sbegin1 integer Start of each sequence to be used
88 -send1 integer End of each sequence to be used
89 -sreverse1 boolean Reverse (if DNA)
90 -sask1 boolean Ask for begin/end/reverse
91 -snucleotide1 boolean Sequence is nucleotide
92 -sprotein1 boolean Sequence is protein
93 -slower1 boolean Make lower case
94 -supper1 boolean Make upper case
95 -scircular1 boolean Sequence is circular
96 -sformat1 string Input sequence format
97 -iquery1 string Input query fields or ID list
98 -ioffset1 integer Input start position offset
99 -sdbname1 string Database name
100 -sid1 string Entryname
101 -ufo1 string UFO features
102 -fformat1 string Features format
103 -fopenfile1 string Features file name
104
105 "-graph" associated qualifiers
106 -gprompt boolean Graph prompting
107 -gdesc string Graph description
108 -gtitle string Graph title
109 -gsubtitle string Graph subtitle
110 -gxtitle string Graph x axis title
111 -gytitle string Graph y axis title
112 -goutfile string Output file for non interactive displays
113 -gdirectory string Output directory
114
115 "-outfile" associated qualifiers
116 -odirectory string Output directory
117
118 General qualifiers:
119 -auto boolean Turn off prompts
120 -stdout boolean Write first file to standard output
121 -filter boolean Read first file from standard input, write
122 first file to standard output
123 -options boolean Prompt for standard and additional values
124 -debug boolean Write debug output to program.dbg
125 -verbose boolean Report some/full command line options
126 -help boolean Report command line options and exit. More
127 information on associated and general
128 qualifiers can be found with -help -verbose
129 -warning boolean Report warnings
130 -error boolean Report errors
131 -fatal boolean Report fatal errors
132 -die boolean Report dying program messages
133 -version boolean Report version number and exit
134
135 Input file format
136
137 The database definitions for following commands are available at
138 http://soap.g-language.org/kbws/embossrc
139
140 gbaseentropy reads one or more nucleotide sequences.
141
142 Output file format
143
144 The output from gbaseentropy is to a plain text file or the EMBOSS
145 graphics device.
146
147 File: nc_000913.gbaseentropy
148
149 Sequence: NC_000913
150 -30,1.98284
151 -29,1.97873
152 -28,1.97692
153 -27,1.97595
154 -26,1.97094
155 -25,1.96777
156 -24,1.96272
157 -23,1.96288
158 -22,1.95707
159
160 [Part of this file has been deleted for brevity]
161
162 21,1.93528
163 22,1.94470
164 23,1.95204
165 24,1.93139
166 25,1.95640
167 26,1.95711
168 27,1.93785
169 28,1.96060
170 29,1.94316
171 30,1.92581
172
173
174 Data files
175
176 None.
177
178 Notes
179
180 None.
181
182 References
183
184 Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
185 Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
186 for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.
187
188 Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
189 large-scale analysis of high-throughput omics data, J. Pest Sci.,
190 31, 7.
191
192 Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
193 Analysis Environment with REST and SOAP Web Service Interfaces,
194 Nucleic Acids Res., 38, W700-W705.
195
196 Warnings
197
198 None.
199
200 Diagnostic Error Messages
201
202 None.
203
204 Exit status
205
206 It always exits with a status of 0.
207
208 Known bugs
209
210 None.
211
212 See also
213
214 gbaseinformationcontent Calculates and graphs the sequence conservation
215 using information content
216 gbaserelativeentropy Calculates and graphs the sequence conservation
217 using Kullback-Leibler divergence (relative
218 entropy)
219
220 Author(s)
221
222 Hidetoshi Itaya (celery@g-language.org)
223 Institute for Advanced Biosciences, Keio University
224 252-0882 Japan
225
226 Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
227 Institute for Advanced Biosciences, Keio University
228 252-0882 Japan
229
230 History
231
232 2012 - Written by Hidetoshi Itaya
233 2013 - Fixed by Hidetoshi Itaya
234
235 Target users
236
237 This program is intended to be used by everyone and everything, from
238 naive users to embedded scripts.
239
240 Comments
241
242 None.
243