comparison GEMBASSY-1.0.3/doc/html/gshuffleseq.html @ 0:8300eb051bea draft

Initial upload
author ktnyt
date Fri, 26 Jun 2015 05:19:29 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:8300eb051bea
1 <!--START OF HEADER - DON'T ALTER -->
2
3 <HTML>
4 <HEAD>
5 <TITLE> EMBOSS: gshuffleseq </TITLE>
6 </HEAD>
7 <BODY BGCOLOR="#FFFFFF" text="#000000">
8
9
10
11 <table align=center border=0 cellspacing=0 cellpadding=0>
12 <tr><td valign=top>
13 <A HREF="/" ONMOUSEOVER="self.status='Go to the EMBOSS home page';return true"><img border=0 src="http://soap.g-language.org/gembassy/emboss_explorer/manual/emboss_icon.jpg" alt="" width=150 height=48></a>
14 </td>
15 <td align=left valign=middle>
16 <b><font size="+6">
17 gshuffleseq
18 </font></b>
19 </td></tr>
20 </table>
21 <br>&nbsp;
22 <p>
23
24
25 <!--END OF HEADER-->
26
27
28
29
30
31
32 <H2> Function </H2>
33 Create randomized sequence with conserved k-mer composition
34 <!--
35 DON'T WRITE ANYTHING HERE.
36 IT IS DONE FOR YOU.
37 -->
38
39
40
41
42 <H2>Description</H2>
43 <p>
44 gshuffleseq shuffles and randomizes the given sequence, conserving the<br />
45 nucleotide/peptide k-mer content of the original sequence.<br />
46 <br />
47 For k=1, i.e. shuffling sequencing preserving single nucleotide composition,<br />
48 Fisher-Yates Algorithm is employed.<br />
49 For k>1, shuffling preserves all k-mers (all k where k=1~k). For example,<br />
50 k=3 preserves all triplet, doublet, and single nucleotide composition.<br />
51 Algorithm for k-mer preserved shuffling is non-trivial, which is solved<br />
52 by graph theoretical approach with Eulerian random walks in the graph of<br />
53 k-1-mers. See Jiang et al., Kandel et al., and Propp et al., for details<br />
54 of this algorithm.<br />
55 <br />
56 G-language SOAP service is provided by the<br />
57 Institute for Advanced Biosciences, Keio University.<br />
58 The original web service is located at the following URL:<br />
59 <br />
60 http://www.g-language.org/wiki/soap<br />
61 <br />
62 WSDL(RPC/Encoded) file is located at:<br />
63 <br />
64 http://soap.g-language.org/g-language.wsdl<br />
65 <br />
66 Documentation on G-language Genome Analysis Environment methods are<br />
67 provided at the Document Center<br />
68 <br />
69 http://ws.g-language.org/gdoc/<br />
70 <br />
71
72 </p>
73
74 <H2>Usage</H2>
75
76 Here is a sample session with gshuffleseq
77
78 <table width="90%"><tr><td bgcolor="#CCFFFF"><pre>
79
80 % gshuffleseq tsw:hbb_human
81 Create randomized sequence with conserved k-mer composition
82 output sequence [hbb_human.fasta]:
83
84 </pre></td></tr></table>
85
86 Go to the <a href="#input">input files</a> for this example<br>
87 Go to the <a href="#output">output files</a> for this example<br><br>
88
89 <h2>Command line arguments</h2>
90
91 <table border cellspacing=0 cellpadding=3 bgcolor="#ccccff">
92 <tr bgcolor="#FFFFCC">
93 <th align="left">Qualifier</th>
94 <th align="left">Type</th>
95 <th align="left">Description</th>
96 <th align="left">Allowed values</th>
97 <th align="left">Default</th>
98 </tr>
99
100 <tr bgcolor="#FFFFCC">
101 <th align="left" colspan=5>Standard (Mandatory) qualifiers</th>
102 </tr>
103
104 <tr bgcolor="#FFFFCC">
105 <td>[-sequence]<br>(Parameter 1)</td>
106 <td>seqall</td>
107 <td>Sequence(s) filename and optional format, or reference (input USA)</td>
108 <td>Readable sequence(s)</td>
109 <td><b>Required</b></td>
110 </tr>
111
112 <tr bgcolor="#FFFFCC">
113 <td>[-outseq]<br>(Parameter 2)</td>
114 <td>seqout</td>
115 <td>Sequence filename and optional format (output USA)</td>
116 <td>Writeable sequence</td>
117 <td><i>&lt;*&gt;</i>.<i>format</i></td>
118 </tr>
119
120 <tr bgcolor="#FFFFCC">
121 <th align="left" colspan=5>Additional (Optional) qualifiers</th>
122 </tr>
123
124 <tr>
125 <td colspan=5>(none)</td>
126 </tr>
127
128 <tr bgcolor="#FFFFCC">
129 <th align="left" colspan=5>Advanced (Unprompted) qualifiers</th>
130 </tr>
131
132 <tr bgcolor="#FFFFCC">
133 <td>-k</td>
134 <td>integer</td>
135 <td>Sequence k-mer to preserve composition</td>
136 <td>Any integer value</td>
137 <td>1</td>
138 </tr>
139
140 </table>
141
142
143 <h2 id="input">Input file format</h2>
144
145 <p>
146 The database definitions for following commands are available at<br />
147 http://soap.g-language.org/kbws/embossrc<br />
148 <br />
149 gshuffleseq reads one or more nucleotide or protein sequences.<br />
150 <br />
151
152 </p>
153
154 <h2 id="output">Output file format</h2>
155
156 <p>
157 The output from gshuffleseq is to .<br />
158 <br />
159 File: hbb_human.fasta<br />
160 <br />
161 <table width="90%"><tr><td bgcolor="#CCFFCC">
162 >HBB_HUMAN P68871 Hemoglobin subunit beta (Beta-globin) (Hemoglobin beta chain) (LVV-hemorphin-7)<br />
163 KGWLDLVAGAAHFVRRLKMLLEVDWAAHEERVGTSNPNNALKNEAADVEVHSPTHVNPTQ<br />
164 LVLVQVGFGTLHLQGVECPKPKPGGVALKPVAHLLAMKECTLVALGSDFYVDHGSDGEDK<br />
165 GFKAYVLATSFFAYTNFLHGKVKHVLF<br />
166 </td></tr></table>
167
168 </p>
169
170 <h2>Data files</h2>
171
172 <p>
173 None.
174 </p>
175
176 <h2>Notes</h2>
177
178 <p>
179 None.
180 </p>
181
182 <h2>References</h2>
183
184 <pre>
185 Fisher R.A. and Yates F. (1938) "Example 12", Statistical Tables, London
186
187 Durstenfeld R. (1964) "Algorithm 235: Random permutation", CACM 7(7):420
188
189 Jiang M., Anderson J., Gillespie J., and Mayne M. (2008) "uShuffle:
190 a useful tool for shuffling biological sequences while preserving the
191 k-let counts", BMC Bioinformatics 9:192
192
193 Kandel D., Matias Y., Unver R., and Winker P. (1996) "Shuffling biological
194 sequences", Discrete Applied Mathematics 71(1-3):171-185
195
196 Propp J.G. and Wilson D.B. (1998) "How to get a perfectly random sample
197 from a generic Markov chain and generate a random spanning tree of a
198 directed graph", Journal of Algorithms 27(2):170-217
199
200 Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
201 Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
202 for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.
203
204 Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
205 large-scale analysis of high-throughput omics data, J. Pest Sci.,
206 31, 7.
207
208 Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
209 Analysis Environment with REST and SOAP Web Service Interfaces,
210 Nucleic Acids Res., 38, W700-W705.
211
212 </pre>
213
214 <h2>Warnings</h2>
215
216 <p>
217 None.
218 </p>
219
220 <h2>Diagnostic Error Messages</h2>
221
222 <p>
223 None.
224 </p>
225
226 <h2>Exit status</h2>
227
228 <p>
229 It always exits with a status of 0.
230 </p>
231
232 <h2>Known bugs</h2>
233
234 <p>
235 None.
236 </p>
237
238 <h2>See also</h2>
239
240 <table border cellpadding=4 bgcolor="#FFFFF0"><tr><th>Program name</th>
241 <th>Description</th></tr>
242
243 <tr>
244 <td><a href="shuffleseq.html">shuffleseq</a></td>
245 <td>Shuffles a set of sequences maintaining composition</td>
246 </tr>
247
248 </table>
249
250 <h2>Author(s)</h2>
251
252 <pre>
253 Hidetoshi Itaya (celery@g-language.org)
254 Institute for Advanced Biosciences, Keio University
255 252-0882 Japan
256
257 Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
258 Institute for Advanced Biosciences, Keio University
259 252-0882 Japan</pre>
260
261 <h2>History</h2>
262
263 2012 - Written by Hidetoshi Itaya
264
265 <h2>Target users</h2>
266
267 This program is intended to be used by everyone and everything, from
268 naive users to embedded scrips.
269
270 <h2>Comments</h2>
271
272 None.
273
274 </BODY>
275 </HTML>