Mercurial > repos > ktnyt > gembassy
comparison GEMBASSY-1.0.3/doc/html/gshuffleseq.html @ 2:8947fca5f715 draft default tip
Uploaded
| author | ktnyt |
|---|---|
| date | Fri, 26 Jun 2015 05:21:44 -0400 |
| parents | 84a17b3fad1f |
| children |
comparison
equal
deleted
inserted
replaced
| 1:84a17b3fad1f | 2:8947fca5f715 |
|---|---|
| 1 <!--START OF HEADER - DON'T ALTER --> | |
| 2 | |
| 3 <HTML> | |
| 4 <HEAD> | |
| 5 <TITLE> EMBOSS: gshuffleseq </TITLE> | |
| 6 </HEAD> | |
| 7 <BODY BGCOLOR="#FFFFFF" text="#000000"> | |
| 8 | |
| 9 | |
| 10 | |
| 11 <table align=center border=0 cellspacing=0 cellpadding=0> | |
| 12 <tr><td valign=top> | |
| 13 <A HREF="/" ONMOUSEOVER="self.status='Go to the EMBOSS home page';return true"><img border=0 src="http://soap.g-language.org/gembassy/emboss_explorer/manual/emboss_icon.jpg" alt="" width=150 height=48></a> | |
| 14 </td> | |
| 15 <td align=left valign=middle> | |
| 16 <b><font size="+6"> | |
| 17 gshuffleseq | |
| 18 </font></b> | |
| 19 </td></tr> | |
| 20 </table> | |
| 21 <br> | |
| 22 <p> | |
| 23 | |
| 24 | |
| 25 <!--END OF HEADER--> | |
| 26 | |
| 27 | |
| 28 | |
| 29 | |
| 30 | |
| 31 | |
| 32 <H2> Function </H2> | |
| 33 Create randomized sequence with conserved k-mer composition | |
| 34 <!-- | |
| 35 DON'T WRITE ANYTHING HERE. | |
| 36 IT IS DONE FOR YOU. | |
| 37 --> | |
| 38 | |
| 39 | |
| 40 | |
| 41 | |
| 42 <H2>Description</H2> | |
| 43 <p> | |
| 44 gshuffleseq shuffles and randomizes the given sequence, conserving the<br /> | |
| 45 nucleotide/peptide k-mer content of the original sequence.<br /> | |
| 46 <br /> | |
| 47 For k=1, i.e. shuffling sequencing preserving single nucleotide composition,<br /> | |
| 48 Fisher-Yates Algorithm is employed.<br /> | |
| 49 For k>1, shuffling preserves all k-mers (all k where k=1~k). For example,<br /> | |
| 50 k=3 preserves all triplet, doublet, and single nucleotide composition.<br /> | |
| 51 Algorithm for k-mer preserved shuffling is non-trivial, which is solved<br /> | |
| 52 by graph theoretical approach with Eulerian random walks in the graph of<br /> | |
| 53 k-1-mers. See Jiang et al., Kandel et al., and Propp et al., for details<br /> | |
| 54 of this algorithm.<br /> | |
| 55 <br /> | |
| 56 G-language SOAP service is provided by the<br /> | |
| 57 Institute for Advanced Biosciences, Keio University.<br /> | |
| 58 The original web service is located at the following URL:<br /> | |
| 59 <br /> | |
| 60 http://www.g-language.org/wiki/soap<br /> | |
| 61 <br /> | |
| 62 WSDL(RPC/Encoded) file is located at:<br /> | |
| 63 <br /> | |
| 64 http://soap.g-language.org/g-language.wsdl<br /> | |
| 65 <br /> | |
| 66 Documentation on G-language Genome Analysis Environment methods are<br /> | |
| 67 provided at the Document Center<br /> | |
| 68 <br /> | |
| 69 http://ws.g-language.org/gdoc/<br /> | |
| 70 <br /> | |
| 71 | |
| 72 </p> | |
| 73 | |
| 74 <H2>Usage</H2> | |
| 75 | |
| 76 Here is a sample session with gshuffleseq | |
| 77 | |
| 78 <table width="90%"><tr><td bgcolor="#CCFFFF"><pre> | |
| 79 | |
| 80 % gshuffleseq tsw:hbb_human | |
| 81 Create randomized sequence with conserved k-mer composition | |
| 82 output sequence [hbb_human.fasta]: | |
| 83 | |
| 84 </pre></td></tr></table> | |
| 85 | |
| 86 Go to the <a href="#input">input files</a> for this example<br> | |
| 87 Go to the <a href="#output">output files</a> for this example<br><br> | |
| 88 | |
| 89 <h2>Command line arguments</h2> | |
| 90 | |
| 91 <table border cellspacing=0 cellpadding=3 bgcolor="#ccccff"> | |
| 92 <tr bgcolor="#FFFFCC"> | |
| 93 <th align="left">Qualifier</th> | |
| 94 <th align="left">Type</th> | |
| 95 <th align="left">Description</th> | |
| 96 <th align="left">Allowed values</th> | |
| 97 <th align="left">Default</th> | |
| 98 </tr> | |
| 99 | |
| 100 <tr bgcolor="#FFFFCC"> | |
| 101 <th align="left" colspan=5>Standard (Mandatory) qualifiers</th> | |
| 102 </tr> | |
| 103 | |
| 104 <tr bgcolor="#FFFFCC"> | |
| 105 <td>[-sequence]<br>(Parameter 1)</td> | |
| 106 <td>seqall</td> | |
| 107 <td>Sequence(s) filename and optional format, or reference (input USA)</td> | |
| 108 <td>Readable sequence(s)</td> | |
| 109 <td><b>Required</b></td> | |
| 110 </tr> | |
| 111 | |
| 112 <tr bgcolor="#FFFFCC"> | |
| 113 <td>[-outseq]<br>(Parameter 2)</td> | |
| 114 <td>seqout</td> | |
| 115 <td>Sequence filename and optional format (output USA)</td> | |
| 116 <td>Writeable sequence</td> | |
| 117 <td><i><*></i>.<i>format</i></td> | |
| 118 </tr> | |
| 119 | |
| 120 <tr bgcolor="#FFFFCC"> | |
| 121 <th align="left" colspan=5>Additional (Optional) qualifiers</th> | |
| 122 </tr> | |
| 123 | |
| 124 <tr> | |
| 125 <td colspan=5>(none)</td> | |
| 126 </tr> | |
| 127 | |
| 128 <tr bgcolor="#FFFFCC"> | |
| 129 <th align="left" colspan=5>Advanced (Unprompted) qualifiers</th> | |
| 130 </tr> | |
| 131 | |
| 132 <tr bgcolor="#FFFFCC"> | |
| 133 <td>-k</td> | |
| 134 <td>integer</td> | |
| 135 <td>Sequence k-mer to preserve composition</td> | |
| 136 <td>Any integer value</td> | |
| 137 <td>1</td> | |
| 138 </tr> | |
| 139 | |
| 140 </table> | |
| 141 | |
| 142 | |
| 143 <h2 id="input">Input file format</h2> | |
| 144 | |
| 145 <p> | |
| 146 The database definitions for following commands are available at<br /> | |
| 147 http://soap.g-language.org/kbws/embossrc<br /> | |
| 148 <br /> | |
| 149 gshuffleseq reads one or more nucleotide or protein sequences.<br /> | |
| 150 <br /> | |
| 151 | |
| 152 </p> | |
| 153 | |
| 154 <h2 id="output">Output file format</h2> | |
| 155 | |
| 156 <p> | |
| 157 The output from gshuffleseq is to .<br /> | |
| 158 <br /> | |
| 159 File: hbb_human.fasta<br /> | |
| 160 <br /> | |
| 161 <table width="90%"><tr><td bgcolor="#CCFFCC"> | |
| 162 >HBB_HUMAN P68871 Hemoglobin subunit beta (Beta-globin) (Hemoglobin beta chain) (LVV-hemorphin-7)<br /> | |
| 163 KGWLDLVAGAAHFVRRLKMLLEVDWAAHEERVGTSNPNNALKNEAADVEVHSPTHVNPTQ<br /> | |
| 164 LVLVQVGFGTLHLQGVECPKPKPGGVALKPVAHLLAMKECTLVALGSDFYVDHGSDGEDK<br /> | |
| 165 GFKAYVLATSFFAYTNFLHGKVKHVLF<br /> | |
| 166 </td></tr></table> | |
| 167 | |
| 168 </p> | |
| 169 | |
| 170 <h2>Data files</h2> | |
| 171 | |
| 172 <p> | |
| 173 None. | |
| 174 </p> | |
| 175 | |
| 176 <h2>Notes</h2> | |
| 177 | |
| 178 <p> | |
| 179 None. | |
| 180 </p> | |
| 181 | |
| 182 <h2>References</h2> | |
| 183 | |
| 184 <pre> | |
| 185 Fisher R.A. and Yates F. (1938) "Example 12", Statistical Tables, London | |
| 186 | |
| 187 Durstenfeld R. (1964) "Algorithm 235: Random permutation", CACM 7(7):420 | |
| 188 | |
| 189 Jiang M., Anderson J., Gillespie J., and Mayne M. (2008) "uShuffle: | |
| 190 a useful tool for shuffling biological sequences while preserving the | |
| 191 k-let counts", BMC Bioinformatics 9:192 | |
| 192 | |
| 193 Kandel D., Matias Y., Unver R., and Winker P. (1996) "Shuffling biological | |
| 194 sequences", Discrete Applied Mathematics 71(1-3):171-185 | |
| 195 | |
| 196 Propp J.G. and Wilson D.B. (1998) "How to get a perfectly random sample | |
| 197 from a generic Markov chain and generate a random spanning tree of a | |
| 198 directed graph", Journal of Algorithms 27(2):170-217 | |
| 199 | |
| 200 Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and | |
| 201 Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench | |
| 202 for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306. | |
| 203 | |
| 204 Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for | |
| 205 large-scale analysis of high-throughput omics data, J. Pest Sci., | |
| 206 31, 7. | |
| 207 | |
| 208 Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome | |
| 209 Analysis Environment with REST and SOAP Web Service Interfaces, | |
| 210 Nucleic Acids Res., 38, W700-W705. | |
| 211 | |
| 212 </pre> | |
| 213 | |
| 214 <h2>Warnings</h2> | |
| 215 | |
| 216 <p> | |
| 217 None. | |
| 218 </p> | |
| 219 | |
| 220 <h2>Diagnostic Error Messages</h2> | |
| 221 | |
| 222 <p> | |
| 223 None. | |
| 224 </p> | |
| 225 | |
| 226 <h2>Exit status</h2> | |
| 227 | |
| 228 <p> | |
| 229 It always exits with a status of 0. | |
| 230 </p> | |
| 231 | |
| 232 <h2>Known bugs</h2> | |
| 233 | |
| 234 <p> | |
| 235 None. | |
| 236 </p> | |
| 237 | |
| 238 <h2>See also</h2> | |
| 239 | |
| 240 <table border cellpadding=4 bgcolor="#FFFFF0"><tr><th>Program name</th> | |
| 241 <th>Description</th></tr> | |
| 242 | |
| 243 <tr> | |
| 244 <td><a href="shuffleseq.html">shuffleseq</a></td> | |
| 245 <td>Shuffles a set of sequences maintaining composition</td> | |
| 246 </tr> | |
| 247 | |
| 248 </table> | |
| 249 | |
| 250 <h2>Author(s)</h2> | |
| 251 | |
| 252 <pre> | |
| 253 Hidetoshi Itaya (celery@g-language.org) | |
| 254 Institute for Advanced Biosciences, Keio University | |
| 255 252-0882 Japan | |
| 256 | |
| 257 Kazuharu Arakawa (gaou@sfc.keio.ac.jp) | |
| 258 Institute for Advanced Biosciences, Keio University | |
| 259 252-0882 Japan</pre> | |
| 260 | |
| 261 <h2>History</h2> | |
| 262 | |
| 263 2012 - Written by Hidetoshi Itaya | |
| 264 | |
| 265 <h2>Target users</h2> | |
| 266 | |
| 267 This program is intended to be used by everyone and everything, from | |
| 268 naive users to embedded scrips. | |
| 269 | |
| 270 <h2>Comments</h2> | |
| 271 | |
| 272 None. | |
| 273 | |
| 274 </BODY> | |
| 275 </HTML> |
