Mercurial > repos > devteam > dwt_var_perclass
comparison execute_dwt_var_perClass.xml @ 0:cb422b6f49d2 draft
Imported from capsule None
author | devteam |
---|---|
date | Mon, 27 Jan 2014 09:26:11 -0500 |
parents | |
children | 781e68074f84 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:cb422b6f49d2 |
---|---|
1 <tool id="compute_p-values_max_variances_feature_occurrences_in_one_dataset_using_discrete_wavelet_transfom" name="Compute P-values and Max Variances for Feature Occurrences" version="1.0.0"> | |
2 <description>in one dataset using Discrete Wavelet Transfoms</description> | |
3 | |
4 <command interpreter="perl"> | |
5 execute_dwt_var_perClass.pl $inputFile $outputFile1 $outputFile2 $outputFile3 | |
6 </command> | |
7 | |
8 <inputs> | |
9 <param format="tabular" name="inputFile" type="data" label="Select the input file"/> | |
10 </inputs> | |
11 | |
12 <outputs> | |
13 <data format="tabular" name="outputFile1"/> | |
14 <data format="tabular" name="outputFile2"/> | |
15 <data format="pdf" name="outputFile3"/> | |
16 </outputs> | |
17 | |
18 <help> | |
19 | |
20 .. class:: infomark | |
21 | |
22 **What it does** | |
23 | |
24 This program generates plots and computes table matrix of maximum variances, p-values, and test orientations at multiple scales for the occurrences of a class of features in one dataset of DNA sequences using multiscale wavelet analysis technique. | |
25 | |
26 The program assumes that the user has one set of DNA sequences, S, which consists of one or more sequences of equal length. Each sequence in S is divided into the same number of multiple intervals n such that n = 2^k, where k is a positive integer and k >= 1. Thus, n could be any value of the set {2, 4, 8, 16, 32, 64, 128, ...}. k represents the number of scales. | |
27 | |
28 The program has one input file obtained as follows: | |
29 | |
30 For a given set of features, say motifs, the user counts the number of occurrences of each feature in each interval of each sequence in S, and builds a tabular file representing the count results in each interval of S. This is the input file of the program. | |
31 | |
32 The program gives three output files: | |
33 | |
34 - The first output file is a TABULAR format file giving the scales at which each features has a maximum variances. | |
35 - The second output file is a TABULAR format file representing the variances, p-values, and test orientation for the occurrences of features at each scale based on a random permutation test and using multiscale wavelet analysis technique. | |
36 - The third output file is a PDF file plotting the wavelet variances of each feature at each scale. | |
37 | |
38 ----- | |
39 | |
40 .. class:: warningmark | |
41 | |
42 **Note** | |
43 | |
44 - If the number of features is greater than 12, the program will divide each output file into subfiles, such that each subfile represents the results of a group of 12 features except the last subfile that will represents the results of the rest. For example, if the number of features is 17, the p-values file will consists of two subfiles, the first for the features 1-12 and the second for the features 13-17. As for the PDF file, it will consists of two pages in this case. | |
45 - In order to obtain empirical p-values, a random perumtation test is implemented by the program, which results in the fact that the program gives slightly different results each time it is run on the same input file. | |
46 | |
47 ----- | |
48 | |
49 | |
50 **Example** | |
51 | |
52 Counting the occurrences of 8 features (motifs) in 16 intervals (one line per interval) of set of DNA sequences in S gives the following tabular file:: | |
53 | |
54 deletionHoptspot insertionHoptspot dnaPolPauseFrameshift indelHotspot topoisomeraseCleavageSite translinTarget vDjRecombinationSignal x-likeSite | |
55 226 403 416 221 1165 832 749 1056 | |
56 236 444 380 241 1223 746 782 1207 | |
57 242 496 391 195 1116 643 770 1219 | |
58 243 429 364 191 1118 694 783 1223 | |
59 244 410 371 236 1063 692 805 1233 | |
60 230 386 370 217 1087 657 787 1215 | |
61 275 404 402 214 1044 697 831 1188 | |
62 265 443 365 231 1086 694 782 1184 | |
63 255 390 354 246 1114 642 773 1176 | |
64 281 384 406 232 1102 719 787 1191 | |
65 263 459 369 251 1135 643 810 1215 | |
66 280 433 400 251 1159 701 777 1151 | |
67 278 385 382 231 1147 697 707 1161 | |
68 248 393 389 211 1162 723 759 1183 | |
69 251 403 385 246 1114 752 776 1153 | |
70 239 383 347 227 1172 759 789 1141 | |
71 | |
72 We notice that the number of scales here is 4 because 16 = 2^4. Runnig the program on the above input file gives the following 3 output files: | |
73 | |
74 The first output file:: | |
75 | |
76 motifs max_var at scale | |
77 deletionHoptspot NA | |
78 insertionHoptspot NA | |
79 dnaPolPauseFrameshift NA | |
80 indelHotspot NA | |
81 topoisomeraseCleavageSite 3 | |
82 translinTarget NA | |
83 vDjRecombinationSignal NA | |
84 x.likeSite NA | |
85 | |
86 The second output file:: | |
87 | |
88 motif 1_var 1_pval 1_test 2_var 2_pval 2_test 3_var 3_pval 3_test 4_var 4_pval 4_test | |
89 | |
90 deletionHoptspot 0.457 0.048 L 1.18 0.334 R 1.61 0.194 R 3.41 0.055 R | |
91 insertionHoptspot 0.556 0.109 L 1.34 0.272 R 1.59 0.223 R 2.02 0.157 R | |
92 dnaPolPauseFrameshift 1.42 0.089 R 0.66 0.331 L 0.421 0.305 L 0.121 0.268 L | |
93 indelHotspot 0.373 0.021 L 1.36 0.254 R 1.24 0.301 R 4.09 0.047 R | |
94 topoisomeraseCleavageSite 0.305 0.002 L 0.936 0.489 R 3.78 0.01 R 1.25 0.272 R | |
95 translinTarget 0.525 0.061 L 1.69 0.11 R 2.02 0.131 R 0.00891 0.069 L | |
96 vDjRecombinationSignal 0.68 0.138 L 0.957 0.46 R 2.35 0.071 R 1.03 0.357 R | |
97 x.likeSite 0.928 0.402 L 1.33 0.261 R 0.735 0.431 L 0.783 0.422 R | |
98 | |
99 The third output file: | |
100 | |
101 .. image:: ${static_path}/operation_icons/dwt_var_perClass.png | |
102 | |
103 </help> | |
104 | |
105 </tool> |