0
|
1 <tool id="compute_p-values_max_variances_feature_occurrences_in_one_dataset_using_discrete_wavelet_transfom" name="Compute P-values and Max Variances for Feature Occurrences" version="1.0.0">
|
|
2 <description>in one dataset using Discrete Wavelet Transfoms</description>
|
|
3
|
|
4 <command interpreter="perl">
|
|
5 execute_dwt_var_perClass.pl $inputFile $outputFile1 $outputFile2 $outputFile3
|
|
6 </command>
|
|
7
|
|
8 <inputs>
|
|
9 <param format="tabular" name="inputFile" type="data" label="Select the input file"/>
|
|
10 </inputs>
|
|
11
|
|
12 <outputs>
|
|
13 <data format="tabular" name="outputFile1"/>
|
|
14 <data format="tabular" name="outputFile2"/>
|
|
15 <data format="pdf" name="outputFile3"/>
|
|
16 </outputs>
|
|
17
|
|
18 <help>
|
|
19
|
|
20 .. class:: infomark
|
|
21
|
|
22 **What it does**
|
|
23
|
|
24 This program generates plots and computes table matrix of maximum variances, p-values, and test orientations at multiple scales for the occurrences of a class of features in one dataset of DNA sequences using multiscale wavelet analysis technique.
|
|
25
|
|
26 The program assumes that the user has one set of DNA sequences, S, which consists of one or more sequences of equal length. Each sequence in S is divided into the same number of multiple intervals n such that n = 2^k, where k is a positive integer and k >= 1. Thus, n could be any value of the set {2, 4, 8, 16, 32, 64, 128, ...}. k represents the number of scales.
|
|
27
|
|
28 The program has one input file obtained as follows:
|
|
29
|
|
30 For a given set of features, say motifs, the user counts the number of occurrences of each feature in each interval of each sequence in S, and builds a tabular file representing the count results in each interval of S. This is the input file of the program.
|
|
31
|
|
32 The program gives three output files:
|
|
33
|
|
34 - The first output file is a TABULAR format file giving the scales at which each features has a maximum variances.
|
|
35 - The second output file is a TABULAR format file representing the variances, p-values, and test orientation for the occurrences of features at each scale based on a random permutation test and using multiscale wavelet analysis technique.
|
|
36 - The third output file is a PDF file plotting the wavelet variances of each feature at each scale.
|
|
37
|
|
38 -----
|
|
39
|
|
40 .. class:: warningmark
|
|
41
|
|
42 **Note**
|
|
43
|
|
44 - If the number of features is greater than 12, the program will divide each output file into subfiles, such that each subfile represents the results of a group of 12 features except the last subfile that will represents the results of the rest. For example, if the number of features is 17, the p-values file will consists of two subfiles, the first for the features 1-12 and the second for the features 13-17. As for the PDF file, it will consists of two pages in this case.
|
|
45 - In order to obtain empirical p-values, a random perumtation test is implemented by the program, which results in the fact that the program gives slightly different results each time it is run on the same input file.
|
|
46
|
|
47 -----
|
|
48
|
|
49
|
|
50 **Example**
|
|
51
|
|
52 Counting the occurrences of 8 features (motifs) in 16 intervals (one line per interval) of set of DNA sequences in S gives the following tabular file::
|
|
53
|
|
54 deletionHoptspot insertionHoptspot dnaPolPauseFrameshift indelHotspot topoisomeraseCleavageSite translinTarget vDjRecombinationSignal x-likeSite
|
|
55 226 403 416 221 1165 832 749 1056
|
|
56 236 444 380 241 1223 746 782 1207
|
|
57 242 496 391 195 1116 643 770 1219
|
|
58 243 429 364 191 1118 694 783 1223
|
|
59 244 410 371 236 1063 692 805 1233
|
|
60 230 386 370 217 1087 657 787 1215
|
|
61 275 404 402 214 1044 697 831 1188
|
|
62 265 443 365 231 1086 694 782 1184
|
|
63 255 390 354 246 1114 642 773 1176
|
|
64 281 384 406 232 1102 719 787 1191
|
|
65 263 459 369 251 1135 643 810 1215
|
|
66 280 433 400 251 1159 701 777 1151
|
|
67 278 385 382 231 1147 697 707 1161
|
|
68 248 393 389 211 1162 723 759 1183
|
|
69 251 403 385 246 1114 752 776 1153
|
|
70 239 383 347 227 1172 759 789 1141
|
|
71
|
|
72 We notice that the number of scales here is 4 because 16 = 2^4. Runnig the program on the above input file gives the following 3 output files:
|
|
73
|
|
74 The first output file::
|
|
75
|
|
76 motifs max_var at scale
|
|
77 deletionHoptspot NA
|
|
78 insertionHoptspot NA
|
|
79 dnaPolPauseFrameshift NA
|
|
80 indelHotspot NA
|
|
81 topoisomeraseCleavageSite 3
|
|
82 translinTarget NA
|
|
83 vDjRecombinationSignal NA
|
|
84 x.likeSite NA
|
|
85
|
|
86 The second output file::
|
|
87
|
|
88 motif 1_var 1_pval 1_test 2_var 2_pval 2_test 3_var 3_pval 3_test 4_var 4_pval 4_test
|
|
89
|
|
90 deletionHoptspot 0.457 0.048 L 1.18 0.334 R 1.61 0.194 R 3.41 0.055 R
|
|
91 insertionHoptspot 0.556 0.109 L 1.34 0.272 R 1.59 0.223 R 2.02 0.157 R
|
|
92 dnaPolPauseFrameshift 1.42 0.089 R 0.66 0.331 L 0.421 0.305 L 0.121 0.268 L
|
|
93 indelHotspot 0.373 0.021 L 1.36 0.254 R 1.24 0.301 R 4.09 0.047 R
|
|
94 topoisomeraseCleavageSite 0.305 0.002 L 0.936 0.489 R 3.78 0.01 R 1.25 0.272 R
|
|
95 translinTarget 0.525 0.061 L 1.69 0.11 R 2.02 0.131 R 0.00891 0.069 L
|
|
96 vDjRecombinationSignal 0.68 0.138 L 0.957 0.46 R 2.35 0.071 R 1.03 0.357 R
|
|
97 x.likeSite 0.928 0.402 L 1.33 0.261 R 0.735 0.431 L 0.783 0.422 R
|
|
98
|
|
99 The third output file:
|
|
100
|
|
101 .. image:: ${static_path}/operation_icons/dwt_var_perClass.png
|
|
102
|
|
103 </help>
|
|
104
|
|
105 </tool>
|