annotate tools/regVariation/compute_motifs_frequency.xml @ 1:cdcb0ce84a1b

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:45:15 -0500
parents 9071e359b9a3
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="compute_motifs_frequency" name="Compute Motif Frequencies" version="1.0.0">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>in indel flanking regions</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 <command interpreter="perl">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 compute_motifs_frequency.pl $inputFile1 $inputFile2 $inputNumber3 $outputFile1 $outputFile2
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 </command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 <param format="tabular" name="inputFile1" type="data" label="Select motifs file"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14 <param format="tabular" name="inputFile2" type="data" label="Select indel flanking regions file from your history"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 <param type="integer" name="inputNumber3" size="5" value="0" label="What is the size of each window?" help="'0' = all the upstream flanking sequence will be one window only, and the same for the downstream flanking sequence."/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 <data format="tabular" name="outputFile1"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 <data format="tabular" name="outputFile2"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28 <param name="inputFile1" value="motifs1.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29 <param name="inputFile2" value="indelsFlankingSequences1.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30 <param name="inputNumber3" value="0" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 <output name="outputFile1" file="flankingSequencesWindows0.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32 <output name="outputFile2" file="motifFrequencies0.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36 <param name="inputFile1" value="motifs1.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37 <param name="inputFile2" value="indelsFlankingSequences1.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38 <param name="inputNumber3" value="10" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39 <output name="outputFile1" file="flankingSequencesWindows10.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40 <output name="outputFile2" file="motifFrequencies10.tabular" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49 **What it does**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51 This program computes the frequency of motifs in the flanking regions of indels found in a chromosome or a genome.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52 Each indel has an upstream flanking sequence and a downstream flanking one. Each of the upstream and downstream flanking
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53 sequences will be divided into a certain number of windows according to the window size input by the user.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 The frequency of a motif in a certain window in one of the two flanking sequences is the total sum of occurrences of
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55 that motif in that window of that flanking sequence over all indels. The indel flanking regions file will be taken
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 from your history or it will be uploaded, whereas the motifs file should be uploaded.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58 - The first input file is the motifs file and it is a tabular file consisting of two columns:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60 - the first column represents the motif name
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61 - the second column represents the motif sequence, as follows::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63 dnaPolPauseFrameshift1 GAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 dnaPolPauseFrameshift2 ACG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65 xSites1 CCG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67 - The second input file is the indels flanking regions file and it is a tabular file consisting of five columns:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69 - the first column represents the indel start coordinate
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70 - the second column represents the indel end coordinate
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71 - the third column represents the indel length
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72 - the fourth column represents the upstream flanking sequence
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 - the fifth column represents the upstream flanking sequence, as follows::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75 16694766 16694768 3 GTGGGTCCTGCCCAGCCTCTGCCTCAGAGGGAAGAGTAGAGAACTGGG AGAGCAGGTCCTTAGGGAGCCCGAGGAAGTCCCTGACGCCAGCTGTTCTCGCGGACGAA
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 25169542 25169545 4 caagcccacaagccttcagaccatagcaCGGGCTCCAGAGGTGTGAGG CAGGTCAGGTGCTTTAGAAGTCAAAAACTCTCAGTAAGGCAAATCACCCCCTATCTCCT
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77 41929580 41929585 6 ggctgtcgtatggaatctggggctcaggactctgtcccatttctctaa accattctgcTTCAACCCAGACACTGACTGTTTTCCAAATTTACTTGTTTGTTTGTTTT
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
78
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
79
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
80 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
81
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
82 .. class:: warningmark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
83
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
84 **Notes**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
85
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
86 - The lengths of the upstream flanking sequences must be equal for all indels.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
87 - The lengths of the downstream flanking sequences must be equal for all indels.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
88 - If the length of the upstream flanking sequence L is not an integer multiple of the window size S, in other words if L/S = m + r where m is the result of division and r is the remainder, then the upstream flanking sequence will be divided into m windows only starting from the indel, and the rest of the sequence will not be considered. The same rule applies to the downstream flanking sequence.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
89
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
90 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
91
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
92 The **output** of this program is two files:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
93
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
94 - The first output file is a tabular file and represents the windows of both upstream and downstream flanking sequences. It consists of multiple left columns representing the windows of the upstream flanking sequence, followed by one column representing the indels, then followed by multiple right columns representing the windows of the downstream flanking sequence, as follows::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
95
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
96 cgaggtcagg agatcgagac catcctggct aacatggtga aatcccgtct ctactaaaaa indel aaatttatat ttataaacaa ttttaataca cctatgttta ttatacattt
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
97 GCCAGTTTAT GGTCTAACAA GGAGAGAAAC AGGGGGCTGA AGGGGTTTCT TAACCTCCAG indel TTCCGGGCTC TGTCCCTAAC CCCCAGCTAG GTAAGTGGCA AAGCACTTCT
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
98 CAGTGGGACC AAGCACTGAA CCACTTTGGG GAGAATCTCA CACTGGGGCC CTCTGACACC indel tatatatttt tttttttttt tttttttttt tttttttttg agatggtgtc
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
99 AGAGCAGCAG CACCCACTTT TGCAGTGTGT GACGTTGGTG GAGCCATCGA AGTCTGTGCT indel GAGCCCTCCC CAGTGCTCCG AGGAGCTGCT GTTCCCCCTG GAGCTCAGAA
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
100
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
101 - The second output file is a tabular file and represents the motif frequencies in every window of every flanking sequence. The first column on the left represents the names of motifs. The other columns represent the frequencies of motifs in the windows that correspond to the ones in the first output file, as follows::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
102
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
103 dnaPolPauseFrameshift1 2 3 1 0 1 2 indel 0 2 2 1 3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
104 dnaPolPauseFrameshift2 2 3 1 0 1 2 indel 0 2 2 1 3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
105 xSites1 3 2 0 1 1 2 indel 1 1 3 2 3
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
106
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
107 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
108
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
109 </tool>