0
|
1 <tool id="microsatellite_birthdeath" name="Identify microsatellite births and deaths" version="1.0.0">
|
|
2 <description> and causal mutational mechanisms from previously identified orthologous microsatellite sets</description>
|
|
3 <command interpreter="perl">
|
|
4 microsatellite_birthdeath.pl
|
|
5 $alignment
|
|
6 $orthfile
|
|
7 $outfile
|
|
8 $species
|
|
9 "$tree_definition"
|
|
10 $thresholds
|
|
11 $separation
|
|
12 $simthresh
|
|
13
|
|
14 </command>
|
|
15 <inputs>
|
|
16 <page>
|
|
17 <param format="maf" name="alignment" type="data" label="Select MAF alignments that have NOT been masked for nucleotide quality"/>
|
|
18
|
|
19 <param format="txt" name="orthfile" type="data" label="Select raw microsatellite data"/>
|
|
20
|
|
21 <param name="species" type="select" label="Select species" display="checkboxes" multiple="true" help="NOTE: Currently users are requested to select one of these three combinations: hg18-panTro2-ponAbe2, hg18-panTro2-ponAbe2-rheMac2 or hg18-panTro2-ponAbe2-rheMac2-calJac1">
|
|
22 <options>
|
|
23 <filter type="data_meta" ref="alignment" key="species" />
|
|
24 </options>
|
|
25 </param>
|
|
26
|
|
27 <param name="tree_definition" size="200" type="text" value= "((((hg18,panTro2),ponAbe2),rheMac2),calJac1)" label="Tree definition of all species above whether or not selected for microsatellite extraction"
|
|
28 help="For example: ((((hg18,panTro2),ponAbe2),rheMac2),calJac1)"/>
|
|
29
|
|
30 <param name="separation" size="10" type="integer" value="40" label="Total length of flanking DNA used for sequence-similarity comparisons among species"
|
|
31 help="A value of 40 means: 20 bp upstream and 20 bp downstream DNA will be used for similarity comparisons."/>
|
|
32
|
|
33 <param name="thresholds" size="15" type="text" value="9,10,12,12" label="Minimum Threshold for the number of repeats for microsatellites"
|
|
34 help="A value of 9,10,12,12 means: All monos having fewer than 9 repeats, dis having fewer than 5 repeats, tris having fewer than 4 repeats, tetras having fewer than 3 repeats will be excluded from the output."/>
|
|
35
|
|
36 <param name="simthresh" size="10" type="integer" value="80" label="Percent sequence similarity of flanking regions (of length same as the above separation distance"
|
|
37 help="Enter a value from 0 to 100"/>
|
|
38
|
|
39
|
|
40 </page>
|
|
41 </inputs>
|
|
42 <outputs>
|
|
43 <data format="txt" name="outfile" metadata_source="orthfile"/>
|
|
44 </outputs>
|
|
45 <tests>
|
|
46 <test>
|
|
47 <param name="alignment" value="regVariation/microsatellite/Galaxy17_unmasked_short.maf.gz"/>
|
|
48 <param name="orthfile" value="regVariation/microsatellite/Galaxy17_short_raw.txt"/>
|
|
49 <param name="thresholds" value="9,10,12,12"/>
|
|
50 <param name="species" value="hg18,panTro2,ponAbe2,rheMac2,calJac1"/>
|
|
51 <param name="tree_definition" value="((((hg18, panTro2), ponAbe2), rheMac2), calJac1)"/>
|
|
52 <param name="separation" value="10"/>
|
|
53 <param name="simthresh" value="85"/>
|
|
54 <output name="outfile" file="regVariation/microsatellite/Galaxy17_unmasked_results.txt"/>
|
|
55 </test>
|
|
56 </tests>
|
|
57
|
|
58
|
|
59 <help>
|
|
60
|
|
61 .. class:: infomark
|
|
62
|
|
63 **What it does**
|
|
64
|
|
65 This tool uses raw orthologous microsatellite clusters (identified by the tool "Extract orthologous microsatellites") to identify microsatellite births and deaths along individual lineages of a phylogenetic tree.
|
|
66 -----
|
|
67
|
|
68 .. class:: warningmark
|
|
69
|
|
70 **Note**
|
|
71
|
|
72 A tab-separated output table (depending on the species being considered) is generated where each row contains all information for a microsatellite locus from multiple species.
|
|
73 The table typically reads like this:
|
|
74
|
|
75 hg18.chr22 16153057 16153074 A 1 ins=,imot:0:tt;dels= ,9:t>c -panTro2 hg18:tttttttttttttttttt,ponAbe2:--tttttttttttttttt,panTro2:-----ttttctttttttt
|
|
76
|
|
77 hg18.chr22 16131711 16131722 ATGC 4 NA ,2:C>T +ponAbe2 hg18:CACGCATGCATG,ponAbe2:CATGCATGCATG,panTro2:CACGCATGCATG,rheMac2:CACGCGTGCATG
|
|
78
|
|
79 Where columns list the following:
|
|
80
|
|
81 1: Chromosome/scaffold/contig of one of the species. The species chosen is the first species readable in the Newick tree submitted by the user.
|
|
82
|
|
83 2: Start coordinate
|
|
84
|
|
85 3: End coordinate
|
|
86
|
|
87 4: Motif of microsatellite
|
|
88
|
|
89 5: Motif size
|
|
90
|
|
91 6: Insertion and deletion details. Insertions are separated from deletions by a ";", and individual insertions and deletions are separated from others by a comma. For the purpose of illustration, consider the first row listed above:
|
|
92 "imot:0:tt", where imot/imotf again suggest insertion, the number indicates position of insertion within the microsatellite's alignment, and this is followed by identity of nucleotides that are inserted.
|
|
93
|
|
94 7: Substitution details. Individual substitutions are separated by commas. Each entry contains the position of substitution event in the microsatellites' alignment, and the nature of substitution.
|
|
95
|
|
96 8: Inference of birth/death event. Births are indicated by "+", and deaths by "-". Events such as "-hg18:panTro2" suggest death in the common ancestor of hg18 and panTro2, whereas events such as "-hg18.panTro2" indicate parallel, independent death events along the two lineages. Alternative interpretations of the event may also be listed, following a "/", such as:
|
|
97 "+hg18.+panTro2 / +hg18:panTro2"
|
|
98
|
|
99 9: Actual sequences in the alignment, separated by commas.
|
|
100
|
|
101 </help>
|
|
102
|
|
103
|
|
104 </tool>
|