annotate deletion_predictor.xml @ 0:3771f6c914bc draft

Imported from capsule None
author wolma
date Sat, 13 Dec 2014 17:20:21 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
1 <tool id="deletion_prediction" name="Deletion Prediction for paired-end data">
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
2 <description>Predicts deletions in one or more aligned read samples based on coverage of the reference genome and on insert sizes</description>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
3 <requirements>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
4 <requirement type="package" version="0.1.5">mimodd</requirement>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
5 </requirements>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
6 <version_command>mimodd version -q</version_command>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
7 <command>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
8 mimodd delcall
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
9 #for $l in $list_input
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
10 ${l.bamfile}
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
11 #end for
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
12 $covfile -o $outputfile
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
13 --max-cov $max_cov --min-size $min_size $include_uncovered $group_by_id --verbose
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
14 </command>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
15
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
16 <inputs>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
17 <repeat name="list_input" title="Aligned reads input source" default="1" min="1">
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
18 <param name="bamfile" type="data" format="bam" label="input BAM file" />
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
19 </repeat>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
20 <param name="covfile" type="data" format="bcf" label="BCF variant call file to extract coverage from" help="Use the Variant Calling tool to generate this file."/>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
21 <param name="group_by_id" type="boolean" label="group reads based on read group id only" truevalue="-i" falsevalue="" checked="true" help="If selected, reads from different read groups will be treated strictly separate. If turned off, read groups with identical sample names are used together for identifying uncovered regions, but are still treated separately for the prediction of deletions." />
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
22 <param name="include_uncovered" type="boolean" label="include low-coverage regions" truevalue="-u" falsevalue="" checked="true" help="If selected, regions that fulfill the coverage criteria below, but are not statistically significant deletions, will be included in the output." />
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
23 <param name="max_cov" type="integer" value="0" label="maximal coverage allowed inside a low-coverage region (default: 0)" help="The maximal coverage at a site allowed to consider it as part of a low-coverage region" />
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
24 <param name="min_size" type="integer" value="100" label="minimal deletion size (default: 100)" help="A low-coverage region must consist of at least this number of consecutive bases below the maximal coverage to consider it in further analyses."/>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
25 </inputs>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
26
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
27 <outputs>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
28 <data name="outputfile" format="gff" />
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
29 </outputs>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
30
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
31 <help>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
32 .. class:: infomark
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
33
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
34 **What it does**
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
35
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
36 The tool predicts deletions from paired-end data in a two-step process:
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
37
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
38 1) It finds regions of low-coverage, i.e., candidate regions for deletions, by scanning a BCF file produced by the *Variant Calling* tool.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
39
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
40 The *maximal coverage allowed inside a low-coverage region* and the *minimal deletion size* parameters are used at this step to define what is considered a low-coverage region.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
41
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
42 .. class:: warningmark
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
43
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
44 The tool treats genome positions missing from the BCF input as zero coverage, so it is safe to use ONLY with BCF files produced by the *Variant Calling* tool or through other commands that keep the information for all sites.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
45
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
46 2) It assesses every low-coverage region statistically for evidence of it being a real deletion. **This step requires paired-end data** since it relies on shifts in the distribution of read pair insert sizes around real deletions.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
47
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
48 By default, the tool only reports Deletions, i.e., the subset of low-coverage regions that pass the statistical test.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
49 If *include low-coverage regions* is selected, regions that failed the test will also be reported.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
50
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
51 With *group reads based on read group id only* selected, as it is by default, grouping of reads into samples is done strictly based on their read group IDs.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
52 With the option deselected, grouping is done based on sample names in the first step of the analysis, i.e. the reads of all samples with a shared sample name are used to identify low-coverage regions.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
53 In the second step, however, reads will be regrouped by their read group IDs again, i.e. the statistical assessment for real deletions is always done on a per read group basis.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
54
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
55 **TIP:**
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
56 Deselecting *group reads based on read group id only* can be useful, for example, if you have both paired-end and single-end sequencing data for the same sample.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
57
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
58 In this case, the two sets of reads will usually share a common sample name, but differ in their read groups.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
59 With grouping based on sample names, the single-end data can be used together with the paired-end data to identify low-coverage regions, thus increasing overall coverage and reliability of this step.
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
60 Still, the assessment of deletions will use only the paired-end data (auto-detecting that the single-end reads do not provide insert size information).
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
61
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
62 </help>
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
63
3771f6c914bc Imported from capsule None
wolma
parents:
diff changeset
64 </tool>