annotate tools/picard/rgPicardHsMetrics.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool name="SAM/BAM Hybrid Selection Metrics" id="PicardHsMetrics" version="0.01">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>for targeted resequencing data</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3 <command interpreter="python">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 picard_wrapper.py -i "$input_file" -d "$html_file.files_path" -t "$html_file" --datatype "$input_file.ext"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 --baitbed "$bait_bed" --targetbed "$target_bed" -n "$out_prefix" --tmpdir "${__new_file_path__}"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 -j "${GALAXY_DATA_INDEX_DIR}/shared/jars/CalculateHsMetrics.jar"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9 </command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 <requirements><requirement type="package">picard</requirement></requirements>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 <param format="sam,bam" name="input_file" type="data" label="SAM/BAM dataset to generate statistics for" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 <param name="out_prefix" value="Picard HS Metrics" type="text" label="Title for the output file" help="Use to remind you what the job was for." size="80" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14 <param name="bait_bed" type="data" format="interval" label="Bait intervals: Sequences for bait in the design" help="In UCSC BED format" size="80" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 <param name="target_bed" type="data" format="interval" label="Target intervals: Sequences for targets in the design" help="In UCSC BED format" size="80" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 <!--
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18 Users can be enabled to set Java heap size by uncommenting this option and adding '-x "$maxheap"' to the <command> tag.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 If commented out the heapsize defaults to the value specified within picard_wrapper.py
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 <param name="maxheap" type="select"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 help="If in doubt, try the default. If it fails with a complaint about java heap size, try increasing it please - larger jobs will require your own hardware."
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 label="Java heap size">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24 <option value="4G" selected = "true">4GB default </option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 <option value="8G" >8GB use if 4GB fails</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 <option value="16G">16GB - try this if 8GB fails</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29 -->
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32 <data format="html" name="html_file" label="${out_prefix}.html" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36 <param name="out_prefix" value="HSMetrics" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37 <param name="input_file" value="picard_input_summary_alignment_stats.sam" ftype="sam" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38 <param name="bait_bed" value="picard_input_bait.bed" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39 <param name="target_bed" value="picard_input_bait.bed" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40 <param name="maxheap" value="8G" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41 <output name="html_file" file="picard_output_hs_transposed_summary_alignment_stats.html" ftype="html" lines_diff="212"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48 **Summary**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50 Calculates a set of Hybrid Selection specific metrics from an aligned SAM or BAM file.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52 **Picard documentation**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 This is a Galaxy wrapper for CollectAlignmentSummaryMetrics, a part of the external package Picard-tools_.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 .. _Picard-tools: http://www.google.com/search?q=picard+samtools
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62 **Inputs, outputs, and parameters**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 Picard documentation says (reformatted for Galaxy):
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66 Calculates a set of Hybrid Selection specific metrics from an aligned SAM or BAM file.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68 .. csv-table::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69 :header-rows: 1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71 "Option", "Description"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72 "BAIT_INTERVALS=File","An interval list file that contains the locations of the baits used. Required."
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 "TARGET_INTERVALS=File","An interval list file that contains the locations of the targets. Required."
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74 "INPUT=File","An aligned SAM or BAM file. Required."
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75 "OUTPUT=File","The output file to write the metrics to. Required. Cannot be used in conjuction with option(s) METRICS_FILE (M)"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 "METRICS_FILE=File","Legacy synonym for OUTPUT, should not be used. Required. Cannot be used in conjuction with option(s) OUTPUT (O)"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77 "CREATE_MD5_FILE=Boolean","Whether to create an MD5 digest for any BAM files created. Default value: false"
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
78
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
79 HsMetrics
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
80
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
81 The set of metrics captured that are specific to a hybrid selection analysis.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
82
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
83 Output Column Definitions::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
84
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
85 1. BAIT_SET: The name of the bait set used in the hybrid selection.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
86 2. GENOME_SIZE: The number of bases in the reference genome used for alignment.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
87 3. BAIT_TERRITORY: The number of bases which have one or more baits on top of them.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
88 4. TARGET_TERRITORY: The unique number of target bases in the experiment where target is usually exons etc.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
89 5. BAIT_DESIGN_EFFICIENCY: Target terrirtoy / bait territory. 1 == perfectly efficient, 0.5 = half of baited bases are not target.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
90 6. TOTAL_READS: The total number of reads in the SAM or BAM file examine.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
91 7. PF_READS: The number of reads that pass the vendor's filter.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
92 8. PF_UNIQUE_READS: The number of PF reads that are not marked as duplicates.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
93 9. PCT_PF_READS: PF reads / total reads. The percent of reads passing filter.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
94 10. PCT_PF_UQ_READS: PF Unique Reads / Total Reads.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
95 11. PF_UQ_READS_ALIGNED: The number of PF unique reads that are aligned with mapping score > 0 to the reference genome.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
96 12. PCT_PF_UQ_READS_ALIGNED: PF Reads Aligned / PF Reads.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
97 13. PF_UQ_BASES_ALIGNED: The number of bases in the PF aligned reads that are mapped to a reference base. Accounts for clipping and gaps.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
98 14. ON_BAIT_BASES: The number of PF aligned bases that mapped to a baited region of the genome.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
99 15. NEAR_BAIT_BASES: The number of PF aligned bases that mapped to within a fixed interval of a baited region, but not on a baited region.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
100 16. OFF_BAIT_BASES: The number of PF aligned bases that mapped to neither on or near a bait.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
101 17. ON_TARGET_BASES: The number of PF aligned bases that mapped to a targetted region of the genome.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
102 18. PCT_SELECTED_BASES: On+Near Bait Bases / PF Bases Aligned.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
103 19. PCT_OFF_BAIT: The percentage of aligned PF bases that mapped neither on or near a bait.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
104 20. ON_BAIT_VS_SELECTED: The percentage of on+near bait bases that are on as opposed to near.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
105 21. MEAN_BAIT_COVERAGE: The mean coverage of all baits in the experiment.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
106 22. MEAN_TARGET_COVERAGE: The mean coverage of targets that recieved at least coverage depth = 2 at one base.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
107 23. PCT_USABLE_BASES_ON_BAIT: The number of aligned, de-duped, on-bait bases out of the PF bases available.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
108 24. PCT_USABLE_BASES_ON_TARGET: The number of aligned, de-duped, on-target bases out of the PF bases available.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
109 25. FOLD_ENRICHMENT: The fold by which the baited region has been amplified above genomic background.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
110 26. ZERO_CVG_TARGETS_PCT: The number of targets that did not reach coverage=2 over any base.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
111 27. FOLD_80_BASE_PENALTY: The fold over-coverage necessary to raise 80% of bases in "non-zero-cvg" targets to the mean coverage level in those targets.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
112 28. PCT_TARGET_BASES_2X: The percentage of ALL target bases acheiving 2X or greater coverage.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
113 29. PCT_TARGET_BASES_10X: The percentage of ALL target bases acheiving 10X or greater coverage.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
114 30. PCT_TARGET_BASES_20X: The percentage of ALL target bases acheiving 20X or greater coverage.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
115 31. PCT_TARGET_BASES_30X: The percentage of ALL target bases acheiving 30X or greater coverage.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
116 32. HS_LIBRARY_SIZE: The estimated number of unique molecules in the selected part of the library.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
117 33. HS_PENALTY_10X: The "hybrid selection penalty" incurred to get 80% of target bases to 10X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 10X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 10 * HS_PENALTY_10X.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
118 34. HS_PENALTY_20X: The "hybrid selection penalty" incurred to get 80% of target bases to 20X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 20X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 20 * HS_PENALTY_20X.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
119 35. HS_PENALTY_30X: The "hybrid selection penalty" incurred to get 80% of target bases to 10X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 30X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 30 * HS_PENALTY_30X.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
120
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
121 .. class:: warningmark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
122
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
123 **Warning on SAM/BAM quality**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
124
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
125 Many SAM/BAM files produced externally and uploaded to Galaxy do not fully conform to SAM/BAM specifications. Galaxy deals with this by using the **LENIENT**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
126 flag when it runs Picard, which allows reads to be discarded if they're empty or don't map. This appears
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
127 to be the only way to deal with SAM/BAM that cannot be parsed.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
128
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
129
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
130 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
131 </tool>