Mercurial > repos > jpruab > jpr_picard
comparison rgPicardHsMetrics.xml @ 4:f4d018471628 draft default tip
Uploaded
author | jpruab |
---|---|
date | Tue, 13 Aug 2013 12:09:14 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
3:08b477977410 | 4:f4d018471628 |
---|---|
1 <tool name="SAM/BAM Hybrid Selection Metrics" id="PicardHsMetrics" version="1.56.0"> | |
2 <description>for targeted resequencing data</description> | |
3 <command interpreter="python"> | |
4 | |
5 picard_wrapper.py -i "${input_file}" -d "${html_file.files_path}" -t "${html_file}" --datatype "${input_file.ext}" | |
6 --baitbed "${bait_bed}" --targetbed "${target_bed}" -n "${out_prefix}" --tmpdir "${__new_file_path__}" | |
7 -j "\$JAVA_JAR_PATH/CalculateHsMetrics.jar" | |
8 | |
9 </command> | |
10 <requirements><requirement type="package" version="1.56.0">picard</requirement></requirements> | |
11 <inputs> | |
12 <param format="sam,bam" name="input_file" type="data" label="SAM/BAM dataset to generate statistics for" /> | |
13 <param name="out_prefix" value="Picard HS Metrics" type="text" label="Title for the output file" help="Use to remind you what the job was for." size="80" /> | |
14 <param name="bait_bed" type="data" format="bed,interval" label="Bait intervals: Sequences for bait in the design" help="Note specific format requirements below!" size="80" /> | |
15 <param name="target_bed" type="data" format="bed,interval" label="Target intervals: Sequences for targets in the design" help="Note specific format requirements below!" size="80" /> | |
16 <!-- | |
17 | |
18 Users can be enabled to set Java heap size by uncommenting this option and adding '-x "$maxheap"' to the <command> tag. | |
19 If commented out the heapsize defaults to the value specified within picard_wrapper.py | |
20 | |
21 <param name="maxheap" type="select" | |
22 help="If in doubt, try the default. If it fails with a complaint about java heap size, try increasing it please - larger jobs will require your own hardware." | |
23 label="Java heap size"> | |
24 <option value="4G" selected = "true">4GB default </option> | |
25 <option value="8G" >8GB use if 4GB fails</option> | |
26 <option value="16G">16GB - try this if 8GB fails</option> | |
27 </param> | |
28 | |
29 --> | |
30 </inputs> | |
31 <outputs> | |
32 <data format="html" name="html_file" label="${out_prefix}.html" /> | |
33 </outputs> | |
34 <tests> | |
35 <test> | |
36 <!-- Uncomment this if maxheap parameter is enabled | |
37 <param name="maxheap" value="8G" /> | |
38 --> | |
39 <param name="out_prefix" value="HSMetrics" /> | |
40 <param name="input_file" value="picard_input_summary_alignment_stats.sam" ftype="sam" /> | |
41 <param name="bait_bed" value="picard_input_bait.bed" /> | |
42 <param name="target_bed" value="picard_input_bait.bed" /> | |
43 <output name="html_file" file="picard_output_hs_transposed_summary_alignment_stats.html" ftype="html" lines_diff="212"/> | |
44 </test> | |
45 </tests> | |
46 <help> | |
47 | |
48 .. class:: infomark | |
49 | |
50 **Summary** | |
51 | |
52 Calculates a set of Hybrid Selection specific metrics from an aligned SAM or BAM file. | |
53 | |
54 .. class:: warnmark | |
55 | |
56 **WARNING about bait and target files** | |
57 | |
58 Picard is very fussy about the bait and target file format. If these are not exactly right, it will fail with an error something like: | |
59 | |
60 Exception in thread "main" net.sf.picard.PicardException: Invalid interval record contains 6 fields: chr1 45787123 45787316 CASO_22G_25063 1000 + | |
61 | |
62 If you see an error like that from this tool, please do NOT report it to any of the Galaxy mailing lists as it is not a bug! | |
63 It means you must reformat your bait and target files. Galaxy cannot do that for you automatically unfortunately. | |
64 | |
65 The required definition is described in the documentation at http://www.broadinstitute.org/gsa/wiki/index.php/Built-in_command-line_arguments | |
66 and the sample provided looks like this: | |
67 | |
68 chr1 1104841 1104940 + target_1 | |
69 chr1 1105283 1105599 + target_2 | |
70 chr1 1105712 1105860 + target_3 | |
71 chr1 1105960 1106119 + target_4 | |
72 | |
73 So your bait and target files MUST have 5 columns with chr, start, end, strand and name tab delimited and in exactly that order. | |
74 Note that the Picard mandated sam header described in the documentation linked above is automagically added by the tool in Galaxy. | |
75 | |
76 .. class:: infomark | |
77 | |
78 **Picard documentation** | |
79 | |
80 This is a Galaxy wrapper for CalculateHsMetrics.jar, a part of the external package Picard-tools_. | |
81 | |
82 .. _Picard-tools: http://www.google.com/search?q=picard+samtools | |
83 | |
84 ----- | |
85 | |
86 .. class:: infomark | |
87 | |
88 **Inputs, outputs, and parameters** | |
89 | |
90 Picard documentation says (reformatted for Galaxy): | |
91 | |
92 Calculates a set of Hybrid Selection specific metrics from an aligned SAM or BAM file. | |
93 | |
94 .. csv-table:: | |
95 :header-rows: 1 | |
96 | |
97 "Option", "Description" | |
98 "BAIT_INTERVALS=File","An interval list file that contains the locations of the baits used. Required." | |
99 "TARGET_INTERVALS=File","An interval list file that contains the locations of the targets. Required." | |
100 "INPUT=File","An aligned SAM or BAM file. Required." | |
101 "OUTPUT=File","The output file to write the metrics to. Required. Cannot be used in conjuction with option(s) METRICS_FILE (M)" | |
102 "METRICS_FILE=File","Legacy synonym for OUTPUT, should not be used. Required. Cannot be used in conjuction with option(s) OUTPUT (O)" | |
103 "CREATE_MD5_FILE=Boolean","Whether to create an MD5 digest for any BAM files created. Default value: false" | |
104 | |
105 HsMetrics | |
106 | |
107 The set of metrics captured that are specific to a hybrid selection analysis. | |
108 | |
109 Output Column Definitions:: | |
110 | |
111 1. BAIT_SET: The name of the bait set used in the hybrid selection. | |
112 2. GENOME_SIZE: The number of bases in the reference genome used for alignment. | |
113 3. BAIT_TERRITORY: The number of bases which have one or more baits on top of them. | |
114 4. TARGET_TERRITORY: The unique number of target bases in the experiment where target is usually exons etc. | |
115 5. BAIT_DESIGN_EFFICIENCY: Target terrirtoy / bait territory. 1 == perfectly efficient, 0.5 = half of baited bases are not target. | |
116 6. TOTAL_READS: The total number of reads in the SAM or BAM file examine. | |
117 7. PF_READS: The number of reads that pass the vendor's filter. | |
118 8. PF_UNIQUE_READS: The number of PF reads that are not marked as duplicates. | |
119 9. PCT_PF_READS: PF reads / total reads. The percent of reads passing filter. | |
120 10. PCT_PF_UQ_READS: PF Unique Reads / Total Reads. | |
121 11. PF_UQ_READS_ALIGNED: The number of PF unique reads that are aligned with mapping score > 0 to the reference genome. | |
122 12. PCT_PF_UQ_READS_ALIGNED: PF Reads Aligned / PF Reads. | |
123 13. PF_UQ_BASES_ALIGNED: The number of bases in the PF aligned reads that are mapped to a reference base. Accounts for clipping and gaps. | |
124 14. ON_BAIT_BASES: The number of PF aligned bases that mapped to a baited region of the genome. | |
125 15. NEAR_BAIT_BASES: The number of PF aligned bases that mapped to within a fixed interval of a baited region, but not on a baited region. | |
126 16. OFF_BAIT_BASES: The number of PF aligned bases that mapped to neither on or near a bait. | |
127 17. ON_TARGET_BASES: The number of PF aligned bases that mapped to a targetted region of the genome. | |
128 18. PCT_SELECTED_BASES: On+Near Bait Bases / PF Bases Aligned. | |
129 19. PCT_OFF_BAIT: The percentage of aligned PF bases that mapped neither on or near a bait. | |
130 20. ON_BAIT_VS_SELECTED: The percentage of on+near bait bases that are on as opposed to near. | |
131 21. MEAN_BAIT_COVERAGE: The mean coverage of all baits in the experiment. | |
132 22. MEAN_TARGET_COVERAGE: The mean coverage of targets that recieved at least coverage depth = 2 at one base. | |
133 23. PCT_USABLE_BASES_ON_BAIT: The number of aligned, de-duped, on-bait bases out of the PF bases available. | |
134 24. PCT_USABLE_BASES_ON_TARGET: The number of aligned, de-duped, on-target bases out of the PF bases available. | |
135 25. FOLD_ENRICHMENT: The fold by which the baited region has been amplified above genomic background. | |
136 26. ZERO_CVG_TARGETS_PCT: The number of targets that did not reach coverage=2 over any base. | |
137 27. FOLD_80_BASE_PENALTY: The fold over-coverage necessary to raise 80% of bases in "non-zero-cvg" targets to the mean coverage level in those targets. | |
138 28. PCT_TARGET_BASES_2X: The percentage of ALL target bases acheiving 2X or greater coverage. | |
139 29. PCT_TARGET_BASES_10X: The percentage of ALL target bases acheiving 10X or greater coverage. | |
140 30. PCT_TARGET_BASES_20X: The percentage of ALL target bases acheiving 20X or greater coverage. | |
141 31. PCT_TARGET_BASES_30X: The percentage of ALL target bases acheiving 30X or greater coverage. | |
142 32. HS_LIBRARY_SIZE: The estimated number of unique molecules in the selected part of the library. | |
143 33. HS_PENALTY_10X: The "hybrid selection penalty" incurred to get 80% of target bases to 10X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 10X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 10 * HS_PENALTY_10X. | |
144 34. HS_PENALTY_20X: The "hybrid selection penalty" incurred to get 80% of target bases to 20X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 20X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 20 * HS_PENALTY_20X. | |
145 35. HS_PENALTY_30X: The "hybrid selection penalty" incurred to get 80% of target bases to 10X. This metric should be interpreted as: if I have a design with 10 megabases of target, and want to get 30X coverage I need to sequence until PF_ALIGNED_BASES = 10^6 * 30 * HS_PENALTY_30X. | |
146 | |
147 .. class:: warningmark | |
148 | |
149 **Warning on SAM/BAM quality** | |
150 | |
151 Many SAM/BAM files produced externally and uploaded to Galaxy do not fully conform to SAM/BAM specifications. Galaxy deals with this by using the **LENIENT** | |
152 flag when it runs Picard, which allows reads to be discarded if they're empty or don't map. This appears to be the only way to deal with SAM/BAM that cannot be parsed. | |
153 | |
154 | |
155 </help> | |
156 </tool> |