Mercurial > repos > devteam > bwa
comparison bwa_macros.xml @ 2:e29bc5c169bc draft
Uploaded
author | devteam |
---|---|
date | Fri, 20 Mar 2015 12:09:08 -0400 |
parents | ff1ae217ccc2 |
children | ac30bfd3e2a8 |
comparison
equal
deleted
inserted
replaced
1:c71dd035971e | 2:e29bc5c169bc |
---|---|
1 <macros> | 1 <macros> |
2 | |
3 <token name="@set_rg_string@"> | |
4 #set $rg_string = "@RG\tID:" + str($rg.ID) + "\tSM:" + str($rg.SM) + "\tPL:" + str($rg.PL) | |
5 #if $rg.LB | |
6 #set $rg_string += "\tLB:$rg.LB" | |
7 #end if | |
8 #if $rg.CN | |
9 #set $rg_string += "\tCN:$rg.CN" | |
10 #end if | |
11 #if $rg.DS | |
12 #set $rg_string += "\tDS:$rg.DS" | |
13 #end if | |
14 #if $rg.DT | |
15 #set $rg_string += "\tDT:$rg.DT" | |
16 #end if | |
17 #if $rg.FO | |
18 #set $rg_string += "\tFO:$rg.FO" | |
19 #end if | |
20 #if $rg.KS | |
21 #set $rg_string += "\tKS:$rg.KS" | |
22 #end if | |
23 #if $rg.PG | |
24 #set $rg_string += "\tPG:$rg.PG" | |
25 #end if | |
26 #if str($rg.PI) | |
27 #set $rg_string += "\tPI:$rg.PI" | |
28 #end if | |
29 #if $rg.PU | |
30 #set $rg_string += "\tPU:$rg.PU" | |
31 #end if | |
32 </token> | |
2 | 33 |
3 <token name="@RG@"> | 34 <token name="@RG@"> |
4 ----- | 35 ----- |
5 | 36 |
6 .. class:: warningmark | 37 .. class:: warningmark |
7 | 38 |
8 **Read Groups are Important!** | 39 **Read Groups are Important!** |
9 | 40 |
10 One of the recommended best practices in NGS analysis is adding read group information to BAM files. You can do thid directly in BWA interface using the | 41 One of the recommended best practices in NGS analysis is adding read group information to BAM files. You can do thid directly in BWA interface using the |
11 **Specify readgroup information?** widget. If you are not familiar with readgroups you shold know that this is effectively a way to tag reads with an additional ID. | 42 **Specify read group information?** widget. If you are not familiar with read groups you shold know that this is effectively a way to tag reads with an additional ID. |
12 This allows you to combine BAM files from, for example, multiple BWA runs into a single dataset. This significantly simplifies downstream processing as | 43 This allows you to combine BAM files from, for example, multiple BWA runs into a single dataset. This significantly simplifies downstream processing as |
13 instead of dealing with multiple datasets you only have to handle only one. This is possible because the readgroup information allows you to identify | 44 instead of dealing with multiple datasets you only have to handle only one. This is possible because the read group information allows you to identify |
14 data from different experiments even if they are combined in one file. Many downstream analysis tools such as varinat callers (e.g., FreeBayes or Naive Varinat Caller | 45 data from different experiments even if they are combined in one file. Many downstream analysis tools such as varinat callers (e.g., FreeBayes or Naive Varinat Caller |
15 present in Galaxy) are aware of readgtroups and will automatically generate calls for each individual sample even if they are combined within a single file. | 46 present in Galaxy) are aware of readgtroups and will automatically generate calls for each individual sample even if they are combined within a single file. |
16 | 47 |
17 **Description of read groups fields** | 48 **Description of read groups fields** |
18 | 49 |
49 @RG ID:FLOWCELL2.LANE2 PL:illumina LB:LIB-KID-1 SM:KID PI:200 | 80 @RG ID:FLOWCELL2.LANE2 PL:illumina LB:LIB-KID-1 SM:KID PI:200 |
50 @RG ID:FLOWCELL2.LANE3 PL:illumina LB:LIB-KID-2 SM:KID PI:400 | 81 @RG ID:FLOWCELL2.LANE3 PL:illumina LB:LIB-KID-2 SM:KID PI:400 |
51 @RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400 | 82 @RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400 |
52 | 83 |
53 Note the hierarchical relationship between read groups (unique for each lane) to libraries (sequenced on two lanes) and samples (across four lanes, two lanes for each library). | 84 Note the hierarchical relationship between read groups (unique for each lane) to libraries (sequenced on two lanes) and samples (across four lanes, two lanes for each library). |
54 </token> | 85 </token> |
55 <token name="@info@"> | 86 <token name="@info@"> |
56 ----- | 87 ----- |
57 | 88 |
58 .. class:: infomark | 89 .. class:: infomark |
59 | 90 |
60 **More info** | 91 **More info** |
64 1. https://biostar.usegalaxy.org/ | 95 1. https://biostar.usegalaxy.org/ |
65 2. https://www.biostars.org/ | 96 2. https://www.biostars.org/ |
66 3. https://github.com/lh3/bwa | 97 3. https://github.com/lh3/bwa |
67 4. http://bio-bwa.sourceforge.net/ | 98 4. http://bio-bwa.sourceforge.net/ |
68 | 99 |
69 </token> | 100 </token> |
70 | 101 |
71 <token name="@dataset_collections@"> | 102 <token name="@dataset_collections@"> |
72 ------ | 103 ------ |
73 | 104 |
74 **Dataset collections - processing large numbers of datasets at once** | 105 **Dataset collections - processing large numbers of datasets at once** |
75 | 106 |
76 This will be added shortly | 107 This will be added shortly |
77 | 108 |
78 | 109 |
79 </token> | 110 </token> |
80 | 111 <xml name="readgroup_params"> |
112 <conditional name="rg"> | |
113 <param name="rg_selector" type="select" label="Set read groups information?" help="-R; Specifying read group information can greatly simplify your downstream analyses by allowing combining multiple datasets. See help below for more details"> | |
114 <option value="set">Set</option> | |
115 <option value="do_not_set" selected="True">Do not set</option> | |
116 </param> | |
117 <when value="set"> | |
118 <param name="ID" type="text" value="" size="20" label="Read group identifier (ID)" help="This value must be unique among multiple samples in your experiment"> | |
119 <validator type="empty_field" /> | |
120 </param> | |
121 <param name="SM" type="text" value="" size="20" label="Read group sample name (SM)" help="This value should be descriptive. Use pool name where a pool is being sequenced" /> | |
122 <param name="PL" type="select" label="Platform/technology used to produce the reads (PL)"> | |
123 <option value="CAPILLARY">CAPILLARY</option> | |
124 <option value="LS454">LS454</option> | |
125 <option value="ILLUMINA">ILLUMINA</option> | |
126 <option value="SOLID">SOLID</option> | |
127 <option value="HELICOS">HELICOS</option> | |
128 <option value="IONTORRENT">IONTORRENT</option> | |
129 <option value="PACBIO">PACBIO</option> | |
130 </param> | |
131 <param name="LB" type="text" size="25" label="Library name (LB)" /> | |
132 <param name="CN" type="text" size="25" label="Sequencing center that produced the read (CN)" /> | |
133 <param name="DS" type="text" size="25" label="Description (DS)" /> | |
134 <param name="DT" type="text" size="25" label="Date that run was produced (DT)" help="ISO8601 format date or date/time, like YYYY-MM-DD" /> | |
135 <param name="FO" type="text" size="25" optional="true" label="Flow order (FO)" help="The array of nucleotide bases that correspond to the nucleotides used for each flow of each read. Multi-base flows are encoded in IUPAC format, and non-nucleotide flows by various other characters. Format: /\*|[ACMGRSVTWYHKDBN]+/"> | |
136 <validator type="regex" message="Invalid flow order">\*|[ACMGRSVTWYHKDBN]+$</validator> | |
137 </param> | |
138 <param name="KS" type="text" size="25" label="The array of nucleotide bases that correspond to the key sequence of each read (KS)" /> | |
139 <param name="PG" type="text" size="25" label="Programs used for processing the read group (PG)" /> | |
140 <param name="PI" type="integer" optional="true" label="Predicted median insert size (PI)" /> | |
141 <param name="PU" type="text" size="25" label="Platform unit (PU)" help="Unique identifier (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD)" /> | |
142 </when> | |
143 <when value="do_not_set"> | |
144 <!-- do nothing --> | |
145 </when> | |
146 </conditional> | |
147 </xml> | |
81 | 148 |
82 </macros> | 149 </macros> |