comparison shm_csr.xml @ 39:a24f8c93583a draft

Uploaded
author davidvanzessen
date Thu, 22 Dec 2016 09:39:27 -0500
parents 96c1276ceefe
children 64711f461c8e
comparison
equal deleted inserted replaced
38:05c62efdc393 39:a24f8c93583a
94 94
95 ----- 95 -----
96 96
97 **Input files** 97 **Input files**
98 98
99 IMGT/HighV-QUEST .zip and .txz are accepted as input files. 99 IMGT/HighV-QUEST .zip and .txz are accepted as input files. The file to be analysed can be selected using the dropdown menu.
100 100
101 .. class:: infomark 101 .. class:: infomark
102 102
103 Note: Files can be uploaded by using “get data” and “upload file” and selecting “IMGT archive“ as a file type. 103 Note: Files can be uploaded by using “get data” and “upload file” and selecting “IMGT archive“ as a file type. Special characters should be prevented in the file names of the uploaded samples as these can give errors when running the immune repertoire pipeline. Underscores are allowed in the file names.
104 104
105 ----- 105 -----
106 106
107 **Sequence starts at** 107 **Sequence starts at**
108 108
109 Identifies the region which will be included in the analysis (analysed region) 109 Identifies the region which will be included in the analysis (analysed region)
110 110
111 - Sequences which are missing a gene region (FR1/CDR1 etc) in the analysed region are excluded 111 - Sequences which are missing a gene region (FR1/CDR1 etc) in the analysed region are excluded.
112 - Sequences containing an ambiguous base in the analysed region are excluded 112 - Sequences containing an ambiguous base in the analysed region or the CDR3 are excluded.
113 - All other filtering/analysis is based on the analysed region 113 - All other filtering/analysis is based on the analysed region.
114 114
115 ----- 115 -----
116 116
117 **Functionality filter** 117 **Functionality filter**
118 118
119 Allows filtering on productive rearrangement, unproductive rearrangements or both based on the assignment provided by IMGT. 119 Allows filtering on productive rearrangements, unproductive rearrangements or both based on the assignment provided by IMGT.
120 120
121 **Filter unique sequences** 121 **Filter unique sequences**
122 122
123 *Remove unique:* 123 *Remove unique:*
124 124
125 125
126 This filter consists of two different steps. 126 This filter consists of two different steps.
127 127
128 Step 1: removes all sequences of which the nucleotide sequence in the “analysed region” (see sequence starts at filter) occurs only once. (Sub)classes are not taken into account in this filter step. 128 Step 1: removes all sequences of which the nucleotide sequence in the “analysed region” and the CDR3 (see sequence starts at filter) occurs only once. (Sub)classes are not taken into account in this filter step.
129 129
130 Step 2: removes all duplicate sequences (sequences with the exact same nucleotide sequence in the analysed region and the same (sub)class). 130 Step 2: removes all duplicate sequences (sequences with the exact same nucleotide sequence in the analysed region, the CDR3 and the same (sub)class).
131 131
132 .. class:: infomark 132 .. class:: infomark
133 133
134 Note: This means that sequences with the same nucleotide sequence but a different (sub)class will be included in the results of both (sub)classes. 134 This means that sequences with the same nucleotide sequence but a different (sub)class will be included in the results of both (sub)classes.
135 135
136 *Keep unique:* 136 *Keep unique:*
137 137
138 Removes all duplicate sequences (sequences with the exact same nucleotide sequence in the analysed region and the same (sub)class). 138 Removes all duplicate sequences (sequences with the exact same nucleotide sequence in the analysed region and the same (sub)class).
139 139
165 165
166 Allows the selection of a single sequence per clone. Different definitions of a clone can be chosen. 166 Allows the selection of a single sequence per clone. Different definitions of a clone can be chosen.
167 167
168 .. class:: infomark 168 .. class:: infomark
169 169
170 Note: The first sequence (in the data set) of each clone is always included in the analysis. When the first matched sequence is unmatched (no subclass assigned) the first matched sequence will be included. This means that altering the data order (by for instance sorting) can change the sequence which is included in the analysis and therefore slightly influence results. 170 Note: The first sequence (in the data set) of each clone is always included in the analysis. When the first matched sequence is unmatched (no subclass assigned) the first matched sequence will be included. This means that altering the data order (by for instance sorting) can change the sequence which is included in the analysis and therefore slightly influences the results.
171 171
172 ----- 172 -----
173 173
174 **Human Class/Subclass filter** 174 **Human Class/Subclass filter**
175 175
176 .. class:: warningmark 176 .. class:: warningmark
177 177
178 Note: This filter should only be applied when analysing human IGH data in which a (sub)class specific sequence is present. Otherwise please select the "do not assign (sub)class" option to prevent errors when running the pipeline. 178 Note: This filter should only be applied when analysing human IGH data in which a (sub)class specific sequence is present. Otherwise please select the do not assign (sub)class option to prevent errors when running the pipeline.
179 179
180 The class percentage is based on the ‘chunk hit percentage’ (see below). The subclass percentage is based on the ‘nt hit percentage’ (see below). 180 The class percentage is based on the ‘chunk hit percentage’ (see below). The subclass percentage is based on the ‘nt hit percentage’ (see below).
181 181
182 The SHM & CSR pipeline identifies human Cµ, Cα, Cγ and Cε constant genes by dividing the reference sequences for the subclasses (NG_001019) in 8 nucleotide chunks which overlap by 4 nucleotides. These overlapping chunks are then individually aligned in the right order to each input sequence. This alignment is used to calculate the chunck hit percentage and the nt hit percentage. 182 The SHM & CSR pipeline identifies human Cµ, Cα, Cγ and Cε constant genes by dividing the reference sequences for the subclasses (NG_001019) in 8 nucleotide chunks which overlap by 4 nucleotides. These overlapping chunks are then individually aligned in the right order to each input sequence. This alignment is used to calculate the chunck hit percentage and the nt hit percentage.
183 183
184 *Chunk hit percentage*: the percentage of the chunks that is aligned 184 *Chunk hit percentage*: The percentage of the chunks that is aligned
185 185
186 *Nt hit percentage*: The percentage of chunks covering the subclass specific nucleotide match with the different subclasses. The most stringent filter for the subclass is 70% ‘nt hit percentage’ which means that 5 out of 7 subclass specific nucleotides for Cα or 6 out of 8 subclass specific nucleotides of Cγ should match with the specific subclass. 186 *Nt hit percentage*: The percentage of chunks covering the subclass specific nucleotide match with the different subclasses. The most stringent filter for the subclass is 70% ‘nt hit percentage’ which means that 5 out of 7 subclass specific nucleotides for Cα or 6 out of 8 subclass specific nucleotides of Cγ should match with the specific subclass.
187 187
188 ----- 188 -----
189 189
190 **Output new IMGT archives per class into your history?** 190 **Output new IMGT archives per class into your history?**
191 191
192 If yes is selected, additional output files (one for each class) will be added to the history which contain information of the sequences that passed the selected filtering criteria. These files are in the same format as the IMGT/HighV-QUEST output files and therefore are also compatible with many other analysis programs, such as IGGalaxy. 192 If yes is selected, additional output files (one for each class) will be added to the history which contain information of the sequences that passed the selected filtering criteria. These files are in the same format as the IMGT/HighV-QUEST output files and therefore are also compatible with many other analysis programs, such as the Immune repertoire pipeline.
193
194 -----
195
196 **Execute**
197
198 Upon pressing execute a new analysis is added to your history (right side of the page). Initially this analysis will be grey, after initiating the analysis colour of the analysis in the history will change to yellow. When the analysis is finished it will turn green in the history. Now the analysis can be opened by clicking on the eye icon on the analysis of interest. When an analysis turns red an error has occurred when running the analysis. If you click on the analysis title additional information can be found on the analysis. In addition a bug icon appears. Here more information on the error can be found.
193 199
194 ]]> 200 ]]>
195 </help> 201 </help>
196 </tool> 202 </tool>