comparison shm_csr.xml @ 15:61d0a6318711 draft

Uploaded
author davidvanzessen
date Thu, 17 Nov 2016 07:33:21 -0500
parents 59765d2c8890
children 949a30f04d9b
comparison
equal deleted inserted replaced
14:59765d2c8890 15:61d0a6318711
73 <citations> 73 <citations>
74 <citation type="doi">10.1093/nar/gks457</citation> 74 <citation type="doi">10.1093/nar/gks457</citation>
75 <citation type="doi">10.1093/bioinformatics/btv359</citation> 75 <citation type="doi">10.1093/bioinformatics/btv359</citation>
76 </citations> 76 </citations>
77 <help> 77 <help>
78 Takes an IMGT zip (http://www.imgt.org/HighV-QUEST/search.action) file and creates a summarization of the mutation analysis. 78 <![CDATA[
79 79 **References**
80 +--------------------------+ 80
81 | unique filter | 81 Yaari, G. and Uduman, M. and Kleinstein, S. H. (2012). Quantifying selection in high-throughput Immunoglobulin sequencing data sets. In *Nucleic Acids Research, 40 (17), pp. e134–e134.* [`doi:10.1093/nar/gks457`_]
82 +--------+--------+--------+ 82
83 | values | remove | keep | 83 .. _doi:10.1093/nar/gks457: http://dx.doi.org/10.1093/nar/gks457
84 +--------+--------+--------+ 84
85 | A | A | A | 85 Gupta, Namita T. and Vander Heiden, Jason A. and Uduman, Mohamed and Gadala-Maria, Daniel and Yaari, Gur and Kleinstein, Steven H. (2015). Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data: Table 1. *In Bioinformatics, 31 (20), pp. 3356–3358.* [`doi:10.1093/bioinformatics/btv359`_]
86 +--------+--------+--------+ 86
87 | A | B | B | 87 .. _doi:10.1093/bioinformatics/btv359: http://dx.doi.org/10.1093/bioinformatics/btv359
88 +--------+--------+--------+ 88
89 | B | D | C | 89 -----
90 +--------+--------+--------+ 90
91 | B | | D | 91 **Input files**
92 +--------+--------+--------+ 92
93 | C | | | 93 IMGT/HighV-QUEST .zip and .txz are accepted as input files.
94 +--------+--------+--------+ 94
95 | D | | | 95 .. class:: infomark
96 +--------+--------+--------+ 96
97 | D | | | 97 Note: Files can be uploaded by using “get data” and “upload file” and selecting “IMGT archive“ as a file type.
98 +--------+--------+--------+ 98
99 99 -----
100
101 **Sequence starts at**
102
103 Identifies the region which will be included in the analysis (analysed region)
104
105 - Sequences which are missing a gene region (FR1/CDR1 etc) in the analysed region are excluded
106 - Sequences containing an ambiguous base in the analysed region are excluded
107 - All other filtering/analysis is based on the analysed region
108
109 -----
110
111 **Functionality filter**
112
113 Allows filtering on productive rearrangement, unproductive rearrangements or both based on the assignment provided by IMGT.
114
115 **Filter unique sequences**
116
117 *Remove unique:*
118
119 This filter consists of two different steps.
120
121 Step 1: removes all sequences of which the nucleotide sequence in the “analysed region” (see sequence starts at filter) occurs only once. (Sub)classes are not taken into account in this filter step.
122
123 Step 2: removes all duplicate sequences (sequences with the exact same nucleotide sequence in the analysed region and the same (sub)class).
124
125 .. class:: infomark
126
127 Note: This means that sequences with the same nucleotide sequence but a different (sub)class will be included in the results of both (sub)classes.
128
129 *Keep unique:*
130
131 Removes all duplicate sequences (sequences with the exact same nucleotide sequence in the analysed region and the same (sub)class).
132
133 Example of the sequences that are included using either the “remove unique filter” or the “keep unique filter”
134
135 +--------------------------+
136 | unique filter |
137 +--------+--------+--------+
138 | values | remove | keep |
139 +--------+--------+--------+
140 | A | A | A |
141 +--------+--------+--------+
142 | A | B | B |
143 +--------+--------+--------+
144 | B | D | C |
145 +--------+--------+--------+
146 | B | | D |
147 +--------+--------+--------+
148 | C | | |
149 +--------+--------+--------+
150 | D | | |
151 +--------+--------+--------+
152 | D | | |
153 +--------+--------+--------+
154
155 -----
156
157 **Remove duplicates based on**
158
159 Allows the selection of a single sequence per clone. Different definitions of a clone can be chosen.
160
161 .. class:: infomark
162
163 Note: The first sequence (in the data set) of each clone is always included in the analysis. When the first matched sequence is unmatched (no subclass assigned) the first matched sequence will be included. This means that altering the data order (by for instance sorting) can change the sequence which is included in the analysis and therefore slightly influence results.
164
165 -----
166
167 **Human Class/Subclass filter**
168
169 .. class:: warningmark
170
171 Note: This filter should only be applied when analysing human IGH data in which a (sub)class specific sequence is present. Otherwise please select the "do not assign (sub)class" option to prevent errors when running the pipeline.
172
173 The class percentage is based on the ‘chunk hit percentage’ (see below). The subclass percentage is based on the ‘nt hit percentage’ (see below).
174
175 The SHM & CSR pipeline identifies human Cµ, Cα, Cγ and Cε constant genes by dividing the reference sequences for the subclasses (NG_001019) in 8 nucleotide chunks which overlap by 4 nucleotides. These overlapping chunks are then individually aligned in the right order to each input sequence. This alignment is used to calculate the chunck hit percentage and the nt hit percentage.
176
177 *Chunk hit percentage*: the percentage of the chunks that is aligned
178
179 *Nt hit percentage*: The percentage of chunks covering the subclass specific nucleotide match with the different subclasses. The most stringent filter for the subclass is 70% ‘nt hit percentage’ which means that 5 out of 7 subclass specific nucleotides for Cα or 6 out of 8 subclass specific nucleotides of Cγ should match with the specific subclass.
180
181 -----
182
183 **Output new IMGT archives per class into your history?**
184
185 If yes is selected, additional output files (one for each class) will be added to the history which contain information of the sequences that passed the selected filtering criteria. These files are in the same format as the IMGT/HighV-QUEST output files and therefore are also compatible with many other analysis programs, such as IGGalaxy.
186
187 ]]>
100 </help> 188 </help>
101 </tool> 189 </tool>