comparison immuneml_train_repert.xml @ 6:2d3dd9ff7e84 draft

"planemo upload commit 74f2bd15d2b7723c8e5a22d743913706dc7d8333-dirty"
author immuneml
date Tue, 27 Jul 2021 09:30:50 +0000
parents ed3932e6d616
children 45ca02982e1f
comparison
equal deleted inserted replaced
5:48569213d91c 6:2d3dd9ff7e84
115 a disease status. One or more ML models are trained to classify repertoires based on the information within the sets of CDR3 sequences. Finally, the performance 115 a disease status. One or more ML models are trained to classify repertoires based on the information within the sets of CDR3 sequences. Finally, the performance
116 of the different methods is compared. 116 of the different methods is compared.
117 Alternatively, if you want to predict a property per immune receptor, such as antigen specificity, check out the 117 Alternatively, if you want to predict a property per immune receptor, such as antigen specificity, check out the
118 `Train immune receptor classifiers (simplified interface) <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_classifiers>`_ tool instead. 118 `Train immune receptor classifiers (simplified interface) <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_classifiers>`_ tool instead.
119 119
120 The full documentation can be found `here <https://docs.immuneml.uio.no/galaxy/galaxy_simple_repertoires.html>`_. 120 The full documentation can be found `here <https://docs.immuneml.uio.no/latest/galaxy/galaxy_simple_repertoires.html>`_.
121 121
122 **Basic terminology** 122 **Basic terminology**
123 123
124 In the context of ML, the characteristics to predict per repertoire are called **labels** and the values that these labels can take on are **classes**. 124 In the context of ML, the characteristics to predict per repertoire are called **labels** and the values that these labels can take on are **classes**.
125 One could thus have a label named ‘CMV_status’ with possible classes ‘positive’ and ‘negative’. The labels and classes must be present in the metadata 125 One could thus have a label named ‘CMV_status’ with possible classes ‘positive’ and ‘negative’. The labels and classes must be present in the metadata
126 file, in columns where the header and values correspond to the label and classes respectively. 126 file, in columns where the header and values correspond to the label and classes respectively.
127 127
128 .. image:: https://docs.immuneml.uio.no/_images/metadata_repertoire_classification.png 128 .. image:: https://docs.immuneml.uio.no/latest/_images/metadata_repertoire_classification.png
129 :height: 150 129 :height: 150
130 130
131 | 131 |
132 132
133 When training an ML model, the goal is for the model to learn **signals** within the data which discriminate between the different classes. An ML model 133 When training an ML model, the goal is for the model to learn **signals** within the data which discriminate between the different classes. An ML model
135 groups of similar receptors or short CDR3 subsequences in an immune repertoire. Our assumptions about what makes up a ‘signal’ determines how we 135 groups of similar receptors or short CDR3 subsequences in an immune repertoire. Our assumptions about what makes up a ‘signal’ determines how we
136 should represent our data to the ML model. This representation is called **encoding**. In this tool, the encoding is automatically chosen based on 136 should represent our data to the ML model. This representation is called **encoding**. In this tool, the encoding is automatically chosen based on
137 the user's assumptions about the dataset. 137 the user's assumptions about the dataset.
138 138
139 139
140 .. image:: https://docs.immuneml.uio.no/_images/repertoire_classification_overview.png 140 .. image:: https://docs.immuneml.uio.no/latest/_images/repertoire_classification_overview.png
141 :height: 500 141 :height: 500
142 142
143 | 143 |
144 | 144 |
145 145
164 encoding, the CDR3 regions are divided into overlapping subsequences and the (disease) signal may be characterized by the presence or absence of 164 encoding, the CDR3 regions are divided into overlapping subsequences and the (disease) signal may be characterized by the presence or absence of
165 certain sequence motifs in the CDR3 regions. Here, two similar CDR3 sequences are no longer independent, because they contain many identical subsequences. 165 certain sequence motifs in the CDR3 regions. Here, two similar CDR3 sequences are no longer independent, because they contain many identical subsequences.
166 A graphical representation of how a CDR3 sequence can be divided into k-mers, and how these k-mers can relate to specific positions in a 3D immune receptor 166 A graphical representation of how a CDR3 sequence can be divided into k-mers, and how these k-mers can relate to specific positions in a 3D immune receptor
167 (here: antibody) is shown in this figure: 167 (here: antibody) is shown in this figure:
168 168
169 .. image:: https://docs.immuneml.uio.no/_images/3mer_to_3d.png 169 .. image:: https://docs.immuneml.uio.no/latest/_images/3mer_to_3d.png
170 :height: 250 170 :height: 250
171 171
172 | 172 |
173 173
174 The subsequences may be position-dependent or invariant. Position invariant means that if a subsequence, e.g., ‘EDNA’ occurs in different positions 174 The subsequences may be position-dependent or invariant. Position invariant means that if a subsequence, e.g., ‘EDNA’ occurs in different positions
214 - Archive: repertoire classification: a .zip file containing the complete output folder as it was produced by immuneML. This folder 214 - Archive: repertoire classification: a .zip file containing the complete output folder as it was produced by immuneML. This folder
215 contains the output of the TrainMLModel instruction including all trained models and their predictions, and report results. 215 contains the output of the TrainMLModel instruction including all trained models and their predictions, and report results.
216 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file. 216 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file.
217 217
218 - optimal_ml_settings.zip: a .zip file containing the raw files for the optimal trained ML settings (ML model, encoding). 218 - optimal_ml_settings.zip: a .zip file containing the raw files for the optimal trained ML settings (ML model, encoding).
219 This .zip file can subsequently be used as an input when `applying previously trained ML models to a new AIRR dataset in Galaxy <https://docs.immuneml.uio.no/galaxy/galaxy_apply_ml_models.html>`_. 219 This .zip file can subsequently be used as an input when `applying previously trained ML models to a new AIRR dataset in Galaxy <https://docs.immuneml.uio.no/latest/galaxy/galaxy_apply_ml_models.html>`_.
220 220
221 - repertoire_classification.yaml: the YAML specification file that was used by immuneML internally to run the analysis. This file can be 221 - repertoire_classification.yaml: the YAML specification file that was used by immuneML internally to run the analysis. This file can be
222 downloaded, altered, and run again by immuneML using the `Train machine learning models <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_ml_model>`_ Galaxy tool. 222 downloaded, altered, and run again by immuneML using the `Train machine learning models <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_ml_model>`_ Galaxy tool.
223 223
224 **More analysis options** 224 **More analysis options**