Mercurial > repos > immuneml > immuneml_tools
comparison immuneml_train_repert.xml @ 6:2d3dd9ff7e84 draft
"planemo upload commit 74f2bd15d2b7723c8e5a22d743913706dc7d8333-dirty"
author | immuneml |
---|---|
date | Tue, 27 Jul 2021 09:30:50 +0000 |
parents | ed3932e6d616 |
children | 45ca02982e1f |
comparison
equal
deleted
inserted
replaced
5:48569213d91c | 6:2d3dd9ff7e84 |
---|---|
115 a disease status. One or more ML models are trained to classify repertoires based on the information within the sets of CDR3 sequences. Finally, the performance | 115 a disease status. One or more ML models are trained to classify repertoires based on the information within the sets of CDR3 sequences. Finally, the performance |
116 of the different methods is compared. | 116 of the different methods is compared. |
117 Alternatively, if you want to predict a property per immune receptor, such as antigen specificity, check out the | 117 Alternatively, if you want to predict a property per immune receptor, such as antigen specificity, check out the |
118 `Train immune receptor classifiers (simplified interface) <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_classifiers>`_ tool instead. | 118 `Train immune receptor classifiers (simplified interface) <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_classifiers>`_ tool instead. |
119 | 119 |
120 The full documentation can be found `here <https://docs.immuneml.uio.no/galaxy/galaxy_simple_repertoires.html>`_. | 120 The full documentation can be found `here <https://docs.immuneml.uio.no/latest/galaxy/galaxy_simple_repertoires.html>`_. |
121 | 121 |
122 **Basic terminology** | 122 **Basic terminology** |
123 | 123 |
124 In the context of ML, the characteristics to predict per repertoire are called **labels** and the values that these labels can take on are **classes**. | 124 In the context of ML, the characteristics to predict per repertoire are called **labels** and the values that these labels can take on are **classes**. |
125 One could thus have a label named ‘CMV_status’ with possible classes ‘positive’ and ‘negative’. The labels and classes must be present in the metadata | 125 One could thus have a label named ‘CMV_status’ with possible classes ‘positive’ and ‘negative’. The labels and classes must be present in the metadata |
126 file, in columns where the header and values correspond to the label and classes respectively. | 126 file, in columns where the header and values correspond to the label and classes respectively. |
127 | 127 |
128 .. image:: https://docs.immuneml.uio.no/_images/metadata_repertoire_classification.png | 128 .. image:: https://docs.immuneml.uio.no/latest/_images/metadata_repertoire_classification.png |
129 :height: 150 | 129 :height: 150 |
130 | 130 |
131 | | 131 | |
132 | 132 |
133 When training an ML model, the goal is for the model to learn **signals** within the data which discriminate between the different classes. An ML model | 133 When training an ML model, the goal is for the model to learn **signals** within the data which discriminate between the different classes. An ML model |
135 groups of similar receptors or short CDR3 subsequences in an immune repertoire. Our assumptions about what makes up a ‘signal’ determines how we | 135 groups of similar receptors or short CDR3 subsequences in an immune repertoire. Our assumptions about what makes up a ‘signal’ determines how we |
136 should represent our data to the ML model. This representation is called **encoding**. In this tool, the encoding is automatically chosen based on | 136 should represent our data to the ML model. This representation is called **encoding**. In this tool, the encoding is automatically chosen based on |
137 the user's assumptions about the dataset. | 137 the user's assumptions about the dataset. |
138 | 138 |
139 | 139 |
140 .. image:: https://docs.immuneml.uio.no/_images/repertoire_classification_overview.png | 140 .. image:: https://docs.immuneml.uio.no/latest/_images/repertoire_classification_overview.png |
141 :height: 500 | 141 :height: 500 |
142 | 142 |
143 | | 143 | |
144 | | 144 | |
145 | 145 |
164 encoding, the CDR3 regions are divided into overlapping subsequences and the (disease) signal may be characterized by the presence or absence of | 164 encoding, the CDR3 regions are divided into overlapping subsequences and the (disease) signal may be characterized by the presence or absence of |
165 certain sequence motifs in the CDR3 regions. Here, two similar CDR3 sequences are no longer independent, because they contain many identical subsequences. | 165 certain sequence motifs in the CDR3 regions. Here, two similar CDR3 sequences are no longer independent, because they contain many identical subsequences. |
166 A graphical representation of how a CDR3 sequence can be divided into k-mers, and how these k-mers can relate to specific positions in a 3D immune receptor | 166 A graphical representation of how a CDR3 sequence can be divided into k-mers, and how these k-mers can relate to specific positions in a 3D immune receptor |
167 (here: antibody) is shown in this figure: | 167 (here: antibody) is shown in this figure: |
168 | 168 |
169 .. image:: https://docs.immuneml.uio.no/_images/3mer_to_3d.png | 169 .. image:: https://docs.immuneml.uio.no/latest/_images/3mer_to_3d.png |
170 :height: 250 | 170 :height: 250 |
171 | 171 |
172 | | 172 | |
173 | 173 |
174 The subsequences may be position-dependent or invariant. Position invariant means that if a subsequence, e.g., ‘EDNA’ occurs in different positions | 174 The subsequences may be position-dependent or invariant. Position invariant means that if a subsequence, e.g., ‘EDNA’ occurs in different positions |
214 - Archive: repertoire classification: a .zip file containing the complete output folder as it was produced by immuneML. This folder | 214 - Archive: repertoire classification: a .zip file containing the complete output folder as it was produced by immuneML. This folder |
215 contains the output of the TrainMLModel instruction including all trained models and their predictions, and report results. | 215 contains the output of the TrainMLModel instruction including all trained models and their predictions, and report results. |
216 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file. | 216 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file. |
217 | 217 |
218 - optimal_ml_settings.zip: a .zip file containing the raw files for the optimal trained ML settings (ML model, encoding). | 218 - optimal_ml_settings.zip: a .zip file containing the raw files for the optimal trained ML settings (ML model, encoding). |
219 This .zip file can subsequently be used as an input when `applying previously trained ML models to a new AIRR dataset in Galaxy <https://docs.immuneml.uio.no/galaxy/galaxy_apply_ml_models.html>`_. | 219 This .zip file can subsequently be used as an input when `applying previously trained ML models to a new AIRR dataset in Galaxy <https://docs.immuneml.uio.no/latest/galaxy/galaxy_apply_ml_models.html>`_. |
220 | 220 |
221 - repertoire_classification.yaml: the YAML specification file that was used by immuneML internally to run the analysis. This file can be | 221 - repertoire_classification.yaml: the YAML specification file that was used by immuneML internally to run the analysis. This file can be |
222 downloaded, altered, and run again by immuneML using the `Train machine learning models <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_ml_model>`_ Galaxy tool. | 222 downloaded, altered, and run again by immuneML using the `Train machine learning models <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_ml_model>`_ Galaxy tool. |
223 | 223 |
224 **More analysis options** | 224 **More analysis options** |