comparison immuneml_train_recept.xml @ 6:2d3dd9ff7e84 draft

"planemo upload commit 74f2bd15d2b7723c8e5a22d743913706dc7d8333-dirty"
author immuneml
date Tue, 27 Jul 2021 09:30:50 +0000
parents ed3932e6d616
children 45ca02982e1f
comparison
equal deleted inserted replaced
5:48569213d91c 6:2d3dd9ff7e84
96 antigen specificity. One or more ML models are trained to classify receptors based on the information within the CDR3 sequence(s). Finally, the performance 96 antigen specificity. One or more ML models are trained to classify receptors based on the information within the CDR3 sequence(s). Finally, the performance
97 of the different methods is compared. 97 of the different methods is compared.
98 Alternatively, if you want to predict a property per immune repertoire, such as disease status, check out the 98 Alternatively, if you want to predict a property per immune repertoire, such as disease status, check out the
99 `Train immune repertoire classifiers (simplified interface) <https://galaxy.immuneml.uio.no/root?tool_id=novice_immuneml_interface>`_ tool instead. 99 `Train immune repertoire classifiers (simplified interface) <https://galaxy.immuneml.uio.no/root?tool_id=novice_immuneml_interface>`_ tool instead.
100 100
101 The full documentation can be found `here <https://docs.immuneml.uio.no/galaxy/galaxy_simple_receptors.html>`_. 101 The full documentation can be found `here <https://docs.immuneml.uio.no/latest/galaxy/galaxy_simple_receptors.html>`_.
102 102
103 **Basic terminology** 103 **Basic terminology**
104 104
105 In the context of ML, the characteristics to predict per receptor are called **labels** and the values that these labels can 105 In the context of ML, the characteristics to predict per receptor are called **labels** and the values that these labels can
106 take on are **classes**. One could thus have a label named ‘epitope’ with possible classes ‘binding_gluten’ and ‘not_binding_gluten’. 106 take on are **classes**. One could thus have a label named ‘epitope’ with possible classes ‘binding_gluten’ and ‘not_binding_gluten’.
110 classes. An ML model that predicts classes is also referred to as a **classifier**. A signal can have a variety of definitions, 110 classes. An ML model that predicts classes is also referred to as a **classifier**. A signal can have a variety of definitions,
111 including the presence of a specific subsequence or conserved positions. Our assumptions about what makes up a ‘signal’ 111 including the presence of a specific subsequence or conserved positions. Our assumptions about what makes up a ‘signal’
112 determines how we should represent our data to the ML model. This representation is called **encoding**. In this tool, the encoding is automatically chosen based on 112 determines how we should represent our data to the ML model. This representation is called **encoding**. In this tool, the encoding is automatically chosen based on
113 the user's assumptions about the dataset. 113 the user's assumptions about the dataset.
114 114
115 .. image:: https://docs.immuneml.uio.no/_images/receptor_classification_overview.png 115 .. image:: https://docs.immuneml.uio.no/latest/_images/receptor_classification_overview.png
116 :height: 500 116 :height: 500
117 117
118 | 118 |
119 | 119 |
120 120
135 in the CDR3 regions. The CDR3 regions are divided into overlapping subsequences and the (antigen specificity) 135 in the CDR3 regions. The CDR3 regions are divided into overlapping subsequences and the (antigen specificity)
136 signal may be characterized by the presence or absence of certain sequence motifs in the CDR3 region. 136 signal may be characterized by the presence or absence of certain sequence motifs in the CDR3 region.
137 A graphical representation of how a CDR3 sequence can be divided into k-mers, and how these k-mers can relate to specific positions in a 3D immune receptor 137 A graphical representation of how a CDR3 sequence can be divided into k-mers, and how these k-mers can relate to specific positions in a 3D immune receptor
138 (here: antibody) is shown in this figure: 138 (here: antibody) is shown in this figure:
139 139
140 .. image:: https://docs.immuneml.uio.no/_images/3mer_to_3d.png 140 .. image:: https://docs.immuneml.uio.no/latest/_images/3mer_to_3d.png
141 :height: 250 141 :height: 250
142 142
143 | 143 |
144 144
145 The subsequences may be position dependent or invariant. Position invariant means that if a subsequence, e.g., 145 The subsequences may be position dependent or invariant. Position invariant means that if a subsequence, e.g.,
185 - Archive: receptor classification: a .zip file containing the complete output folder as it was produced by immuneML. This folder 185 - Archive: receptor classification: a .zip file containing the complete output folder as it was produced by immuneML. This folder
186 contains the output of the TrainMLModel instruction including all trained models and their predictions, and report results. 186 contains the output of the TrainMLModel instruction including all trained models and their predictions, and report results.
187 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file. 187 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file.
188 188
189 - optimal_ml_settings.zip: a .zip file containing the raw files for the optimal trained ML settings (ML model, encoding). 189 - optimal_ml_settings.zip: a .zip file containing the raw files for the optimal trained ML settings (ML model, encoding).
190 This .zip file can subsequently be used as an input when `applying previously trained ML models to a new AIRR dataset in Galaxy <https://docs.immuneml.uio.no/galaxy/galaxy_apply_ml_models.html>`_. 190 This .zip file can subsequently be used as an input when `applying previously trained ML models to a new AIRR dataset in Galaxy <https://docs.immuneml.uio.no/latest/galaxy/galaxy_apply_ml_models.html>`_.
191 191
192 - receptor_classification.yaml: the YAML specification file that was used by immuneML internally to run the analysis. This file can be 192 - receptor_classification.yaml: the YAML specification file that was used by immuneML internally to run the analysis. This file can be
193 downloaded, altered, and run again by immuneML using the `Train machine learning models <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_ml_model>`_ Galaxy tool. 193 downloaded, altered, and run again by immuneML using the `Train machine learning models <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_ml_model>`_ Galaxy tool.
194 194
195 **More analysis options** 195 **More analysis options**