Mercurial > repos > immuneml > immuneml_tools
comparison immuneml_train_recept.xml @ 6:2d3dd9ff7e84 draft
"planemo upload commit 74f2bd15d2b7723c8e5a22d743913706dc7d8333-dirty"
author | immuneml |
---|---|
date | Tue, 27 Jul 2021 09:30:50 +0000 |
parents | ed3932e6d616 |
children | 45ca02982e1f |
comparison
equal
deleted
inserted
replaced
5:48569213d91c | 6:2d3dd9ff7e84 |
---|---|
96 antigen specificity. One or more ML models are trained to classify receptors based on the information within the CDR3 sequence(s). Finally, the performance | 96 antigen specificity. One or more ML models are trained to classify receptors based on the information within the CDR3 sequence(s). Finally, the performance |
97 of the different methods is compared. | 97 of the different methods is compared. |
98 Alternatively, if you want to predict a property per immune repertoire, such as disease status, check out the | 98 Alternatively, if you want to predict a property per immune repertoire, such as disease status, check out the |
99 `Train immune repertoire classifiers (simplified interface) <https://galaxy.immuneml.uio.no/root?tool_id=novice_immuneml_interface>`_ tool instead. | 99 `Train immune repertoire classifiers (simplified interface) <https://galaxy.immuneml.uio.no/root?tool_id=novice_immuneml_interface>`_ tool instead. |
100 | 100 |
101 The full documentation can be found `here <https://docs.immuneml.uio.no/galaxy/galaxy_simple_receptors.html>`_. | 101 The full documentation can be found `here <https://docs.immuneml.uio.no/latest/galaxy/galaxy_simple_receptors.html>`_. |
102 | 102 |
103 **Basic terminology** | 103 **Basic terminology** |
104 | 104 |
105 In the context of ML, the characteristics to predict per receptor are called **labels** and the values that these labels can | 105 In the context of ML, the characteristics to predict per receptor are called **labels** and the values that these labels can |
106 take on are **classes**. One could thus have a label named ‘epitope’ with possible classes ‘binding_gluten’ and ‘not_binding_gluten’. | 106 take on are **classes**. One could thus have a label named ‘epitope’ with possible classes ‘binding_gluten’ and ‘not_binding_gluten’. |
110 classes. An ML model that predicts classes is also referred to as a **classifier**. A signal can have a variety of definitions, | 110 classes. An ML model that predicts classes is also referred to as a **classifier**. A signal can have a variety of definitions, |
111 including the presence of a specific subsequence or conserved positions. Our assumptions about what makes up a ‘signal’ | 111 including the presence of a specific subsequence or conserved positions. Our assumptions about what makes up a ‘signal’ |
112 determines how we should represent our data to the ML model. This representation is called **encoding**. In this tool, the encoding is automatically chosen based on | 112 determines how we should represent our data to the ML model. This representation is called **encoding**. In this tool, the encoding is automatically chosen based on |
113 the user's assumptions about the dataset. | 113 the user's assumptions about the dataset. |
114 | 114 |
115 .. image:: https://docs.immuneml.uio.no/_images/receptor_classification_overview.png | 115 .. image:: https://docs.immuneml.uio.no/latest/_images/receptor_classification_overview.png |
116 :height: 500 | 116 :height: 500 |
117 | 117 |
118 | | 118 | |
119 | | 119 | |
120 | 120 |
135 in the CDR3 regions. The CDR3 regions are divided into overlapping subsequences and the (antigen specificity) | 135 in the CDR3 regions. The CDR3 regions are divided into overlapping subsequences and the (antigen specificity) |
136 signal may be characterized by the presence or absence of certain sequence motifs in the CDR3 region. | 136 signal may be characterized by the presence or absence of certain sequence motifs in the CDR3 region. |
137 A graphical representation of how a CDR3 sequence can be divided into k-mers, and how these k-mers can relate to specific positions in a 3D immune receptor | 137 A graphical representation of how a CDR3 sequence can be divided into k-mers, and how these k-mers can relate to specific positions in a 3D immune receptor |
138 (here: antibody) is shown in this figure: | 138 (here: antibody) is shown in this figure: |
139 | 139 |
140 .. image:: https://docs.immuneml.uio.no/_images/3mer_to_3d.png | 140 .. image:: https://docs.immuneml.uio.no/latest/_images/3mer_to_3d.png |
141 :height: 250 | 141 :height: 250 |
142 | 142 |
143 | | 143 | |
144 | 144 |
145 The subsequences may be position dependent or invariant. Position invariant means that if a subsequence, e.g., | 145 The subsequences may be position dependent or invariant. Position invariant means that if a subsequence, e.g., |
185 - Archive: receptor classification: a .zip file containing the complete output folder as it was produced by immuneML. This folder | 185 - Archive: receptor classification: a .zip file containing the complete output folder as it was produced by immuneML. This folder |
186 contains the output of the TrainMLModel instruction including all trained models and their predictions, and report results. | 186 contains the output of the TrainMLModel instruction including all trained models and their predictions, and report results. |
187 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file. | 187 Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file. |
188 | 188 |
189 - optimal_ml_settings.zip: a .zip file containing the raw files for the optimal trained ML settings (ML model, encoding). | 189 - optimal_ml_settings.zip: a .zip file containing the raw files for the optimal trained ML settings (ML model, encoding). |
190 This .zip file can subsequently be used as an input when `applying previously trained ML models to a new AIRR dataset in Galaxy <https://docs.immuneml.uio.no/galaxy/galaxy_apply_ml_models.html>`_. | 190 This .zip file can subsequently be used as an input when `applying previously trained ML models to a new AIRR dataset in Galaxy <https://docs.immuneml.uio.no/latest/galaxy/galaxy_apply_ml_models.html>`_. |
191 | 191 |
192 - receptor_classification.yaml: the YAML specification file that was used by immuneML internally to run the analysis. This file can be | 192 - receptor_classification.yaml: the YAML specification file that was used by immuneML internally to run the analysis. This file can be |
193 downloaded, altered, and run again by immuneML using the `Train machine learning models <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_ml_model>`_ Galaxy tool. | 193 downloaded, altered, and run again by immuneML using the `Train machine learning models <https://galaxy.immuneml.uio.no/root?tool_id=immuneml_train_ml_model>`_ Galaxy tool. |
194 | 194 |
195 **More analysis options** | 195 **More analysis options** |