Mercurial > repos > althonos > gecco
comparison CHANGELOG.md @ 19:cc91d730cc4f draft
Fix syntax of Galaxy script for GECCO
author | althonos |
---|---|
date | Mon, 16 Jan 2023 18:35:56 +0000 |
parents | 3dd71eaa2909 |
children | 6ba37b7dea42 |
comparison
equal
deleted
inserted
replaced
18:3dd71eaa2909 | 19:cc91d730cc4f |
---|---|
3 | 3 |
4 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) | 4 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) |
5 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). | 5 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). |
6 | 6 |
7 ## [Unreleased] | 7 ## [Unreleased] |
8 [Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.5...master | 8 [Unreleased]: https://github.com/zellerlab/GECCO/compare/v0.9.6...master |
9 | |
10 | |
11 ## [v0.9.6] - 2023-01-11 | |
12 [v0.9.6]: https://github.com/zellerlab/GECCO/compare/v0.9.5...v0.9.6 | |
13 | |
14 ### Added | |
15 - Gene Ontology annotations to `gecco.interpro` local metadata. | |
16 - Reference to Gene Ontology terms and derived functions to `gecco.model.Domain` objects. | |
17 - Gene color based on predicted function in `gecco.model.Gene.to_seq_feature`. | |
18 | |
19 ### Fixed | |
20 - Missing `gzip` import in the CLI preventing usage of gzip-compressed inputs. | |
21 - Invalid coordinates of domains found in reverse-strand genes. | |
22 - Detection of entry points with `importlib.metadata` on older Python versions. | |
23 | |
24 ### Changed | |
25 - `bgc_id` columns of cluster tables are renamed `cluster_id`. | |
26 - `gecco.model.ProductType` is renamed to `gecco.model.ClusterType`. | |
27 - Bumped `pyrodigal` dependency to `v2.0`. | |
28 - Bumped `pyhmmer` dependency to `v0.7`. | |
9 | 29 |
10 | 30 |
11 ## [v0.9.5] - 2022-08-10 | 31 ## [v0.9.5] - 2022-08-10 |
12 [v0.9.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.4...v0.9.5 | 32 [v0.9.5]: https://github.com/zellerlab/GECCO/compare/v0.9.4...v0.9.5 |
13 | 33 |
14 ### Added | 34 ### Added |
15 - `gecco predict` command to predict BGCs from an annotated genome. | 35 - `gecco predict` command to predict BGCs from an annotated genome. |
16 - `Protein.with_seq` function to assign a new sequence to a protein object. | 36 - `Protein.with_seq` function to assign a new sequence to a protein object. |
17 | 37 |
19 - Issue with antiSMASH sideload JSON file generation in `gecco run` and `gecco predict`. | 39 - Issue with antiSMASH sideload JSON file generation in `gecco run` and `gecco predict`. |
20 - Make `gecco.orf` handle STOP codons consistently ([#9](https://github.com/zellerlab/GECCO/issues/9)). | 40 - Make `gecco.orf` handle STOP codons consistently ([#9](https://github.com/zellerlab/GECCO/issues/9)). |
21 | 41 |
22 | 42 |
23 ## [v0.9.4] - 2022-05-31 | 43 ## [v0.9.4] - 2022-05-31 |
24 [v0.9.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.3...v0.9.4 | 44 [v0.9.4]: https://github.com/zellerlab/GECCO/compare/v0.9.3...v0.9.4 |
25 | 45 |
26 ### Added | 46 ### Added |
27 - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`. | 47 - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`. |
28 - Alternative ORF finder `CDSFinder` which simply extracts CDS features from input sequences ([#8](https://github.com/zellerlab/GECCO/issues/8)). | 48 - Alternative ORF finder `CDSFinder` which simply extracts CDS features from input sequences ([#8](https://github.com/zellerlab/GECCO/issues/8)). |
29 - Support for annotating domains with "exclusive" HMMs to annotate genes with *at most* one HMM from the library. | 49 - Support for annotating domains with "exclusive" HMMs to annotate genes with *at most* one HMM from the library. |
37 ### Fixed | 57 ### Fixed |
38 - Broken MyPy type annotations in the `gecco.model` and `gecco.cli` modules. | 58 - Broken MyPy type annotations in the `gecco.model` and `gecco.cli` modules. |
39 | 59 |
40 | 60 |
41 ## [v0.9.3] - 2022-05-13 | 61 ## [v0.9.3] - 2022-05-13 |
42 [v0.9.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.2...v0.9.3 | 62 [v0.9.3]: https://github.com/zellerlab/GECCO/compare/v0.9.2...v0.9.3 |
43 | 63 |
44 ### Changed | 64 ### Changed |
45 - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`. | 65 - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`. |
46 | 66 |
47 ### Fixed | 67 ### Fixed |
48 - Genes with duplicate IDs being silently ignored in `HMMER.run`. | 68 - Genes with duplicate IDs being silently ignored in `HMMER.run`. |
49 | 69 |
50 | 70 |
51 ## [v0.9.2] - 2022-04-11 | 71 ## [v0.9.2] - 2022-04-11 |
52 [v0.9.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1...v0.9.2 | 72 [v0.9.2]: https://github.com/zellerlab/GECCO/compare/v0.9.1...v0.9.2 |
53 | 73 |
54 ### Added | 74 ### Added |
55 - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`. | 75 - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`. |
56 | 76 |
57 ## [v0.9.1] - 2022-04-05 | 77 ## [v0.9.1] - 2022-04-05 |
58 [v0.9.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha4...v0.9.1 | 78 [v0.9.1]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha4...v0.9.1 |
59 | 79 |
60 ### Changed | 80 ### Changed |
61 - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window. | 81 - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window. |
62 - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`. | 82 - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`. |
63 | 83 |
64 ## [v0.9.1-alpha4] - 2022-03-31 | 84 ## [v0.9.1-alpha4] - 2022-03-31 |
65 [v0.9.1-alpha4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4 | 85 [v0.9.1-alpha4]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4 |
66 | 86 |
67 Retrain internal model with: | 87 Retrain internal model with: |
68 ``` | 88 ``` |
69 $ python -m gecco -vv train --c1 0.4 --c2 0 --select 0.25 --window-size 20 \ | 89 $ python -m gecco -vv train --c1 0.4 --c2 0 --select 0.25 --window-size 20 \ |
70 -f mibig-2.0.proG2.Pfam-v35.0.features.tsv \ | 90 -f mibig-2.0.proG2.Pfam-v35.0.features.tsv \ |
72 -g GECCO-data/data/embeddings/mibig-2.0.proG2.genes.tsv \ | 92 -g GECCO-data/data/embeddings/mibig-2.0.proG2.genes.tsv \ |
73 -o models/v0.9.1-alpha4 | 93 -o models/v0.9.1-alpha4 |
74 ``` | 94 ``` |
75 | 95 |
76 ## [v0.9.1-alpha3] - 2022-03-23 | 96 ## [v0.9.1-alpha3] - 2022-03-23 |
77 [v0.9.1-alpha3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3 | 97 [v0.9.1-alpha3]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3 |
78 | 98 |
79 ### Added | 99 ### Added |
80 - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains. | 100 - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains. |
81 | 101 |
82 ### Changed | 102 ### Changed |
83 - Refactored implementation of `load` and `dump` methods for `Table` classes into a dedicated base class. | 103 - Refactored implementation of `load` and `dump` methods for `Table` classes into a dedicated base class. |
84 - `gecco run` and `gecco annotate` now output a gene table in addition to the feature and cluster tables. | 104 - `gecco run` and `gecco annotate` now output a gene table in addition to the feature and cluster tables. |
85 - `gecco train` expects a gene table instead of a GFF file for the gene coordinates. | 105 - `gecco train` expects a gene table instead of a GFF file for the gene coordinates. |
86 | 106 |
87 ## [v0.9.1-alpha2] - 2022-03-23 | 107 ## [v0.9.1-alpha2] - 2022-03-23 |
88 [v0.9.1-alpha2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2 | 108 [v0.9.1-alpha2]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2 |
89 | 109 |
90 ### Fixed | 110 ### Fixed |
91 - `TypeClassifier.trained` not being able to read unknown types from type tables. | 111 - `TypeClassifier.trained` not being able to read unknown types from type tables. |
92 | 112 |
93 ## [v0.9.1-alpha1] - 2022-03-20 | 113 ## [v0.9.1-alpha1] - 2022-03-20 |
94 [v0.9.1-alpha1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.10...v0.9.1-alpha1 | 114 [v0.9.1-alpha1]: https://github.com/zellerlab/GECCO/compare/v0.8.10...v0.9.1-alpha1 |
95 Candidate release with support for a sliding window in the CRF prediction algorithm. | 115 Candidate release with support for a sliding window in the CRF prediction algorithm. |
96 | 116 |
97 ## [v0.8.10] - 2022-02-23 | 117 ## [v0.8.10] - 2022-02-23 |
98 [v0.8.10]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.9...v0.8.10 | 118 [v0.8.10]: https://github.com/zellerlab/GECCO/compare/v0.8.9...v0.8.10 |
99 ### Fixed | 119 ### Fixed |
100 - `--antismash-sideload` flag of `gecco run` causing command to crash. | 120 - `--antismash-sideload` flag of `gecco run` causing command to crash. |
101 | 121 |
102 ## [v0.8.9] - 2022-02-22 | 122 ## [v0.8.9] - 2022-02-22 |
103 [v0.8.9]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.8...v0.8.9 | 123 [v0.8.9]: https://github.com/zellerlab/GECCO/compare/v0.8.8...v0.8.9 |
104 ### Removed | 124 ### Removed |
105 - Prediction and support for the *Other* biosynthetic type of MIBiG clusters. | 125 - Prediction and support for the *Other* biosynthetic type of MIBiG clusters. |
106 | 126 |
107 ## [v0.8.8] - 2022-02-21 | 127 ## [v0.8.8] - 2022-02-21 |
108 [v0.8.8]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.7...v0.8.8 | 128 [v0.8.8]: https://github.com/zellerlab/GECCO/compare/v0.8.7...v0.8.8 |
109 ### Fixed | 129 ### Fixed |
110 - `ClusterRefiner` filtering method for edge genes not working as intended. | 130 - `ClusterRefiner` filtering method for edge genes not working as intended. |
111 - `gecco run` and `gecco annotate` commands crashing on missing input files instead of nicely rendering the error. | 131 - `gecco run` and `gecco annotate` commands crashing on missing input files instead of nicely rendering the error. |
112 | 132 |
113 ## [v0.8.7] - 2022-02-18 | 133 ## [v0.8.7] - 2022-02-18 |
114 [v0.8.7]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.6...v0.8.7 | 134 [v0.8.7]: https://github.com/zellerlab/GECCO/compare/v0.8.6...v0.8.7 |
115 ### Fixed | 135 ### Fixed |
116 - `interpro.json` metadata file not being included in distribution files. | 136 - `interpro.json` metadata file not being included in distribution files. |
117 - Missing docstring for `Protein.with_domains` method. | 137 - Missing docstring for `Protein.with_domains` method. |
118 ### Changed | 138 ### Changed |
119 - Bump minimum `scikit-learn` version to `v1.0` for Python3.7+. | 139 - Bump minimum `scikit-learn` version to `v1.0` for Python3.7+. |
120 | 140 |
121 ## [v0.8.6] - 2022-02-17 - YANKED | 141 ## [v0.8.6] - 2022-02-17 - YANKED |
122 [v0.8.6]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.5...v0.8.6 | 142 [v0.8.6]: https://github.com/zellerlab/GECCO/compare/v0.8.5...v0.8.6 |
123 ### Added | 143 ### Added |
124 - CLI flag for enabling region masking for contigs processed by Prodigal. | 144 - CLI flag for enabling region masking for contigs processed by Prodigal. |
125 - CLI flag for controlling region distance used for edge distance filtering. | 145 - CLI flag for controlling region distance used for edge distance filtering. |
126 ### Changed | 146 ### Changed |
127 - `gecco.model.Gene` and `gecco.model.Protein` are now immutable data classes. | 147 - `gecco.model.Gene` and `gecco.model.Protein` are now immutable data classes. |
131 ### Fixed | 151 ### Fixed |
132 - Mark `BGC0000930` as `Terpene` in the type classifier data. | 152 - Mark `BGC0000930` as `Terpene` in the type classifier data. |
133 - Progress bar messages are now in consistent format. | 153 - Progress bar messages are now in consistent format. |
134 | 154 |
135 ## [v0.8.5] - 2021-11-21 | 155 ## [v0.8.5] - 2021-11-21 |
136 [v0.8.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.4...v0.8.5 | 156 [v0.8.5]: https://github.com/zellerlab/GECCO/compare/v0.8.4...v0.8.5 |
137 ### Added | 157 ### Added |
138 - Minimal compatibility support for running GECCO inside of Galaxy workflows. | 158 - Minimal compatibility support for running GECCO inside of Galaxy workflows. |
139 | 159 |
140 ## [v0.8.4] - 2021-09-26 | 160 ## [v0.8.4] - 2021-09-26 |
141 [v0.8.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3-post1...v0.8.4 | 161 [v0.8.4]: https://github.com/zellerlab/GECCO/compare/v0.8.3-post1...v0.8.4 |
142 ### Fixed | 162 ### Fixed |
143 - `gecco convert gbk --format bigslice` failing to run because of outdated code ([#5](https://github.com/zellerlab/GECCO/issues/5)). | 163 - `gecco convert gbk --format bigslice` failing to run because of outdated code ([#5](https://github.com/zellerlab/GECCO/issues/5)). |
144 - `gecco convert gbk --format bigslice` not creating files with names conforming to BiG-SLiCE expected input. | 164 - `gecco convert gbk --format bigslice` not creating files with names conforming to BiG-SLiCE expected input. |
145 ### Changed | 165 ### Changed |
146 - Bump minimum `pyrodigal` version to `v0.6.2` to use platform-accelerated code if supported. | 166 - Bump minimum `pyrodigal` version to `v0.6.2` to use platform-accelerated code if supported. |
147 | 167 |
148 ## [v0.8.3-post1] - 2021-08-23 | 168 ## [v0.8.3-post1] - 2021-08-23 |
149 [v0.8.3-post1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3...v0.8.3-post1 | 169 [v0.8.3-post1]: https://github.com/zellerlab/GECCO/compare/v0.8.3...v0.8.3-post1 |
150 ### Fixed | 170 ### Fixed |
151 - Wrong default value for `--threshold` being shown in `gecco run` help message. | 171 - Wrong default value for `--threshold` being shown in `gecco run` help message. |
152 | 172 |
153 ## [v0.8.3] - 2021-08-23 | 173 ## [v0.8.3] - 2021-08-23 |
154 [v0.8.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.2...v0.8.3 | 174 [v0.8.3]: https://github.com/zellerlab/GECCO/compare/v0.8.2...v0.8.3 |
155 ### Changed | 175 ### Changed |
156 - Default probability threshold for segmentation to 0.3 (from 0.4). | 176 - Default probability threshold for segmentation to 0.3 (from 0.4). |
157 | 177 |
158 ## [v0.8.2] - 2021-07-31 | 178 ## [v0.8.2] - 2021-07-31 |
159 [v0.8.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.1...v0.8.2 | 179 [v0.8.2]: https://github.com/zellerlab/GECCO/compare/v0.8.1...v0.8.2 |
160 ### Fixed | 180 ### Fixed |
161 - `gecco run` crashing on Python 3.6 because of missing `contextlib.nullcontext` class. | 181 - `gecco run` crashing on Python 3.6 because of missing `contextlib.nullcontext` class. |
162 ### Changed | 182 ### Changed |
163 - `gecco run` and `gecco annotate` will not try to count the number of profiles when given an external HMM file with the `--hmm` flag. | 183 - `gecco run` and `gecco annotate` will not try to count the number of profiles when given an external HMM file with the `--hmm` flag. |
164 - `PyHMMER.run` now reports the *p-value* of each domain in addition to the *e-value* as a `/note` qualifier. | 184 - `PyHMMER.run` now reports the *p-value* of each domain in addition to the *e-value* as a `/note` qualifier. |
165 | 185 |
166 ## [v0.8.1] - 2021-07-29 | 186 ## [v0.8.1] - 2021-07-29 |
167 [v0.8.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.0...v0.8.1 | 187 [v0.8.1]: https://github.com/zellerlab/GECCO/compare/v0.8.0...v0.8.1 |
168 ### Changed | 188 ### Changed |
169 - `gecco run` now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom `--model`. | 189 - `gecco run` now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom `--model`. |
170 ### Fixed | 190 ### Fixed |
171 - `gecco` reporting about using Pfam `v33.1` while actually using `v34.0` because of an outdated field in `gecco/hmmer/Pfam.ini`. | 191 - `gecco` reporting about using Pfam `v33.1` while actually using `v34.0` because of an outdated field in `gecco/hmmer/Pfam.ini`. |
172 ### Added | 192 ### Added |
173 - Missing documentation for the `strand` attribute of `gecco.model.Gene`. | 193 - Missing documentation for the `strand` attribute of `gecco.model.Gene`. |
174 | 194 |
175 ## [v0.8.0] - 2021-07-03 | 195 ## [v0.8.0] - 2021-07-03 |
176 [v0.8.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.7.0...v0.8.0 | 196 [v0.8.0]: https://github.com/zellerlab/GECCO/compare/v0.7.0...v0.8.0 |
177 ### Changed | 197 ### Changed |
178 - Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0. | 198 - Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0. |
179 - Bump minimum `pyhmmer` version to `v0.4.0` to improve exception handling. | 199 - Bump minimum `pyhmmer` version to `v0.4.0` to improve exception handling. |
180 - Bump minimum `pyrodigal` version to `v0.5.0` to fix sequence decoding on some platforms. | 200 - Bump minimum `pyrodigal` version to `v0.5.0` to fix sequence decoding on some platforms. |
181 - Use p-values instead of e-values to filter domains obtained with HMMER. | 201 - Use p-values instead of e-values to filter domains obtained with HMMER. |
193 - Outdated `gecco embed` command. | 213 - Outdated `gecco embed` command. |
194 - Unused `--truncate` flag from the `gecco train` CLI. | 214 - Unused `--truncate` flag from the `gecco train` CLI. |
195 - Tigrfam domains, which is not improving performance on the new training data. | 215 - Tigrfam domains, which is not improving performance on the new training data. |
196 | 216 |
197 ## [v0.7.0] - 2021-05-31 | 217 ## [v0.7.0] - 2021-05-31 |
198 [v0.7.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.3...v0.7.0 | 218 [v0.7.0]: https://github.com/zellerlab/GECCO/compare/v0.6.3...v0.7.0 |
199 ### Added | 219 ### Added |
200 - Support for writing an AntiSMASH sideload JSON file after a `gecco run` workflow. | 220 - Support for writing an AntiSMASH sideload JSON file after a `gecco run` workflow. |
201 - Code for converting GenBank files in BiG-SLiCE compatible format with the `gecco convert` subcommand. | 221 - Code for converting GenBank files in BiG-SLiCE compatible format with the `gecco convert` subcommand. |
202 - Documentation about using GECCO in combination with AntiSMASH or BiG-SLiCE. | 222 - Documentation about using GECCO in combination with AntiSMASH or BiG-SLiCE. |
203 ### Changed | 223 ### Changed |
205 - Internal domain composition shipped in the `gecco.types` with newer composition array obtained directly from MIBiG files. | 225 - Internal domain composition shipped in the `gecco.types` with newer composition array obtained directly from MIBiG files. |
206 ### Removed | 226 ### Removed |
207 - Outdated notice about `-vvv` verbosity level in the help message of the main `gecco` command. | 227 - Outdated notice about `-vvv` verbosity level in the help message of the main `gecco` command. |
208 | 228 |
209 ## [v0.6.3] - 2021-05-10 | 229 ## [v0.6.3] - 2021-05-10 |
210 [v0.6.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.2...v0.6.3 | 230 [v0.6.3]: https://github.com/zellerlab/GECCO/compare/v0.6.2...v0.6.3 |
211 ### Fixed | 231 ### Fixed |
212 - HMMER annotation not properly handling inputs with multiple contigs. | 232 - HMMER annotation not properly handling inputs with multiple contigs. |
213 - Some progress bar totals displaying as floats in the CLI. | 233 - Some progress bar totals displaying as floats in the CLI. |
214 ### Changed | 234 ### Changed |
215 - `PyHMMER` now sets the `Z` and `domZ` values from the number of proteins given to the search pipeline. | 235 - `PyHMMER` now sets the `Z` and `domZ` values from the number of proteins given to the search pipeline. |
216 - `gecco.cli` delegates imports to make CLI more responsive. | 236 - `gecco.cli` delegates imports to make CLI more responsive. |
217 - `pkg_resources` has been replaced with `importlib.resources` and `importlib.metadata` where applicable. | 237 - `pkg_resources` has been replaced with `importlib.resources` and `importlib.metadata` where applicable. |
218 - `multiprocessing.cpu_count` has been replaced with `os.cpu_count` where applicable. | 238 - `multiprocessing.cpu_count` has been replaced with `os.cpu_count` where applicable. |
219 | 239 |
220 ## [v0.6.2] - 2021-05-04 | 240 ## [v0.6.2] - 2021-05-04 |
221 [v0.6.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.1...v0.6.2 | 241 [v0.6.2]: https://github.com/zellerlab/GECCO/compare/v0.6.1...v0.6.2 |
222 ### Fixed | 242 ### Fixed |
223 - `gecco cv loto` crashing because of outdated code. | 243 - `gecco cv loto` crashing because of outdated code. |
224 ### Changed | 244 ### Changed |
225 - Logging-style prompt will only display if GECCO is running with `-vv` flag. | 245 - Logging-style prompt will only display if GECCO is running with `-vv` flag. |
226 ### Added | 246 ### Added |
227 - GECCO bioRxiv paper reference to `Cluster.to_seq_record` output record. | 247 - GECCO bioRxiv paper reference to `Cluster.to_seq_record` output record. |
228 | 248 |
229 ## [v0.6.1] - 2021-03-15 | 249 ## [v0.6.1] - 2021-03-15 |
230 [v0.6.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.0...v0.6.1 | 250 [v0.6.1]: https://github.com/zellerlab/GECCO/compare/v0.6.0...v0.6.1 |
231 ### Fixed | 251 ### Fixed |
232 - Progress bar not being disabled by `-q` flag in CLI. | 252 - Progress bar not being disabled by `-q` flag in CLI. |
233 - Fallback to using HMM name if accession is not available in `PyHMMER`. | 253 - Fallback to using HMM name if accession is not available in `PyHMMER`. |
234 - Group genes by source contig and process them separately in `PyHMMER` to avoid bogus E-values. | 254 - Group genes by source contig and process them separately in `PyHMMER` to avoid bogus E-values. |
235 ### Added | 255 ### Added |
237 - Support for using an arbitrary mapping of positives to negatives in `gecco embed`. | 257 - Support for using an arbitrary mapping of positives to negatives in `gecco embed`. |
238 ### Removed | 258 ### Removed |
239 - Unused and outdated `HMMER` and `DomainRow` classes from `gecco.hmmer`. | 259 - Unused and outdated `HMMER` and `DomainRow` classes from `gecco.hmmer`. |
240 | 260 |
241 ## [v0.6.0] - 2021-02-28 | 261 ## [v0.6.0] - 2021-02-28 |
242 [v0.6.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.5...v0.6.0 | 262 [v0.6.0]: https://github.com/zellerlab/GECCO/compare/v0.5.5...v0.6.0 |
243 ### Changed | 263 ### Changed |
244 - Updated internal model with a cleaned-up version of the MIBiG-2.0 | 264 - Updated internal model with a cleaned-up version of the MIBiG-2.0 |
245 Pfam-33.1/Tigrfam-15.0 embedding. | 265 Pfam-33.1/Tigrfam-15.0 embedding. |
246 - Updated internal InterPro catalog. | 266 - Updated internal InterPro catalog. |
247 ### Fixed | 267 ### Fixed |
248 - Features not being grouped together in `gecco cv` and `gecco train` | 268 - Features not being grouped together in `gecco cv` and `gecco train` |
249 when provided with a feature table where rows were not sorted by | 269 when provided with a feature table where rows were not sorted by |
250 protein IDs. | 270 protein IDs. |
251 | 271 |
252 ## [v0.5.5] - 2021-02-28 | 272 ## [v0.5.5] - 2021-02-28 |
253 [v0.5.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.4...v0.5.5 | 273 [v0.5.5]: https://github.com/zellerlab/GECCO/compare/v0.5.4...v0.5.5 |
254 ### Fixed | 274 ### Fixed |
255 - `gecco cv` bug causing only the last fold to be written. | 275 - `gecco cv` bug causing only the last fold to be written. |
256 | 276 |
257 ## [v0.5.4] - 2021-02-28 | 277 ## [v0.5.4] - 2021-02-28 |
258 [v0.5.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.3...v0.5.4 | 278 [v0.5.4]: https://github.com/zellerlab/GECCO/compare/v0.5.3...v0.5.4 |
259 ### Changed | 279 ### Changed |
260 - Replaced `verboselogs`, `coloredlogs` and `better-exceptions` with `rich`. | 280 - Replaced `verboselogs`, `coloredlogs` and `better-exceptions` with `rich`. |
261 ### Removed | 281 ### Removed |
262 - `tqdm` training dependency. | 282 - `tqdm` training dependency. |
263 ### Added | 283 ### Added |
264 - `gecco annotate` command to produce a feature table from a genomic file. | 284 - `gecco annotate` command to produce a feature table from a genomic file. |
265 - `gecco embed` to embed BGCs into non-BGC regions using feature tables. | 285 - `gecco embed` to embed BGCs into non-BGC regions using feature tables. |
266 | 286 |
267 ## [v0.5.3] - 2021-02-21 | 287 ## [v0.5.3] - 2021-02-21 |
268 [v0.5.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.2...v0.5.3 | 288 [v0.5.3]: https://github.com/zellerlab/GECCO/compare/v0.5.2...v0.5.3 |
269 ### Fixed | 289 ### Fixed |
270 - Coordinates of genes in output GenBank files. | 290 - Coordinates of genes in output GenBank files. |
271 - Potential issue with the number of CPUs in `PyHMMER.run`. | 291 - Potential issue with the number of CPUs in `PyHMMER.run`. |
272 ### Changed | 292 ### Changed |
273 - Bump required `pyrodigal` version to `v0.4.2` to fix buffer overflow. | 293 - Bump required `pyrodigal` version to `v0.4.2` to fix buffer overflow. |
274 | 294 |
275 ## [v0.5.2] - 2021-01-29 | 295 ## [v0.5.2] - 2021-01-29 |
276 [v0.5.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.1...v0.5.2 | 296 [v0.5.2]: https://github.com/zellerlab/GECCO/compare/v0.5.1...v0.5.2 |
277 ### Added | 297 ### Added |
278 - Support for downloading HMM files directly from GitHub releases assets. | 298 - Support for downloading HMM files directly from GitHub releases assets. |
279 - Validation of filtered HMMs with MD5 checksum. | 299 - Validation of filtered HMMs with MD5 checksum. |
280 ### Fixed | 300 ### Fixed |
281 - Invalid coordinates of protein domains in GenBank output files. | 301 - Invalid coordinates of protein domains in GenBank output files. |
282 - `gecco.interpro` module not being added to wheel distribution. | 302 - `gecco.interpro` module not being added to wheel distribution. |
283 ### Changed | 303 ### Changed |
284 - Bump required `pyhmmer` version to `v0.2.1`. | 304 - Bump required `pyhmmer` version to `v0.2.1`. |
285 | 305 |
286 ## [v0.5.1] - 2021-01-15 | 306 ## [v0.5.1] - 2021-01-15 |
287 [v0.5.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.0...v0.5.1 | 307 [v0.5.1]: https://github.com/zellerlab/GECCO/compare/v0.5.0...v0.5.1 |
288 ### Fixed | 308 ### Fixed |
289 - `--hmm` flag being ignored in in `gecco run` command. | 309 - `--hmm` flag being ignored in in `gecco run` command. |
290 - `PyHMMER` using HMM names instead of accessions, causing issues with Pfam HMMs. | 310 - `PyHMMER` using HMM names instead of accessions, causing issues with Pfam HMMs. |
291 | 311 |
292 ## [v0.5.0] - 2021-01-11 | 312 ## [v0.5.0] - 2021-01-11 |
293 [v0.5.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.5...v0.5.0 | 313 [v0.5.0]: https://github.com/zellerlab/GECCO/compare/v0.4.5...v0.5.0 |
294 ### Added | 314 ### Added |
295 - Explicit support for Python 3.9. | 315 - Explicit support for Python 3.9. |
296 ### Changed | 316 ### Changed |
297 - [`pyhmmer`](https://pypi.org/project/pyhmmer) is used to annotate protein sequences instead of HMMER3 binary `hmmsearch`. | 317 - [`pyhmmer`](https://pypi.org/project/pyhmmer) is used to annotate protein sequences instead of HMMER3 binary `hmmsearch`. |
298 - HMM files are stored in binary format to speedup parsing and reduce storage size. | 318 - HMM files are stored in binary format to speedup parsing and reduce storage size. |
299 - `tqdm` is now a *training*-only dependency. | 319 - `tqdm` is now a *training*-only dependency. |
300 - `gecco cv` now requires *training* dependencies. | 320 - `gecco cv` now requires *training* dependencies. |
301 | 321 |
302 ## [v0.4.5] - 2020-11-23 | 322 ## [v0.4.5] - 2020-11-23 |
303 [v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5 | 323 [v0.4.5]: https://github.com/zellerlab/GECCO/compare/v0.4.4...v0.4.5 |
304 ### Added | 324 ### Added |
305 - Additional `fold` column to cross-validation table output. | 325 - Additional `fold` column to cross-validation table output. |
306 ### Changed | 326 ### Changed |
307 - Use sequence ID instead of protein ID to extract type from cluster in `gecco cv`. | 327 - Use sequence ID instead of protein ID to extract type from cluster in `gecco cv`. |
308 - Install HMM data in pre-pressed format to make `hmmsearch` runs faster on short sequences. | 328 - Install HMM data in pre-pressed format to make `hmmsearch` runs faster on short sequences. |
309 - `gecco.orf` was rewritten to extract genes from input sequences in parallel. | 329 - `gecco.orf` was rewritten to extract genes from input sequences in parallel. |
310 | 330 |
311 ## [v0.4.4] - 2020-09-30 | 331 ## [v0.4.4] - 2020-09-30 |
312 [v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4 | 332 [v0.4.4]: https://github.com/zellerlab/GECCO/compare/v0.4.3...v0.4.4 |
313 ### Added | 333 ### Added |
314 - `gecco cv loto` command to run LOTO cross-validation using BGC types | 334 - `gecco cv loto` command to run LOTO cross-validation using BGC types |
315 for stratification. | 335 for stratification. |
316 - `header` keyword argument to `FeatureTable.dump` and `ClusterTable.dump` | 336 - `header` keyword argument to `FeatureTable.dump` and `ClusterTable.dump` |
317 to write the table without the column header allowing to append to an | 337 to write the table without the column header allowing to append to an |
323 the tables for every fold in memory. | 343 the tables for every fold in memory. |
324 ### Changed | 344 ### Changed |
325 - Bumped `pandas` training dependency to `v1.0`. | 345 - Bumped `pandas` training dependency to `v1.0`. |
326 | 346 |
327 ## [v0.4.3] - 2020-09-07 | 347 ## [v0.4.3] - 2020-09-07 |
328 [v0.4.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.2...v0.4.3 | 348 [v0.4.3]: https://github.com/zellerlab/GECCO/compare/v0.4.2...v0.4.3 |
329 ### Fixed | 349 ### Fixed |
330 - GenBank files being written with invalid `/cds` feature type. | 350 - GenBank files being written with invalid `/cds` feature type. |
331 ### Changed | 351 ### Changed |
332 - Blocked installation of Biopython `v1.78` or newer as it removes `Bio.Alphabet` | 352 - Blocked installation of Biopython `v1.78` or newer as it removes `Bio.Alphabet` |
333 and breaks the current code. | 353 and breaks the current code. |
334 | 354 |
335 ## [v0.4.2] - 2020-08-07 | 355 ## [v0.4.2] - 2020-08-07 |
336 [v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2 | 356 [v0.4.2]: https://github.com/zellerlab/GECCO/compare/v0.4.1...v0.4.2 |
337 ### Fixed | 357 ### Fixed |
338 - `TypeClassifier.predict_types` using inverse type probabilities when | 358 - `TypeClassifier.predict_types` using inverse type probabilities when |
339 given several clusters to process. | 359 given several clusters to process. |
340 | 360 |
341 ## [v0.4.1] - 2020-08-07 | 361 ## [v0.4.1] - 2020-08-07 |
342 [v0.4.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.0...v0.4.1 | 362 [v0.4.1]: https://github.com/zellerlab/GECCO/compare/v0.4.0...v0.4.1 |
343 ### Fixed | 363 ### Fixed |
344 - `gecco run` command crashing on input sequences not containing any genes. | 364 - `gecco run` command crashing on input sequences not containing any genes. |
345 | 365 |
346 ## [v0.4.0] - 2020-08-06 | 366 ## [v0.4.0] - 2020-08-06 |
347 [v0.4.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.3.0...v0.4.0 | 367 [v0.4.0]: https://github.com/zellerlab/GECCO/compare/v0.3.0...v0.4.0 |
348 ### Added | 368 ### Added |
349 - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC. | 369 - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC. |
350 ### Removed | 370 ### Removed |
351 - `pandas` interaction from internal data model. | 371 - `pandas` interaction from internal data model. |
352 - `ClusterCRF` code specific to cross-validation. | 372 - `ClusterCRF` code specific to cross-validation. |
354 - `pandas`, `fisher` and `statsmodels` dependencies are now optional. | 374 - `pandas`, `fisher` and `statsmodels` dependencies are now optional. |
355 - `gecco train` command expects a cluster table in addition to the feature | 375 - `gecco train` command expects a cluster table in addition to the feature |
356 table to know the types of the input BGCs. | 376 table to know the types of the input BGCs. |
357 | 377 |
358 ## [v0.3.0] - 2020-08-03 | 378 ## [v0.3.0] - 2020-08-03 |
359 [v0.3.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.2...v0.3.0 | 379 [v0.3.0]: https://github.com/zellerlab/GECCO/compare/v0.2.2...v0.3.0 |
360 ### Changed | 380 ### Changed |
361 - Replaced Nearest-Neighbours classifier with Random Forest to perform type | 381 - Replaced Nearest-Neighbours classifier with Random Forest to perform type |
362 prediction for candidate BGCs. | 382 prediction for candidate BGCs. |
363 - `gecco.knn` module was renamed to implementation-agnostic name `gecco.types`. | 383 - `gecco.knn` module was renamed to implementation-agnostic name `gecco.types`. |
364 ### Fixed | 384 ### Fixed |
365 - Extraction of domain composition taking a long time in `gecco train` command. | 385 - Extraction of domain composition taking a long time in `gecco train` command. |
366 ### Removed | 386 ### Removed |
367 - `--metric` argument to the `gecco run` CLI command. | 387 - `--metric` argument to the `gecco run` CLI command. |
368 | 388 |
369 ## [v0.2.2] - 2020-07-31 | 389 ## [v0.2.2] - 2020-07-31 |
370 [v0.2.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.1...v0.2.2 | 390 [v0.2.2]: https://github.com/zellerlab/GECCO/compare/v0.2.1...v0.2.2 |
371 ### Changed | 391 ### Changed |
372 - `Domain` and `Gene` can now carry qualifiers that are used when they | 392 - `Domain` and `Gene` can now carry qualifiers that are used when they |
373 are translated to a sequence feature. | 393 are translated to a sequence feature. |
374 ### Added | 394 ### Added |
375 - InterPro names, accessions, and HMMER e-value for each annotated domain | 395 - InterPro names, accessions, and HMMER e-value for each annotated domain |
376 in GenBank output files. | 396 in GenBank output files. |
377 | 397 |
378 ## [v0.2.1] - 2020-07-23 | 398 ## [v0.2.1] - 2020-07-23 |
379 [v0.2.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.0...v0.2.1 | 399 [v0.2.1]: https://github.com/zellerlab/GECCO/compare/v0.2.0...v0.2.1 |
380 ### Fixed | 400 ### Fixed |
381 - Various potential crashes in `ClusterRefiner` code. | 401 - Various potential crashes in `ClusterRefiner` code. |
382 ### Removed | 402 ### Removed |
383 - Uneeded feature dictionary filtering in `ClusterCRF` for models with | 403 - Uneeded feature dictionary filtering in `ClusterCRF` for models with |
384 Fisher Exact Test feature selection. | 404 Fisher Exact Test feature selection. |
385 | 405 |
386 ## [v0.2.0] - 2020-07-23 | 406 ## [v0.2.0] - 2020-07-23 |
387 [v0.2.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.1...v0.2.0 | 407 [v0.2.0]: https://github.com/zellerlab/GECCO/compare/v0.1.1...v0.2.0 |
388 ### Fixed | 408 ### Fixed |
389 - `pandas` warning about unsorted columns in `gecco run`. | 409 - `pandas` warning about unsorted columns in `gecco run`. |
390 ### Removed | 410 ### Removed |
391 - `Gene.probability` property, replaced by `Gene.maximum_probability` and | 411 - `Gene.probability` property, replaced by `Gene.maximum_probability` and |
392 `Gene.average_probability` properties to be explicit. | 412 `Gene.average_probability` properties to be explicit. |
395 selected with Fisher's Exact Test. | 415 selected with Fisher's Exact Test. |
396 - `ClusterRefiner` now removes genes on `Cluster` edges if they do not | 416 - `ClusterRefiner` now removes genes on `Cluster` edges if they do not |
397 contain any domain annotation. | 417 contain any domain annotation. |
398 | 418 |
399 ## [v0.1.1] - 2020-07-22 | 419 ## [v0.1.1] - 2020-07-22 |
400 [v0.1.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.0...v0.1.1 | 420 [v0.1.1]: https://github.com/zellerlab/GECCO/compare/v0.1.0...v0.1.1 |
401 ### Added | 421 ### Added |
402 - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`. | 422 - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`. |
403 ### Changed | 423 ### Changed |
404 - BGC probability is now stored at the `Domain` level instead of at the `Gene` | 424 - BGC probability is now stored at the `Domain` level instead of at the `Gene` |
405 level, independently of the feature extraction level used by the CRF. | 425 level, independently of the feature extraction level used by the CRF. |
408 - Added this changelog file to document changes in the code. | 428 - Added this changelog file to document changes in the code. |
409 - Added documentation to `gecco` submodules missing some. | 429 - Added documentation to `gecco` submodules missing some. |
410 - Included the `CHANGELOG.md` file to the generated docs. | 430 - Included the `CHANGELOG.md` file to the generated docs. |
411 | 431 |
412 ## [v0.1.0] - 2020-07-17 | 432 ## [v0.1.0] - 2020-07-17 |
413 [v0.1.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.0.1...v0.1.0 | 433 [v0.1.0]: https://github.com/zellerlab/GECCO/compare/v0.0.1...v0.1.0 |
414 Initial release. | 434 Initial release. |
415 | 435 |
416 ## [v0.0.1] - 2018-08-13 | 436 ## [v0.0.1] - 2018-08-13 |
417 [v0.0.1]: https://git.embl.de/grp-zeller/GECCO/compare/37afb97...v0.0.1 | 437 [v0.0.1]: https://github.com/zellerlab/GECCO/compare/37afb97...v0.0.1 |
418 Proof-of-concept. | 438 Proof-of-concept. |