comparison CHANGELOG.md @ 3:359232b58f6a draft

"Update Galaxy tool wrapper to follow the IUC best practices"
author althonos
date Sun, 21 Nov 2021 19:47:22 +0000
parents
children 169849dfb098
comparison
equal deleted inserted replaced
2:e618ab1c78d9 3:359232b58f6a
1 # Changelog
2 All notable changes to this project will be documented in this file.
3
4 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
5 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
6
7 ## [Unreleased]
8 [Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.5...master
9
10 ## [v0.8.5] - 2021-11-21
11 [v0.8.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.4...v0.8.5
12 ### Added
13 - Minimal compatibility support for running GECCO inside of Galaxy workflows.
14
15 ## [v0.8.4] - 2021-09-26
16 [v0.8.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3-post1...v0.8.4
17 ### Fixed
18 - `gecco convert gbk --format bigslice` failing to run because of outdated code ([#5](https://github.com/zellerlab/GECCO/issues/5)).
19 - `gecco convert gbk --format bigslice` not creating files with names conforming to BiG-SLiCE expected input.
20 ### Changed
21 - Bump minimum `pyrodigal` version to `v0.6.2` to use platform-accelerated code if supported.
22
23 ## [v0.8.3-post1] - 2021-08-23
24 [v0.8.3-post1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3...v0.8.3-post1
25 ### Fixed
26 - Wrong default value for `--threshold` being shown in `gecco run` help message.
27
28 ## [v0.8.3] - 2021-08-23
29 [v0.8.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.2...v0.8.3
30 ### Changed
31 - Default probability threshold for segmentation to 0.3 (from 0.4).
32
33 ## [v0.9.0] - 2021-08-10 - **YANKED**
34 [v0.9.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.2...v0.9.0
35 ### Changed
36 - Retrain internal model using `--select=0.35` instead of `--select=0.25` like before.
37 - Change default *p-value* filter from 1e-9 to 1e-5 to detect more features.
38
39 ## [v0.8.2] - 2021-07-31
40 [v0.8.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.1...v0.8.2
41 ### Fixed
42 - `gecco run` crashing on Python 3.6 because of missing `contextlib.nullcontext` class.
43 ### Changed
44 - `gecco run` and `gecco annotate` will not try to count the number of profiles when given an external HMM file with the `--hmm` flag.
45 - `PyHMMER.run` now reports the *p-value* of each domain in addition to the *e-value* as a `/note` qualifier.
46
47 ## [v0.8.1] - 2021-07-29
48 [v0.8.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.0...v0.8.1
49 ### Changed
50 - `gecco run` now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom `--model`.
51 ### Fixed
52 - `gecco` reporting about using Pfam `v33.1` while actually using `v34.0` because of an outdated field in `gecco/hmmer/Pfam.ini`.
53 ### Added
54 - Missing documentation for the `strand` attribute of `gecco.model.Gene`.
55
56 ## [v0.8.0] - 2021-07-03
57 [v0.8.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.7.0...v0.8.0
58 ### Changed
59 - Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0.
60 - Bump minimum `pyhmmer` version to `v0.4.0` to improve exception handling.
61 - Bump minimum `pyrodigal` version to `v0.5.0` to fix sequence decoding on some platforms.
62 - Use p-values instead of e-values to filter domains obtained with HMMER.
63 - `gecco cv` and `gecco train` now seed the RNG with a user-defined seed before shuffling rows of training data.
64 ### Fixed
65 - Extraction of BGC compositions for the type predictor while training.
66 - `ClusterCRF.trained` failing to open an external model.
67 ### Added
68 - `Domain.pvalue` attribute to access the p-value of a domain annotation.
69 - Mandatory `pvalue` column to `FeatureTable` objects.
70 - Support for loading several feature tables in `gecco train` and `gecco cv`.
71 - Warnings to `ClusterCRF.fit` when selecting uninformative features.
72 - `--correction` flag to `gecco train` and `gecco cv`, allowing to give a multiple testing correction method when computing p-values with the Fisher Exact Tests.
73 ### Removed
74 - Outdated `gecco embed` command.
75 - Unused `--truncate` flag from the `gecco train` CLI.
76 - Tigrfam domains, which is not improving performance on the new training data.
77
78 ## [v0.7.0] - 2021-05-31
79 [v0.7.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.3...v0.7.0
80 ### Added
81 - Support for writing an AntiSMASH sideload JSON file after a `gecco run` workflow.
82 - Code for converting GenBank files in BiG-SLiCE compatible format with the `gecco convert` subcommand.
83 - Documentation about using GECCO in combination with AntiSMASH or BiG-SLiCE.
84 ### Changed
85 - Minimum Biopython version to `v1.73` for compatibility with older bioinformatics tooling.
86 - Internal domain composition shipped in the `gecco.types` with newer composition array obtained directly from MIBiG files.
87 ### Removed
88 - Outdated notice about `-vvv` verbosity level in the help message of the main `gecco` command.
89
90 ## [v0.6.3] - 2021-05-10
91 [v0.6.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.2...v0.6.3
92 ### Fixed
93 - HMMER annotation not properly handling inputs with multiple contigs.
94 - Some progress bar totals displaying as floats in the CLI.
95 ### Changed
96 - `PyHMMER` now sets the `Z` and `domZ` values from the number of proteins given to the search pipeline.
97 - `gecco.cli` delegates imports to make CLI more responsive.
98 - `pkg_resources` has been replaced with `importlib.resources` and `importlib.metadata` where applicable.
99 - `multiprocessing.cpu_count` has been replaced with `os.cpu_count` where applicable.
100
101 ## [v0.6.2] - 2021-05-04
102 [v0.6.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.1...v0.6.2
103 ### Fixed
104 - `gecco cv loto` crashing because of outdated code.
105 ### Changed
106 - Logging-style prompt will only display if GECCO is running with `-vv` flag.
107 ### Added
108 - GECCO bioRxiv paper reference to `Cluster.to_seq_record` output record.
109
110 ## [v0.6.1] - 2021-03-15
111 [v0.6.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.0...v0.6.1
112 ### Fixed
113 - Progress bar not being disabled by `-q` flag in CLI.
114 - Fallback to using HMM name if accession is not available in `PyHMMER`.
115 - Group genes by source contig and process them separately in `PyHMMER` to avoid bogus E-values.
116 ### Added
117 - `psutil` dependency to get the number of physical CPU cores on the host machine.
118 - Support for using an arbitrary mapping of positives to negatives in `gecco embed`.
119 ### Removed
120 - Unused and outdated `HMMER` and `DomainRow` classes from `gecco.hmmer`.
121
122 ## [v0.6.0] - 2021-02-28
123 [v0.6.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.5...v0.6.0
124 ### Changed
125 - Updated internal model with a cleaned-up version of the MIBiG-2.0
126 Pfam-33.1/Tigrfam-15.0 embedding.
127 - Updated internal InterPro catalog.
128 ### Fixed
129 - Features not being grouped together in `gecco cv` and `gecco train`
130 when provided with a feature table where rows were not sorted by
131 protein IDs.
132
133 ## [v0.5.5] - 2021-02-28
134 [v0.5.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.4...v0.5.5
135 ### Fixed
136 - `gecco cv` bug causing only the last fold to be written.
137
138 ## [v0.5.4] - 2021-02-28
139 [v0.5.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.3...v0.5.4
140 ### Changed
141 - Replaced `verboselogs`, `coloredlogs` and `better-exceptions` with `rich`.
142 ### Removed
143 - `tqdm` training dependency.
144 ### Added
145 - `gecco annotate` command to produce a feature table from a genomic file.
146 - `gecco embed` to embed BGCs into non-BGC regions using feature tables.
147
148 ## [v0.5.3] - 2021-02-21
149 [v0.5.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.2...v0.5.3
150 ### Fixed
151 - Coordinates of genes in output GenBank files.
152 - Potential issue with the number of CPUs in `PyHMMER.run`.
153 ### Changed
154 - Bump required `pyrodigal` version to `v0.4.2` to fix buffer overflow.
155
156 ## [v0.5.2] - 2021-01-29
157 [v0.5.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.1...v0.5.2
158 ### Added
159 - Support for downloading HMM files directly from GitHub releases assets.
160 - Validation of filtered HMMs with MD5 checksum.
161 ### Fixed
162 - Invalid coordinates of protein domains in GenBank output files.
163 - `gecco.interpro` module not being added to wheel distribution.
164 ### Changed
165 - Bump required `pyhmmer` version to `v0.2.1`.
166
167 ## [v0.5.1] - 2021-01-15
168 [v0.5.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.0...v0.5.1
169 ### Fixed
170 - `--hmm` flag being ignored in in `gecco run` command.
171 - `PyHMMER` using HMM names instead of accessions, causing issues with Pfam HMMs.
172
173 ## [v0.5.0] - 2021-01-11
174 [v0.5.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.5...v0.5.0
175 ### Added
176 - Explicit support for Python 3.9.
177 ### Changed
178 - [`pyhmmer`](https://pypi.org/project/pyhmmer) is used to annotate protein sequences instead of HMMER3 binary `hmmsearch`.
179 - HMM files are stored in binary format to speedup parsing and reduce storage size.
180 - `tqdm` is now a *training*-only dependency.
181 - `gecco cv` now requires *training* dependencies.
182
183 ## [v0.4.5] - 2020-11-23
184 [v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5
185 ### Added
186 - Additional `fold` column to cross-validation table output.
187 ### Changed
188 - Use sequence ID instead of protein ID to extract type from cluster in `gecco cv`.
189 - Install HMM data in pre-pressed format to make `hmmsearch` runs faster on short sequences.
190 - `gecco.orf` was rewritten to extract genes from input sequences in parallel.
191
192 ## [v0.4.4] - 2020-09-30
193 [v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4
194 ### Added
195 - `gecco cv loto` command to run LOTO cross-validation using BGC types
196 for stratification.
197 - `header` keyword argument to `FeatureTable.dump` and `ClusterTable.dump`
198 to write the table without the column header allowing to append to an
199 existing table.
200 - `__getitem__` implementation for `FeatureTable` and `ClusterTable`
201 that returns a single row or a sub-table from a table.
202 ### Fixed
203 - `gecco cv` command now writes results iteratively instead of holding
204 the tables for every fold in memory.
205 ### Changed
206 - Bumped `pandas` training dependency to `v1.0`.
207
208 ## [v0.4.3] - 2020-09-07
209 [v0.4.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.2...v0.4.3
210 ### Fixed
211 - GenBank files being written with invalid `/cds` feature type.
212 ### Changed
213 - Blocked installation of Biopython `v1.78` or newer as it removes `Bio.Alphabet`
214 and breaks the current code.
215
216 ## [v0.4.2] - 2020-08-07
217 [v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2
218 ### Fixed
219 - `TypeClassifier.predict_types` using inverse type probabilities when
220 given several clusters to process.
221
222 ## [v0.4.1] - 2020-08-07
223 [v0.4.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.0...v0.4.1
224 ### Fixed
225 - `gecco run` command crashing on input sequences not containing any genes.
226
227 ## [v0.4.0] - 2020-08-06
228 [v0.4.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.3.0...v0.4.0
229 ### Added
230 - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC.
231 ### Removed
232 - `pandas` interaction from internal data model.
233 - `ClusterCRF` code specific to cross-validation.
234 ### Changed
235 - `pandas`, `fisher` and `statsmodels` dependencies are now optional.
236 - `gecco train` command expects a cluster table in addition to the feature
237 table to know the types of the input BGCs.
238
239 ## [v0.3.0] - 2020-08-03
240 [v0.3.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.2...v0.3.0
241 ### Changed
242 - Replaced Nearest-Neighbours classifier with Random Forest to perform type
243 prediction for candidate BGCs.
244 - `gecco.knn` module was renamed to implementation-agnostic name `gecco.types`.
245 ### Fixed
246 - Extraction of domain composition taking a long time in `gecco train` command.
247 ### Removed
248 - `--metric` argument to the `gecco run` CLI command.
249
250 ## [v0.2.2] - 2020-07-31
251 [v0.2.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.1...v0.2.2
252 ### Changed
253 - `Domain` and `Gene` can now carry qualifiers that are used when they
254 are translated to a sequence feature.
255 ### Added
256 - InterPro names, accessions, and HMMER e-value for each annotated domain
257 in GenBank output files.
258
259 ## [v0.2.1] - 2020-07-23
260 [v0.2.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.0...v0.2.1
261 ### Fixed
262 - Various potential crashes in `ClusterRefiner` code.
263 ### Removed
264 - Uneeded feature dictionary filtering in `ClusterCRF` for models with
265 Fisher Exact Test feature selection.
266
267 ## [v0.2.0] - 2020-07-23
268 [v0.2.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.1...v0.2.0
269 ### Fixed
270 - `pandas` warning about unsorted columns in `gecco run`.
271 ### Removed
272 - `Gene.probability` property, replaced by `Gene.maximum_probability` and
273 `Gene.average_probability` properties to be explicit.
274 ### Changed
275 - Internal model now uses `Pfam` and `Tigrfam` with the top 35% features
276 selected with Fisher's Exact Test.
277 - `ClusterRefiner` now removes genes on `Cluster` edges if they do not
278 contain any domain annotation.
279
280 ## [v0.1.1] - 2020-07-22
281 [v0.1.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.0...v0.1.1
282 ### Added
283 - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`.
284 ### Changed
285 - BGC probability is now stored at the `Domain` level instead of at the `Gene`
286 level, independently of the feature extraction level used by the CRF.
287 - `ClusterKNN` will use the model path provided to `gecco run` if any.
288 ### Docs
289 - Added this changelog file to document changes in the code.
290 - Added documentation to `gecco` submodules missing some.
291 - Included the `CHANGELOG.md` file to the generated docs.
292
293 ## [v0.1.0] - 2020-07-17
294 [v0.1.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.0.1...v0.1.0
295 Initial release.
296
297 ## [v0.0.1] - 2018-08-13
298 [v0.0.1]: https://git.embl.de/grp-zeller/GECCO/compare/37afb97...v0.0.1
299 Proof-of-concept.