comparison clipkit_repo/docs/README.md @ 0:49b058e85902 draft

"planemo upload for repository https://github.com/jlsteenwyk/clipkit commit cbe1e8577ecb1a46709034a40dff36052e876e7a-dirty"
author padge
date Fri, 25 Mar 2022 13:04:31 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:49b058e85902
1 <p align="center">
2 <a href="https://github.com/jlsteenwyk/clipkit">
3 <img src="https://raw.githubusercontent.com/JLSteenwyk/ClipKIT/master/docs/_static/img/logo.jpg" alt="Logo" width="400">
4 </a>
5 <p align="center">
6 <a href="https://jlsteenwyk.com/ClipKIT/">Docs</a>
7 ·
8 <a href="https://github.com/jlsteenwyk/clipkit/issues">Report Bug</a>
9 ·
10 <a href="https://github.com/jlsteenwyk/clipkit/issues">Request Feature</a>
11 </p>
12 <p align="center">
13 <a href="https://lbesson.mit-license.org/" alt="License">
14 <img src="https://img.shields.io/badge/License-MIT-blue.svg">
15 </a>
16 <a href="https://pypi.org/project/clipkit/" alt="PyPI - Python Version">
17 <img src="https://img.shields.io/pypi/pyversions/clipkit">
18 </a>
19 <a href="https://github.com/JLSteenwyk/ClipKIT/actions" alt="Build">
20 <img src="https://img.shields.io/github/workflow/status/jlsteenwyk/clipkit/CI/master">
21 </a>
22 <a href="https://codecov.io/gh/jlsteenwyk/clipkit" alt="Coverage">
23 <img src="https://codecov.io/gh/jlsteenwyk/clipkit/branch/master/graph/badge.svg?token=0J49I6441V">
24 </a>
25 <a href="https://github.com/jlsteenwyk/clipkit/graphs/contributors" alt="Contributors">
26 <img src="https://img.shields.io/github/contributors/jlsteenwyk/clipkit">
27 </a>
28 <a href="https://twitter.com/intent/follow?screen_name=jlsteenwyk" alt="Author Twitter">
29 <img src="https://img.shields.io/twitter/follow/jlsteenwyk?style=social&logo=twitter"
30 alt="follow on Twitter">
31 </a>
32 </p>
33 </p>
34
35 ClipKIT is a fast and flexible alignment trimming tool that keeps phylogenetically informative sites and removes others.<br /><br />
36 If you found clipkit useful, please cite *ClipKIT: a multiple sequence alignment trimming software for accurate phylogenomic inference*. bioRxiv. doi: [10.1101/2020.06.08.140384](https://www.biorxiv.org/content/10.1101/2020.06.08.140384v1).
37 <br /><br />
38
39
40 ---
41
42
43 ## Guide
44 [Quick Start](#quick-start)<br />
45 [Advanced Usage](#advanced-usage)<br />
46 [Performance Assessment](#performance-assessment)<br />
47 [FAQ](#faq)
48
49
50 ---
51
52 ## Quick Start
53 ### 1) Installation
54 To install, use the following commands:
55 ```shell
56 pip install clipkit
57 ```
58 <br />
59
60 To install from source, use the following commands:
61 ```shell
62 git clone https://github.com/JLSteenwyk/ClipKIT.git
63 cd ClipKIT/
64 make install
65 ```
66 <br />
67
68 If you run into permission errors when executing *make install*, create a virtual environment for your installation:
69 ```shell
70 git clone https://github.com/JLSteenwyk/ClipKIT.git
71 cd ClipKIT/
72 python -m venv .venv
73 source .venv/bin/activate
74 make install
75 ```
76 Note, the virtual environment must be activated to use clipkit.
77 <br />
78
79 ### 2) Usage
80 To use ClipKIT in its simpliest form, execute the following command:
81 ```
82 clipkit <input>
83 ```
84 Output file with the suffix ".clipkit"
85
86 <br />
87
88 ---
89
90 ### Advanced Usage
91 This section describes the various features and options of ClipKIT.<br />
92 \- [Modes](#modes)<br />
93 \- [Output](#output)<br />
94 \- [Log](#log)<br />
95 \- [Complementary](#complementary)<br />
96 \- [All options](#all-options)
97
98 <br />
99
100 ### Modes
101 ClipKIT can be run with five different modes (gappy, kpic, kpic-gappy, kpi, and kpi-gappy), which are specified with the -m/--mode argument.<br />
102 *Default: 'gappy'*<br />
103 * gappy: trim all sites that are above a threshold of gappyness (default: 0.9)<br />
104 * kpic (alias: medium): keep only parismony informative and constant sites<br />
105 * kpic-gappy (alias: medium-gappy): a combination of kpic- and gappy-based trimming<br />
106 * kpi (alias: heavy): keep only parsimony informative sites<br />
107 * kpi-gappy (alias: heavy-gappy): a combination of kpi- and gappy-based trimming<br />
108 ```
109 # gappy-based trimming
110 clipkit <input>
111 clipkit <input> -m gappy
112
113 # kpic-based trimming
114 clipkit <input> -m kpic
115 clipkit <input> -m medium
116
117 # kpic- and gappy-based trimming
118 clipkit <input> -m kpic-gappy
119 clipkit <input> -m medium-gappy
120
121 # kpi-based trimming
122 clipkit <input> -m kpi
123 clipkit <input> -m heavy
124
125 # kpi- and gappy-based trimming
126 clipkit <input> -m kpi-gappy
127 clipkit <input> -m heavy-gappy
128 ```
129
130 <br />
131
132 ### Output
133
134 By default, output files will have the same name as the input file with the suffix ".clipkit"
135 appended to the name. Users can specify output file names with the -o option.
136
137 ```
138 # specify output
139 clipkit <input> -o <output>
140 ```
141
142 <br />
143
144 ### Log
145 It can be very useful to have information about the each position in an alignment. For example, this information could be used in alignment diagnostics, fine-tuning of trimming parameters, etc. To create the log file, use the -l/--log option. Using this option will create a four column file with the suffix '.clipkit.log'. *Default: off*
146 * col1: position in the alignment (starting at 1)
147 * col2: reports if site was trimmed or kept (trim or keep, respectively)
148 * col3: reports if the site is constant or not (Const or nConst), parsimony informative or not (PI or nPI), or neither (nConst, nPI)
149 * col4: reports the gappyness of the position (number of gaps / entries in alignment)
150 <br />
151
152 ```
153 clipkit <input> -l
154 ```
155 Output file with the suffix ".clipkit.log"
156
157 <br />
158
159 ### Complementary
160 Having an alignment of the sequences that were trimmed can be useful for other analyses. To obtain an alignment of the sequences that were trimmed, use the -c/--complementary option.
161 *Default: off*<br />
162
163 ```
164 clipkit <input> -c
165 ```
166 Output file with the suffix ".clipkit.complementary"
167
168 <br />
169
170 ### All options
171 | Option | Usage and meaning |
172 | ------------- | ------------------ |
173 | -h/--help | Print help message |
174 | -v/--version | Print software version |
175 | -o/--output | Specify output file name |
176 | -m/--modes | Specify trimming mode. *Default: gappy* |
177 | -g/--gaps | Specify gappyness threshold (between 0 and 1). *Default: 0.9* |
178 | -if/--input_file_format | Specify input file format*. *Default: auto-detect* |
179 | -of/--input_file_format | Specify output file format*. *Default: input file type* |
180 | -l/--log | Create a log file. *Default: off* |
181 | -c/--complementary | Create a complementary alignment file. *Default: off* |
182
183 *Acceptable file formats include: [fasta](https://en.wikipedia.org/wiki/FASTA_format), [clustal](http://meme-suite.org/doc/clustalw-format.html), [maf](http://www.bx.psu.edu/~dcking/man/maf.xhtml), [mauve](http://darlinglab.org/mauve/user-guide/files.html), [phylip](http://scikit-bio.org/docs/0.2.3/generated/skbio.io.phylip.html), [phylip-sequential](http://rosalind.info/glossary/phylip-format/), [phylip-relaxed](https://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/FormatExplain.html), [stockholm](https://en.wikipedia.org/wiki/Stockholm_format)
184 <br />
185 <br />
186
187 ---
188
189 ## Performance Assessment
190 In brief, performance assessment and comparison of multiple trimming alignment software revealed that ClipKIT with nearly any mode is a top-performing software. Here, we provide greater detail into the empirical datasets used to assess alignment trimming performance.
191 <p align="center">
192 <a href="https://www.biorxiv.org/content/10.1101/2020.06.08.140384v1">
193 <img src="https://raw.githubusercontent.com/JLSteenwyk/ClipKIT/master/docs/_static/img/Performance_summary.jpg" alt="Performance Summary" width="1000">
194 </a>
195 </p>
196
197 **ClipKIT is a top-performing software for trimming multiple sequence alignments.** Across a total of 138,152 multiple sequence alignments (MSAs) from empirical (left) and simulated (right) datasets, desirability-based integration of accuracy and support metrics per MSA facilitated the comparison of relative software performance and revealed ClipKIT is a top-performing software. MSA trimming approaches are ordered along the x-axis from the highest-performing software to the lowest-performing software according to average desirability-based rank. Abbreviations of trimmers and parameters are as follows: ClipKIT: g = gappy mode; ClipKIT: kc = kpic; ClipKIT: kcg = kpic-gappy; ClipKIT: k = kpi mode; ClipKIT: kg = kpi-gappy mode; BMGE = BMGE default; BMGE 0.3 = 0.3 entropy threshold; BMGE 0.7 = 0.7 entropy threshold; trimAl: s = strict; trimAl: sp = strictplus; Noisy = default; Gblocks = default; No trim = no trimming.
198
199 For additional performance details, please see the manuscript *ClipKIT: a multiple sequence alignment trimming software for accurate phylogenomic inference*. bioRxiv. doi: [10.1101/2020.06.08.140384](https://www.biorxiv.org/content/10.1101/2020.06.08.140384v1).
200
201 <br /><br /><br />
202
203 ---
204
205 ## FAQ
206
207 <strong>If tree inference with no trim works well, why even trim?</strong>
208
209 Tree inference with trimmed multiple sequence alignments is computationally efficient. In other words, shorter alignments require less computational time and memory during tree search. We found that ClipKIT reduced computation time by an average of 20%. As datasets continuously become bigger, an alignment trimming algorithm that can reduce computational time will be of great value.
210
211 <br />
212
213 <strong>Does ClipKIT trim amino acids, nucleotides, or codons?</strong>
214
215 ClipKIT's trims amino acid and nucleotide alignments. Currently, ClipKIT does not trim codons.
216
217 <br />
218
219 <strong>Is there a website version of ClipKIT?</strong>
220
221 Currently, there is not website version of ClipKIT.
222
223 <br />
224
225 ---
226
227 ## Developers
228 * [Jacob Steenwyk](https://jlsteenwyk.github.io/)<br />
229 * [Thomas Buida](https://tjbiii.com)<br />
230 <br />
231
232 ## All Team Members
233 * [Jacob Steenwyk](https://jlsteenwyk.github.io/)<br />
234 * [Thomas Buida](https://tjbiii.com)<br />
235 * [Yuanning Li](https://scholar.google.com/citations?user=65ygCIsAAAAJ&hl=en&oi=ao)
236 * [Xing-Xing Shen](https://xingxingshen.github.io/)
237 * [Antonis Rokas](https://as.vanderbilt.edu/rokaslab/)
238 <br />