Mercurial > repos > okorol > itsx
diff ITSx.xml @ 1:b433586432d7 draft
Uploaded
author | okorol |
---|---|
date | Tue, 24 Mar 2015 16:18:36 -0400 |
parents | f82c70f54bd7 |
children | 7c914d783d36 |
line wrap: on
line diff
--- a/ITSx.xml Tue Mar 24 12:02:48 2015 -0400 +++ b/ITSx.xml Tue Mar 24 16:18:36 2015 -0400 @@ -13,7 +13,7 @@ <inputs> <param name="input" type="data" format="fasta" label="Input Fasta"/> <param name="cpu" type="integer" value="1" label="cpu"/> - <param name="complement" type="boolean" checked="true" truevalue="--complement T" falsevalue="--complement F" label="Checks both DNA strands against the database"/> + <param name="complement" type="boolean" checked="true" truevalue="--complement T" falsevalue="--complement F" label="Check both DNA strands against the database"/> <param name="heuristics" type="boolean" checked="false" truevalue="--heuristics T" falsevalue="--heuristics F" label="Use HMMER's heuristic filtering"/> <param name="reset" type="boolean" checked="false" truevalue="--reset T" falsevalue="--reset F" label="Re-creates the HMM-database before ITSx is run"/> <param name="preserve" type="boolean" checked="false" truevalue="--preserve T" falsevalue="--preserve F" label=" Preserve sequence headers instead of printing out ITSx headers"/> @@ -38,134 +38,57 @@ <regex match="error" source="both" level="fatal"/> </stdio> - <test></test> + <tests> + <test> + <param name="input" value="test-data/testITSsequences.fasta"/> + <param name="cpu" value="1" /> + <param name="complement" value="--complement T"/> + <param name="reset" value="--reset F" /> + <param name="preserve" value="--preserve F" /> + + <output name="positions" file="test-data/expectedOutput.positions.txt" /> + <output name="fullfasta" file="test-data/expectedOutput.full.fasta" /> + <output name="summary" file="test-data/expectedOutput.summary.txt" /> + <output name="problematic" file="test-data/expectedOutput.problematic.txt" /> + </test> + </tests> + <help> -ITSx -- Identifies ITS sequences and extracts the ITS region +**What it does** + +Identifies ITS sequences and extracts the ITS regions + +ITSx is an open source software utility to extract the highly variable ITS1 and ITS2 subregions from ITS sequences, which is commonly used as a molecular barcode for e.g. fungi. As the inclusion of parts of the neighbouring, very conserved, ribosomal genes (SSU, 5S and LSU rRNA sequences) in the sequence identification process can lead to severely misleading results, ITSx identifies and extracts only the ITS regions themselves. + +------ + + +**Info** +Galaxy wrapper: + +Microbial Biodiversity Bioinformatics Group +Agriculture and Agri-Food Canada + +Contact: Oksana Korol, oksana.korol[at]agr.gc.ca + mbb[at]agr.gc.ca + + +ITSx tool: + +Version: 1.0.11 Source code available at: http://microbiology.se/software/itsx -Version: 1.0.6 -ITSx -- Identifies ITS sequences and extracts the ITS region Copyright (C) 2012-2013 Johan Bengtsson-Palme et al. Contact: Johan Bengtsson-Palme, johan[at]microbiology.se Programmer: Johan Bengtsson-Palme -Full installation instructions can be found in the User's Guide. -A quick installation guide follows below. -ITSx requires Perl and HMMER3. - -1) Perl is usually installed on Unix-like systems by default. If not, it can be retrieved from http://www.perl.org/ - -2) HMMER3 can be found at http://hmmer.janelia.org/software -Download it and follow the on site instructions for installation. - -3) Obtain the ITSx package from http://microbiology.se/software/itsx -Unpack the tarball and move into the newly created "ITSx" directory. - -4) Copy the ITSx file and the ITSx_db directory to your preferred bin directory. - -5) To test if ITSx was successfully installed type "ITSx --help" on the command-line. You should now see the ITSx help message. - -To run ITSx, you need a FASTA-formatted output file. You can e.g. use the test.fasta file supplied with the package. To check for ITS sequences in the test file, type "ITSx -i test.fasta -o test" on the command line. If you are on a multicore machine, you might want to use the "--cpu 2" option to speed up the processes by using two (or more) cores. - -New features in this version: -- Fixed a bug causing over-reporting of chimeras - - -If you encounter a bug or some other strange behaviour, please report it to: -johan[at]microbiology.se - -This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.You should have received a copy of the GNU General Public License along with this program, in a file called 'license.txt'. If not, see: http://www.gnu.org/licenses/. - ----- - -Usage: ITSx -i [input file] -o [output file] - -Options: - --i {file} : DNA FASTA input file to investigate - --o {file} : Base for the names of output file(s) - --p {directory} : A path to a directory of HMM-profile collections representing ITS conserved regions, default is in the same directory as ITSx itself - ---date {T or F} : Adds a date and time stamp to the output directory, off (F) by default - ---reset {T or F} : Re-creates the HMM-database before ITSx is run, off (F) by default - -Sequence selection options: - --t {character code} : Profile set to use for the search, see the User's Guide (comma-separated), default is all - --E {value} : Domain E-value cutoff for a sequence to be included in the output, default = 1e-5 - --S {value} : Domain score cutoff for a sequence to be included in the output, default = 0 - --N {value} : The minimal number of domains that must match a sequence before it is included, default = 2 - ---selection_priority {sum, domains, eval, score} : Selects what will be of highest priority when determining the origin of the sequence, default is sum - ---search_eval {value} : The E-value cutoff used in the HMMER search, high numbers may slow down the process, cannot be used with the --search_score option, default is 0.01 +**Citation** ---search_score {value} : The score cutoff used in the HMMER search, low numbers may slow down the process, cannot be used with the --search_eval option, default is to used E-value cutoff, not score - ---allow_single_domain {e-value,score or F} : Allow inclusion of sequences that only find a single domain, given that they meet the given E-value and score thresholds, on with parameters 1e-9,0 by default - ---allow_reorder {T or F} : Allows profiles to be in the wrong order on extracted sequences, off (F) by default - ---complement {T or F} : Checks both DNA strands against the database, creating reverse complements, on (T) by default - ---cpu {value} : the number of CPU threads to use, default is 1 - ---multi_thread {T or F} : Multi-thread the HMMER-search, on (T) if number of CPUs (--cpu option > 1), else off (F) by default - ---heuristics {T or F} : Selects whether to use HMMER's heuristic filtering, off (F) by default - -Output options: - ---summary {T or F} : Summary of results output, on (T) by default - ---graphical {T or F} : 'Graphical' output, on (T) by default - ---fasta {T or F} : FASTA-format output of extracted ITS sequences, on (T) by default - ---preserve {T or F} : Preserve sequence headers in input file instead of printing out ITSx headers, off (F) by default - ---save_regions {SSU,ITS1,5.8S,ITS2,LSU,all,none} : A comma separated list of regions to output separate FASTA files for, 'ITS1,ITS2' by default - ---anchor {integer or HMM} : Saves an additional number of bases before and after each extracted region. If set to 'HMM' all bases matching the corresponding HMM will be output, default = 0 +Bengtsson-Palme, Johan and Ryberg, Martin and Hartmann, Martin and Branco, Sara and Wang, Zheng and Godhe, Anna and De Wit, Pierre and Sánchez-García, Marisol and Ebersberger, Ingo and de Sousa, Filipe and Amend, Anthony and Jumpponen, Ari and Unterseher, Martin and Kristiansson, Erik and Abarenkov, Kessy and Bertrand, Yann J. K. and Sanli, Kemal and Eriksson, K. Martin and Vik, Unni and Veldre, Vilmar and Nilsson, R. Henrik. Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution, 4;10:914-919, 2013. ---partial {integer} : Saves additional FASTA-files for full and partial ITS sequences longer than the specified cutoff, default = 0 (off) - ---concat {T or F} : Saves a FASTA-file with concatenated ITS sequences (with 5.8S removed), off (F) by default - ---minlen {integer} : Minimum length the ITS regions must be to be outputted in the concatenated file (see above), default = 0 - ---positions {T or F} : Table format output containing the positions ITS sequences were found in, on (T) by default - ---table {T or F} : Table format output of sequences containing probable ITS sequences, off (F) by default - ---not_found {T or F} : Saves a list of non-found entries, on (T) by default - ---detailed_results {T or F} : Saves a tab-separated list of all results, off (F) by default - ---truncate {T or F} : Truncates the FASTA output to only contain the actual ITS sequences found, on (T) by default - ---silent {T or F} : Supresses printing progress info to stderr, off (F) by default - ---graph_scale {value} : Sets the scale of the graph output, if value is zero, a percentage view is shown, default = 0 - ---save_raw {T or F} : Saves all raw data for searches etc. instead of removing it on finish, off (F) by default - --h : displays this help message - ---help : displays this help message - ---bugs : displays the bug fixes and known bugs in this version of ITSx - ---license : displays licensing information </help>