changeset 0:0968856c687c draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tool_collections/kraken2/kraken2/ commit 6ad48e582972ec27cdc0d401f877dfe172057231
author iuc
date Thu, 14 Mar 2019 05:16:48 -0400
parents
children d4bb87ca916d
files README.rst kraken2.xml macros.xml test-data/kraken2_databases.loc test-data/kraken_test1.fa test-data/kraken_test1_output.tab test-data/test_db/hash.k2d test-data/test_db/opts.k2d test-data/test_db/taxo.k2d tool-data/kraken2_databases.loc.sample tool_data_table_conf.xml.sample tool_data_table_conf.xml.test
diffstat 12 files changed, 311 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.rst	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,42 @@
+Introduction
+============
+Kraken is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the $k$-mers within a query sequence and uses the information within those $k$-mers to query a database. That database maps $k$-mers to the lowest common ancestor (LCA) of all genomes known to contain a given $k$-mer.
+
+The first version of Kraken used a large indexed and sorted list of $k$-mer/LCA pairs as its database. While fast, the large memory requirements posed some problems for users, and so Kraken 2 was created to provide a solution to those problems.
+
+Kraken 2 differs from Kraken 1 in several important ways:
+
+1. Only minimizers of the $k$-mers in the query sequences are used as database queries. Similarly, only minimizers of the $k$-mers in the reference sequences in the database's genomic library are stored in the database. We will also refer to the minimizers as $\ell$-mers, where $\ell \leq k$. All $k$-mers are considered to have the same LCA as their minimizer's database LCA value.
+2. Kraken 2 uses a compact hash table that is a probabilistic data structure. This means that occasionally, database queries will fail by either returning the wrong LCA, or by not resulting in a search failure when a queried minimizer was never actually stored in the database. By incurring the risk of these false positives in the data structure, Kraken 2 is able to achieve faster speeds and lower memory requirements. Users should be aware that database false positive errors occur in less than 1% of queries, and can be compensated for by use of confidence scoring thresholds.
+3. Kraken 2 has the ability to build a database from amino acid sequences and perform a translated search of the query sequences against that database.
+4. Kraken 2 utilizes spaced seeds in the storage and querying of minimizers to improve classification accuracy.
+5. Kraken 2 provides support for "special" databases that are not based on NCBI's taxonomy. These are currently limited to three popular 16S databases.
+
+Because Kraken 2 only stores minimizers in its hash table, and $k$ can be much larger than $\ell$, only a small percentage of the possible $\ell$-mers in a genomic library are actually deposited in the database. This creates a situation similar to the Kraken 1 "MiniKraken" databases; however, preliminary testing has shown the accuracy of a reduced Kraken 2 database to be quite similar to the full-sized Kraken 2 database, while Kraken 1's MiniKraken databases often resulted in a substantial loss of per-read sensitivity.
+
+The Kraken 2 paper is currently under preparation. Until it is released, please cite the original Kraken paper if you use Kraken 2 in your research. Thank you!
+Page: https://ccb.jhu.edu/software/kraken2/
+
+System Requirements
+===================
+- Disk space: Construction of a Kraken 2 standard database requires approximately 100 GB of disk space. A test on 01 Jan 2018 of the default installation showed 42 GB of disk space was used to store the genomic library files, 26 GB was used to store the taxonomy information from NCBI, and 29 GB was used to store the Kraken 2 compact hash table.
+
+- Like in Kraken 1, we strongly suggest against using NFS storage to store the Kraken 2 database if at all possible.
+
+- Memory: To run efficiently, Kraken 2 requires enough free memory to hold the database (primarily the hash table) in RAM. While this can be accomplished with a ramdisk, Kraken 2 will by default load the database into process-local RAM; the --memory-mapping switch to kraken2 will avoid doing so. The default database size is 29 GB (as of Jan. 2018), and you will need slightly more than that in RAM if you want to build the default database.
+
+- Dependencies: Kraken 2 currently makes extensive use of Linux utilities such as sed, find, and wget. Many scripts are written using the Bash shell, and the main scripts are written using Perl. Core programs needed to build the database and run the classifier are written in C++11, and need to be compiled using a somewhat recent version of g++ that will support C++11. Multithreading is handled using OpenMP. Downloads of NCBI data are performed by wget and rsync. Most Linux systems will have all of the above listed programs and development libraries available either by default or via package download.
+
+- Unlike Kraken 1, Kraken 2 does not use an external $k$-mer counter. However, by default, Kraken 2 will attempt to use the dustmasker or segmasker programs provided as part of NCBI's BLAST suite to mask low-complexity regions (see [Masking of Low-complexity Sequences]).
+
+- MacOS NOTE: MacOS and other non-Linux operating systems are not explicitly supported by the developers, and MacOS users should refer to the Kraken-users group for support in installing the appropriate utilities to allow for full operation of Kraken 2. We will attempt to use MacOS-compliant code when possible, but development and testing time is at a premium and we cannot guarantee that Kraken 2 will install and work to its full potential on a default installation of MacOS.
+
+- In particular, we note that the default MacOS X installation of GCC does not have support for OpenMP. Without OpenMP, Kraken 2 is limited to single-threaded operation, resulting in slower build and classification runtimes.
+
+- Network connectivity: Kraken 2's standard database build and download commands expect unfettered FTP and rsync access to the NCBI FTP server. If you're working behind a proxy, you may need to set certain environment variables (such as ftp_proxy or RSYNC_PROXY) in order to get these commands to work properly.
+
+- Kraken 2's scripts default to using rsync for most downloads; however, you may find that your network situation prevents use of rsync. In such cases, you can try the --use-ftp option to kraken2-build to force the downloads to occur via FTP.
+
+- MiniKraken: At present, users with low-memory computing environments can replicate the "MiniKraken" functionality of Kraken 1 in two ways: first, by increasing the value of $k$ with respect to $\ell$ (using the --kmer-len and --minimizer-len options to kraken2-build); and secondly, through downsampling of minimizers (from both the database and query sequences) using a hash function. This second option is performed if the --max-db-size option to kraken2-build is used; however, the two options are not mutually exclusive. In a difference from Kraken 1, Kraken 2 does not require building a full database and then shrinking it to obtain a reduced database.
+
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/kraken2.xml	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,147 @@
+<?xml version="1.0"?>
+<tool id="kraken2" name="Kraken2" version="@TOOL_VERSION@+galaxy0">
+    <description>
+        assign taxonomic labels to sequencing reads
+    </description>
+    <macros>
+        <import>macros.xml</import>
+    </macros>
+    <requirements>
+        <requirement type="package" version="@TOOL_VERSION@">kraken2</requirement>
+    </requirements>
+    <version_command>kraken2 --version</version_command>
+    <command detect_errors="exit_code">
+        <![CDATA[
+        kraken2
+            --threads \${GALAXY_SLOTS:-1}
+            --db '${kraken2_database.fields.path}'
+
+            #if $quick:
+                --quick
+            #end if
+
+            #if $single_paired.single_paired_selector == 'yes'
+                --paired
+                '${single_paired.forward_input}' '${single_paired.reverse_input}'
+            #elif $single_paired.single_paired_selector == "collection":
+                '${single_paired.input_pair.forward}' '${single_paired.input_pair.reverse}'
+            #else:
+                '${single_paired.input_sequences}'
+            #end if
+
+            #if $split_reads:
+                --classified-out '${classified_out}' --unclassified-out '${unclassified_out}'
+            #end if
+
+            --confidence '${confidence}'
+	    
+            --minimum-base-quality '${min_base_quality}'
+
+            #if $use_names:
+                --use-names
+            #end if
+
+            #if $report.create_report:
+                --report '${report_output}'
+                #if $report.use_mpa_style:
+                    --use-mpa-style
+                #end if
+                #if $report.report_zero_counts:
+                    --report-zero-counts
+                #end if
+            #end if
+
+            > '${output}'
+    ]]></command>
+    <inputs>
+        <conditional name="single_paired">
+            <param name="single_paired_selector" type="select" label="Single or paired reads" help="--paired">
+                <option value="collection">Collection</option>
+                <option value="yes">Paired</option>
+                <option selected="True" value="no">Single</option>
+            </param>
+            <when value="collection">
+                <param format="@INTYPES@" name="input_pair" type="data_collection" collection_type="paired" label="Collection of paired reads"/>
+            </when>
+            <when value="yes">
+                <param format="@INTYPES@" name="forward_input" type="data" label="Forward strand"/>
+                <param format="@INTYPES@" name="reverse_input" type="data" label="Reverse strand"/>
+            </when>
+            <when value="no">
+                <param format="@INTYPES@" label="Input sequences" name="input_sequences" type="data"/>
+            </when>
+        </conditional>
+
+    <param name="use_names" type="boolean" label="Print scientific names instead of just taxids"/>
+
+    <param name="confidence"  type="float" label="Confidence" value="0.0" help="Confidence score threshold. Must be in [0, 1]">
+        <validator type="in_range" min="0.0" max="1.0" message="Confidence score threshold should be between 0 and 1" />
+    </param>
+
+    <param name="min_base_quality" type="integer" label="Minimum Base Quality" value="0" help="Minimum base quality used in classification (only effective with FASTQ input)"/>
+	
+    <param name="quick" type="boolean" label="Enable quick operation" help="Quick operation (use first hit)"/>
+
+    <param name="split_reads" type="boolean" label="Split classified and unclassified outputs?" help="Sets --unclassified-out and --classified-out"/>
+
+    <section name="report" title="Create Report" expanded="false">
+        <param name="create_report" type="boolean"  label="Print a report with aggregrate counts/clade to file" help="--report" optional="true"/>
+        <param name="use_mpa_style" type="boolean" label="Format report output like Kraken 1's kraken-mpa-report" help="--use-mpa-style" optional="true"/>
+        <param name="report_zero_counts" type="boolean" label="Report counts for ALL taxa, even if counts are zero" help="--report-zero-counts" optional="true"/>
+    </section>
+
+    <expand macro="input_database"/>
+    
+    </inputs>
+    <outputs>
+        <data name="classified_out" format_source="input_sequences" label="${tool.name} on ${on_string}: Classified reads">
+            <filter>(split_reads)</filter>
+        </data>
+        <data name="unclassified_out" format_source="input_sequences" label="${tool.name} on ${on_string}: Unclassified reads">
+            <filter>(split_reads)</filter>
+        </data>
+    <data name="report_output" format_source="text" label="Report: ${tool.name} on ${on_string}">
+        <filter>(report['create_report'])</filter>
+    </data>
+        <data name="output" format="tabular" label="${tool.name} on ${on_string}: Classification"/>
+        <!--<data format="tabular" label="${tool.name} on ${on_string}: Translated classification" name="translated" />-->
+    </outputs>
+
+    <tests>    
+        <test>
+            <param name="single_paired_selector" value="no"/>
+            <param name="input_sequences" value="kraken_test1.fa" ftype="fasta"/>
+            <param name="split_reads" value="false"/>
+            <param name="quick" value="no"/>
+            <param name="confidence" value=".2"/>
+            <param name="only-classified-output" value="false"/>
+            <param name="kraken2_database" value="test_entry"/>
+            <output name="output" file="kraken_test1_output.tab" ftype="tabular"/>
+        </test>
+    </tests>
+    <help>
+        <![CDATA[
+**What it does**
+
+Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to short DNA reads. It does this by examining the k-mers within a read and querying a database with those k-mers. This database contains a mapping of every k-mer in Kraken's genomic library to the lowest common ancestor (LCA) in a taxonomic tree of all genomes that contain that k-mer. The set of LCA taxa that correspond to the k-mers in a read are then analyzed to create a single taxonomic label for the read; this label can be any of the nodes in the taxonomic tree. Kraken is designed to be rapid, sensitive, and highly precise.
+
+-----
+
+**Output Format**
+
+Each sequence classified by Kraken results in a single line of output. Output lines contain five tab-delimited fields; from left to right, they are::
+
+    1. "C"/"U": a one letter code indicating that the sequence was either classified or unclassified.
+    2. The sequence ID, obtained from the FASTA/FASTQ header.
+    3. The taxonomy ID Kraken 2 used to label the sequence; this is 0 if the sequence is unclassified.
+    4. The length of the sequence in bp.
+    5. A space-delimited list indicating the LCA mapping of each k-mer in the sequence. For example, "562:13 561:4 A:31 0:1 562:3" would indicate that:
+            a) the first 13 k-mers mapped to taxonomy ID #562
+            b) the next 4 k-mers mapped to taxonomy ID #561
+            c) the next 31 k-mers contained an ambiguous nucleotide
+            d) the next k-mer was not in the database
+            e) the last 3 k-mers mapped to taxonomy ID #562
+        ]]>
+    </help>
+    <expand macro="citations" />
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/macros.xml	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,19 @@
+<?xml version="1.0"?>
+<macros>
+    <token name="@TOOL_VERSION@">2.0.7_beta</token>
+    <token name="@INTYPES@">
+        fasta,fastq,fasta.gz,fasta.bz2,fastq.gz,fastq.bz2,fastqsanger
+    </token>
+    <xml name="input_database">
+        <param label="Select a Kraken2 database" name="kraken2_database" type="select">
+            <options from_data_table="kraken2_databases">
+                <validator message="No Kraken2 database is available" type="no_options" />
+            </options>
+        </param>
+    </xml>
+    <xml name="citations">
+        <citations>
+            <citation type="doi">10.1186/gb-2014-15-3-r46</citation>
+        </citations>
+    </xml>
+</macros>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/kraken2_databases.loc	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,6 @@
+# Tab separated with three columns:
+# - value (Galaxy records this in the Galaxy DB)
+# - name (Galaxy shows this in the UI)
+# - path (folder name containing the Kraken DB)
+#
+test_entry	"Test Database"	${__HERE__}/test_db
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/kraken_test1.fa	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,70 @@
+>gi|145231|gb|M33724.1|ECOALPHOA Escherichia coli K-12 truncated PhoA (phoA) gene, partial cds; and transposon Mu dI, partial sequence
+CAAAGCTCCGGGCCTCACCCAGGCGCTAAATACCAAAGATGGCGCAGTGATGGTGATGAGTTACGGGAAC
+TCCGAAGAGGATTCACAAGAACATACCGGCAGTCAGTTGCGTATTGCGGCGTATGGCCCGCATGCCGCCA
+ATGAAGCGGCGCACGAAAAACGCGAAAGCGT
+
+>gi|145232|gb|M33725.1|ECOALPHOB Escherichia coli K12 phoA pseudogene and transposon Mu dl-R, partial sequence
+CTGTCATAAAGTTGTCACGGCCGAGACTTATAGTCGCTTTGTTTTTATTTTTTAATGTATTTGTACATGG
+AGAAAATAAAGTGAAACAAAGCACTATTGCACTGGCACTCTTACCGTTACTGTTTACCCCTGTGACAAAA
+GCCCGGACACCAGTGAAGCGGCGCACGAAAAACGCGAAAGCGT
+
+>gi|145234|gb|M33727.1|ECOALPHOE Escherichia coli K12 upstream sequence of psiA5::Mu dI. is identical to psiA30 upstream sequence; putative (phoA) pseudogene and transposon Mu dl-R, partial sequence
+TTGTTTTTATTTTTTAATGTATTTGTACATGGAGAAAATAAAGTGAAACAAAGCACTATTGCACTGGTGA
+AGCGGCGCACGAAAAACGCGAAAGCGT
+
+>gi|146195|gb|J01619.1|ECOGLTA Eschericia coli gltA gene, sdhCDAB operon and sucABCD operons, complete sequence
+GAATTCGACCGCCATTGCGCAAGGCATCGCCATGACCAGGCAGGATACAAAAGAGAGTCGATAAATATTC
+ACGGTGTCCATACCTGATAAATATTTTATGAAAGGCGGCGATGATGCCGCAAAATAATACTTATTTATAA
+TCCAGCACGTAGGTTGCGTTAGCGGTTACTTCACCTGCCGTGACATCGACTGCATTATCAATTTGTTCCA
+TCCAGGCGAAAAAGTTCAGCGTCTGTTCTGATGAGCTTGCATCCAGGTCAAGATCTGGCGCGGCTGAACC
+TAATACGATGTTACCGTCATTTTTGTCCATCAGTCGTACACCGACCCCAGTTGCTTCGCCTGCACTGGTG
+TTGCTCAACAAAGGCGTAGCACCAGTTGTCTTAGCCGTGCTATCGAAGGTTACGCCAAACTTTGGATACC
+GGCATTCCGCTACCGTTGTCAGAAGCAGGCAGATCACAGTTGATCAAGCGAATGTCGACGGCCACTTTAT
+TGCTATGATGCTCCCGGTTTATATGGGTTGTCGTGACTTGTCCAAGATCTATGTTTTTATCAATATCTTC
+TGGATGAATTTCACAAGGTGCTTCAATAACCTCCCCCTTAAAGTGAATTTCGCCAGAACCTTCATCAGCA
+GCATAAACAGGTGCAGTGAACAGCAGAGATACGGCCAGTGCGGCCAATGTTTTTTGTCCTTTAAACATAA
+CAGAGTCCTTTAAGGATATAGAATAGGGGTATAGCTACGCCAGAATATCGTATTTGATTATTGCTAGTTT
+TTAGTTTTGCTTAAAAAATATTGTTAGTTTTATTAAATTGGAAAACTAAATTATTGGTATCATGAATTGT
+TGTATGATGATAAATATAGGGGGGATATGATAGACGTCATTTTCATAGGGTTATAAAATGCGACTACCAT
+GAAGTTTTTAATTCAAAGTATTGGGTTGCTGATAATTTGAGCTGTTCTATTCTTTTTAAATATCTATATA
+GGTCTGTTAATGGATTTTATTTTTACAAGTTTTTTGTGTTTAGGCATATAAAAATCAAGCCCGCCATATG
+AACGGCGGGTTAAAATATTTACAACTTAGCAATCGAACCATTAACGCTTGATATCGCTTTTAAAGTCGCG
+TTTTTCATATCCTGTATACAGCTGACGCGGACGGGCAATCTTCATACCGTCACTGTGCATTTCGCTCCAG
+TGGGCGATCCAGCCAACGGTACGTGCCATTGCGAAAATGACGGTGAACATGGAAGACGGAATACCCATCG
+CTTTCAGGATGATACCAGAGTAGAAATCGACGTTCGGGTACAGTTTCTTCTCGATAAAGTACGGGTCGTT
+CAGCGCGATGTTTTCCAGCTCCATAGCCACTTCCAGCAGGTCATCCTTCGTGCCCAGCTCTTTCAGCACT
+TCATGGCAGGTTTCACGCATTACGGTGGCGCGCGGGTCGTAATTTTTGTACACGCGGTGACCGAAGCCCA
+TCAGGCGGAAAGAATCATTTTTGTCTTTCGCACGACGAAAAAATTCCGGAATGTGTTTAACGGAGCTGAT
+TTCTTCCAGCATTTTCAGCGCCGCTTCGTTAGCACCGCCGTGCGCAGGTCCCCACAGTGAAGCAATACCT
+GCTGCGATACAGGCAAACGGGTTCGCACCCGAAGAGCCAGCGGTACGCACGGTGGAGGTAGAGGCGTTCT
+GTTCATGGTCAGCGTGCAGGATCAGAATACGGTCCATAGCACGTTCCAGAATCGGATTAACTTCATACGG
+TTCGCACGGCGTGGAGAACATCATATTCAGGAAGTTACCGGCGTAGGAGAGATCGTTGCGCGGGTAAACA
+AATGGCTGACCAATGGAATACTTGTAACACATCGCGGCCATGGTCGGCATTTTCGACAGCAGGCGGAACG
+CGGCAATTTCACGGTGACGAGGATTGTTAACATCCAGCGAGTCGTGATAGAACGCCGCCAGCGCGCCGGT
+AATACCACACATGACTGCCATTGGATGCGAGTCGCGACGGAAAGCATGGAACAGACGGGTAATCTGCTCG
+TGGATCATGGTATGACGGGTCACCGTAGTTTTAAATTCGTCATACTGTTCCTGAGTCGGTTTTTCACCAT
+TCAGCAGGATGTAACAAACTTCCAGGTAGTTAGAATCGGTCGCCAGCTGATCGATCGGGAAACCGCGGTG
+CAGCAAAATACCTTCATCACCATCAATAAAAGTAATTTTAGATTCGCAGGATGCGGTTGAAGTGAAGCCT
+GGGTCAAAGGTGAACACACCTTTTGAACCGAGAGTACGGATATCAATAACATCTTGACCCAGCGTGCCTT
+TCAGCACATCCAGTTCAACAGCTGTATCCCCGTTGAGGGTGAGTTTTGCTTTTGTATCAGCCATTTAAGG
+TCTCCTTAGCGCCTTATTGCGTAAGACTGCCGGAACTTAAATTTGCCTTCGCACATCAACCTGGCTTTAC
+CCGTTTTTTATTTGGCTCGCCGCTCTGTGAAAGAGGGGAAAACCTGGGTACAGAGCTCTGGGCGCTTGCA
+GGTAAAGGATCCATTGATGACGAATAAATGGCGAATCAAGTACTTAGCAATCCGAATTATTAAACTTGTC
+TACCACTAATAACTGTCCCGAATGAATTGGTCAATACTCCACACTGTTACATAAGTTAATCTTAGGTGAA
+ATACCGACTTCATAACTTTTACGCATTATATGCTTTTCCTGGTAATGTTTGTAACAACTTTGTTGAATGA
+TTGTCAAATTAGATGATTAAAAATTAAATAAATGTTGTTATCGTGACCTGGATCACTGTTCAGGATAAAA
+CCCGACAAACTATATGTAGGTTAATTGTAATGATTTTGTGAACAGCCTATACTGCCGCCAGTCTCCGGAA
+CACCCTGCAATCCCGAGCCACCCAGCGTTGTAACGTGTCGTTTTCGCATCTGGAAGCAGTGTTTTGCATG
+ACGCGCAGTTATAGAAAGGACGCTGTCTGACCCGCAAGCAGACCGGAGGAAGGAAATCCCGACGTCTCCA
+GGTAACAGAAAGTTAACCTCTGTGCCCGTAGTCCCCAGGGAATAATAAGAACAGCATGTGGGCGTTATTC
+ATGATAAGAAATGTGAAAAAACAAAGACCTGTTAATCTGGACCTACAGACCATCCGGTTCCCCATCACGG
+CGATAGCGTCCATTCTCCATCGCGTTTCCGGTGTGATCACCTTTGTTGCAGTGGGCATCCTGCTGTGGCT
+TCTGGGTACCAGCCTCTCTTCCCCTGAAGGTTTCGAGCAAGCTTCCGCGATTATGGGCAGCTTCTTCGTC
+AAATTTATCATGTGGGGCATCCTTACCGCTCTGGCGTATCACGTCGTCGTAGGTATTCGCCACATGATGA
+TGGATTTTGGCTATCTGGAAGAAACATTCGAAGCGGGTAAACGCTCCGCCAAAATCTCCTTTGTTATTAC
+TGTCGTGCTTTCACTTCTCGCAGGAGTCCTCGTATGGTAAGCAACGCCTCCGCATTAGGACGCAATGGCG
+TACATGATTTCATCCTCGTTCGCGCTACCGCTATCGTCCTGACGCTCTACATCATTTATATGGTCGGTTT
+TTTCGCTACCAGTGGCGAGCTGACATATGAAGTCTGGATCGGTTTCTTCGCCTCTGCGTTCACCAAAGTG
+TTCACCCTGCTGGCGCTGTTTTCTATCTTGATCCATGCCTGGATCGGCATGTGGCAGGTGTTGACCGACT
+ACGTTAAACCGCTGGCTTTGCGCCTGATGCTGCAACTGGTGATTGTCGTTGCACTGGTGGTTTACGTGAT
+TTATGGATTCGTTGTGGTGTGGGGTGTGTGATGAAATTGCCAGTCAGAGAATTTGATGCAGTTGTGATTG
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/kraken_test1_output.tab	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,4 @@
+U	gi|145231|gb|M33724.1|ECOALPHOA	0	171	0:137
+U	gi|145232|gb|M33725.1|ECOALPHOB	0	183	0:149
+U	gi|145234|gb|M33727.1|ECOALPHOE	0	97	0:63
+U	gi|146195|gb|J01619.1|ECOGLTA	0	3850	0:3816
Binary file test-data/test_db/hash.k2d has changed
Binary file test-data/test_db/opts.k2d has changed
Binary file test-data/test_db/taxo.k2d has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tool-data/kraken2_databases.loc.sample	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,7 @@
+# Expect three columns, tab separated, as follows:
+# - value (Galaxy records this in the Galaxy DB)
+# - name (Galaxy shows this in the UI)
+# - path with or without trailing slash (folder name containing the Kraken DB)
+#
+# e.g.
+# plants2018<tab>Plant genomes (2018)<tab>/path/to/krakenDB/plants_2018
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_data_table_conf.xml.sample	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,8 @@
+<?xml version="1.0"?>
+<tables>
+    <!-- Locations of Kraken database in the required format -->
+    <table name="kraken2_databases" comment_char="#">
+        <columns>value, name, path</columns>
+        <file path="tool-data/kraken2_databases.loc" />
+    </table>
+</tables>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_data_table_conf.xml.test	Thu Mar 14 05:16:48 2019 -0400
@@ -0,0 +1,8 @@
+<?xml version="1.0"?>
+<tables>
+    <!-- Locations of Kraken database in the required format -->
+    <table name="kraken2_databases" comment_char="#">
+        <columns>value, name, path</columns>
+        <file path="${__HERE__}/test-data/kraken2_databases.loc" />
+    </table>
+</tables>