Mercurial > repos > fubar > mashmap
comparison mashmap.xml @ 8:9ba0184870ef draft
planemo upload
author | fubar |
---|---|
date | Sat, 24 Feb 2024 06:58:16 +0000 |
parents | 53f601fb8664 |
children | dc53eb4354a6 |
comparison
equal
deleted
inserted
replaced
7:53f601fb8664 | 8:9ba0184870ef |
---|---|
1 <tool name="mashmap" id="mashmap" version="1.19.2" profile="22.05"> | 1 <tool name="mashmap" id="mashmap" version="1.19.2" profile="22.05"> |
2 <!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay--> | 2 <!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay--> |
3 <!--Created by toolfactory@galaxy.org at 24/02/2024 15:30:44 using the Galaxy Tool Factory.--> | 3 <!--Created by toolfactory@galaxy.org at 24/02/2024 15:58:22 using the Galaxy Tool Factory.--> |
4 <description>Fast local alignment boundaries</description> | 4 <description>Fast local alignment boundaries</description> |
5 <requirements> | 5 <requirements> |
6 <requirement version="3.1.3" type="package">mashmap</requirement> | 6 <requirement version="3.1.3" type="package">mashmap</requirement> |
7 <requirement version="1.19.2" type="package">samtools</requirement> | 7 <requirement version="1.19.2" type="package">samtools</requirement> |
8 </requirements> | 8 </requirements> |
63 </tests> | 63 </tests> |
64 <help><![CDATA[ | 64 <help><![CDATA[ |
65 *MashMap* implements a fast and approximate algorithm for computing local alignment boundaries between long DNA sequences. It can be useful for mapping genome assembly or long reads (PacBio/ONT) to reference genome(s). Given a minimum alignment length and an identity threshold for the desired local alignments, | 65 *MashMap* implements a fast and approximate algorithm for computing local alignment boundaries between long DNA sequences. It can be useful for mapping genome assembly or long reads (PacBio/ONT) to reference genome(s). Given a minimum alignment length and an identity threshold for the desired local alignments, |
66 | 66 |
67 Mashmap computes alignment boundaries and identity estimates using k-mers. It does not compute the alignments explicitly, but rather estimates an unbiased k-mer based Jaccard similarity using a combination of minmers (a novel winnowing scheme) and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined using the given minimum local alignment length and identity thresholds. | 67 Mashmap computes alignment boundaries and identity estimates using k-mers. It does not compute the alignments explicitly, but rather estimates an unbiased k-mer based Jaccard similarity using a combination of minmers (a novel winnowing scheme) and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined using the given minimum local alignment length and identity thresholds. |
68 | 68 |
69 As an example, Mashmap can map a human genome assembly to the human reference genome in about one minute total execution time and < 4 GB memory using just 8 CPU threads, achieving more than an order of magnitude improvement in both runtime and memory over alternative methods. We describe the algorithms associated with Mashmap, and report on speed, scalability, and accuracy of the software in the publications listed below. Unlike traditional mappers, MashMap does not compute exact sequence alignments. In future, we plan to add an optional alignment support to generate base-to-base alignments. | 69 Output is in *paf* format |
70 | 70 This is space-delimited, with each line consisting of query name, length, 0-based start, end, strand, target name, length, start, end and mapping nucleotide identity. |
71 Map set of query sequences against a reference genome: | 71 Details at https://github.com/lh3/miniasm/blob/master/PAF.md |
72 | 72 |
73 mashmap -r reference.fna -q query.fa | 73 More details at the Mashmap github repository https://github.com/marbl/MashMap |
74 | 74 |
75 The output is space-delimited with each line consisting of query name, length, 0-based start, end, strand, target name, length, start, end and mapping nucleotide identity. | |
76 | |
77 Map set of query seqences against a list of reference genomes: | |
78 | |
79 mashmap --rl referenceList.txt -q query.fa | |
80 | |
81 File 'referenceList.txt' containing the list of reference genomes should contain path to the reference genomes, one per line. | |
82 | |
83 Source code: https://github.com/marbl/MashMap | |
84 ]]></help> | 75 ]]></help> |
85 <citations> | 76 <citations> |
86 <citation type="doi">10.1093/bioinformatics/btad512</citation> | 77 <citation type="doi">10.1093/bioinformatics/btad512</citation> |
87 <citation type="doi">10.1093/bioinformatics/bts573</citation> | 78 <citation type="doi">10.1093/bioinformatics/bts573</citation> |
88 </citations> | 79 </citations> |