Mercurial > repos > bgruening > flye
diff flye.xml @ 11:291923e6f276 draft
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/flye commit acf41fab409bef4882d5d12cbf991452b408076e
author | bgruening |
---|---|
date | Mon, 18 Mar 2024 12:44:09 +0000 |
parents | cb8dfd28c16f |
children | 3e4f8642c77e |
line wrap: on
line diff
--- a/flye.xml Wed Oct 26 13:37:35 2022 +0000 +++ b/flye.xml Mon Mar 18 12:44:09 2024 +0000 @@ -3,8 +3,9 @@ <macros> <import>macros.xml</import> </macros> + <expand macro="edam_ontology"/> + <expand macro="xrefs"/> <expand macro="requirements" /> - <expand macro="edam_ontology"/> <version_command>flye --version</version_command> <command detect_errors="exit_code"><![CDATA[ #for $counter, $input in enumerate($inputs): @@ -17,7 +18,7 @@ #elif $input.is_of_type('fasta'): #set $ext = 'fasta' #end if - ln -s '$input' ./input_${counter}.${ext} && + ln -sf '$input' ./input_${counter}.${ext} && #end for flye $mode_conditional.mode @@ -228,12 +229,12 @@ </output> <output name="assembly_gfa" ftype="txt"> <assert_contents> - <has_size value="420252" delta="100"/> + <has_size value="419414" delta="100"/> </assert_contents> </output> <output name="consensus" ftype="fasta"> <assert_contents> - <has_size value="427129" delta="100"/> + <has_size value="426277" delta="100"/> </assert_contents> </output> </test> @@ -252,17 +253,17 @@ </output> <output name="assembly_graph" ftype="graph_dot"> <assert_contents> - <has_size value="1273" delta="100"/> + <has_size value="1500" delta="100"/> </assert_contents> </output> <output name="assembly_gfa" ftype="txt"> <assert_contents> - <has_size value="420252" delta="100"/> + <has_size value="418422" delta="100"/> </assert_contents> </output> <output name="consensus" ftype="fasta"> <assert_contents> - <has_size value="427129" delta="100"/> + <has_size value="425147" delta="200"/> </assert_contents> </output> </test> @@ -287,12 +288,12 @@ </output> <output name="assembly_gfa" ftype="txt"> <assert_contents> - <has_size value="420252" delta="100"/> + <has_size value="418511" delta="100"/> </assert_contents> </output> <output name="consensus" ftype="fasta"> <assert_contents> - <has_size value="427129" delta="100"/> + <has_size value="425267" delta="100"/> </assert_contents> </output> </test> @@ -301,7 +302,7 @@ <param name="inputs" ftype="fastq.gz" value="ecoli_hifi_01.fastq.gz,ecoli_hifi_02.fastq.gz,ecoli_hifi_03.fastq.gz,ecoli_hifi_04.fastq.gz,ecoli_hifi_05.fastq.gz,ecoli_hifi_06.fastq.gz,ecoli_hifi_07.fastq.gz,ecoli_hifi_08.fastq.gz,ecoli_hifi_09.fastq.gz"/> <param name="mode" value="--nano-hq"/> <param name="min_overlap" value="1000"/> - <param name="scaffolding" value="true"/> + <param name="scaffold" value="true"/> <output name="assembly_info" ftype="tabular"> <assert_contents> <has_size value="286" delta="100"/> @@ -314,12 +315,12 @@ </output> <output name="assembly_gfa" ftype="txt"> <assert_contents> - <has_size value="420252" delta="100"/> + <has_size value="419414" delta="1000"/> </assert_contents> </output> <output name="consensus" ftype="fasta"> <assert_contents> - <has_size value="427129" delta="100"/> + <has_size value="426277" delta="1000"/> </assert_contents> </output> </test> @@ -353,8 +354,6 @@ </tests> <help><![CDATA[ -.. class:: infomark - **Purpose** Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. @@ -364,8 +363,6 @@ ---- -.. class:: infomark - **Quick usage** Input reads can be in FASTA or FASTQ format, uncompressed or compressed with gz. Currently, PacBio (raw, corrected, HiFi) and ONT reads @@ -380,17 +377,13 @@ ---- -.. class:: infomark - **Outputs** The main output files are: - :: - - - Final assembly: contains contigs and possibly scaffolds (see below). - - Final repeat graph: note that the edge sequences might be different (shorter) than contig sequences, because contigs might include multiple graph edges. - - Extra information about contigs (such as length or coverage). +* Final assembly: contains contigs and possibly scaffolds (see below). +* Final repeat graph: note that the edge sequences might be different (shorter) than contig sequences, because contigs might include multiple graph edges. +* Extra information about contigs (such as length or coverage). Each contig is formed by a single unique graph edge. If possible, unique contigs are extended with the sequence from flanking unresolved repeats on the graph. Thus, a contig fully contains the corresponding graph edge (with the same id), but might be longer then this edge. This is somewhat similar to unitig-contig relation in @@ -402,53 +395,42 @@ Extra information about contigs/scaffolds is output into the assembly_info.txt file. It is a tab-delimited table with the columns as follows: - :: +* Contig/scaffold id +* Length +* Coverage +* Is circular, (Y)es or (N)o +* Is repetitive, (Y)es or (N)o +* Multiplicity (based on coverage) +* Alternative group +* Graph path (graph path corresponding to this contig/scaffold). - - Contig/scaffold id - - Length - - Coverage - - Is circular, (Y)es or (N)o - - Is repetitive, (Y)es or (N)o - - Multiplicity (based on coverage) - - Alternative group - - Graph path (graph path corresponding to this contig/scaffold). - -Scaffold gaps are marked with ?? symbols, and * symbol denotes a terminal graph node. Alternative contigs (representing alternative haplotypes) will have the same alt. -group ID. Primary contigs are marked by *. +Scaffold gaps are marked with `??` symbols, and `*` symbol denotes a terminal graph node. Alternative contigs (representing alternative haplotypes) will have the same alt. +group ID. Primary contigs are marked by `*`. ---- -.. class:: infomark - **Algorithm Description** This is a brief description of the Flye algorithm. Please refer to the manuscript for more detailed information. The draft contig extension is organized as follows: - :: - - - K-mer counting / erroneous k-mer pre-filtering - - Solid k-mer selection (k-mers with sufficient frequency, which are unlikely to be erroneous) - - Contig extension. The algorithm starts from a single read and extends it with a next overlapping read (overlaps are dynamically detected using the selected solid k-mers). +* K-mer counting / erroneous k-mer pre-filtering +* Solid k-mer selection (k-mers with sufficient frequency, which are unlikely to be erroneous) +* Contig extension. The algorithm starts from a single read and extends it with a next overlapping read (overlaps are dynamically detected using the selected solid k-mers). Note that we do not attempt to resolve repeats at this stage, thus the reconstructed contigs might contain misassemblies. Flye then aligns the reads on these draft contigs using minimap2 and calls a consensus. Afterwards, Flye performs repeat analysis as follows: - :: - - - Repeat graph is constructed from the (possibly misassembled) contigs - - In this graph all repeats longer than minimum overlap are collapsed - - The algorithm resolves repeats using the read information and graph structure - - The unbranching paths in the graph are output as contigs +* Repeat graph is constructed from the (possibly misassembled) contigs +* In this graph all repeats longer than minimum overlap are collapsed +* The algorithm resolves repeats using the read information and graph structure +* The unbranching paths in the graph are output as contigs If enabled, after resolving bridged repeats, Trestle module attempts to resolve simple unbridged repeats (of multiplicity 2) using the heterogeneities between repeat copies. Finally, Flye performs polishing of the resulting assembly to correct the remaining errors: - :: - - - Alignment of all reads to the current assembly using minimap2 - - Partition the alignment into mini-alignments (bubbles) - - Error correction of each bubble using a maximum likelihood approach - +* Alignment of all reads to the current assembly using minimap2 +* Partition the alignment into mini-alignments (bubbles) +* Error correction of each bubble using a maximum likelihood approach The polishing steps could be repeated, which might slightly increase quality for some datasets.