# HG changeset patch # User iuc # Date 1553532916 14400 # Node ID 9a1626faa05c93e00022baa7c26af2793ed34bd6 planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tools/berokka commit 387f04ffbf5205aaaa7b46e9e3d518edb62a538f diff -r 000000000000 -r 9a1626faa05c README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,14 @@ +Galaxy Wrapper for Berokka +========================== + +Trim, circularise, orient & filter long read bacterial genome assemblies. + +Detailed Description +-------------------- + +View original Berokka documentation here: https://github.com/tseemann/berokka/blob/master/README.md + +License +------- + +`GPLv3 `_ diff -r 000000000000 -r 9a1626faa05c berokka.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/berokka.xml Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,107 @@ + + Trim, circularise, orient and filter long read bacterial genome assemblies + + berokka + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ` allows you to remove contigs which match 50% of sequences in this file. Berokka comes with the standard Pacbio control sequence. You can provide your own FASTA file using this option. + +* `Read Length ` can be used for datasets that won't seem to circularise. It affects the length of the match it attempts to make using BLAST. + +* `Fuzz` can be used to accept local alignment within X bp of global (default '5') + +* `Annotation` can be set to "No" to ensure that the FASTA descriptions are not altered between the input and output FASTA files. + + ]]> + + +@UNPUBLISHED{Seemann2016, + author = {Seemann, Torsten}, + title = {Berokka: Faster Trim, circularise and orient long read bacterial genome assemblies}, + year = {2016}, + url = {https://github.com/tseemann/berokka}, +} + + + diff -r 000000000000 -r 9a1626faa05c test-data/berokka_test1.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/berokka_test1.fasta Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,70 @@ +>gi|145231|gb|M33724.1|ECOALPHOA Escherichia coli K-12 truncated PhoA (phoA) gene, partial cds; and transposon Mu dI, partial sequence +CAAAGCTCCGGGCCTCACCCAGGCGCTAAATACCAAAGATGGCGCAGTGATGGTGATGAGTTACGGGAAC +TCCGAAGAGGATTCACAAGAACATACCGGCAGTCAGTTGCGTATTGCGGCGTATGGCCCGCATGCCGCCA +ATGAAGCGGCGCACGAAAAACGCGAAAGCGT + +>gi|145232|gb|M33725.1|ECOALPHOB Escherichia coli K12 phoA pseudogene and transposon Mu dl-R, partial sequence +CTGTCATAAAGTTGTCACGGCCGAGACTTATAGTCGCTTTGTTTTTATTTTTTAATGTATTTGTACATGG +AGAAAATAAAGTGAAACAAAGCACTATTGCACTGGCACTCTTACCGTTACTGTTTACCCCTGTGACAAAA +GCCCGGACACCAGTGAAGCGGCGCACGAAAAACGCGAAAGCGT + +>gi|145234|gb|M33727.1|ECOALPHOE Escherichia coli K12 upstream sequence of psiA5::Mu dI. is identical to psiA30 upstream sequence; putative (phoA) pseudogene and transposon Mu dl-R, partial sequence +TTGTTTTTATTTTTTAATGTATTTGTACATGGAGAAAATAAAGTGAAACAAAGCACTATTGCACTGGTGA +AGCGGCGCACGAAAAACGCGAAAGCGT + +>gi|146195|gb|J01619.1|ECOGLTA Eschericia coli gltA gene, sdhCDAB operon and sucABCD operons, complete sequence +GAATTCGACCGCCATTGCGCAAGGCATCGCCATGACCAGGCAGGATACAAAAGAGAGTCGATAAATATTC +ACGGTGTCCATACCTGATAAATATTTTATGAAAGGCGGCGATGATGCCGCAAAATAATACTTATTTATAA +TCCAGCACGTAGGTTGCGTTAGCGGTTACTTCACCTGCCGTGACATCGACTGCATTATCAATTTGTTCCA +TCCAGGCGAAAAAGTTCAGCGTCTGTTCTGATGAGCTTGCATCCAGGTCAAGATCTGGCGCGGCTGAACC +TAATACGATGTTACCGTCATTTTTGTCCATCAGTCGTACACCGACCCCAGTTGCTTCGCCTGCACTGGTG +TTGCTCAACAAAGGCGTAGCACCAGTTGTCTTAGCCGTGCTATCGAAGGTTACGCCAAACTTTGGATACC +GGCATTCCGCTACCGTTGTCAGAAGCAGGCAGATCACAGTTGATCAAGCGAATGTCGACGGCCACTTTAT +TGCTATGATGCTCCCGGTTTATATGGGTTGTCGTGACTTGTCCAAGATCTATGTTTTTATCAATATCTTC +TGGATGAATTTCACAAGGTGCTTCAATAACCTCCCCCTTAAAGTGAATTTCGCCAGAACCTTCATCAGCA +GCATAAACAGGTGCAGTGAACAGCAGAGATACGGCCAGTGCGGCCAATGTTTTTTGTCCTTTAAACATAA +CAGAGTCCTTTAAGGATATAGAATAGGGGTATAGCTACGCCAGAATATCGTATTTGATTATTGCTAGTTT +TTAGTTTTGCTTAAAAAATATTGTTAGTTTTATTAAATTGGAAAACTAAATTATTGGTATCATGAATTGT +TGTATGATGATAAATATAGGGGGGATATGATAGACGTCATTTTCATAGGGTTATAAAATGCGACTACCAT +GAAGTTTTTAATTCAAAGTATTGGGTTGCTGATAATTTGAGCTGTTCTATTCTTTTTAAATATCTATATA +GGTCTGTTAATGGATTTTATTTTTACAAGTTTTTTGTGTTTAGGCATATAAAAATCAAGCCCGCCATATG +AACGGCGGGTTAAAATATTTACAACTTAGCAATCGAACCATTAACGCTTGATATCGCTTTTAAAGTCGCG +TTTTTCATATCCTGTATACAGCTGACGCGGACGGGCAATCTTCATACCGTCACTGTGCATTTCGCTCCAG +TGGGCGATCCAGCCAACGGTACGTGCCATTGCGAAAATGACGGTGAACATGGAAGACGGAATACCCATCG +CTTTCAGGATGATACCAGAGTAGAAATCGACGTTCGGGTACAGTTTCTTCTCGATAAAGTACGGGTCGTT +CAGCGCGATGTTTTCCAGCTCCATAGCCACTTCCAGCAGGTCATCCTTCGTGCCCAGCTCTTTCAGCACT +TCATGGCAGGTTTCACGCATTACGGTGGCGCGCGGGTCGTAATTTTTGTACACGCGGTGACCGAAGCCCA +TCAGGCGGAAAGAATCATTTTTGTCTTTCGCACGACGAAAAAATTCCGGAATGTGTTTAACGGAGCTGAT +TTCTTCCAGCATTTTCAGCGCCGCTTCGTTAGCACCGCCGTGCGCAGGTCCCCACAGTGAAGCAATACCT +GCTGCGATACAGGCAAACGGGTTCGCACCCGAAGAGCCAGCGGTACGCACGGTGGAGGTAGAGGCGTTCT +GTTCATGGTCAGCGTGCAGGATCAGAATACGGTCCATAGCACGTTCCAGAATCGGATTAACTTCATACGG +TTCGCACGGCGTGGAGAACATCATATTCAGGAAGTTACCGGCGTAGGAGAGATCGTTGCGCGGGTAAACA +AATGGCTGACCAATGGAATACTTGTAACACATCGCGGCCATGGTCGGCATTTTCGACAGCAGGCGGAACG +CGGCAATTTCACGGTGACGAGGATTGTTAACATCCAGCGAGTCGTGATAGAACGCCGCCAGCGCGCCGGT +AATACCACACATGACTGCCATTGGATGCGAGTCGCGACGGAAAGCATGGAACAGACGGGTAATCTGCTCG +TGGATCATGGTATGACGGGTCACCGTAGTTTTAAATTCGTCATACTGTTCCTGAGTCGGTTTTTCACCAT +TCAGCAGGATGTAACAAACTTCCAGGTAGTTAGAATCGGTCGCCAGCTGATCGATCGGGAAACCGCGGTG +CAGCAAAATACCTTCATCACCATCAATAAAAGTAATTTTAGATTCGCAGGATGCGGTTGAAGTGAAGCCT +GGGTCAAAGGTGAACACACCTTTTGAACCGAGAGTACGGATATCAATAACATCTTGACCCAGCGTGCCTT +TCAGCACATCCAGTTCAACAGCTGTATCCCCGTTGAGGGTGAGTTTTGCTTTTGTATCAGCCATTTAAGG +TCTCCTTAGCGCCTTATTGCGTAAGACTGCCGGAACTTAAATTTGCCTTCGCACATCAACCTGGCTTTAC +CCGTTTTTTATTTGGCTCGCCGCTCTGTGAAAGAGGGGAAAACCTGGGTACAGAGCTCTGGGCGCTTGCA +GGTAAAGGATCCATTGATGACGAATAAATGGCGAATCAAGTACTTAGCAATCCGAATTATTAAACTTGTC +TACCACTAATAACTGTCCCGAATGAATTGGTCAATACTCCACACTGTTACATAAGTTAATCTTAGGTGAA +ATACCGACTTCATAACTTTTACGCATTATATGCTTTTCCTGGTAATGTTTGTAACAACTTTGTTGAATGA +TTGTCAAATTAGATGATTAAAAATTAAATAAATGTTGTTATCGTGACCTGGATCACTGTTCAGGATAAAA +CCCGACAAACTATATGTAGGTTAATTGTAATGATTTTGTGAACAGCCTATACTGCCGCCAGTCTCCGGAA +CACCCTGCAATCCCGAGCCACCCAGCGTTGTAACGTGTCGTTTTCGCATCTGGAAGCAGTGTTTTGCATG +ACGCGCAGTTATAGAAAGGACGCTGTCTGACCCGCAAGCAGACCGGAGGAAGGAAATCCCGACGTCTCCA +GGTAACAGAAAGTTAACCTCTGTGCCCGTAGTCCCCAGGGAATAATAAGAACAGCATGTGGGCGTTATTC +ATGATAAGAAATGTGAAAAAACAAAGACCTGTTAATCTGGACCTACAGACCATCCGGTTCCCCATCACGG +CGATAGCGTCCATTCTCCATCGCGTTTCCGGTGTGATCACCTTTGTTGCAGTGGGCATCCTGCTGTGGCT +TCTGGGTACCAGCCTCTCTTCCCCTGAAGGTTTCGAGCAAGCTTCCGCGATTATGGGCAGCTTCTTCGTC +AAATTTATCATGTGGGGCATCCTTACCGCTCTGGCGTATCACGTCGTCGTAGGTATTCGCCACATGATGA +TGGATTTTGGCTATCTGGAAGAAACATTCGAAGCGGGTAAACGCTCCGCCAAAATCTCCTTTGTTATTAC +TGTCGTGCTTTCACTTCTCGCAGGAGTCCTCGTATGGTAAGCAACGCCTCCGCATTAGGACGCAATGGCG +TACATGATTTCATCCTCGTTCGCGCTACCGCTATCGTCCTGACGCTCTACATCATTTATATGGTCGGTTT +TTTCGCTACCAGTGGCGAGCTGACATATGAAGTCTGGATCGGTTTCTTCGCCTCTGCGTTCACCAAAGTG +TTCACCCTGCTGGCGCTGTTTTCTATCTTGATCCATGCCTGGATCGGCATGTGGCAGGTGTTGACCGACT +ACGTTAAACCGCTGGCTTTGCGCCTGATGCTGCAACTGGTGATTGTCGTTGCACTGGTGGTTTACGTGAT +TTATGGATTCGTTGTGGTGTGGGGTGTGTGATGAAATTGCCAGTCAGAGAATTTGATGCAGTTGTGATTG diff -r 000000000000 -r 9a1626faa05c test-data/results_1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/results_1 Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,5 @@ +#sequence status old_len new_len trimmed +gi|145231|gb|M33724.1|ECOALPHOA kept 171 171 0 +gi|145232|gb|M33725.1|ECOALPHOB kept 183 183 0 +gi|145234|gb|M33727.1|ECOALPHOE kept 97 97 0 +gi|146195|gb|J01619.1|ECOGLTA kept 3850 3850 0 diff -r 000000000000 -r 9a1626faa05c test-data/results_2 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/results_2 Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,5 @@ +#sequence status old_len new_len trimmed +gi|145231|gb|M33724.1|ECOALPHOA kept 171 171 0 +gi|145232|gb|M33725.1|ECOALPHOB kept 183 183 0 +gi|145234|gb|M33727.1|ECOALPHOE kept 97 97 0 +gi|146195|gb|J01619.1|ECOGLTA kept 3850 3850 0 diff -r 000000000000 -r 9a1626faa05c test-data/results_3 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/results_3 Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,5 @@ +#sequence status old_len new_len trimmed +gi|145231|gb|M33724.1|ECOALPHOA kept 171 171 0 +gi|145232|gb|M33725.1|ECOALPHOB kept 183 183 0 +gi|145234|gb|M33727.1|ECOALPHOE kept 97 97 0 +gi|146195|gb|J01619.1|ECOGLTA kept 3850 3850 0 diff -r 000000000000 -r 9a1626faa05c test-data/trimmed_1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/trimmed_1 Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,78 @@ +>gi|145231|gb|M33724.1|ECOALPHOA Escherichia coli K-12 truncated PhoA (phoA) gene, partial cds; and transposon Mu dI, partial sequence +CAAAGCTCCGGGCCTCACCCAGGCGCTAAATACCAAAGATGGCGCAGTGATGGTGATGAG +TTACGGGAACTCCGAAGAGGATTCACAAGAACATACCGGCAGTCAGTTGCGTATTGCGGC +GTATGGCCCGCATGCCGCCAATGAAGCGGCGCACGAAAAACGCGAAAGCGT +>gi|145232|gb|M33725.1|ECOALPHOB Escherichia coli K12 phoA pseudogene and transposon Mu dl-R, partial sequence +CTGTCATAAAGTTGTCACGGCCGAGACTTATAGTCGCTTTGTTTTTATTTTTTAATGTAT +TTGTACATGGAGAAAATAAAGTGAAACAAAGCACTATTGCACTGGCACTCTTACCGTTAC +TGTTTACCCCTGTGACAAAAGCCCGGACACCAGTGAAGCGGCGCACGAAAAACGCGAAAG +CGT +>gi|145234|gb|M33727.1|ECOALPHOE Escherichia coli K12 upstream sequence of psiA5::Mu dI. is identical to psiA30 upstream sequence; putative (phoA) pseudogene and transposon Mu dl-R, partial sequence +TTGTTTTTATTTTTTAATGTATTTGTACATGGAGAAAATAAAGTGAAACAAAGCACTATT +GCACTGGTGAAGCGGCGCACGAAAAACGCGAAAGCGT +>gi|146195|gb|J01619.1|ECOGLTA Eschericia coli gltA gene, sdhCDAB operon and sucABCD operons, complete sequence +GAATTCGACCGCCATTGCGCAAGGCATCGCCATGACCAGGCAGGATACAAAAGAGAGTCG +ATAAATATTCACGGTGTCCATACCTGATAAATATTTTATGAAAGGCGGCGATGATGCCGC +AAAATAATACTTATTTATAATCCAGCACGTAGGTTGCGTTAGCGGTTACTTCACCTGCCG +TGACATCGACTGCATTATCAATTTGTTCCATCCAGGCGAAAAAGTTCAGCGTCTGTTCTG +ATGAGCTTGCATCCAGGTCAAGATCTGGCGCGGCTGAACCTAATACGATGTTACCGTCAT +TTTTGTCCATCAGTCGTACACCGACCCCAGTTGCTTCGCCTGCACTGGTGTTGCTCAACA +AAGGCGTAGCACCAGTTGTCTTAGCCGTGCTATCGAAGGTTACGCCAAACTTTGGATACC +GGCATTCCGCTACCGTTGTCAGAAGCAGGCAGATCACAGTTGATCAAGCGAATGTCGACG +GCCACTTTATTGCTATGATGCTCCCGGTTTATATGGGTTGTCGTGACTTGTCCAAGATCT +ATGTTTTTATCAATATCTTCTGGATGAATTTCACAAGGTGCTTCAATAACCTCCCCCTTA +AAGTGAATTTCGCCAGAACCTTCATCAGCAGCATAAACAGGTGCAGTGAACAGCAGAGAT +ACGGCCAGTGCGGCCAATGTTTTTTGTCCTTTAAACATAACAGAGTCCTTTAAGGATATA +GAATAGGGGTATAGCTACGCCAGAATATCGTATTTGATTATTGCTAGTTTTTAGTTTTGC +TTAAAAAATATTGTTAGTTTTATTAAATTGGAAAACTAAATTATTGGTATCATGAATTGT +TGTATGATGATAAATATAGGGGGGATATGATAGACGTCATTTTCATAGGGTTATAAAATG +CGACTACCATGAAGTTTTTAATTCAAAGTATTGGGTTGCTGATAATTTGAGCTGTTCTAT +TCTTTTTAAATATCTATATAGGTCTGTTAATGGATTTTATTTTTACAAGTTTTTTGTGTT +TAGGCATATAAAAATCAAGCCCGCCATATGAACGGCGGGTTAAAATATTTACAACTTAGC +AATCGAACCATTAACGCTTGATATCGCTTTTAAAGTCGCGTTTTTCATATCCTGTATACA +GCTGACGCGGACGGGCAATCTTCATACCGTCACTGTGCATTTCGCTCCAGTGGGCGATCC +AGCCAACGGTACGTGCCATTGCGAAAATGACGGTGAACATGGAAGACGGAATACCCATCG +CTTTCAGGATGATACCAGAGTAGAAATCGACGTTCGGGTACAGTTTCTTCTCGATAAAGT +ACGGGTCGTTCAGCGCGATGTTTTCCAGCTCCATAGCCACTTCCAGCAGGTCATCCTTCG +TGCCCAGCTCTTTCAGCACTTCATGGCAGGTTTCACGCATTACGGTGGCGCGCGGGTCGT +AATTTTTGTACACGCGGTGACCGAAGCCCATCAGGCGGAAAGAATCATTTTTGTCTTTCG +CACGACGAAAAAATTCCGGAATGTGTTTAACGGAGCTGATTTCTTCCAGCATTTTCAGCG +CCGCTTCGTTAGCACCGCCGTGCGCAGGTCCCCACAGTGAAGCAATACCTGCTGCGATAC +AGGCAAACGGGTTCGCACCCGAAGAGCCAGCGGTACGCACGGTGGAGGTAGAGGCGTTCT +GTTCATGGTCAGCGTGCAGGATCAGAATACGGTCCATAGCACGTTCCAGAATCGGATTAA +CTTCATACGGTTCGCACGGCGTGGAGAACATCATATTCAGGAAGTTACCGGCGTAGGAGA +GATCGTTGCGCGGGTAAACAAATGGCTGACCAATGGAATACTTGTAACACATCGCGGCCA +TGGTCGGCATTTTCGACAGCAGGCGGAACGCGGCAATTTCACGGTGACGAGGATTGTTAA +CATCCAGCGAGTCGTGATAGAACGCCGCCAGCGCGCCGGTAATACCACACATGACTGCCA +TTGGATGCGAGTCGCGACGGAAAGCATGGAACAGACGGGTAATCTGCTCGTGGATCATGG +TATGACGGGTCACCGTAGTTTTAAATTCGTCATACTGTTCCTGAGTCGGTTTTTCACCAT +TCAGCAGGATGTAACAAACTTCCAGGTAGTTAGAATCGGTCGCCAGCTGATCGATCGGGA +AACCGCGGTGCAGCAAAATACCTTCATCACCATCAATAAAAGTAATTTTAGATTCGCAGG +ATGCGGTTGAAGTGAAGCCTGGGTCAAAGGTGAACACACCTTTTGAACCGAGAGTACGGA +TATCAATAACATCTTGACCCAGCGTGCCTTTCAGCACATCCAGTTCAACAGCTGTATCCC +CGTTGAGGGTGAGTTTTGCTTTTGTATCAGCCATTTAAGGTCTCCTTAGCGCCTTATTGC +GTAAGACTGCCGGAACTTAAATTTGCCTTCGCACATCAACCTGGCTTTACCCGTTTTTTA +TTTGGCTCGCCGCTCTGTGAAAGAGGGGAAAACCTGGGTACAGAGCTCTGGGCGCTTGCA +GGTAAAGGATCCATTGATGACGAATAAATGGCGAATCAAGTACTTAGCAATCCGAATTAT +TAAACTTGTCTACCACTAATAACTGTCCCGAATGAATTGGTCAATACTCCACACTGTTAC +ATAAGTTAATCTTAGGTGAAATACCGACTTCATAACTTTTACGCATTATATGCTTTTCCT +GGTAATGTTTGTAACAACTTTGTTGAATGATTGTCAAATTAGATGATTAAAAATTAAATA +AATGTTGTTATCGTGACCTGGATCACTGTTCAGGATAAAACCCGACAAACTATATGTAGG +TTAATTGTAATGATTTTGTGAACAGCCTATACTGCCGCCAGTCTCCGGAACACCCTGCAA +TCCCGAGCCACCCAGCGTTGTAACGTGTCGTTTTCGCATCTGGAAGCAGTGTTTTGCATG +ACGCGCAGTTATAGAAAGGACGCTGTCTGACCCGCAAGCAGACCGGAGGAAGGAAATCCC +GACGTCTCCAGGTAACAGAAAGTTAACCTCTGTGCCCGTAGTCCCCAGGGAATAATAAGA +ACAGCATGTGGGCGTTATTCATGATAAGAAATGTGAAAAAACAAAGACCTGTTAATCTGG +ACCTACAGACCATCCGGTTCCCCATCACGGCGATAGCGTCCATTCTCCATCGCGTTTCCG +GTGTGATCACCTTTGTTGCAGTGGGCATCCTGCTGTGGCTTCTGGGTACCAGCCTCTCTT +CCCCTGAAGGTTTCGAGCAAGCTTCCGCGATTATGGGCAGCTTCTTCGTCAAATTTATCA +TGTGGGGCATCCTTACCGCTCTGGCGTATCACGTCGTCGTAGGTATTCGCCACATGATGA +TGGATTTTGGCTATCTGGAAGAAACATTCGAAGCGGGTAAACGCTCCGCCAAAATCTCCT +TTGTTATTACTGTCGTGCTTTCACTTCTCGCAGGAGTCCTCGTATGGTAAGCAACGCCTC +CGCATTAGGACGCAATGGCGTACATGATTTCATCCTCGTTCGCGCTACCGCTATCGTCCT +GACGCTCTACATCATTTATATGGTCGGTTTTTTCGCTACCAGTGGCGAGCTGACATATGA +AGTCTGGATCGGTTTCTTCGCCTCTGCGTTCACCAAAGTGTTCACCCTGCTGGCGCTGTT +TTCTATCTTGATCCATGCCTGGATCGGCATGTGGCAGGTGTTGACCGACTACGTTAAACC +GCTGGCTTTGCGCCTGATGCTGCAACTGGTGATTGTCGTTGCACTGGTGGTTTACGTGAT +TTATGGATTCGTTGTGGTGTGGGGTGTGTGATGAAATTGCCAGTCAGAGAATTTGATGCA +GTTGTGATTG diff -r 000000000000 -r 9a1626faa05c test-data/trimmed_2 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/trimmed_2 Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,78 @@ +>gi|145231|gb|M33724.1|ECOALPHOA Escherichia coli K-12 truncated PhoA (phoA) gene, partial cds; and transposon Mu dI, partial sequence +CAAAGCTCCGGGCCTCACCCAGGCGCTAAATACCAAAGATGGCGCAGTGATGGTGATGAG +TTACGGGAACTCCGAAGAGGATTCACAAGAACATACCGGCAGTCAGTTGCGTATTGCGGC +GTATGGCCCGCATGCCGCCAATGAAGCGGCGCACGAAAAACGCGAAAGCGT +>gi|145232|gb|M33725.1|ECOALPHOB Escherichia coli K12 phoA pseudogene and transposon Mu dl-R, partial sequence +CTGTCATAAAGTTGTCACGGCCGAGACTTATAGTCGCTTTGTTTTTATTTTTTAATGTAT +TTGTACATGGAGAAAATAAAGTGAAACAAAGCACTATTGCACTGGCACTCTTACCGTTAC +TGTTTACCCCTGTGACAAAAGCCCGGACACCAGTGAAGCGGCGCACGAAAAACGCGAAAG +CGT +>gi|145234|gb|M33727.1|ECOALPHOE Escherichia coli K12 upstream sequence of psiA5::Mu dI. is identical to psiA30 upstream sequence; putative (phoA) pseudogene and transposon Mu dl-R, partial sequence +TTGTTTTTATTTTTTAATGTATTTGTACATGGAGAAAATAAAGTGAAACAAAGCACTATT +GCACTGGTGAAGCGGCGCACGAAAAACGCGAAAGCGT +>gi|146195|gb|J01619.1|ECOGLTA Eschericia coli gltA gene, sdhCDAB operon and sucABCD operons, complete sequence +GAATTCGACCGCCATTGCGCAAGGCATCGCCATGACCAGGCAGGATACAAAAGAGAGTCG +ATAAATATTCACGGTGTCCATACCTGATAAATATTTTATGAAAGGCGGCGATGATGCCGC +AAAATAATACTTATTTATAATCCAGCACGTAGGTTGCGTTAGCGGTTACTTCACCTGCCG +TGACATCGACTGCATTATCAATTTGTTCCATCCAGGCGAAAAAGTTCAGCGTCTGTTCTG +ATGAGCTTGCATCCAGGTCAAGATCTGGCGCGGCTGAACCTAATACGATGTTACCGTCAT +TTTTGTCCATCAGTCGTACACCGACCCCAGTTGCTTCGCCTGCACTGGTGTTGCTCAACA +AAGGCGTAGCACCAGTTGTCTTAGCCGTGCTATCGAAGGTTACGCCAAACTTTGGATACC +GGCATTCCGCTACCGTTGTCAGAAGCAGGCAGATCACAGTTGATCAAGCGAATGTCGACG +GCCACTTTATTGCTATGATGCTCCCGGTTTATATGGGTTGTCGTGACTTGTCCAAGATCT +ATGTTTTTATCAATATCTTCTGGATGAATTTCACAAGGTGCTTCAATAACCTCCCCCTTA +AAGTGAATTTCGCCAGAACCTTCATCAGCAGCATAAACAGGTGCAGTGAACAGCAGAGAT +ACGGCCAGTGCGGCCAATGTTTTTTGTCCTTTAAACATAACAGAGTCCTTTAAGGATATA +GAATAGGGGTATAGCTACGCCAGAATATCGTATTTGATTATTGCTAGTTTTTAGTTTTGC +TTAAAAAATATTGTTAGTTTTATTAAATTGGAAAACTAAATTATTGGTATCATGAATTGT +TGTATGATGATAAATATAGGGGGGATATGATAGACGTCATTTTCATAGGGTTATAAAATG +CGACTACCATGAAGTTTTTAATTCAAAGTATTGGGTTGCTGATAATTTGAGCTGTTCTAT +TCTTTTTAAATATCTATATAGGTCTGTTAATGGATTTTATTTTTACAAGTTTTTTGTGTT +TAGGCATATAAAAATCAAGCCCGCCATATGAACGGCGGGTTAAAATATTTACAACTTAGC +AATCGAACCATTAACGCTTGATATCGCTTTTAAAGTCGCGTTTTTCATATCCTGTATACA +GCTGACGCGGACGGGCAATCTTCATACCGTCACTGTGCATTTCGCTCCAGTGGGCGATCC +AGCCAACGGTACGTGCCATTGCGAAAATGACGGTGAACATGGAAGACGGAATACCCATCG +CTTTCAGGATGATACCAGAGTAGAAATCGACGTTCGGGTACAGTTTCTTCTCGATAAAGT +ACGGGTCGTTCAGCGCGATGTTTTCCAGCTCCATAGCCACTTCCAGCAGGTCATCCTTCG +TGCCCAGCTCTTTCAGCACTTCATGGCAGGTTTCACGCATTACGGTGGCGCGCGGGTCGT +AATTTTTGTACACGCGGTGACCGAAGCCCATCAGGCGGAAAGAATCATTTTTGTCTTTCG +CACGACGAAAAAATTCCGGAATGTGTTTAACGGAGCTGATTTCTTCCAGCATTTTCAGCG +CCGCTTCGTTAGCACCGCCGTGCGCAGGTCCCCACAGTGAAGCAATACCTGCTGCGATAC +AGGCAAACGGGTTCGCACCCGAAGAGCCAGCGGTACGCACGGTGGAGGTAGAGGCGTTCT +GTTCATGGTCAGCGTGCAGGATCAGAATACGGTCCATAGCACGTTCCAGAATCGGATTAA +CTTCATACGGTTCGCACGGCGTGGAGAACATCATATTCAGGAAGTTACCGGCGTAGGAGA +GATCGTTGCGCGGGTAAACAAATGGCTGACCAATGGAATACTTGTAACACATCGCGGCCA +TGGTCGGCATTTTCGACAGCAGGCGGAACGCGGCAATTTCACGGTGACGAGGATTGTTAA +CATCCAGCGAGTCGTGATAGAACGCCGCCAGCGCGCCGGTAATACCACACATGACTGCCA +TTGGATGCGAGTCGCGACGGAAAGCATGGAACAGACGGGTAATCTGCTCGTGGATCATGG +TATGACGGGTCACCGTAGTTTTAAATTCGTCATACTGTTCCTGAGTCGGTTTTTCACCAT +TCAGCAGGATGTAACAAACTTCCAGGTAGTTAGAATCGGTCGCCAGCTGATCGATCGGGA +AACCGCGGTGCAGCAAAATACCTTCATCACCATCAATAAAAGTAATTTTAGATTCGCAGG +ATGCGGTTGAAGTGAAGCCTGGGTCAAAGGTGAACACACCTTTTGAACCGAGAGTACGGA +TATCAATAACATCTTGACCCAGCGTGCCTTTCAGCACATCCAGTTCAACAGCTGTATCCC +CGTTGAGGGTGAGTTTTGCTTTTGTATCAGCCATTTAAGGTCTCCTTAGCGCCTTATTGC +GTAAGACTGCCGGAACTTAAATTTGCCTTCGCACATCAACCTGGCTTTACCCGTTTTTTA +TTTGGCTCGCCGCTCTGTGAAAGAGGGGAAAACCTGGGTACAGAGCTCTGGGCGCTTGCA +GGTAAAGGATCCATTGATGACGAATAAATGGCGAATCAAGTACTTAGCAATCCGAATTAT +TAAACTTGTCTACCACTAATAACTGTCCCGAATGAATTGGTCAATACTCCACACTGTTAC +ATAAGTTAATCTTAGGTGAAATACCGACTTCATAACTTTTACGCATTATATGCTTTTCCT +GGTAATGTTTGTAACAACTTTGTTGAATGATTGTCAAATTAGATGATTAAAAATTAAATA +AATGTTGTTATCGTGACCTGGATCACTGTTCAGGATAAAACCCGACAAACTATATGTAGG +TTAATTGTAATGATTTTGTGAACAGCCTATACTGCCGCCAGTCTCCGGAACACCCTGCAA +TCCCGAGCCACCCAGCGTTGTAACGTGTCGTTTTCGCATCTGGAAGCAGTGTTTTGCATG +ACGCGCAGTTATAGAAAGGACGCTGTCTGACCCGCAAGCAGACCGGAGGAAGGAAATCCC +GACGTCTCCAGGTAACAGAAAGTTAACCTCTGTGCCCGTAGTCCCCAGGGAATAATAAGA +ACAGCATGTGGGCGTTATTCATGATAAGAAATGTGAAAAAACAAAGACCTGTTAATCTGG +ACCTACAGACCATCCGGTTCCCCATCACGGCGATAGCGTCCATTCTCCATCGCGTTTCCG +GTGTGATCACCTTTGTTGCAGTGGGCATCCTGCTGTGGCTTCTGGGTACCAGCCTCTCTT +CCCCTGAAGGTTTCGAGCAAGCTTCCGCGATTATGGGCAGCTTCTTCGTCAAATTTATCA +TGTGGGGCATCCTTACCGCTCTGGCGTATCACGTCGTCGTAGGTATTCGCCACATGATGA +TGGATTTTGGCTATCTGGAAGAAACATTCGAAGCGGGTAAACGCTCCGCCAAAATCTCCT +TTGTTATTACTGTCGTGCTTTCACTTCTCGCAGGAGTCCTCGTATGGTAAGCAACGCCTC +CGCATTAGGACGCAATGGCGTACATGATTTCATCCTCGTTCGCGCTACCGCTATCGTCCT +GACGCTCTACATCATTTATATGGTCGGTTTTTTCGCTACCAGTGGCGAGCTGACATATGA +AGTCTGGATCGGTTTCTTCGCCTCTGCGTTCACCAAAGTGTTCACCCTGCTGGCGCTGTT +TTCTATCTTGATCCATGCCTGGATCGGCATGTGGCAGGTGTTGACCGACTACGTTAAACC +GCTGGCTTTGCGCCTGATGCTGCAACTGGTGATTGTCGTTGCACTGGTGGTTTACGTGAT +TTATGGATTCGTTGTGGTGTGGGGTGTGTGATGAAATTGCCAGTCAGAGAATTTGATGCA +GTTGTGATTG diff -r 000000000000 -r 9a1626faa05c test-data/trimmed_3 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/trimmed_3 Mon Mar 25 12:55:16 2019 -0400 @@ -0,0 +1,78 @@ +>gi|145231|gb|M33724.1|ECOALPHOA Escherichia coli K-12 truncated PhoA (phoA) gene, partial cds; and transposon Mu dI, partial sequence +CAAAGCTCCGGGCCTCACCCAGGCGCTAAATACCAAAGATGGCGCAGTGATGGTGATGAG +TTACGGGAACTCCGAAGAGGATTCACAAGAACATACCGGCAGTCAGTTGCGTATTGCGGC +GTATGGCCCGCATGCCGCCAATGAAGCGGCGCACGAAAAACGCGAAAGCGT +>gi|145232|gb|M33725.1|ECOALPHOB Escherichia coli K12 phoA pseudogene and transposon Mu dl-R, partial sequence +CTGTCATAAAGTTGTCACGGCCGAGACTTATAGTCGCTTTGTTTTTATTTTTTAATGTAT +TTGTACATGGAGAAAATAAAGTGAAACAAAGCACTATTGCACTGGCACTCTTACCGTTAC +TGTTTACCCCTGTGACAAAAGCCCGGACACCAGTGAAGCGGCGCACGAAAAACGCGAAAG +CGT +>gi|145234|gb|M33727.1|ECOALPHOE Escherichia coli K12 upstream sequence of psiA5::Mu dI. is identical to psiA30 upstream sequence; putative (phoA) pseudogene and transposon Mu dl-R, partial sequence +TTGTTTTTATTTTTTAATGTATTTGTACATGGAGAAAATAAAGTGAAACAAAGCACTATT +GCACTGGTGAAGCGGCGCACGAAAAACGCGAAAGCGT +>gi|146195|gb|J01619.1|ECOGLTA Eschericia coli gltA gene, sdhCDAB operon and sucABCD operons, complete sequence +GAATTCGACCGCCATTGCGCAAGGCATCGCCATGACCAGGCAGGATACAAAAGAGAGTCG +ATAAATATTCACGGTGTCCATACCTGATAAATATTTTATGAAAGGCGGCGATGATGCCGC +AAAATAATACTTATTTATAATCCAGCACGTAGGTTGCGTTAGCGGTTACTTCACCTGCCG +TGACATCGACTGCATTATCAATTTGTTCCATCCAGGCGAAAAAGTTCAGCGTCTGTTCTG +ATGAGCTTGCATCCAGGTCAAGATCTGGCGCGGCTGAACCTAATACGATGTTACCGTCAT +TTTTGTCCATCAGTCGTACACCGACCCCAGTTGCTTCGCCTGCACTGGTGTTGCTCAACA +AAGGCGTAGCACCAGTTGTCTTAGCCGTGCTATCGAAGGTTACGCCAAACTTTGGATACC +GGCATTCCGCTACCGTTGTCAGAAGCAGGCAGATCACAGTTGATCAAGCGAATGTCGACG +GCCACTTTATTGCTATGATGCTCCCGGTTTATATGGGTTGTCGTGACTTGTCCAAGATCT +ATGTTTTTATCAATATCTTCTGGATGAATTTCACAAGGTGCTTCAATAACCTCCCCCTTA +AAGTGAATTTCGCCAGAACCTTCATCAGCAGCATAAACAGGTGCAGTGAACAGCAGAGAT +ACGGCCAGTGCGGCCAATGTTTTTTGTCCTTTAAACATAACAGAGTCCTTTAAGGATATA +GAATAGGGGTATAGCTACGCCAGAATATCGTATTTGATTATTGCTAGTTTTTAGTTTTGC +TTAAAAAATATTGTTAGTTTTATTAAATTGGAAAACTAAATTATTGGTATCATGAATTGT +TGTATGATGATAAATATAGGGGGGATATGATAGACGTCATTTTCATAGGGTTATAAAATG +CGACTACCATGAAGTTTTTAATTCAAAGTATTGGGTTGCTGATAATTTGAGCTGTTCTAT +TCTTTTTAAATATCTATATAGGTCTGTTAATGGATTTTATTTTTACAAGTTTTTTGTGTT +TAGGCATATAAAAATCAAGCCCGCCATATGAACGGCGGGTTAAAATATTTACAACTTAGC +AATCGAACCATTAACGCTTGATATCGCTTTTAAAGTCGCGTTTTTCATATCCTGTATACA +GCTGACGCGGACGGGCAATCTTCATACCGTCACTGTGCATTTCGCTCCAGTGGGCGATCC +AGCCAACGGTACGTGCCATTGCGAAAATGACGGTGAACATGGAAGACGGAATACCCATCG +CTTTCAGGATGATACCAGAGTAGAAATCGACGTTCGGGTACAGTTTCTTCTCGATAAAGT +ACGGGTCGTTCAGCGCGATGTTTTCCAGCTCCATAGCCACTTCCAGCAGGTCATCCTTCG +TGCCCAGCTCTTTCAGCACTTCATGGCAGGTTTCACGCATTACGGTGGCGCGCGGGTCGT +AATTTTTGTACACGCGGTGACCGAAGCCCATCAGGCGGAAAGAATCATTTTTGTCTTTCG +CACGACGAAAAAATTCCGGAATGTGTTTAACGGAGCTGATTTCTTCCAGCATTTTCAGCG +CCGCTTCGTTAGCACCGCCGTGCGCAGGTCCCCACAGTGAAGCAATACCTGCTGCGATAC +AGGCAAACGGGTTCGCACCCGAAGAGCCAGCGGTACGCACGGTGGAGGTAGAGGCGTTCT +GTTCATGGTCAGCGTGCAGGATCAGAATACGGTCCATAGCACGTTCCAGAATCGGATTAA +CTTCATACGGTTCGCACGGCGTGGAGAACATCATATTCAGGAAGTTACCGGCGTAGGAGA +GATCGTTGCGCGGGTAAACAAATGGCTGACCAATGGAATACTTGTAACACATCGCGGCCA +TGGTCGGCATTTTCGACAGCAGGCGGAACGCGGCAATTTCACGGTGACGAGGATTGTTAA +CATCCAGCGAGTCGTGATAGAACGCCGCCAGCGCGCCGGTAATACCACACATGACTGCCA +TTGGATGCGAGTCGCGACGGAAAGCATGGAACAGACGGGTAATCTGCTCGTGGATCATGG +TATGACGGGTCACCGTAGTTTTAAATTCGTCATACTGTTCCTGAGTCGGTTTTTCACCAT +TCAGCAGGATGTAACAAACTTCCAGGTAGTTAGAATCGGTCGCCAGCTGATCGATCGGGA +AACCGCGGTGCAGCAAAATACCTTCATCACCATCAATAAAAGTAATTTTAGATTCGCAGG +ATGCGGTTGAAGTGAAGCCTGGGTCAAAGGTGAACACACCTTTTGAACCGAGAGTACGGA +TATCAATAACATCTTGACCCAGCGTGCCTTTCAGCACATCCAGTTCAACAGCTGTATCCC +CGTTGAGGGTGAGTTTTGCTTTTGTATCAGCCATTTAAGGTCTCCTTAGCGCCTTATTGC +GTAAGACTGCCGGAACTTAAATTTGCCTTCGCACATCAACCTGGCTTTACCCGTTTTTTA +TTTGGCTCGCCGCTCTGTGAAAGAGGGGAAAACCTGGGTACAGAGCTCTGGGCGCTTGCA +GGTAAAGGATCCATTGATGACGAATAAATGGCGAATCAAGTACTTAGCAATCCGAATTAT +TAAACTTGTCTACCACTAATAACTGTCCCGAATGAATTGGTCAATACTCCACACTGTTAC +ATAAGTTAATCTTAGGTGAAATACCGACTTCATAACTTTTACGCATTATATGCTTTTCCT +GGTAATGTTTGTAACAACTTTGTTGAATGATTGTCAAATTAGATGATTAAAAATTAAATA +AATGTTGTTATCGTGACCTGGATCACTGTTCAGGATAAAACCCGACAAACTATATGTAGG +TTAATTGTAATGATTTTGTGAACAGCCTATACTGCCGCCAGTCTCCGGAACACCCTGCAA +TCCCGAGCCACCCAGCGTTGTAACGTGTCGTTTTCGCATCTGGAAGCAGTGTTTTGCATG +ACGCGCAGTTATAGAAAGGACGCTGTCTGACCCGCAAGCAGACCGGAGGAAGGAAATCCC +GACGTCTCCAGGTAACAGAAAGTTAACCTCTGTGCCCGTAGTCCCCAGGGAATAATAAGA +ACAGCATGTGGGCGTTATTCATGATAAGAAATGTGAAAAAACAAAGACCTGTTAATCTGG +ACCTACAGACCATCCGGTTCCCCATCACGGCGATAGCGTCCATTCTCCATCGCGTTTCCG +GTGTGATCACCTTTGTTGCAGTGGGCATCCTGCTGTGGCTTCTGGGTACCAGCCTCTCTT +CCCCTGAAGGTTTCGAGCAAGCTTCCGCGATTATGGGCAGCTTCTTCGTCAAATTTATCA +TGTGGGGCATCCTTACCGCTCTGGCGTATCACGTCGTCGTAGGTATTCGCCACATGATGA +TGGATTTTGGCTATCTGGAAGAAACATTCGAAGCGGGTAAACGCTCCGCCAAAATCTCCT +TTGTTATTACTGTCGTGCTTTCACTTCTCGCAGGAGTCCTCGTATGGTAAGCAACGCCTC +CGCATTAGGACGCAATGGCGTACATGATTTCATCCTCGTTCGCGCTACCGCTATCGTCCT +GACGCTCTACATCATTTATATGGTCGGTTTTTTCGCTACCAGTGGCGAGCTGACATATGA +AGTCTGGATCGGTTTCTTCGCCTCTGCGTTCACCAAAGTGTTCACCCTGCTGGCGCTGTT +TTCTATCTTGATCCATGCCTGGATCGGCATGTGGCAGGTGTTGACCGACTACGTTAAACC +GCTGGCTTTGCGCCTGATGCTGCAACTGGTGATTGTCGTTGCACTGGTGGTTTACGTGAT +TTATGGATTCGTTGTGGTGTGGGGTGTGTGATGAAATTGCCAGTCAGAGAATTTGATGCA +GTTGTGATTG