Mercurial > repos > peterjc > clc_assembly_cell
changeset 0:0996169ac2e8 draft
Uploaded v0.0.2, previously only on the TestToolShed
author | peterjc |
---|---|
date | Fri, 21 Nov 2014 06:41:12 -0500 |
parents | |
children | 5ae1c0312aaa |
files | test-data/NC_010642.fna tools/clc_assembly_cell/README.rst tools/clc_assembly_cell/clc_assembler.xml tools/clc_assembly_cell/clc_mapper.xml tools/clc_assembly_cell/tool_dependencies.xml |
diffstat | 5 files changed, 679 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/NC_010642.fna Fri Nov 21 06:41:12 2014 -0500 @@ -0,0 +1,245 @@ +>gi|187250362|ref|NC_010642.1| Panthera tigris mitochondrion, complete genome +GGGTTAATGACTAATCAGCCCATGATCACACATAACTGTGGTGTCATGCATTTGGTATTTTTAATTTTTA +GGGGGTCGAACTTGCTATGACTCAGCTATGACCTAAAGGTCCCGACTCAGTCAAATATAATGTAGCTGGA +CTTATTCTCTATGCGGGGGTTCCACACGTACAACAAACAAGGTGTTATTCAGTCAATGGTCACAGGACAT +ATACTTAAATTCCTATTGTTCCACAGGACACGGCATGCGCGCACCCACGTATACGCGTACACGTATACAC +GTATACACGTACACACGTACACACGTACACACGTACACACGTACACACGTATACACGTATACACGTATAC +ACGTATACACGTATACACGTATACACGTATACACGTATACACGTATACACGTATACACGTACACACGTAC +ACACGTACACACGTACACACGTATACACGTATACACGTATACACGTATACACGTATACGCGTACACGTAC +ACACGTACACACGTACACACGTACACACGTACACACGTACACACGTACACACGTACACACGTACACACGT +ACACACGTACACACGTACACACGTACACACGTACACACGTATACACGTATACACGTATACACGTATACAC +ATGCAAACTTTTTTGATTTAGTAAATAATTAGCTTAACCAAACCCCCCTTACCCCCCGTTAATCTTATTT +ATTATAGTACGTGTTTATTTCTGTCTTGCCAAACCCCAAAAACAAGACTAAACCCGTATCTAGGCACAAG +GCCTAAGATTAACGTTTACAAACTCTACCAACCCCATCATTACCAATTATTAATACTAAATCATAACTTC +GTTCGCAGTTATCTATAGATACGACAACCCGATCTCTAATTCGTCCCTATCGAACAATATTTACATACCC +AACAACCCTATGTCTTGGTTAATGTAGCTTAAATATATTAAAGCAAGGCACTGAAAATGCCTAGATGGGC +CGCCAGGCTCCATAAACATAAAGGTTTGGTCCTAGCCTTTCCATTAGTTGTTAATAAAATTACACATGCA +AGCCTCCGCATCCCGGTGAAAATGCCCTCTAAATCACCCAGTGATCCAAAGGAGCCGGTATCAAGTACAC +AACCATTGTAGCTCATGACACCTTGCTCAGCCACACCCCCACGGGACACAGCAGTGATAAAAATTAAGCC +ATGAATGAAAGTTCGACTAAGCTATATTAAATTAGGGTTGGTAAATTTCGTGCCAGCCACCGCGGTCATA +CGATTAACCCAAACTAATAGACCCACGGCGTAAAGCGTGTTACAGAAAAAAGTATACTAAAGTTAAGCCT +TAACTAGGCTGTAAAAAGCCACAGTTAACGTAAAAATACAGCACGAAAGTAACTTTAATATTTCTGACCA +CACGATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAGATAGTTAACC +CAAACAAAACTATCCGCCAGAGAACTACTAGCAACAGCTTAAAACTCAAAGGACTTGGCGGTGCTTTATA +TCCCTCTAGAGGAGCCTGTTCCATAATCGATAAACCCCGATAAACCTCACCATCTCTTGCTAATTCAGCC +TATATACCGCCATCTTCAGCAAACCCTAAAAAGGAAGAAAAGTAAGCACAAGTATCTTAACATAAAAAAG +TTAGGTCAAGGTGTAGCCCATGAGATGGGGAAGTAATGGGCTACATTTTCTATAACTAGAACATCCACGA +AAATCCTTATGAAATTAAGTATTAAAGGAGGATTTAGTAGTAAATTCGAGAATAGAGAGCTCGATTGAAT +CGGGCCATGAAGCACGCACACACCGCCCGTCACCCTCCTCAAGTGATTAGACCCCAAAGAAACCTATTCA +AACCACTACACCCACAAGAGGAGACAAGTCGTAACAAGGTAAGCATACTGGAAAGTGTGCTTGGATAACA +AGATGTAGCTTAAACAAAGCATCTGGCCTACACCCAGAAGATTTCATATTAAACTGACCATCTTGAGCTA +GAGCTAGCCCAACTACCCATAAACACAACTAACATTAGAAAGTAAAACAAAACATTTAGTTACTCTAAAA +AGTATAGGAGATAGAAATTTAACTCGGCGCTATAGAAAAAGTACCGCAAGGGAATGATGAAAGAAAAAAC +TAAAAGCACTATACAGCAAAGATTACCCCTTGTACCTTTTGCATAATGAATTAGCTAGAATAACCTAACA +AAGAGAACTTCAGCTAGGCCCCCCGAAACCAGACGAGCTACCCATAAACAATCTATTACAGGATGAACTC +GTCTATGTTGCAAAATAGTGAGAAGATTTATGGGTAGAGGTGAAAAGCCTAACGAGCCTGGTGATAGCTG +GTTGCCCAGAACAGAATCTTAGTTCAGCTTTAAACTTACCTCAAAAACCCTAAAATTCCAATGTAAGTTT +AAAATATAGTCTAAAAAGGTACAGCTTTTTAGAATCTAGGATACAGCCTTAATTAGAGAGTAAGCATATA +ACACAAACCATAGTTGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACAATCAAAACATC +TCAATGTCAAAAAACGCAACCAACTCCTAATCTAAAACTGGGCTAATCTATTTAACAATAGAAGCAATAA +TGCTAATATGAGTAACAAGAAATACTTCTCCCGCGCATAAGCTTATATCAGAACGGATAACCACTGATAG +TTAACAACAAGATATATATAACCTAACTACAAGCAAAATATCAGACTAATTGTTAACCCAACACAGGCAT +GCAATTTAGGGAAAGATTAAAAGAAGTAAAAGGAACTCGGCAAACACAAGCCCCGCCTGTTTACCAAAAA +CATCACCTCTAGCATTTCCAGTATTAGAGGCACTGCCTGCCCAGTGACATTAGTTAAACGGCCGCGGTAT +CCTGACCGTGCAAAGGTAGCATAATCATTTGTTCCTTAAATAGGGACTTGTATGAATGGCCACACGAGGG +CTTTACTGTCTCTTACTTCCAATCCGTGAAATTGACCTTCCCGTGAAGAGGCGGGAATACGACAATAAGA +CGAGAAGACCCTGTGGAGCTTTAATTAATCGACCCAAAGAGACCTTAATAACCAACCGACAGGAACAACA +GACCTCTGCCATGGGCCGACAATTTAGGTTGGGGTGACCTCGGAGAATAAAACAACCTCCGAGTGATTTA +AATCTAGACTGACCAGTCGAAAGTATTACATCACTTATTGATCCAAAGCTTGATCAACGGAACAAGTTAC +CCCAGGGATAACAGCGCAATCCTATTTCAGAGTCCATATCGACAATAGGGTTTACGACCTCGATGTTGGA +TCAGGACATCCCGATGGTGCAGCAGCTATCAAAGGTTCGTTTGTTCAACGATTAAAGTCCTACGTGATCT +GAGTTCAGACCGGAGTAATCCAGGTCGGTTTCTATCTATTAAATAATTTCTCCCAGTACGAAAGGACAAG +AGAAATAAGGCCCACTTTACCAAAGCGCCTTTAACCAAATAGATGATATAATCTCAATCTAAACAGTTTA +TCTAAACATATCACCCGTAGAGCTCGGGTTTGTTAGGGTGGCAGAGCCCGGCAATTGCATAAAACTTAAG +CTTTTACTATCAGAGGTTCAACTCCTCTCCCTAACAGCATGTTTATAATCAATATTCTCTCATTAATTAT +CCCTATTCTCTTCGCCGTAGCCTTCCTAACCCTAGTTGAACGTAAAGTACTGGGCTACATACAACTCCGT +AAAGGACCAAACGTCGTAGGACCATACGGCCTACTTCAGCCTATTGCAGACGCCATGAAACTCTTCACTA +AAGAACCCCTCCGACCCCTCACATCCTCCACATTCATATTCATCACAGCACCCATCCTAGCTCTTACACT +AGCCCTAACCATATGAATCCCACTGCCCATACCATACCCACTCATTAACATAAACTTAGGAGTGCTATTT +ATACTAGCTATGTCCAGCCTAGCTGTTTACTCCATTCTATGATCAGGATGGGCTTCAAACTCAAAATATG +CCCTAATCGGAGCCCTACGAGCCGTAGCCCAAACAATCTCATATGAAGTCACATTAGCTATCATTCTCTT +ATCAGTACTACTAATAAATGGATCCTTCACATTAGCTGCACTAATTACCACCCAAGAATACATCTGGCTC +ATCATCCCTGCATGACCCCTAGCCATAATATGATTCATCTCCACACTAGCAGAAACCAACCGAGCTCCAT +TTGATCTAACAGAAGGAGAATCAGAACTCGTTTCCGGATTCAACGTAGAATACGCAGCAGGCCCCTTTGC +CCTATTTTTTCTAGCAGAATACGCTAATATTATCATAATAAACATCCTCACAACAATCTTATTTTTCGGA +GCATTCCATAATCCCTATATACCAGAACTATATACTATCAACTTCACTGTAAAAACCCTAATTCTAACAA +CCACCTTCCTATGGATCCGAGCATCTTATCCACGATTCCGATATGACCAATTAATGCACCTCCTATGAAA +AAACTTCCTACCCCTTACTCTAGCCCTATGCATATGACACGTCTCCCTACCCATCATTACAGCAAGCATT +CCACCCCAAACATAAGAAATATGTCTGACAAAAGAATTACTTTGATAGAGTAAAACATAGAGGTTTAAGC +CCTCTTATTTCTAGAATTATAGGAATCGAACCTAATCCTAAGAATCCAAAAATCTTCGTGCTACCAATAT +TACACCACATTCTAAGTAAGGTCAGCTAAATAAGCTATCGGGCCCATACCCCGAAAATGTTGGTTTACAC +CCTTCCCATACTAATCAAACCCCCTATCCTCACCATCATTATACTAACCGTTATCTCAGGAACTATAATC +GTAATAACAACTTCTCACTGACTTATAGTCTGAATTGGCTTCGAAATAAACCTATTAGCTATTATTCCCA +TCCTCATGAAAAAATATAACCCACGAGCCATAGAAGCAGCCACAAAATACTTCCTGACACAAGCAACCGC +TTCAATACTCCTAATAATAGGAATTATCATCAACCTGCTGCACTCAGGACAATGAACCGTATCAAAAGAC +CTCAACCCCATGGCATCCATTATAATAACAACCGCCTTAGCAGTAAAACTAGGACTAGCCCCATTCCACT +TCTGAGTGCCCGAAGTTACACAAGGAATCTCCTTGTCTTCAGGCCTGATCCTACTCACATGACAAAAAAT +CGCACCACTATCAATTCTTTACCAAATTTCACCCACCATTAACCCCAACCTACTCCTAGCAATAGCCATT +ATATCAGTTATAATCGGAGGCTGAGGGGGACTTAACCAAACCCAGCTACGAAAAATCATAGCATACTCCT +CAATCGCCCATATAGGTTGAATAACAGCCATCATAATATATAGCCCCACAATAATAATTTTAAACCTGAC +TATCTATATCATTATAACACTAACCACTTTCATGTTACTCATATACAACTCCACCACAACAACATTATCC +TTATCACAAACATGAAACAAAACGCCCCTGATCACCTCACTTATCCTACTGCTAATAATGTCTCTGGGCG +GCCTCCCCCCACTCTCTGGCTTCATCCCAAAATGAATAATCATTCAAAAACTAACCAAAAATGAAATAAT +CATAATACCCACACTACTAGCCATAACAGCACTACTTAACCTGTACTTCTACATACGACTAACATATACC +ACTGCACTAACTATATTCCCCTCAAACAACTGTATAAAAATAAAATGACGGTTCAAATGCACAAAAAAAA +TAATCTTTTTACCCCCCTTAATCGTAATGTCCACCATGCTACTCCCACTCACACCAATACTATCCGTCCT +AGATTAGAAAGTTTAGGTTAAACTAGACCAAGAGCCTTCAAAGCTCTAAGTAAGCCCTATAGAATTAACT +TCTGCATACCTATTAACTCTAAGGACTGGAAGAATCTATCTTACATCAATTGACTGCAAATCAAACACTT +TAATTAAGCTAAGCCCTTACTAGATTGGTGGGCCCTAACCCCACGAAATTTTAGTTAACAGCTAAATACC +CTAATCAACTGGCTTCAATCTACTTCTCCCGCCGCCTGGAAAAAAAAGGCGGGAGAAGCCCCGGCAGCGT +CAAGCTGCTTCTTTGAATTTGCAATTCAATATGACATTCACTACAGGACTTGGTAAAAAGAGGGTTAGAA +CCTCCTGTCTTTAGATTTACAGTCTAATGCTTACTCAGCCATTTTACCTATGTTCATAAACCGCTGACTA +TTTTCAACCAATCACAAGGATATTGGAACTCTTTACCTTTTATTTGGCGCCTGGGCTGGTATAGTGGGGA +CTGCCCTCAGTCTCCTAATTCGAGCCGAACTGGGTCAACCTGGCACACTACTAGGAGATGACCAAATTTA +TAATGTAGTAGTTACTGCCCATGCCTTTGTGATAATCTTTTTTATAGTAATGCCTATTATAATTGGAGGA +TTCGGAAACTGGCTAGTTCCGTTAATAATCGGAGCCCCCGATATGGCATTCCCTCGAATGAATAACATAA +GCTTCTGACTCCTTCCCCCATCCTTCCTACTTCTGCTCGCATCGTCTATGGTAGAAGCTGGGGCAGGAAC +TGGGTGGACAGTATACCCACCCCTAGCTGGCAACCTAGCCCATGCAGGAGCATCCGTGGATCTAACTATT +TTTTCACTACACCTAGCAGGCGTCTCCTCAATCTTAGGTGCTATTAATTTTATTACTACTATTATTAATA +TAAAACCGCCCGCTATGTCCCAATACCAAACACCCCTGTTTGTTTGATCGGTTCTAATTACTGCTGTGTT +GCTACTTCTATCACTGCCAGTTTTAGCAGCAGGCATCACCATGCTACTGACAGATCGAAATCTAAATACC +ACATTTTTTGATCCTGCCGGGGGAGGAGACCCCATCTTATATCAACACCTATTCTGATTCTTCGGTCACC +CAGAAGTCTATATCTTAATCCTGCCCGGGTTTGGAATAATTTCACATATTGTCACCTACTACTCAGGCAA +AAAAGAACCTTTTGGCTACATGGGGATAGTCTGAGCCATAATGTCAATTGGCTTTCTGGGCTTTATCGTA +TGGGCCCATCACATGTTTACTGTAGGGATAGATGTGGATACACGAGCATACTTTACGTCAGCTACTATAA +TTATCGCTATTCCTACTGGGGTAAAAGTATTTAGCTGATTGGCCACTCTTCACGGGGGTAATATTAAATG +GTCTCCCGCTATACTATGGGCTTTGGGATTCATTTTCCTATTCACCGTAGGGGGCTTAACAGGAATTGTA +CTAGCAAACTCCTCATTGGATATTGTCCTTCACGACACATACTACGTAGTAGCCCACTTCCACTACGTCT +TGTCAATAGGAGCAGTATTTGCTATTATAGGGGGCTTCGTTCACTGATTCCCCTTATTCTCAGGGTATAC +TCTTGATAATACTTGGGCAAAAGTTCATTTTACGATCATGTTCGTAGGTGTCAATATAACGTTTTTCCCT +CAGCATTTCCTAGGCCTGTCTGGGATGCCTCGACGTTATTCTGACTATCCAGACGCGTATACAACTTGAA +ACACAATCTCCTCAATAGGCTCTTTTATTTCACTAACAGCAGTAATATTAATAGTCTTTATAATGTGAGA +AGCTTTCGCATCAAAGCGAGAAGTAGCCACAGTGGAACTAACCACAACTAATCTCGAATGACTTCACGGA +TGTCCTCCTCCGTATCACACATTTGAAGAGCCAGCCTACGTGCTGTTAAAATAAGAAAGGAAGGAATCGA +ACCTCCTTAGACTGGTTTCAAGCCAATACCATAACCACTATGTCTTTCTCAATTAAGAAGTATTAGTAAA +ATAATTACATAACTTTGTCAAAGTTAAATTATAGGTTTAAGCCCTATGTGCTTCCATGGCATACCCCTTC +CAACTAGGTTTTCAAGATGCTACATCCCCCATTATAGAAGAGCTTTTACACTTCCATGATCATACATTAA +TAATTGTATTCCTAATTAGCTCCCTAGTCCTCTACATTATCTCATTAATACTGACAACTAAACTTACGCA +TACAAGCACAATAGATGCCCAAGAAGTAGAAACTATCTGAACCATTTTACCAGCCATCATCTTAATTCTC +ATTGCCCTGCCTTCCTTACGAATTCTCTATATAATAGATGAGATTAATAATCCCTCCCTCACTGTAAAGA +CTATAGGACATCAGTGATACTGAAGTTATGAGTACACCGACTATGAGGACCTAAGCTTCGACTCCTACAT +AATCCCCACTCAAGAGTTAAAGCCCGGAGAGCTCCGACTACTAGAAGTTGATAACCGAGTAGTGTTGCCA +ATAGAAGTGACTATTCGCATGTTAGTCTCATCAGAGGACGTACTGCACTCGTGAGCCATCCCATCCCTGG +GCCTAAAAACTGACGCTATCCCAGGCCGACTAAACCAAACAACCCTAATAGGCACACGGCCTGGGCTATA +TTATGGTCAGTGCTCAGAAATCTGCGGCTCAAATCACAGTTTTATGCCCATTGTCCTTGAACTAGTCCCG +CTGTCATATTTCGAAAAATGATCTGCATCTATGCTGTAATTTCACTAAGAAGCTAAATTAGCGTTAACCT +TTTAAGTTAAAAACTGGGAGTTCAAACCTCCCCTTAGTGACATGCCACAGTTAGACACATCAACCTGATT +TATTACTATTATTTCAATAATCATGACACTGTTCGTTATATTTCAACTAAAAATCTCAAAACATCTGTAC +CCATCAAGCCCAGAACCCAAATCTACAGCTGCATTAAAACAGCCGAGTCCCTGAGAAAAAAAATGAACGA +AAATCTATTCACCTCTTTTACTACCCCAACAATAATAGGACTGCCTGTTGTTGTGTTAATCGTTATGTTC +CCCAGCATTCTATTTCCCTCGCCTAACCGACTAATTAATAACCGCCTAGTCTCACTCCAACAATGATTAG +TACAACTTACATTAAAGCAAATACTGATTACCCACAATTACAAAGGACAAACCTGGGCCCTAATACTTAT +GTCTCTCATTTTATTTATTGGGTCAACAAATCTGCTAGGTCTACTACCTCACTCATTTACTCCAACTACC +CAATTATCAATAAACCTAGGCATAGCCATCCCCTTGTGAGCCGGCACCGTAATCACTGGATTCCGTCACA +AAACTAAAGCATCCTTGGCCCACTTTCTACCACAAGGAACACCAGTCCCCTTAATCCCTATGCTCGTAAT +TATCGAAACTATCAGCCTTTTTATCCAGCCCGTAGCCCTAGCCGTACGACTCACAGCTAATATTACTGCA +GGCCATTTATTAATACACCTAATCGGAGGAGCTGCTTTAGCCCTAACAAATATTAGTGCCCCTACTGCTT +TAATTACCTTTATCATCCTCATCCTACTGACAATTCTTGAATTCGCTGTAGCTCTAATCCAAGCCTATGT +TTTTACCCTACTTGTGAGCCTGTATTTACATGATAATACTTAATGACCCACCAAACCCACGCATATCACA +TGGTTAATCCCAGCCCATGGCCACTTACAGGGGCCCTTTCGGCCCTACTAATAACCTCAGGCCTGGCTAT +ATGATTTCACTATAACTCAATACTACTATTAACTCTAGGAATAACCACTAACCTATTGACTATATACCAA +TGGTGACGAGACATCATTCGGGAGAGCACATTCCAAGGCCACCACACACCCATTGTTCAAAAAGGCCTCC +GATACGGAATAATCCTTTTCATCATCTCAGAAGTATTCTTCTTCGCAGGTTTTTTCTGGGCCTTCTATCA +CTCAAGCCTGGCCCCGACCCCCGAATTGGGAGGATGCTGGCCACCAACAGGTATTATTCCCCTAAACCCC +CTAGAAGTCCCACTACTCAATACTTCTGTGCTCTTAGCTTCCGGAGTGTCAATCACCTGAGCCCATCATA +GCCTAATAGAAGGAAATCGAAAACACATGCTCCAAGCACTATTTATTACAATCTCCCTAGGAGTCTATTT +TACCCTCCTCCAAGCCTCTGAGTACTATGAAACATCATTTACAATCTCGGACGGAGTTTATGGGTCCACC +TTTTTCATAGCCACAGGGTTCCACGGACTACACGTAATTATTGGCTCTACCTTCCTAATCGTATGTTTCT +TGCGCCAACTAAAATACCACTTCACATCGAGCCACCATTTTGGATTCGAAGCCGCTGCTTGATATTGACA +TTTCGTAGACGTGGTTTGACTGTTCTTATACGTTTCCATTTATTGATGAGGATCCTATTCCCTTAGTATC +AACAAGTACAGTTGACTTCCAATCAACCAGTTTCGGTATAATCCGAAAGGGAATAATAAACATAATACTC +GCTCTACTCACCAACACACTTCTATCCACACTACTTGTACTCATCGCGTTCTGACTACCCCAACTAAACA +CCTATGCAGAAAAAGCAAGTCCTTATGAATGTGGATTTGACCCCATAGGATCCGCTCGCCTGCCCTTCTC +CATAAAATTCTTCCTAGTAGCTATCACATTCTTGCTATTCGACCTAGAAATTGCACTACTGCTCCCTCTT +CCCTGAGCCTCACAAACAAACAAACTGTCAACCATGCTTATCACAGCCCTTCTACTAATCTCCCTATTAG +CCGCAAGCCTAGCCTACGAGTGAACCCAAAAAGGATTAGAATGAACTGAATATGATAATTAGTTTAAACT +AAAACAAATGATTTCGACTCATTAGATTGTAGCTTACCCTATAATTATCAAATGTCCATAGTCTATGTTA +ACATCTTCCTGGCTTTCATCGTATCACTCATAGGACTATTAATGTACCGATCCCACTTAATATCCTCCCT +TCTATGCCTAGAAGGCATAATACTATCCCTATTTATTATGATAACCATGGCAGTTCTAAACAATCACTTT +ACACTAGCTAGCATGACCCCCATTATCCTGCTAGTATTTGCAGCCTGCGAGGCAGCACTGGGCTTGTCCC +TACTAGTAATGGTATCAAATACATATGGTACCGACTATGTACAAAACCTAAACCTCTTGCAATGCTAAAA +ATTATTATCCCCACTGCCATACTCATACCAATAACATGATTATCAAAACCCAGCATAATCTGAATTAACT +CAACCACCTATAGTTTTCTGATCAGCCTTGTTAGCCTGTCCTACTTAAATCAACTAGGCGACAACAGCCT +AAATCTCTCATTACTATTTTTCTCAGACTCACTCTCTGCACCCCTACTAGTATTAACAACATGACTCTTA +CCACTAATGCTCATGGCTAGTCAATCACACCTGTCAAAAGAGACCCTAGCCCGAAAAAAACTATACATTA +CAATACTTATTATCCTACAACTCCTCTTAATTATAACATTCACCGCTACAGAACTGATTATATTCTACAT +TCTATTCGAAGCTACATTAATCCCTACTCTTATTATTATCACTCGATGAGGCAATCAAACAGAGCGACTA +AACGCTGGTCTGTACTTTCTATTCTACACCCTGGTAGGCTCACTACCCCTCCTAGTCGCACTACTATACA +TCCAAAACACAACAGGAACTCTGAACTTCCTAATTATTCAATACTGAGCCAAACCAATTTCAGCCACCTG +ATCTAATATCTTTCTCTGACTAGCATGCATAATAGCATTCATAGTAAAAATACCTCTATATGGGCTCCAC +CTGTGACTACCAAAAGCACATGTCGAAGCTCCCATTGCCGGCTCAATAGTCCTTGCTGCTGTACTGTTGA +AGCTAGGAGGATATGGAATGATACGCATTACAATCCTACTCAACCCCACAACAAACCAAATGGCATACCC +CTTCATAATGCTATCCCTATGGGGAATAATTATAACAAGCTCTATTTGTCTACGCCAGACAGACCTAAAA +TCCCTAATCGCATATTCATCCGTAAGCCATATGGCCCTAGTAATCGTGGCCGTACTAATTCAAACACCTT +GGAGTTACATAGGAGCCACAGCTCTTATAATCGCCCACGGACTAACTTCCTCAGTGCTATTTTGCCTTGC +AAACTCAAACTACGAACGAATCCATAGCCGAACAATAATTCTCGCACGAGGCCTACAAACCATCCTCCCC +CTAATAGCTGCTTGATGACTACTGGCCAGCCTCGCAAACCTGGCCCTACCTCCTACTATTAACCTAATTG +CAGAGCTATTTGTAGTAGTGGCCTCCTTTTCATGATCTAACATAACCATTACTCTCATGGGCACAAATAT +CATCATCACAGCCCTATATACCCTCTACATACTCATTACAACCCAACGAGGCAAATATACACACCACATT +AAAAACATCAATCCATCATTCACACGAGAAAATGCCCTAATAACACTTCATCTGCTCCCACTTTTTCTCT +TATCTCTCAACCCCAAAATCGTACTAGGTCCTATTTACTGTAAATATAGTTTAATAAAAACATTAGATTG +TGAATCTAATAATAGAAGTGCAAATCTTCCTATTTAAACGAAAAAGTATGCAAGAGCTGCTAACTCATGC +CCCCACGTATAAAAACGTGGCTTTTTCAACTTTTATAGGATAGAAGTAATCCATTGGTCTTAGGAGCCAA +AAAATTGGTGCAACTCCAAATAAAAGTAATAAACCTACTTACCTCCTCTATACTCACTGCGATATTTATC +CTACTCCTACCTATCATTACATCCAACACTCAATTATATAAAAGTAACCTATACCCTCACTATGTAAAAA +CCACAATCTCTTACGCCTTTACCATTAGTATAATCCCAGCCATAATATTCATTTCCTCCGGACAAGAGAT +AACCATCTCAAACTGATGTTGACTATCAATTCAAACCCTTAAATTATCACTAAGCTTCAAACTAGATTAT +TTCTCGATCATCTTCATCCCAGTAGCACTTTTCGTTACATGGTCGATCATAGAATTCTCAATGTGATACA +TACACACAGATCCCTATATTAACCAGTTCTTTAAGTACCTCCTTATATTCCTAATCACTATAATGATCTT +AGTGACCGCCAATAATCTATTTCAGCTGTTTATTGGATGGGAGGGAGTAGGAATTATATCTTTCCTACTT +ATCGGATGATGATATGGTCGAGCAGACGCAAACACTGCCGCCCTGCAAGCAATTCTCTACAACCGTATTG +GTGATGTAGGATTTATCATGGCCATAGCATGATTCCTTACCAACCTAAATGCATGAAACCTCCAACAAAT +CTTTATCACTCAACATGAAAGCCTGAATATGCCATTACTAGGACTCCTCCTAGCCGCCACAGGCAAGTCC +GCCCAATTTGGCCTACACCCATGATTGCCATCAGCCATAGAAGGTCCAACTCCCGTCTCCGCCCTACTCC +ACTCAAGCACAATAGTTGTAGCCGGAGTCTTCTTATTAATCCGCTTCCACCCACTCATAGAACAAAATAA +AGCCATACAAACCCTCACTCTATGCCTGGGGGCCATCACAACCCTATTCACAGCCATCTGTGCCCTCACA +CAAAATGATATTAAAAAAATTGTTGCTTTCTCAACTTCAAGCCAATTAGGCCTGATAATCGTTACTATCG +GAATTAACCAACCCTACCTTGCATTCCTGCATATCTGCACACACGCATTTTTTAAAGCCATATTATTCAT +GTGCTCCGGATCAATTATCCACAGTCTAAACGACGAGCAAGATATTCGAAAAATAGGCGGACTATATAAA +CCAATACCCTTTACTACCACCTCCCTTATTATCGGAAGCCTCGCATTAACAGGCATGCCATTCCTAACAG +GCTTTTACTCCAAAGACCTAATCATCGAGACAGCCAATACGTCGTATACCAACGCCTGAGCCCTATTGGT +CACTCTCATTGCTACATCCCTCACAGCCGCCTATAGTACTCGAATCATATTCTTTGCACTCCTGGGGCAA +CCCCGATTCAACTCCCTAAGCCCAATCAATGAAAACAACCCCCACCTCATCAACTCCATTAAACGTCTCT +TAATTGGAAGCATTTTTGCAGGATACTTGATCTCCCATAACATCCCCCCAACGACCATCCCACAAATGAC +CATACCCTGCCACCTAAAACTAACTGCTCTCGCCATGACCATCATAGGCTTTATCCTGGCATTAGAGCTT +AACCTCGTGGCTAAAAACTTAAAATTTAAATACCCCTCAAATCTTTTTAAGTTTTCTAACCTCCTCGGGT +ACTTTCCAATCGTAATTCACCGCCTCCCATCGATAATAAGCCTAACCATAAGCCAAAAATCCGCATCGAT +ACTATTAGATATAATCTGGCTAGAAAATGTAATACCAAAATCCATCTCCCACTTCCAAATAAAAATATCA +ACCGCCGTATCTAATCAGAAGGGACTAGTTAAGCTCTACTTCCTATCCTTCATAATCACCCTGACCCTTA +GCCTACTCTTACTTAGTTTCCACGAGTAACCTCTATAATCACCAATACACCAATAAGCAAAGACCAACCA +GTGACAACCACTAGCCAGGTTCCATAACTATACAGTGCTGCAATTCCTATGGCCTCCTCACTAAAAAACC +CCGAGTCACCCGTATCATAGATCACTCAATCACCCGCACCATTAAACTTAAACACAACCTCAACCTCATC +TTCCTTTAAAATATAGCAAGCAGTCAACAACTCCGCTAATACCCCCGTAATAAACGCACCTAATACGGCT +TTATTAGATGTCCACGCCTCGGGGTAGGGCTCAGTAGCCATAGCTGTAGTGTACCCAAACACCACAAGCA +TGCCCCCCAAATAAATTAAAAAAACTATTAAACCTAAAAATGACCCCCCAAAATTCAATACAATACCGCA +ACCAACACCACCAGCCACAATCAATCCAAGCCCACCATAAATAGGAGAAGGCTTTGAAGAAAAACTCACA +AAGCTCACCACGAAAATTGTACTTAAAATAAATACAATGTATGTTATCATAATTCTCACATGGATTCTAA +CCACGACCAATGATATGAAAAACCATCGTTGTATTTCAACTATAAGAACTTAATGACCAACATTCGAAAA +TCACACCCCCTTATCAAAATTATTAATCACTCATTTATTGACCTACCCGCCCCATCCAATATTTCAGCAT +GATGAAACTTTGGCTCCTTACTAGGGGTGTGCTTAATCTTACAAATCCTCACTGGCCTCTTTCTAGCCAT +ACACTACACATCAGACACAATAACCGCTTTCTCATCAGTTACCCACATTTGCCGCGACGTAAACTACGGC +TGGATTATCCGATATCTACATGCCAACGGAGCCTCCATATTCTTTATCTGTCTATACATGCACGTAGGAC +GAGGAATATACTACGGCTCCTACACCTTCTCAAAAACATGAAATATCGGGATTGTGCTATTGTTTACGGT +CATGGCTACAGCCTTCATAGGATATGTCTTACCATGAGGACAAATATCATTCTGAGGGGCAACCGTAATC +ACCAACCTCCTGTCAGCAATCCCATATATTGGGACCGACCTAGTAGAGTGAATCTGAGGGGGTTTCTCAG +TAGACAAAGCTACCCTGACACGATTCTTTGCCTTCCACTTCATCCTTCCGTTTATCGTCTCAGCCCTAGC +AGCAGTCCACCTCCTATTCCTTCACGAAACAGGATCCAATAACCCCTCAGGAATGGTGTCCGACTCAGAC +AAAATCCCATTCCACCCATACTACACAATTAAAGATATCTTAGGCCTCTTAGTACTAATCCTAACCCTCA +CACTACTCGTCCTATTCTCACCAGACCTATTAGGAGACCCTGATAACTACATCCCCGCCAACCCCCTAAA +TACCCCTCCCCATATTAAGCCCGAATGGTATTTCCTATTCGCATACGCAATCCTCCGATCTATTCCCAAT +AAACTAGGAGGAGTTCTAGCCCTAGTCTTATCCATCTTAATCTTAGCCACTATCCCTGCCCTCCACACAT +CCAAACAACGAGGAATAATGTTTCGACCGCTAAGCCAATGCTTATTCTGACTCTTAGTGGCAGACCTTCT +AACCCTAACATGAATTGGTGGCCAACCTGTAGAACACCCCTTTATTGCCATCGGCCAACTAGCCTCTATC +CTATACTTCTTCATCCTCCTAGTCTTAATCCCCATCTCAGGCATTATTGAAAACCGCCTCCTTAAATGAA +GAGTCTTCGTAGTATATAAATTACTTTGGTCTTGTAAACCAAAAAAGGAGAATATGTACTCTCCCTAAGA +CTTCAAGGAAGAAGCAATAGCCCCACCATCAGCACCCAAAGCTGAAATTCTTTCTTAAACTATTCCTTGC +CAATACCAAAAAACAACCCCATGACTTTCATAATTCATATATTGCATATACCCGTACTGTGCTTGCCCAG +TATGTCCTCATCCCCACAAAAAATAAGTGAAAAAATCCTCAATCCCCGTTAATACAGAACACACAACACG +AAATAACCTGTTAACTACCGGACCCCCCCCCTCCCCCCGTTAACACATTACGTAGGGCATACTATGTATA +TCGGGCATTAATCGCCTGTCCCCATGAATATTAAGCATGTACAGTAGTTTATATATTTTACATAAGGCAT +ACTATGTATATCGTGCATTAATCCCTTGTCCCCATGAATATTAAGCATGTACAGTAGTTCATATATATTA +CATAAAACATAATAGTGCTTAATCGTGCATATTCATGATTTAAAACAGTTCTTTCATGGATCTCAACTAT +CCGAAAAAGCTTAATCACCTGGCCTCGAAAAACCAACAACCCTTGCTCGAGCGTGTACCTCTTCTCGCTC +CGGGCCCATTTCAACGTGGGGGTGTCTATAGTGAAACTATACCTGGCATCTGGTTCTTACTTCAGGGTCA +TGACATTCTTAAATCCAATCCTTCAACTTTCTCAAATAGGACATCTCGAT +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/clc_assembly_cell/README.rst Fri Nov 21 06:41:12 2014 -0500 @@ -0,0 +1,126 @@ +Galaxy wrapper for the CLC Assembly Cell suite from CLCbio +========================================================== + +This wrapper is copyright 2013 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below. + +CLC Assembly Cell is the commercial command line assembly suite from CLCbio. +It uses SIMD instructions to parallelize and accelerate their assembly +algorithms, and is also very memory efficient making it an appealing choice +for complex genomes where the RAM requirements exclude other popular tools. + +For more information: +http://www.clcbio.com/products/clc-assembly-cell/ + +You can download the CLC Assembly Cell User Manual here, currently v4.2 +http://www.clcbio.com/files/usermanuals/CLC_Assembly_Cell_User_Manual.pdf + +There is also an online manual here: +http://clcsupport.com/clcassemblycell/current/index.php?manual=Introduction.html + +There is currently a free trial download here: +http://www.clcbio.com/?action=transfer_user&productVersion=4.2&productID=6982&productName=CLC+Assembly+Cell&nonce=db842e3f95 + +This wrapper is available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell + +This Galaxy wrapper was written and tested using CLC Assembly Cell +version 4.10.86742 + + +Automated Installation +====================== + +This should be straightforward, Galaxy should automatically download and +install the wrapper from the Galaxy Tool Shed. However, you will need to +manually install the CLC Assembly Cell software, and setup the environment +variable ``$CLC_ASSEMBLY_CELL`` to the directory containing the binaries +(and in particular, the ``clc_assembler`` binary). For example: + +$ export CLC_ASSEMBLY_CELL=/opt/clcbio/clc-assembly-cell-4.1.0-linux_64/ + + +Manual Installation +=================== + +First install the CLC Assembly Cell sortware as described above. + +To install the wrapper copy or move the following files under the Galaxy tools +folder, e.g. in a ``tools/clcbio/`` folder: + +* clc_assembler.xml (Galaxy tool definition) +* clc_mapper.xml (Galaxy tool definition) +* README.rst (this file) + +You will also need to modify the ``tools_conf.xml`` file to tell Galaxy to offer +the tools. Just all these line, for example next to other assembly tools:: + + <tool file="clc_assembly_cell/clc_assembler.xml" /> + <tool file="clc_assembly_cell/clc_mapper.xml" /> + +If you wish to run the unit tests, also move/copy the ``test-data/`` files +under Galaxy's ``test-data/`` folder. Then run:: + + $ ./run_tests.sh -id clc_assembler + $ ./run_tests.sh -id clc_mapper + +That's it. + + +History +======= + +======= ====================================================================== +Version Changes +------- ---------------------------------------------------------------------- +v0.0.1 - Initial public release. +v0.0.2 - Actually use the ``$CLC_ASSEMBLY_CELL`` environment variable. + - Enable and fixed the tests. +======= ====================================================================== + + +Developers +========== + +Development is on this itHub repository: +https://github.com/peterjc/pico_galaxy/tree/master/tools/clc_assembly_cell + +For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use +the following command from the Galaxy root folder:: + + $ tar -czf clcbio.tar.gz tools/clc_assembly_cell/README.rst tools/clc_assembly_cell/clc_assembler.xml tools/clc_assembly_cell/clc_mapper.xml tools/clc_assembly_cell/tool_dependencies.xml test-data/NC_010642.fna + +Check this worked:: + + $ tar -tzf clcbio.tar.gz + tools/clc_assembly_cell/README.rst + tools/clc_assembly_cell/clc_assembler.xml + tools/clc_assembly_cell/clc_mapper.xml + tools/clc_assembly_cell/tool_dependencies.xml + test-data/NC_010642.fna + + +Licence (MIT) +============= + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. + +NOTE: This is the licence for the Galaxy Wrapper only. The CLCbio tools are +commercial, and are available and licenced separately.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/clc_assembly_cell/clc_assembler.xml Fri Nov 21 06:41:12 2014 -0500 @@ -0,0 +1,133 @@ +<tool id="clc_assembler" name="CLC assembler" version="0.0.2"> + <description>Assembles reads giving a FASTA file</description> + <requirements> + <requirement type="binary">clc_assembler</requirement> + </requirements> + <version_command>\${CLC_ASSEMBLY_CELL:-/mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/}clc_assembler | grep -i version</version_command> + <command>\${CLC_ASSEMBLY_CELL:-/mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/}clc_assembler +#for $rg in $read_group +##-------------------------------------- +#if str($rg.segments.type) == "paired" +-p $rg.segments.placement $rg.segments.dist_mode $rg.segments.min_size $rg.segments.max_size -q -i "$rg.segments.filename1" "$rg.segments.filename2" +#end if +##-------------------------------------- +#if str($rg.segments.type) == "interleaved" +-p $rg.segments.placement $rg.segments.dist_mode $rg.segments.min_size $rg.segments.max_size -q "$rg.segments.filename" +#end if +##-------------------------------------- +#if str($rg.segments.type) == "none" +-p no -q +#for $f in $rg.segments.filenames +"$f" +#end for +#end if +##-------------------------------------- +#end for +-m $min_contig_len +-o "$out_fasta" +--cpus \${GALAXY_SLOTS:-4} +-v | grep -v "^Progress: "</command> + <stdio> + <!-- Assume anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <inputs> + <repeat name="read_group" title="Read Group" min="1"> + <conditional name="segments"> + <param name="type" type="select" label="Are these paired reads?"> + <option value="paired">Paired reads (as two files)</option> + <option value="interleaved">Paired reads (as one interleaved file)</option> + <option value="none">Unpaired reads (single or orphan reads)</option> + </param> + <when value="paired"> + <param name="placement" type="select" label="Pairing type (segment placing)"> + <option value="fb">---> <--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> + <option value="bf"><--- ---> (e.g. Solexa/Illumina mate-pair library)</option> + <option value="ff">---> ---></option> + <option value="bb"><--- <---</option> + </param> + <param name="dist_mode" type="select" label="How is the fragment distance measured?"> + <option value="ss">Start to start (e.g. Sanger capillary or Solexa/Illumina libraries)</option> + <option value="se">Start to end</option> + <option value="es">End to start</option> + <option value="ee">End to end</option> + </param> + <!-- TODO - min/max validation done via the <code> tag? --> + <param name="min_size" type="integer" optional="false" min="0" value="" + label="Minimum size of 'good' DNA templates in the library preparation" /> + <param name="max_size" type="integer" optional="false" min="0" value="" + label="Maximum size of 'good' DNA templates in the library preparation" /> + <param name="filename1" type="data" format="fastq,fasta" required="true" label="Read file one"/> + <param name="filename2" type="data" format="fastq,fasta" required="true" label="Read file two"/> + </when> + <when value="interleaved"> + <param name="placement" type="select" label="Pairing type (segment placing)"> + <option value="fb">---> <--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> + <option value="bf"><--- ---> (e.g. Solexa/Illumina mate-pair library)</option> + <option value="ff">---> ---></option> + <option value="bb"><-- <--</option> + </param> + <param name="dist_mode" type="select" label="How is the fragment distance measured?"> + <option value="ss">Start to start (e.g. Sanger capillary or Solexa/Illumina libraries)</option> + <option value="se">Start to end</option> + <option value="es">End to start</option> + <option value="ee">End to end</option> + </param> + <!-- TODO - min/max validation done via the <code> tag? --> + <param name="min_size" type="integer" optional="false" min="0" value="" + label="Minimum size of 'good' DNA templates in the library preparation" /> + <param name="max_size" type="integer" optional="false" min="0" value="" + label="Maximum size of 'good' DNA templates in the library preparation" /> + <param name="filename" type="data" format="fastq,fasta" required="true" label="Interleaved read file"/> + </when> + <when value="none"> + <param name="filenames" type="data" format="fastq,fasta" multiple="true" required="true" label="Read file(s)" + help="Multiple files allowed, for example several files of orphan reads." /> + </when> + </conditional> + </repeat> + <param name="min_contig_len" type="integer" optional="false" min="1" value="200" label="Minimum contig length"/> + <!-- Word size? --> + <!-- Bubble size? --> + <!-- Scaffolding options? --> + <!-- AGP / GFF output? --> + </inputs> + <!-- min/max validation? <code file="clc_validator.py" /> --> + <outputs> + <data name="out_fasta" format="fasta" label="CLCbio assember contigs (FASTA)" /> + </outputs> + <tests> + <test> + <param name="read_group_0|segments|type" value="interleaved" /> + <param name="read_group_0|segments|placement" value="fb" /> + <param name="read_group_0|segments|dist_mode" value="ss" /> + <param name="read_group_0|segments|min_size" value="1" /> + <param name="read_group_0|segments|max_size" value="1000" /> + <param name="read_group_0|segments|dist_mode" value="ss" /> + <param name="read_group_0|segments|filename" value="SRR639755_mito_pairs.fastq.gz" ftype="fastqsanger" /> + <param name="min_contig_len" value="200" /> + <output name="out_fasta" file="SRR639755_mito_pairs.clc4_de_novo.fasta" ftype="fasta" /> + </test> + </tests> + <help> + +**What it does** + +Runs the ``clc_assembler`` tool giving a FASTA output file. You would then +typically map the same set of reads onto this assembly using ``cls_mapper`` +to any perform downstream analysis using the mapped reads. + + +**Citation** + +If you use this Galaxy tool in work leading to a scientific publication please +cite this wrapper as: + +Peter J.A. Cock (2013), Galaxy wrapper for the CLC Assembly Cell suite from CLCbio +http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell + +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/clc_assembly_cell/clc_mapper.xml Fri Nov 21 06:41:12 2014 -0500 @@ -0,0 +1,169 @@ +<tool id="clc_mapper" name="CLC Mapper" version="0.0.2"> + <description>Maps reads giving a SAM/BAM file</description> + <requirements> + <requirement type="binary">clc_mapper</requirement> + <requirement type="binary">clc_cas_to_sam</requirement> + <requirement type="binary">samtools</requirement> + <requirement type="package" version="0.1.19">samtools</requirement> + </requirements> + <version_command>\${CLC_ASSEMBLY_CELL:-/mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/}clc_mapper | grep -i version</version_command> + <command>echo Mapping reads with clc_mapper... +&& \${CLC_ASSEMBLY_CELL:-/mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/}clc_mapper +#for $ref in $references +#if str($ref.ref_type)=="circular" +-d -z "$ref.ref_file" +#else +-d "$ref.ref_file" +#end if +#end for +#for $rg in $read_group +##-------------------------------------- +#if str($rg.segments.type) == "paired" +-p $rg.segments.placement $rg.segments.dist_mode $rg.segments.min_size $rg.segments.max_size -q -i "$rg.segments.filename1" "$rg.segments.filename2" +#end if +##-------------------------------------- +#if str($rg.segments.type) == "interleaved" +-p $rg.segments.placement $rg.segments.dist_mode $rg.segments.min_size $rg.segments.max_size -q "$rg.segments.filename" +#end if +##-------------------------------------- +#if str($rg.segments.type) == "none" +-p no -q +#for $f in $rg.segments.filenames +"$f" +#end for +#end if +##-------------------------------------- +#end for +-o "temp_job.cas" +--cpus \${GALAXY_SLOTS:-4} +## TODO - filtering out the progress lines seems to mess up the multiple commands +## | grep -v "^Progress: " +##=========================================== +## TODO - I've required all the input in Sanger FASTQ format (or FASTA) so can +## use the offset 33, rather then the CLCbio default of 64 which is only for +## obsolete Illumina FASTQ files. Really need this option per input file... +&& echo Converting CAS file to BAM with clc_cas_to_sam... +&& /mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/clc_cas_to_sam --cas "temp_job.cas" -o "temp_job.bam" --no-progress --qualityoffset 33 +&& rm "temp_job.cas" +##=========================================== +&& echo Sorting BAM file with samtools... +&& samtools sort "temp_job.bam" "temp_sorted" +&& mv "temp_sorted.bam" "$out_bam" +&& echo Indexing BAM file with samtools... +&& samtools index "$out_bam"</command> + <stdio> + <!-- Assume anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <!-- Job splitting with merge via clc_join_mappings? --> + <inputs> + <!-- Support linear and circular references (-z) --> + <repeat name="references" title="Reference Sequence" min="1"> + <param name="ref_file" type="data" format="fasta" required="true" label="Reference sequence(s) (FASTA)" /> + <param name="ref_type" type="select" label="Reference type"> + <option value="linear">Linear (e.g. most chromosomes)</option> + <option value="circular">Circular (e.g. bacterial chromosomes, mitochondria)</option> + </param> + </repeat> + <repeat name="read_group" title="Read Group" min="1"> + <conditional name="segments"> + <param name="type" type="select" label="Are these paired reads?"> + <option value="paired">Paired reads (as two files)</option> + <option value="interleaved">Paired reads (as one interleaved file)</option> + <option value="none">Unpaired reads (single or orphan reads)</option> + </param> + <when value="paired"> + <param name="placement" type="select" label="Pairing type (segment placing)"> + <option value="fb">---> <--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> + <option value="bf"><--- ---> (e.g. Solexa/Illumina mate-pair library)</option> + <option value="ff">---> ---></option> + <option value="bb"><--- <---</option> + </param> + <param name="dist_mode" type="select" label="How is the fragment distance measured?"> + <option value="ss">Start to start (e.g. Sanger capillary or Solexa/Illumina libraries)</option> + <option value="se">Start to end</option> + <option value="es">End to start</option> + <option value="ee">End to end</option> + </param> + <!-- TODO - min/max validation done via the <code> tag? --> + <param name="min_size" type="integer" optional="false" min="0" value="" + label="Minimum size of 'good' DNA templates in the library preparation" /> + <param name="max_size" type="integer" optional="false" min="0" value="" + label="Maximum size of 'good' DNA templates in the library preparation" /> + <param name="filename1" type="data" format="fastqsanger,fasta" required="true" label="Read file one" + help="FASTA or Sanger FASTQ accepted." /> + <param name="filename2" type="data" format="fastqsanger,fasta" required="true" label="Read file two" + help="FASTA or Sanger FASTQ accepted." /> + </when> + <when value="interleaved"> + <param name="placement" type="select" label="Pairing type (segment placing)"> + <option value="fb">---> <--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> + <option value="bf"><--- ---> (e.g. Solexa/Illumina mate-pair library)</option> + <option value="ff">---> ---></option> + <option value="bb"><-- <--</option> + </param> + <param name="dist_mode" type="select" label="How is the fragment distance measured?"> + <option value="ss">Start to start (e.g. Sanger capillary or Solexa/Illumina libraries)</option> + <option value="se">Start to end</option> + <option value="es">End to start</option> + <option value="ee">End to end</option> + </param> + <!-- TODO - min/max validation done via the <code> tag? --> + <param name="min_size" type="integer" optional="false" min="0" value="" + label="Minimum size of 'good' DNA templates in the library preparation" /> + <param name="max_size" type="integer" optional="false" min="0" value="" + label="Maximum size of 'good' DNA templates in the library preparation" /> + <param name="filename" type="data" format="fastqsanger,fasta" required="true" label="Interleaved read file" + help="FASTA or Sanger FASTQ accepted."/> + </when> + <when value="none"> + <param name="filenames" type="data" format="fastqsanger,fasta" multiple="true" required="true" label="Read file(s)" + help="Multiple files allowed, for example several files of orphan reads. FASTA or Sanger FASTQ accepted." /> + </when> + </conditional> + </repeat> + <!-- Length fraction (-l), default 0.5 --> + <!-- Similarity (-s), default 0.8 --> + <!-- Option for unmapped reads via clc_unmapped_reads ? --> + </inputs> + <outputs> + <data name="out_bam" format="bam" label="CLCbio mapping (BAM)" /> + </outputs> + <tests> + <!-- CLC's SAM header @PG and @RG lines include filenames so will change --> + <test> + <param name="ref_file" value="NC_010642.fna" ftype="fasta" /> + <param name="ref_type" value="circular" /> + <param name="read_group_0|segments|type" value="interleaved" /> + <param name="read_group_0|segments|placement" value="fb" /> + <param name="read_group_0|segments|dist_mode" value="ss" /> + <param name="read_group_0|segments|min_size" value="1" /> + <param name="read_group_0|segments|max_size" value="1000" /> + <param name="read_group_0|segments|dist_mode" value="ss" /> + <param name="read_group_0|segments|filename" value="SRR639755_mito_pairs.fastq.gz" ftype="fastqsanger" /> + <output name="out_fasta" file="SRR639755_mito_pairs_vs_NC_010642_clc.bam" ftype="bam" lines_diff="4"/> + </test> + </tests> + <help> + +**What it does** + +Runs the CLCbio tool ``clc_mapper`` which produces a proprietary binary +CAS format file, which is immediately processed using ``clc_cas_to_sam`` +to generate a self-contained standard BAM file, which is then sorted +and indexed using ``samtools``. + + +**Citation** + +If you use this Galaxy tool in work leading to a scientific publication please +cite this wrapper as: + +Peter J.A. Cock (2013), Galaxy wrapper for the CLC Assembly Cell suite from CLCbio +http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell + +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/clc_assembly_cell/tool_dependencies.xml Fri Nov 21 06:41:12 2014 -0500 @@ -0,0 +1,6 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="samtools" version="0.1.19"> + <repository changeset_revision="923adc89c666" name="package_samtools_0_1_19" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" /> + </package> +</tool_dependency>