# HG changeset patch # User devteam # Date 1400517247 14400 # Node ID 0b4e3602679422ba323e05c379593ba58fdfb6df Imported from capsule None diff -r 000000000000 -r 0b4e36026794 tabular_to_fasta.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tabular_to_fasta.py Mon May 19 12:34:07 2014 -0400 @@ -0,0 +1,68 @@ +#!/usr/bin/env python +""" +Input: fasta, minimal length, maximal length +Output: fasta +Return sequences whose lengths are within the range. +""" +import sys, os + +assert sys.version_info[:2] >= ( 2, 4 ) + +def stop_err( msg ): + sys.stderr.write( msg ) + sys.exit() + +def __main__(): + infile = sys.argv[1] + title_col = sys.argv[2] + seq_col = sys.argv[3] + outfile = sys.argv[4] + + if title_col == None or title_col == 'None' or seq_col == None or seq_col == 'None': + stop_err( "Columns not specified." ) + try: + seq_col = int( seq_col ) - 1 + except: + stop_err( "Invalid Sequence Column: %s." %str( seq_col ) ) + + title_col_list = title_col.split( ',' ) + out = open( outfile, 'w' ) + skipped_lines = 0 + first_invalid_line = 0 + invalid_line = "" + i = 0 + + for i, line in enumerate( open( infile ) ): + error = False + line = line.rstrip( '\r\n' ) + if line and not line.startswith( '#' ): + fields = line.split( '\t' ) + fasta_title = [] + for j in title_col_list: + try: + j = int( j ) - 1 + fasta_title.append( fields[j] ) + except: + skipped_lines += 1 + if not invalid_line: + first_invalid_line = i + 1 + invalid_line = line + error = True + break + if not error: + try: + fasta_seq = fields[seq_col] + if fasta_title[0].startswith( ">" ): + fasta_title[0] = fasta_title[0][1:] + print >> out, ">%s\n%s" % ( "_".join( fasta_title ), fasta_seq ) + except: + skipped_lines += 1 + if not invalid_line: + first_invalid_line = i + 1 + invalid_line = line + out.close() + + if skipped_lines > 0: + print 'Data issue: skipped %d blank or invalid lines starting at #%d: "%s"' % ( skipped_lines, first_invalid_line, invalid_line ) + +if __name__ == "__main__" : __main__() \ No newline at end of file diff -r 000000000000 -r 0b4e36026794 tabular_to_fasta.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tabular_to_fasta.xml Mon May 19 12:34:07 2014 -0400 @@ -0,0 +1,43 @@ + + converts tabular file to FASTA format + tabular_to_fasta.py $input $title_col $seq_col $output + + + + + + + + + + + + + + + + + + +**What it does** + +Converts tab delimited data into FASTA formatted sequences. + +----------- + +**Example** + +Suppose this is a sequence file produced by Illumina (Solexa) sequencer:: + + 5 300 902 419 GACTCATGATTTCTTACCTATTAGTGGTTGAACATC + 5 300 880 431 GTGATATGTATGTTGACGGCCATAAGGCTGCTTCTT + +Selecting **c3** and **c4** as the **Title column(s)** and **c5** as the **Sequence column** will result in:: + + >902_419 + GACTCATGATTTCTTACCTATTAGTGGTTGAACATC + >880_431 + GTGATATGTATGTTGACGGCCATAAGGCTGCTTCTT + + + \ No newline at end of file diff -r 000000000000 -r 0b4e36026794 test-data/1.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/1.fasta Mon May 19 12:34:07 2014 -0400 @@ -0,0 +1,2 @@ +>hg17 +gtttgccatcttttgctgctctagggaatccagcagctgtcaccatgtaaacaagcccaggctagaccaGTTACCCTCATCATCTTAGCTGATAGCCAGCCAGCCACCACAGGCAtgagtcaggccatattgctggacccacagaattatgagctaaataaatagtcttgggttaagccactaagttttaggcatagtgtgttatgtaTCTCACAAACATATAAGACTGTGTGTTTGTTGACTGGAGGAAGAGATGCTATAAAGACCACCTTTTAAAACTTCCC-------------------------------AAATACT-GCCACTGATGTCCTG-----ATGGAGGTA-------TGAA-------------------AACATCCACTAAAATTTGTGGTTTATTCATTTTTCATTATTTTGTTTAAGGAGGTCTATAGTGGAAGAGGGAGATATTTGGggaaatt---ttgtatagactagctttcacgatgttagggaattattattgtgtgataatggtcttgcagttaca-cagaaattcttccttattttttgggaa---gcaccaaag----tagggat---aaaatgtcatgatgtgtgcaatacactttaaaatgtttttg-----ccaaaataatt----------------aatgaagc--aaatatggaaa-ataataattattaaatctaggtgatgggtatattgtagttcactatagtattgcacacttttctgtatgtttaaatttttcatttaaaaaaaaactttgagc-----tagacaccaggctatgagctaggagcatagcaatgaccaa----------------------------------------------------------------------------------------------atagactcctaccaa--------------------------------------------------ctc-aaagaatgcacattctCTGGGAAACATGTTTCCATTAGGAAGCCTCGAATGCAATGTGACTGTGGTCTCCAGGACCTG-TGTGATCCTGGCTTTTCCTGTTCCCTCCG---CATCATCACTGCAGGTGTGTTTTCCCAAGTTTTAAACATTTA------CCTTCCCAGTGGCCTTGCGTCTAGAGGAATCCCTGTATAGTGGT-ACATGAATATAACACATAACAAA-AATCATCTCTATGGTGTGTGTTGTTCCTGGGGTTCAattcagcaaattttccctg-ggcacccatgtgttcttggcactggaaaagtaccgggactgaaacagttgatggccca-atccctgtcctct---taaaacctaagggaggagaTGGAAAGGGG-CACCCAACCCAGACTGAGAGACAGGAATTAGCTGCAAGGGGAACTAGGAAAAGCTTCTTTA---AGGATGGAGAGGCCCTAGTGGAAT-GGGGAGATTCTTCCGGGAGAAGCGATGGATGCACAGTTGGGCATCCCCACAGACGGACTGGAAAGAAAAAAGGCCTGGAGGAATCAATGTG-------CAATGTATGTGTGTTCCCTGGTTcaagggctgg-gaactttctcta-aagggccaggtagaaaacattttaggctttctaagccaag--gcaaaat-tgaggatattacatgggtacttatacaacaagaataaacaatt---tacacaattttttgttgacagaattcaaaactttat----agacacagaaatgcaaatttcctgtaattttcccat-gagaactattcttct--tttgttttgttttgcgacAGGGTTGCGCtgatcctcccgcctcagtctccctaagtgctgagatgttgcaggaagtcagggaccccgaacagagagatcggctggagccgtggcagaggaacataaattttgaagatttcattttaatatggacacttatcagttcccaaataatacttttataattttttatgcctgtctttgctttaatctcttaatcctgttatcttcataagctaaggatgtacgtcacctcaggaccactgtgataattgtgttaactgtacagattgattgcaaaacatgtgtgtttgaacaatatgaaatcagtgcaccttgaaaaagagcagaataacagcaatttttagggaacaagggaagacaactataaggtctgactgcctgcggggtcgggcaaagggagccatatttttcttcttgcagagagcctataaatagacctgcaagtaggagagatattgctaatttcttttgctagcatggaatattaatattaacaccctgggaaaggaatgcattcctggggggaggtctataaatggccgctctgggaatgtctatcctacgcaatggagataaggactgagatacgccctggtctcctgcagtaccctcaggcttactagggtggtgaaaaactccgccctggtaaatttgtggtcagaccagttttctgctctcgaacactgttttctgttgtttaagatgtttatcaagacaatacgtgcaccgctgaacacagacccttatcagtagttctcctttttgccctttgaagcatgtgatctactccctgttttacaccccctcaccttttgaaacccttaataaaaaacttgctggttt-gaggctcaggtgggcatcacagtactaccgatatgtgatgtcacccccggcggcccagctgtaaaattcctctctttgtactctctctctttatttctcagccagctgacacttatggaaaatagaaagaacctacgttgaaatattgggggcaggttcccccaataTCTGGTGCCCAACGTGGGAtactgagattacaagcatgagccactgcatctggcctcttcttttgatttttttttttcaaacttttacaaatgtagaaaccattcttagcttttgggcattaccaaacccggcagtgg-caggctcggttcaccaacgtcatttgcagttccccgCTTTATGTTATGGgttttgttttgttttgtttttttt-attgagacagagtttcactcttgttgcccaggctgtagtgcaatggtctgatcttggctcactgcaacctccacttcccaggttcaagccattctcctgcctcagcctctcaagtagctgggattacagacactcaccaccacacctggctaattttgtatttttagtagagatgaggtttcaccatgttggccaggctggtctcgaaatcctgacctcaggtgatccacccaccttggcctcccaaagtgctgggattacaggcttgagctaccacgcctggctGGGTTGGTTCTCAATGGAGTGGTTTGTTTTTGGAGCTGCTCT-GCGCAGtggggaccagaataggcctg-------------------ggttcctagcccattgctattcctt----accagctgtggattctaaggaaagtcatttaacctcgctggaccttag-attcctcatccctgaaGCCCAAGGGTaaaacaaaacaaaacaaaacaaaacaaaccaaCCCATCATGTAAAGCGGGGAACTACAAACGATACAGGTGAAACATGCCTACCACACCACTCACAGGCT--ATGATGACAAAAACGTGGCTACATCTGGGACCACCCCCCAACCCCCACTTTGTACGTAGGAAATACGGAGTTGAGGATGGAGACCCACAGTATGTCCAGAGTGTCCCCAAAGGCCACAGTGCCCGCCTGGAGCCCTCCAGAGAGCGTGCACTCCCTGGGGTGCCAGCCAGAGACAACTTGCCCTGAGGCTTGGAACTCGATTCTCCGCGTGCCAGAGAAGGGGTGGGACTTCAGAACCCCCAACCCCGCAATCTGGGTCGGGGAGCCTGGCGCACTGCGGGCCGCTCCCTCTAACCCTGGGCTTCCCTG------GCGTCCAGGGCCGTCGG-----------GGCCGAGTCCCGATTCGCTCCCACCCCGAAGCCGCGCCAGGACCAACGAGGGCGCAGCCGTATGCCCCAGCCCGCTCCGCGGAGCCCCTCACAGCCAcccccgccccgaccgcgccccgcgcggcTCGAAGCACCTTCCCAAGGGGCTGGTCCTTGC----------GCCATAGTCGCGCCGGAGCCTCTGGAGGGACATCAAGGATTTCTC-GCTCCTACCAGCCACCCCCAAATTTTTGGGAGGTACCCAAGGGTGCGCGCGTGGCTCCTGGCGCGCCGAGGCCCTCCCTCGAGGCCCCGCGAGGTGCACACTGC---------GGGCCCAGGGCTAGCAGCCGCCCGGCACGTCGCTACCCTGAGGGGCGGGGCGGGAGCTGGCGCTAGAAATGCGCCGGGGCCTGCGGGGCAGTTGCGCAAGTTGTGATCGGGCCGCTATAAGAGGGGCGGGCAGGCATGGAGCCCCGTAGGAATCGCAGCGCCAGCGGTTGCAAGGTAAGGCCC-CGGCGCGCTCCTTCCTCCTTCTCTGCTGGTCTTTCTTGGCAGGCCACAGGGCCCCACACAACTCTGGATCCCGGGGAAACTGAGTCAGG-AGGGATGCAGGGCGGATGGCTTAGTTCTGGACTATGATAGCTTTGTACCGAG-----TTCTAGCCAGATAGAAGGTTACCGGGAGCTGGGGAGCGTTGGATTTGCTGCTGGGCTGTGCCGGTGCCCAGAAGGCA------GGACCTTGCAGAACCAGCCAGGTCCCTGGGAGACTGTCAGACCCACCAACCTGGTGGCATTCGCAGAGCTGAGATGCATTGGAAATTGCCTTGGGCACATCCCCAAAGATCAGGATGTCCCACCCCAGTCTGAAGGAGA---TAAAGTTGGGGGTAGGAGAGACGCAG-ATGCAAGTGATCAGTCTC---AGTCCCAGACATTGCCTTGCTCTGCGGGTAGGAATTCAGGATTCATTTTCCAGGGAAG--------TTCCTGACCTCTGAATGAGAGGGGCTGTGTAAGGCCAATGCCTGGG-AGGAAGGCAAGGATGAGTAGAGGTGGGGGGAAACAAGTGTCAGGAAGA--------------------------------------------CTCAAA---------ATCTTC--------------------------------------------------------------CAGAGAAATTGT-----GCAGGGTCTTACCAGATCTGTCCTCAAAGCCATGCAAATTGCCTTCTTTGCAATGCAT-ACAATGAGGTGTCTCTGGGGGTCAGAACTGG-----------------------TTATTAGGGAACTTCTAGCCAGGACTGCTAAATACGCGCTGTTGG---------CCCACCAGGCTCACCTATAGCCT-TCCTTCAGTCTGGGCTTGGTTTGGATTTCACTGTGGGTGCCATCGCCTTTACACTCCTGTTTCTATAGTTTAAAGATAGTGGTGCTTTGGGAAAG---TGACTCCTTAAATACAGTTAGGTCCAAGTGA-GACAAGTGGCCTGGCTGTCATTTCAGAATAGCAGCTTCCAAGAGG----------TGATTAATTTCTGTTGGAAGGGTGAT-CTTTGGGGAGGT--GGGTGAAGAGCAGAGACTTGGTGGTACCGTTCCAGGAGCACAGGCTCTCT-----TCCTTTGCA--GTGCAGAATGACCTCTGGCAGCCGGAGTTGTGTTTGTT--------CTGTAGGATTCTGAGGTGGGCCATGGGCAGCTGGAACTGGG-----GAATTTTGCCAATCTCTTTCATATTAGGATTGTCTGCAGAACCAGATATGGAGG------CTTCTAGCAACGTGAGTGCTCCTGTTCTAATGCCCTTAGAAACAAGAAGGCCACACTGATCATTTCTCTCACTTAGGCAGGGAGACAAGGCAAGAGAGAAACAGT-----------------GGATGC--TTTTAGGTTCTTTCCCTTCCCAAGCAGTTGTGGACATTGGGCTGA-GGGGAACATTTCCACATTGGCTAAAGGAGCGTCCTCCTCATATTTTGTACATTTTATACCCAA--AATAA-CTCTTCTTGGTATTT-GGGGAAATATTTTCCTCCCCGTCC------------ATTCCAGGAAATGGCTCCAAGTGCCAAGGACAGAGCCAGGGAAGTTGCAATGAATTCCTGCCCGTCAGCCCCAGGCAGATGCCTTGCACGTCTGAGTGGCCCATGCAGAGCGTGGAGGTGGCCGCC----------------ACGGAACC-TGGGTCAATGT-CCCACCCCCG----CTTAGATGCCA-CCAGGGG--CGTGGGAGCCAAGGAG--AGAAGAGGGGCTCCAGGAAGGTAGAGTCCTTGTGTCTTGTGCATCTGTGAACAGCACTGGTATGATTTAAAGGAAAATTGAGCCAAATTTTCCGGCAGTCAGTT-----ACCCCATCCCCACCGGGGTAGGAGTCTGGCAGCCGCAGCTCCATTCTGGCCAGTCGGCAGAGAGCCTTGAAATTCTTCTTTGTCCACACAGTTGTCTCAGAGAAACAG--AGAGGTT-GTTTCTGCTTAAAAACAACACACTTGGTGTCTGGGCCCACAGACTCCTTTGCACTTATTCCACGTGTGACAGCCAATGTGCCTCGTTGCTTAGCAGACAGCATGTTACCGTCTTTCCTGCTCAGTTTGTTAG--------------CTCTATGGAATGGAATTTATAATCAATGCCCATACCAACATTTCACTAATATCATAGGAGATTTAGTCTCCATCTGGGTGTACATTACATTTGC--TCTGGGG-TGCTCCAGGC--TGGGGGGTTGCCAAGGAAGAGAAGAGAAACCGCAGAGAAGAC---GGGAGGGCAGGGCAGGGGTCTCTGAGAAGGGGAGGGGTCCCAGAGTGCAGGAGCAGGAGCCAGGCTC---------ATGAAAGGGGCCACGGGCGGGAGTATCCAGGGACGGCAGTCAAGATGGAGCACAGCTTAGG--AAGCTGAAGGGAATCCTGGCCCACCTGGGTGCTAGAGGGCACATAGGAAGTGCAGGAAGCAGACCAAGGTCCCCAAGAGAGGGAGACCTGGACGCTGAAGCATTTTCTGTCTTTATTAAG-------------ACAACTCCGTAAGAATTCCTGCTGGGCCAAAGTGAATTCTAGGATGCGACTTTAAGATGGGAGCAAGCGAACCATTGAGGAGGCAGGTTACCCTAGTTAGCCAATGCAGATCGAGAATGGGAAATCTTTCatttattcatgcaacagatatttaacgaagccctgccgtgttccaggcctgtgatagatgctggaacaggtacagaga----------tAc-------aggtgtcattaattgatcaggg--caacctctc---cttctgagt--cttgctggagcttcagatgc-ccctcacacagagctcgagggagcctc-aacaattgatcagaagtcaggcaccatggctcacgcatataatcccagcactttgggaggccaaggcaggtggatcactggagcccaggagttccagatcagctggggcaacatggcaaaaccccatctctattaaaaaaaaaaaaagtaactggatgtgatggtacacacctgtagtcccagctacttgggaggctgagaggtgggagaattgcttgagcccgggaagtcgggggtccagtgagccttgatcacaccactgcactccagcctgagtgacagagcaagaccctgacacacacacacacacacacacacacacacacacagattagagctgaaacaggagtagaaacctatctg-tatctctgATGA-GATCAGATC---------TTTCTGATGAACAGAAAGAATGTAACCCCTGTACTCACACCCTCTCTGCTGGTTACATATGTTAACACGATTTCTCAAATGAGGCTTTTGGTTGCAAATAAGAGAAAATCACTCACGCT-GGCCCTGTG--TTTTTCAAATTGTTTATTGTGATCAACATTTGAAAAAAGAGCCGAGACTCTCAAGAGTGCATTACCCACGGTAAGGGTGAATTTT-ACTTCTTGACACTTATTTCTCTTACATGTATCTATCTGTCTC-----AAATGAAAAATATATTTAGAAAGTTGAAAGCTATCCAAGTGAGTATAAGAAAAGAGTATCTCACCCTGAAGGCTAAGGACAGGGAGGGC---------------------------CACCAGGCCTCACGAGGACCCAGGAACCACAAAGAAGGCT-AGGAAGGAGCACAGGCGGTGACCATACTCTGGCTCAGTGGCTATGTGGGCTCTGGTCTCTCTCAGCTGTTCCATGCATATGAGGCCAAATGTGGCTACCCTAGAGCTTCTGAGCCCTCAACAGAGATGAACTGGACTCTCTGCAGCCCCACTCTAAATTCCTAAGAGAGAAGTTGATTGACCCAATCAGGGTCAGGAGAAGGAAGGGAGGAGGAAAGGGAGGAGAGAAGAGCCTCTTCGTCTCTTGCCTACCACTGGCCAGGCAATTGTAGCCAAGGGGGCTGGAGTGTAAATGCAAACATAGCCATCAAGGGTtgtgtatgtgtgtgtgtgtgtctgtgtgtgtgtatgtgtgtCTCTTGGGTAGGTTAGA-TCTCCCAGGAGGTCCCTACTAAACAGACTTAAGCCCGCAAAATTTTAGCTCTCCAGCCTCACACACTCCACCCCTCTACCATATTGAATCTTCCCAAACCAACTATGGCTTTCCCTAACTCCGGAGc------ttggcctggaatgccctgcttcccctctttcccctggggaacgcctgtccttcaggcctcagttcacacactgcctcccttgcaaagctctccTCCCATCCCCGGAGTCCCT--CTTCCCCTTTGTTCTTTGGGTTCTATGCTTCTTCCCTCATAACTCCCACCAGGTTGTGTTAAAATGAGTTGTTCAAGGTCCTGTCTGTTCCACTAGATTCTGAGCAACTTGGAGAACGAAGATCCAAACTTCGCTGCCTTTATTTCCTCCTTTGTTCTTTTCTCATCCCCAAGTCCCTTCCAACTTGGAGTTATgaagaaaggaaggaaggaagggtgggagggaagaaCAGGAGGGGATCCCACAGG-AGAATGTGTATAGGGAGAGGACTCAGACTAGCTAAAGCTTTTCCCTCATAATTAATAGCAAATACCATGTTACCTGAATTTAATTCACAGTAGCATACAAAAGACTCGCTTTGTTCT-------CCCCA---------TTGATGTCATCAGAGG--------------------GCTGTGGG--------------CAGGCCTAATCTTGGCTCAGGAGGCCCTCCAGCCTGGATCTAAAGAGCAGCAGATGggccaggctcggtggctcatgcctgtaatcccagcattttgggaggccgaggcgggtggatcacgaggtcaggagtttgagaccagcctggccaagatggtgaagcctcgtctctactaaaaatacaaaaattagccaggtgcggtggtgggcgcctgtatttccagctacccgggaggctgaggaggctgaggcaggagaatcgcttgaacccgggaggcggaggttgcagtgagccgaggtcacgccactgcactctagcctgggcaacagagcaagactccgtcaaaaaaaaaataaaaaaataaaaaaataaaaaaaataaaGAGGAGCACACATCTCTGCCCATCCTAACTCCCACTTTGACATTGAGGTCCCCAGGATGGAGGGTCTGCCTCCATCTGCCTTGTCCCCTG-CAATGGTGGGAAGGTGATGGAGCTCAAGTCTAGAGGCCACCAGCTTCTTAGGGAGG--TAGGAGGTG---------------GAGGGTGGGGTGC-GGGCCCTGCACACAACTGCCAAGTGAGGATGGGGGTGGG-GTCCACCTGAGGATAAGTAACAGTGAGGCTGGTGCAGAGGACCCAGGTGGAGGTAGACAGCAGAATTTGTGGTGGGGT--GGATGGCAC-ATTATATAAGCCTCTCTTGC------TGCCCTGT---TTACTGAGATTGTTTCAttatcttttttggcttttgtttttaagagatggggtcttgctgtgtcacacaggctggagtgcactgtgtgatcatacctcactgcagcctcgacatcctgggctcaggcaaacctcccaccttggcctcccaagtagctgggaccacaagcgtttgccaccacactcagctatttttatttttattttta--ttttttttagagatggggtcttgctgtgtcgcccaggctggtcttgaactcctgggctcaagcgatcctcctgccttggcctcccaaagccctgggattataggctgagccaccacacccagccACATTTCATCTGTGCAGCTCCAGGGGCTCCACATTCT-ACTCTTCTCATTTCTTCTCCAGGGTACCC----------ATGGCAAGGGATGAGGGT--AGAAGATGGGGCA--GCCAGGCCTTGATTAAAGGAGAAGGAAGGCAGCCTGTGGAGAGG---GCAGCC---C---AGGGAG---TGCAGAGAGAAGTGGGCCATGAGGGAGA---CAGCAGAGTGCAGGCTGCGTCC---CAAATGAGCATACAGCCCACTGTGAGCCCACC--ATCTTCCTAGA-GA--CCCCTCTCCTCTCC-AGGAGCTGCTTCAGTAGCACTCA---------GAGGAAAGAATGATGC--------TGTATCAACATTTCAGCAGCTCATCTTTTAACTCTAAGAAAATGGCAGCTCCTAAATGTTCAA--AACTGCTTTGGAAACTTCT---GGAGAGAGGTTTTGCAGCTCAGGCAGACAGCTGATCGCGGCCTTTCTTCCACCCCAACCCATGCTCTCCCCATGCT--CTCCTGCCACAGCTGCAGCGGGCCCCTGGGTCCTACATTTGCAG-CCCTTTGTCTCTGAGCT-----CAGACTTCCAATTCCAAGCGGCAGCTGGGCAGGCTCACCAGCATGT---CCAGCCAGTACTAGGACATCAGCAGGAGC----CCAACCACCTCTTTCCAAAATCTCTCCTCATGTCTCTCCTAGTTTCCATCTCCATCCTTCTAGTCAGCCAGGCTGAAAACATT-----------------TGCTCCTCAGGGTGCAGAAGGGAAAGCTTTGCCTCCCTTCCTGGTGCTCACTGCCCCTGCGATTCCAGCCCAAGCCCTCCCCGGCTCCTCACC----------CTGGTGTCAGCTGGAAGCCACCATCTCCTAAACCCACCTGtgttcttccacctctgc--------cagggctgc-cctctcctccaccttcacaaactcaattcctacccattctcaggtcccttatcaaatgccatctcctccatgatgcctccctgattccccTGCTGGAaataatggtgataacagctaag--gcattggggttggctacgtgccaggcaaggagttggcactttacatgctttatctcatttcagccacataacatcgacaggt-ggcattatgattcatatcatccccatctgatagccaggaaaactgagtcccagagaggttagc-cactttcctagggccCTGTGCTCTGACTCAAGCATAGCTCTGAGGAACTCTAGCATTCATCAGTTTAAGCACCATGACTTTCTTTGCTGAGTCACCCAAGGCAT-TTCTTCATTTAAATGTTCTTCCTTGGCCAGGCGCAGTGGCTCAggcccaatgcggtggctcacgcctgtaatctcaacactttgggaggccgaggtgggcagataatctgaggtcaggagttcaagaccagcctggccaacatggtgaaaccccatctctactaaaaatacaaaaaaatgaggctgggcgtgatgactcacacctgtaatcccagcactttgggaggccaaggcaggtggattacatgaggtcaggagttcgagaccagcctggccaacatggtgaaatcctatctctattaaaaatacaaaaaattagccaggcatggtggcaggcacctgtaatcccagctacttgggaggctgaggcaggaaaatggcttgaacccgggaggtggaggttgcagtgagccaaggttgcaccattgcactccagcctgggcaaaaagagggaaacatcgtctaaaaaagaaaaaaaaaaaattagccaggctgggtggtgcatgcccgtaattccagctactcaggaggatgaagcaagagaattgcttgaacccaggaggcagagattacagtgagctgagatcacaacactgcactccagcctaggtaaagaacaagactccatctcaaaaataaataaataaaaataaaTGTTCTTCCTTGCAATGAAGTTAAATATGTAAATTCTCAAACCAGTTGCTTAAGGGCACAGTTTTGTTCTTTACCTATATTTTTAACAAATATTTTATGTAAGTAGTTGAC-AAAATCAAATACTGT-GTACACTACCGAGGCTTCCCTGGGAAAGCCATCAG-CCTCTGCCCCATCCCTTCCCACTCCTGATT-CCACTTTCCTGTGTTTCCATATCTTTTTCATGTCTGTTTCTGGCCCACAGTGGGCGATCAATACATGTTAGCCACCAACCATCAAACCTATATTGAGTAATTATGGTATGTCAGGCACTATGCTCAATGAAATTGTAttaggcttgtacaaaagtaattgtggtttttaagagtaatggcaaaaacggcagttactttcgcaccaacTATTTGCTGCCTTGAATTATTCCTCCTCTC-CTCATCCCTAAACCCTGCTCCTCCCAGCCATTCTTCCTCCCCTTCTTGGGCCATGGCCAGGCCCCACCCAGGTACTAAGACTCAGGTGAACCAAGGAAGACTTAATGCCCACTCTTTTCTGATGCCCATGTT--GGCATGTGTTAAGtcggttagcattaagtttggctgcatttagcagagacccaaaagaacagtgccttttaaaaggcagaggttatgtctctcacacacacccagcacaagtccaag-------------------------accagcatggcatctcagctccatcaa--cctcaggaaccgagctcctgcagctccctgccctgcagttgataaggtgaggtctttgtcctcctggttcaagatggtgctagaatgttggctaccatatctatagtccaggcatcagaatggagcaagggatgaaaaaggaagagatgaaggcacacgacaggttcctgagagctggcacaggacacttctgcttatatttcactggccagaacttagtcacatggtcacacctagttgggagactctgagaagtaa----agtatttattctagatggccatatccctacc-taagacttggagttttctatgactggggaagaacggaagacaagatattgggaaagactagcagcctctactaAAAGGGTGATCtgtgttgatgtgcgtgtgtgtgtgatgtttgtatg---agcatgtgtgt-tatgtgttgt--gtgtTGGTGGGGCA--GATTCTTGCGAGCACTTTGGTCTCAGATGGACCTGCTACCAGTTCTCTCTGCAGACCCCCATAGGTTTCTCCTAAACCTGGCCT-CTCCTATTAGGCAGCCTTACTCAGCGGCAGCTTCTCAGCTCCATGTTTTCAAGGAACCACAATTTATTTCCAGCATCCACTGAAGCATATTATCAGTGGTGATAGAGGGGGCTTGTAAAACTGTTTTTCCACTTAGGTATTAGAGGGTGGCCATTATTTGAGAGTGAC-----TATGACCACAGTTAATCTGGTAATAAATTCTCTTGGGTAGGAGGGGGAAAGGAAAGGATGCTTTAAGGAAGCATCTTGCCAGGAGACACAAAGCTAACAAGAGTGGAGCCTGCAG----------------------------CTGGAGCCGCAGAGCCTAATCACTACACCCGCCCATCTCTGCTAGGGTTTCATGACTTCGTATCGGGGATTAGCAGTATTTAACTCTGTTGCACAAACATTTGGTGTA-----TTATTCAGGTAACAAGTAGCTAATAGAGGAAGTTTTACTTTTTTAAGACATAA--------------------ATTTGCCTTTTCCCAAATTACTTGGTACATAGTAC-TTTTCATGTTTGAAGTTGAGATGTGGGTACAATACCATAGCTTTATTCCAGAGCAGGGTATTTGTTTCCAAATGCCATGTTCCCAGCAGCTGCCCTTGACTGGGAATTGGGGTG-----TGATTTGGGCTTTTCCTTAAATCCTTGA-----GGAGCTGGA---GGGGTGGGTGGCTCGCACTCCTGCTTTctgg---------atctgaatc--------------ctgactctgtcatggacctgtt-tgactttgggcaagttgactcctattcctgagccccatat-ttttctcttctgtgaaattcagattaaaaA-AACATGGCTTTGATCAAACATTATAAATAATATATAGACAGACTGCTTGTTTTTATTGTATTGCCAG-AAATGAATCCTACTAATATTGCCATCTATGGACAGAAAATGTATTACCTGTCTTCATCAAGACCCAGACGAGGAAGAACACGAAAAGCGGAGATTAATTTTACTGCCATCTCCAGAACCGTCATCCTAATATTTACTTACAT-TTTATTATTATTTCAGGCTCATGCACATATACTTAGCATGGATCATTGGCCACAGACTCGCATACATTTAACTTTATTACCTTT-TGCCTCATGTATCTCATTAAAATTTTGCTGCTTAATCAAGGATCTGCATATTATTTTAATTTTAGAATTCACAGTTCCAAGACTTTGAAAGTTTCAAGCGTTCTGGGTGaatgtgttatgc--tctctcccgccaccatgtctttataccccctgatttctcagccact-atggcaaccactttctactcttagtagcccatatttag--tccaatccccagctcaggagacacttcttccaggg--agccccctgtgccttccagtagtatcttgtacctgccctttttgcaaagctctttcctcctggcttagaatggcccattgacctgtttgtttctcctattaaactgtaagccactcgagggtagagagcatctgttgttcaccattgcatcctcggtgctgagcactgcgtctgacatattatttagaaggtcagtaagtgctagtgggatTCAGGCTCCCAGTGGGTGGGAGAGAAAGGACGTAAGGAAGCAAGTGGTAAAGGCCCTCACAGA-GTATCAGCAGGCTGGTGTGA-GGGAGAAATGCAGAGGATGGGTGAGTAGCA-----TAATCGCTAATGAT-AGGGTAATGATAGAGCACATTTCACAACACCTTt-aagccctttcacgtgcatcagataatttgatcctcataaaagcctagagatagatatattacagg-gatgaaggtggagtattttgtggttatgtgatatg-tttaaaattatgcagtgagtaaatgactgggttcaaaccagaccttaaaagtctgttatctttccCTCG-AGCATGCAATGAAGTCTACATCATCCCTACCATGTCCATTTGATCACACCCTGGCCTCACAGCTCTGTGGTCTACAGGATACCTCATGGTGGTTTTATTGACCAGACAATAATCCTCTTTCTAAGGGGATGCATTTCATTAATACATATGTAGATCATGAATTGTCTTTGACTTTGAGGGGATGGTAGC----CAGAGCAGAAAGCAAAGCTGATTTTCATCCCCGTCTGGTAATGTGGTTGGTAATGTGAAGA-TGGGTGTATTCTGAGATACCGGCTCCTTGCAGTGTGTGGTTCCTTCTGTTTTCAGGCCC------AAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACGGTAAGGAG---AGTATGCGGGGACAAA---GTAGAACTGCAGCCAGCCCAGCACTGGCTCCTAGTGGCACT-GGACC-CAGATAGtccaagaaacatttattgaacgcctcctgaatgccaggcacctactggaagctgagaaGGATTTGAAAGCACAGGGC-TCCACTCTTTCTGGTTGTTTCTTTTGGCCCCTCTGCCTGCTGAGATTCCAGGGGTTAGTGG--------------------------------------------------------TTCTAATTCTAAACCACTCCAAGAACATTTGATTTTGCTACATGTTTCCATTTAAAAATCATAGGATTTGggctgggtgtggtggcttgtacctgtcatcccagcactttgggaggccaaagcaggaggatcattcgagcccaagagttcgagaccagcctgggcagcatagggagaccccatctctacaaaaataataaaaaatgttagctgggcatggtggtgtgtacctgtggtcccagctaggggaggctgagatggaaggatcacctgagcctgggaggttgaggctgcagtgggccctgatcatgccaccgtgctccagcctgggtgacagagtgagaccttgtctcaaaataaataaataaataaataaaAGTCATAGGATTTgatcaggcatgatgggtcacatctgtaagcccattgctttaggaggccaaggtaggaggatcagttgaggccaggagttcaagaccagcctgggcaacatggcaagacctctctctctaatttttaaaaaaataaaaaTTAAAGATAAGAAAAAAATCATAGGATTCTCATGAGGCCTCACGTGCTTATTTTCAACCTACCAAGGGGAAACCCAGGCCTCAGCGATTAGCTGAGC----------CACATGCAGGCACAG------------------------CCACTG-----TCTCTTTCCTTCCTGTCCCCTCTGTCCCCACCTTCTGCGCTCGCCTTCCTCCCTGACTTCACTTCCTTGAATCTTAGTGCCTACGACCAGAGGGAGCTGTGAAGTTCCTTG----TGTCCCATTGGCAGGAA-CAAGACCCCCAGAAGCATCTCCTCAGGGC------CTCTA-----TCCCATCTC-TAGATGTGCTTGTCATTAGG-Gttct-------------tgtagttccagctgatctctggccctgccgctcaaagatacccaaaagagcgagtctaccctttttcacattcaaccctctactgatttgcaaatagcagtcagtgcccaccctggtcttttctctggggtccagcaggcctagaccttcagccattttcctgatgaGGTCTGTAtttgaaattaggaagattaagtttgaatcttcacacttctgat----gtctgtgagatcttcagcaagttccttact--gtctttaagccttgt-tttcatcatctggataatggggatatcacacacta-ttcacaaggttgttatgaggcctaaattagctaaagcaATTGAATCCTCCTTACCCCCTGCATGGAGCTCTCTGGAGACTTCCACGTCTCCTGGTCATTGTGGGTGTCTTATGGTA-GTCTTGGGCAGTTAGGGAGAAGTTAGGTGTCTGGAAGCAAAGATGGCTCAGAACTAGATAGAGTC-TTGGGCATTTTATA-GATAAAAACTCTT--GTCTCCtttaaaaataataaaaaaaaattaGCTGGGCATATTAGCCACTCAGCAAGACTGCACGTGATAGATCCCGAGTGCCCCACCTTGGGTGGTGTAATACACAATATCACGGGAGCCCCGGGTAGTAACCACGGAGGTGTCAGCCTCAGTGCTGTGGGCAGATG-GATGGGGAGAGCC--TCCCGG-AACTGGAGTCACTGGAGCA----------------------------GGGTTGGGGGGCCTCACTGAGGGTACGGCCTTGATCTCTAAGGAGGAGGGACTGCCTGGAAAAGC-TGACTGGGAGGGAGGACTCGGCTGGGGGTAGAAGGGA----------CTAGGGAAGGCTGGGGGTGGGGGTGCTTATGGAGGACCTCAGATGCCTGGGGAACAGACTCCACTAAATAAAACATATGAAACCATGGCTGGTTCTTCAGCAGAGGCCATGTAGAGAAAGGAATGACCTAGGAAAGTTGGCCTGGAAGTGGAGGGAAGGATGGTGTGGGAAAAGCAGGAA--------TCTCGGAGACCAGCTTAGAGGCTTGGCAGTCACCTGGGTGCAGG-ATACAAGGGCCTGAGCCAAAGTGGTGAGGGAGGGTGGAAGGAGGCAGCCCAGAGAATGACCCTCCATGCCCACGGGGAAGGCAGAGGGCTCT-GAGAGCGA--TTCCTCCCACATG-CT-GAGCACTTGTTCTCCCTCTTCCTCCTGCATAGCAGTCAGTCTCCTCCAAACAGAAAGTCACCGGTTTGGACTTCATTCCTGGGCTCCACCCCATCCTGACCTTATCCAAGATGGACCAGACACTGGCAGTCTACCAACAGATCCTCACCAGTATGCCTTCCAGAAACGTGATCCAAATATCCAACGACCTGGAGAACCTCCGGGATCTTCTTCACGTGCTGGCCTTCTCTAAGAGCTGCCACTTGCCCTGGGCCAGTGGCCTGGAGACCTTGGACAGCCTGGGGGGTGTCCTGGAAGCTTCAGGCTACTCCACAGAGGTGGTGGCCCTGAGCAGGCTGCAGGGGTCTCTGCAGGACATGCTGTGGCAGCTGGACCTCAGCCCTGGGTGCTGAGGCCTTGAAGGTCACTCTTCCTGCAAGGACTACGTTAAGGGAAGGAACTCTGGCTTCCAGGTATCTCCAGGATTGAAGAGCATTGCATGGACACCCCTTATCCAGGACTCTGTCAATT--TCCCTGACTCCTCTAAGCCACTCTTCCAAAGGCATAAGACCCTAAGCCTCCTTTTGCTTGAAACCAAAGATATATACACAGGATCCTATTCTCACCAGGAAGGGGG-TCCACCC-AGCAAAGAGTGGGCTGCATCTGGGATTCCCACCAAGGTCTTCAGCCATCA---ACAAGAGTTGTCTTGTCCCCTCT-TGACCCATCT-----------------CCCCCTCACTGAATGCCTCAATGTGACCAGGGGTGATTTCAGAGAGGGCAGAGGGGTAGGCAGAGCCTTTGGATGACCA--GAACAAGGTTCCCTCTGAGAATTCCAAGGAGTTCCATGAAGACCACATCCACACACG--CAGGAACTCCC--AGCAACACAAGCTGGAA---GCACATGTTTATTTATTCTGCATTTTATTCTGGATGGATTTGAAGCAAAGCACCAGCTTCTCCAGGCTCTTTGGGGTCAGCCAGGGCCAGGGGTCTCCCTGGAGTGCAGTTTCCAATCCCATAGATGGGTC-TGGCTGAGCTGAACCCA---TTTTGAGTGACT----CGAGGGTTGGG-TTCATCTGAGCAAGAGCTGGCAAAGGTGGCTCTCCAGTTAGTTCTCTCGTAACTGGTTTCATTTCTACTGTGACTGATGTTACATCACAGTGTTTGCAATGGTGTTGCCCTGAGTGGATCTCCAAGGACCAGGTTATTTTAAAA---AGATTTGTTTTGTCAAGTGTCATATGTAGGTGTCTGCACCCAGGGGTGGG-GAATGTTTGGGCAGAAGGGAGAAGGATCTAGAATGTGTTTTCTGAATAACATTTGTGTGGTGGGTTCTTTGGAAGGAGTGAGA-TCATTTTCTTATCTTCTGCAATTGCTTAGGATGTTTTTCATGAAAA------------TAGCTCTTTCAG-GGGGGTTGTGAGGCCTGGCCAGGCACCCCCTGGAGAGAAGTTTCTGGCCCTGGCTGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAAAGGGCTGAAAGCCATTTGTTGGGGCAGTGGTAAGCTCTGGCTTTCTCCGACTGCTAGGGAGTGGTCTTTCCTATCATGGAGTGACGGTCCCACACTGGTGACTGCGATCTTCAGAGCAGGGGTCCTTGGTGT-GACCCTCTGAATGGTCCAGGGTTGATCACACTCTGGGTTTATTACATGGCAG-----TGTTCCTATTTGGGGCTTGCATGCCAAATTGTAGTTCTTGTCTGATTGGCTCACCC-AAGCAAGGCCAAAATTACCAAAAATCTTGGGGGG--TTTTTACTC-CAGTGGTGAAGAAAACTCCTTTAGCAGG-TGGTCCTGAGACCT-GACAAGCACTGCTAGGCGAGTGCCAGGACTCCCCAGGCCAGGCCACCAGGATGGCCCTTCCCACTGGAGGTCACATTCAGGAAGATGAAAGAGGAGGTTTGGGGTCTGCCACCATCCTGCTGCTGTGTTTTTGCTATCACACAGTGGGTGGTGGATCTGTCCAAGGAAACTTGAATCAAAGCAGTTAAC-TTTAAGactgagcacctgcttcatgctcagccctgactggtgctataggctggagaagctcacccaataaacattaagatt-gaggcctgccctcagggatcttgcattcccagtggTCAAACC-GCACTCACCCATGTGCCAAGGTGGGGTA-TTTACCACAGCAG--CTGAACAGCCAAATGCATGGTGCAGTTGACAGCAGGTGGGAAATGGTATGAGCTGAGGGGGGCCGTGCCCAGGGGCCCACAGG-GAACCCTGCTTGCACTTTGTAACATGTTTA-----CTTTTCagggcatcttagctt---ctatta-----tagccacatccctttga---aacaagataactgagaatttaaaaataagaa-----aata--TGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAAATGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCATGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAGACCCCAAAGAGCCTGGAGAAGCTGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCAGATGCTTTGCTTCAAATCCATCCAGAATAAAACGCA diff -r 000000000000 -r 0b4e36026794 test-data/a.tab --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/a.tab Mon May 19 12:34:07 2014 -0400 @@ -0,0 +1,15 @@ +CHR SNP BP A1 TEST NMISS BETA STAT P +1 rs1181876 3671541 T DOMDEV 958 -1.415 -3.326 0.0009161 +1 rs10492923 5092886 C ADD 1007 5.105 4.368 1.382e-05 +1 rs10492923 5092886 C DOMDEV 1007 -5.612 -4.249 2.35e-05 +1 rs10492923 5092886 C GENO_2DF 1007 NA 19.9 4.775e-05 +1 rs1801133 11778965 T ADD 1022 1.23 3.97 7.682e-05 +1 rs1801133 11778965 T GENO_2DF 1022 NA 16.07 0.0003233 +1 rs1361912 12663121 A ADD 1021 12.69 4.093 4.596e-05 +1 rs1361912 12663121 A DOMDEV 1021 -12.37 -3.945 8.533e-05 +1 rs1361912 12663121 A GENO_2DF 1021 NA 17.05 0.0001982 +1 rs1009806 19373138 G ADD 1021 -1.334 -3.756 0.0001826 +1 rs1009806 19373138 G GENO_2DF 1021 NA 19.36 6.244e-05 +1 rs873654 29550948 A DOMDEV 1012 1.526 3.6 0.0003339 +1 rs10489527 36800027 C ADD 1016 12.67 4.114 4.211e-05 +1 rs10489527 36800027 C DOMDEV 1016 -13.05 -4.02 6.249e-05 diff -r 000000000000 -r 0b4e36026794 test-data/solexa.tabular --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/solexa.tabular Mon May 19 12:34:07 2014 -0400 @@ -0,0 +1,30 @@ +5 300 902 419 GACTCATGATTTCTTACCTATTAGTGGTTGAACATC +5 300 880 431 GTGATATGTATGTTGACGGCCATAAGGCTGCTTCTT +5 300 896 461 GTTGTCGATAGAACTTCATGTGCCTGTAAAACAAGT +5 300 890 751 ACCAACCAGAACGTGAAAAAGCGTCCTGCGTGTAGC +5 300 897 443 GTTTATGTTGGTTTCATGGTTTTGTCTAACTTTATC +5 300 906 879 GCTTTACCGTCTTTCCAGAAATTGTTCCAAGTATCG +5 300 894 484 GCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGC +5 300 890 811 GTTATAACGCCGAAGCGGTAAAAATTTTTATTTTTT +5 300 889 530 GTTCTCACTTCTGTTACTCCAGCTTCTTCGGCACCT +5 300 898 886 GTGGCCTGTTGATTCTAAAGGTTAGTTTCTTCACGC +5 300 878 205 GTGACCGCATAAAGTGCACAACATGGAAATGAGGAC +5 300 920 381 GCAGATCGCTACACGCAGGACGCTTTTTCACGTTCT +5 300 886 453 GCTCGTTATGGTTTCCGTTGCTGCCATCTCAAAAAC +5 300 893 365 GTTGACGGCCATAAGGCTGCTTCTGACGTTCGTGAT +5 300 892 801 GTCAAGGACTGGTTTAGATATGAGTCACATTTTGTT +5 300 881 945 GTGCTGAGTTTTTTTCTGTTACTGTGACATTAATTT +5 300 883 811 GACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGG +5 300 899 654 GGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCT +5 300 883 644 GAGTCTCATTTTGCATCTCGGCAATCTCTTTCTGAT +5 300 879 877 GTCATAAGAGGTTTTACCTCCAAATGAAGAAATAAC +5 300 900 821 GCTGGTAATGGTGGTTTTTTTTTTTTTTTTTTTTTT +5 300 886 965 GTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTA +5 300 933 442 GAGGAGAGTGCAGGTATTAATATCAAGGTTTGTGAG +5 300 879 782 GGCGACTTCACGCCAGAATACGAAATACCAGGTATT +5 300 923 441 GCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGA +5 300 877 772 GGGATGAACATAATAAGCAATGACGGCAGCAATAAA +5 300 918 831 GTATTTTACCAATGACCAAATCAAAGAAATGACTCG +5 300 904 940 GTTTTTAGTGAGTTGTTCCATTCTTTAGCTCCTAGA +5 300 956 880 GTATTGATAAAGCTGTTGCCGATACTTAGCACTATT +5 300 883 755 GCGTACTTATTCGCCACCATGATTATTACCAGTGTT diff -r 000000000000 -r 0b4e36026794 test-data/tabular_to_fasta_out1.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/tabular_to_fasta_out1.fasta Mon May 19 12:34:07 2014 -0400 @@ -0,0 +1,60 @@ +>5_300_902_419 +GACTCATGATTTCTTACCTATTAGTGGTTGAACATC +>5_300_880_431 +GTGATATGTATGTTGACGGCCATAAGGCTGCTTCTT +>5_300_896_461 +GTTGTCGATAGAACTTCATGTGCCTGTAAAACAAGT +>5_300_890_751 +ACCAACCAGAACGTGAAAAAGCGTCCTGCGTGTAGC +>5_300_897_443 +GTTTATGTTGGTTTCATGGTTTTGTCTAACTTTATC +>5_300_906_879 +GCTTTACCGTCTTTCCAGAAATTGTTCCAAGTATCG +>5_300_894_484 +GCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGC +>5_300_890_811 +GTTATAACGCCGAAGCGGTAAAAATTTTTATTTTTT +>5_300_889_530 +GTTCTCACTTCTGTTACTCCAGCTTCTTCGGCACCT +>5_300_898_886 +GTGGCCTGTTGATTCTAAAGGTTAGTTTCTTCACGC +>5_300_878_205 +GTGACCGCATAAAGTGCACAACATGGAAATGAGGAC +>5_300_920_381 +GCAGATCGCTACACGCAGGACGCTTTTTCACGTTCT +>5_300_886_453 +GCTCGTTATGGTTTCCGTTGCTGCCATCTCAAAAAC +>5_300_893_365 +GTTGACGGCCATAAGGCTGCTTCTGACGTTCGTGAT +>5_300_892_801 +GTCAAGGACTGGTTTAGATATGAGTCACATTTTGTT +>5_300_881_945 +GTGCTGAGTTTTTTTCTGTTACTGTGACATTAATTT +>5_300_883_811 +GACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGG +>5_300_899_654 +GGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCT +>5_300_883_644 +GAGTCTCATTTTGCATCTCGGCAATCTCTTTCTGAT +>5_300_879_877 +GTCATAAGAGGTTTTACCTCCAAATGAAGAAATAAC +>5_300_900_821 +GCTGGTAATGGTGGTTTTTTTTTTTTTTTTTTTTTT +>5_300_886_965 +GTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTA +>5_300_933_442 +GAGGAGAGTGCAGGTATTAATATCAAGGTTTGTGAG +>5_300_879_782 +GGCGACTTCACGCCAGAATACGAAATACCAGGTATT +>5_300_923_441 +GCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGA +>5_300_877_772 +GGGATGAACATAATAAGCAATGACGGCAGCAATAAA +>5_300_918_831 +GTATTTTACCAATGACCAAATCAAAGAAATGACTCG +>5_300_904_940 +GTTTTTAGTGAGTTGTTCCATTCTTTAGCTCCTAGA +>5_300_956_880 +GTATTGATAAAGCTGTTGCCGATACTTAGCACTATT +>5_300_883_755 +GCGTACTTATTCGCCACCATGATTATTACCAGTGTT