annotate fastx_barcode_splitter.pl @ 4:0fb7e9130a70 draft default tip

planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
author lparsons
date Mon, 02 May 2016 17:04:32 -0400
parents e7b7cdc1834d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
1 #!/usr/bin/perl
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
2
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
3 # FASTX-toolkit - FASTA/FASTQ preprocessing tools.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
4 # Copyright (C) 2009 A. Gordon (gordon@cshl.edu)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
5 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
6 # Lance Parsons (lparsons@princeton.edu)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
7 # 3/21/2011 - Modified to accept separate index file for barcodes
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
8 # 4/6/2011 - Modified to cleanup bad barcode identifiers (esp. useful for Galaxy)
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
9 # 4/28/2016 - Modified summary output to remove file paths and add comment
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
10 # character '#'
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
11
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
12 # This program is free software: you can redistribute it and/or modify
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
13 # it under the terms of the GNU Affero General Public License as
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
14 # published by the Free Software Foundation, either version 3 of the
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
15 # License, or (at your option) any later version.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
16 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
17 # This program is distributed in the hope that it will be useful,
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
18 # but WITHOUT ANY WARRANTY; without even the implied warranty of
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
19 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
20 # GNU Affero General Public License for more details.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
21 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
22 # You should have received a copy of the GNU Affero General Public License
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
23 # along with this program. If not, see <http://www.gnu.org/licenses/>.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
24
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
25 use strict;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
26 use warnings;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
27 use IO::Handle;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
28 use Data::Dumper;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
29 use Getopt::Long;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
30 use Carp;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
31
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
32 ##
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
33 ## This program splits a FASTQ/FASTA file into several smaller files,
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
34 ## Based on barcode matching.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
35 ##
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
36 ## run with "--help" for usage information
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
37 ##
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
38 ## Assaf Gordon <gordon@cshl.edu> , 11sep2008
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
39
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
40 # Forward declarations
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
41 sub load_barcode_file ($);
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
42 sub parse_command_line ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
43 sub match_sequences ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
44 sub mismatch_count($$) ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
45 sub print_results;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
46 sub open_and_detect_input_format;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
47 sub open_index_and_detect_input_format($);
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
48 sub read_index_record;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
49 sub read_record;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
50 sub write_record($);
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
51 sub usage();
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
52
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
53 # Global flags and arguments,
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
54 # Set by command line argumens
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
55 my $barcode_file ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
56 my $barcodes_at_eol = 0 ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
57 my $barcodes_at_bol = 0 ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
58 my $index_read_file ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
59 my $exact_match = 0 ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
60 my $allow_partial_overlap = 0;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
61 my $allowed_mismatches = 1;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
62 my $newfile_suffix = '';
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
63 my $newfile_prefix ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
64 my $quiet = 0 ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
65 my $debug = 0 ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
66 my $fastq_format = 1;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
67 my $index_fastq_format = 1;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
68 my $read_id_check_strip_characters = 1;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
69
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
70 # Global variables
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
71 # Populated by 'create_output_files'
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
72 my %filenames;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
73 my %files;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
74 my %counts = ( 'unmatched' => 0 );
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
75 my $barcodes_length;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
76 my @barcodes;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
77 my $input_file_io;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
78
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
79
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
80 # The Four lines per record in FASTQ format.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
81 # (when using FASTA format, only the first two are used)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
82 my $seq_name;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
83 my $seq_bases;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
84 my $seq_name2;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
85 my $seq_qualities;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
86
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
87 # Values used for index read file
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
88 my $index_seq_name;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
89 my $index_seq_bases;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
90 my $index_seq_name2;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
91 my $index_seq_qualities;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
92
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
93
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
94
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
95 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
96 # Start of Program
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
97 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
98 parse_command_line ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
99
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
100 load_barcode_file ( $barcode_file ) ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
101
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
102 open_and_detect_input_format;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
103
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
104 if (defined $index_read_file) {open_index_and_detect_input_format ( $index_read_file );}
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
105
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
106 match_sequences ;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
107
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
108 print_results unless $quiet;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
109
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
110 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
111 # End of program
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
112 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
113
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
114
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
115
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
116
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
117
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
118
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
119
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
120
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
121 sub parse_command_line {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
122 my $help;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
123
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
124 usage() if (scalar @ARGV==0);
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
125
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
126 my $result = GetOptions ( "bcfile=s" => \$barcode_file,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
127 "eol" => \$barcodes_at_eol,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
128 "bol" => \$barcodes_at_bol,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
129 "idxfile=s" => \$index_read_file,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
130 "idxidstrip=i" => \$read_id_check_strip_characters,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
131 "exact" => \$exact_match,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
132 "prefix=s" => \$newfile_prefix,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
133 "suffix=s" => \$newfile_suffix,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
134 "quiet" => \$quiet,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
135 "partial=i" => \$allow_partial_overlap,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
136 "debug" => \$debug,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
137 "mismatches=i" => \$allowed_mismatches,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
138 "help" => \$help
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
139 ) ;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
140
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
141 usage() if ($help);
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
142
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
143 die "Error: barcode file not specified (use '--bcfile [FILENAME]')\n" unless defined $barcode_file;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
144 die "Error: prefix path/filename not specified (use '--prefix [PATH]')\n" unless defined $newfile_prefix;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
145
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
146 if (! defined $index_read_file) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
147 if ($barcodes_at_bol == $barcodes_at_eol) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
148 die "Error: can't specify both --eol & --bol\n" if $barcodes_at_eol;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
149 die "Error: must specify either --eol or --bol or --idxfile\n" ;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
150 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
151 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
152 elsif ($barcodes_at_bol || $barcodes_at_eol) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
153 die "Error: Must specify only one of --idxfile, --eol, or --bol";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
154 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
155
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
156 die "Error: invalid for value partial matches (valid values are 0 or greater)\n" if $allow_partial_overlap<0;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
157
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
158 $allowed_mismatches = 0 if $exact_match;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
159
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
160 die "Error: invalid value for mismatches (valid values are 0 or more)\n" if ($allowed_mismatches<0);
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
161
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
162 die "Error: partial overlap value ($allow_partial_overlap) bigger than " .
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
163 "max. allowed mismatches ($allowed_mismatches)\n" if ($allow_partial_overlap > $allowed_mismatches);
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
164
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
165
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
166 exit unless $result;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
167 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
168
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
169
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
170
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
171 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
172 # Read the barcode file
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
173 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
174 sub load_barcode_file ($) {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
175 my $filename = shift or croak "Missing barcode file name";
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
176
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
177 open BCFILE,"<$filename" or die "Error: failed to open barcode file ($filename)\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
178 while (<BCFILE>) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
179 next if m/^#/;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
180 chomp;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
181 my ($ident, $barcode) = split('\t') ;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
182
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
183 $barcode = uc($barcode);
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
184
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
185 # Sanity checks on the barcodes
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
186 die "Error: bad data at barcode file ($filename) line $.\n" unless defined $barcode;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
187 die "Error: bad barcode value ($barcode) at barcode file ($filename) line $.\n"
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
188 unless $barcode =~ m/^[AGCT]+$/;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
189
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
190 # Cleanup Identifiers (only allow alphanumeric, replace others with dash '-')
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
191 $ident =~ s/[^A-Za-z0-9]/-/g;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
192 die "Error: bad identifier value ($ident) at barcode file ($filename) line $. (must be alphanumeric)\n"
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
193 unless $ident =~ m/^[A-Za-z0-9-]+$/;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
194
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
195 die "Error: badcode($ident, $barcode) is shorter or equal to maximum number of " .
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
196 "mismatches ($allowed_mismatches). This makes no sense. Specify fewer mismatches.\n"
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
197 if length($barcode)<=$allowed_mismatches;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
198
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
199 $barcodes_length = length($barcode) unless defined $barcodes_length;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
200 die "Error: found barcodes in different lengths. this feature is not supported yet.\n"
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
201 unless $barcodes_length == length($barcode);
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
202
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
203 push @barcodes, [$ident, $barcode];
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
204
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
205 if ($allow_partial_overlap>0) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
206 foreach my $i (1 .. $allow_partial_overlap) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
207 substr $barcode, ($barcodes_at_bol)?0:-1, 1, '';
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
208 push @barcodes, [$ident, $barcode];
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
209 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
210 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
211 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
212 close BCFILE;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
213
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
214 if ($debug) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
215 print STDERR "barcode\tsequence\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
216 foreach my $barcoderef (@barcodes) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
217 my ($ident, $seq) = @{$barcoderef};
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
218 print STDERR $ident,"\t", $seq ,"\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
219 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
220 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
221 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
222
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
223 # Create one output file for each barcode.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
224 # (Also create a file for the dummy 'unmatched' barcode)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
225 sub create_output_files {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
226 my %barcodes = map { $_->[0] => 1 } @barcodes; #generate a uniq list of barcode identifiers;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
227 $barcodes{'unmatched'} = 1 ;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
228
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
229 foreach my $ident (keys %barcodes) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
230 my $new_filename = $newfile_prefix . $ident . $newfile_suffix;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
231 $filenames{$ident} = $new_filename;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
232 open my $file, ">$new_filename" or die "Error: failed to create output file ($new_filename)\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
233 $files{$ident} = $file ;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
234 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
235 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
236
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
237 sub match_sequences {
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
238
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
239 my %barcodes = map { $_->[0] => 1 } @barcodes; #generate a uniq list of barcode identifiers;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
240 $barcodes{'unmatched'} = 1 ;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
241
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
242 #reset counters
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
243 foreach my $ident ( keys %barcodes ) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
244 $counts{$ident} = 0;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
245 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
246
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
247 create_output_files;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
248
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
249 # Read file FASTQ file
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
250 # split accotding to barcodes
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
251 while ( read_record ) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
252 chomp $seq_name;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
253 chomp $seq_bases;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
254 if (defined $index_read_file) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
255 read_index_record() or die "Error: Unable to read index sequence for sequence name ($seq_name), check to make sure the file lengths match.\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
256 chomp $index_seq_name;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
257 chomp $index_seq_bases;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
258
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
259 # Assert that the read ids match
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
260 my $seq_name_match = &strip_read_id($seq_name);
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
261 my $index_seq_name_match = &strip_read_id($index_seq_name);
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
262 if ($seq_name_match ne $index_seq_name_match) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
263 die "Error: Index sequence name ($index_seq_name) does not match sequence name ($seq_name)\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
264 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
265
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
266 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
267
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
268 print STDERR "sequence $seq_bases: \n" if $debug;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
269
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
270 my $best_barcode_mismatches_count = $barcodes_length;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
271 my $best_barcode_ident = undef;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
272
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
273 #Try all barcodes, find the one with the lowest mismatch count
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
274 foreach my $barcoderef (@barcodes) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
275 my ($ident, $barcode) = @{$barcoderef};
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
276
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
277 # Get DNA fragment (in the length of the barcodes)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
278 # The barcode will be tested only against this fragment
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
279 # (no point in testing the barcode against the whole sequence)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
280 my $sequence_fragment;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
281 if ($barcodes_at_bol) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
282 $sequence_fragment = substr $seq_bases, 0, $barcodes_length;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
283 } elsif ($barcodes_at_eol) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
284 $sequence_fragment = substr $seq_bases, - $barcodes_length;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
285 } else {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
286 $sequence_fragment = substr $index_seq_bases, 0, $barcodes_length;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
287 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
288
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
289 my $mm = mismatch_count($sequence_fragment, $barcode) ;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
290
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
291 # if this is a partial match, add the non-overlap as a mismatch
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
292 # (partial barcodes are shorter than the length of the original barcodes)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
293 $mm += ($barcodes_length - length($barcode));
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
294
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
295 if ( $mm < $best_barcode_mismatches_count ) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
296 $best_barcode_mismatches_count = $mm ;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
297 $best_barcode_ident = $ident ;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
298 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
299 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
300
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
301 $best_barcode_ident = 'unmatched'
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
302 if ( (!defined $best_barcode_ident) || $best_barcode_mismatches_count>$allowed_mismatches) ;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
303
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
304 print STDERR "sequence $seq_bases matched barcode: $best_barcode_ident\n" if $debug;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
305
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
306 $counts{$best_barcode_ident}++;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
307
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
308 #get the file associated with the matched barcode.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
309 #(note: there's also a file associated with 'unmatched' barcode)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
310 my $file = $files{$best_barcode_ident};
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
311
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
312 write_record($file);
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
313 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
314 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
315
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
316 # Strip end of readids when matching to avoid mismatch between read 1, 2, 3, etc.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
317 sub strip_read_id {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
318 my $read_id = shift;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
319 my $stripped_read_id = $read_id;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
320 if ($read_id_check_strip_characters) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
321 if ($read_id =~ /@([^:]+):([0-9]+):([^:]+):([0-9]+):([0-9]+):([0-9]+):([0-9]+) ([0-9]+):([YN]):([0-9]+):([ACGT]+){0,1}/) { # CASAVA 1.8+
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
322 my @parts = split(/ /,$read_id);
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
323 $stripped_read_id = $parts[0];
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
324 } else { # CASAVA 1.7 and earlier
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
325 $stripped_read_id = substr($read_id, 0, length($read_id)-$read_id_check_strip_characters);
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
326 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
327 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
328 return $stripped_read_id;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
329 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
330
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
331
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
332 #Quickly calculate hamming distance between two strings
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
333 #
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
334 #NOTE: Strings must be same length.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
335 # returns number of different characters.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
336 #see http://www.perlmonks.org/?node_id=500235
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
337 sub mismatch_count($$) { length( $_[ 0 ] ) - ( ( $_[ 0 ] ^ $_[ 1 ] ) =~ tr[\0][\0] ) }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
338
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
339
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
340
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
341 sub print_results
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
342 {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
343 print "# Barcode\tCount\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
344 my $total = 0 ;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
345 foreach my $ident (sort keys %counts) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
346 print $ident, "\t", $counts{$ident},"\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
347 $total += $counts{$ident};
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
348 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
349 print "total\t",$total,"\n";
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
350 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
351
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
352
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
353 sub read_record
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
354 {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
355 $seq_name = $input_file_io->getline();
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
356
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
357 return undef unless defined $seq_name; # End of file?
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
358
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
359 $seq_bases = $input_file_io->getline();
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
360 die "Error: bad input file, expecting line with sequences\n" unless defined $seq_bases;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
361
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
362 # If using FASTQ format, read two more lines
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
363 if ($fastq_format) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
364 $seq_name2 = $input_file_io->getline();
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
365 die "Error: bad input file, expecting line with sequence name2\n" unless defined $seq_name2;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
366
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
367 $seq_qualities = $input_file_io->getline();
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
368 die "Error: bad input file, expecting line with quality scores\n" unless defined $seq_qualities;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
369 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
370 return 1;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
371 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
372
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
373 sub write_record($)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
374 {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
375 my $file = shift;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
376
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
377 croak "Bad file handle" unless defined $file;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
378
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
379 print $file $seq_name,"\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
380 print $file $seq_bases,"\n";
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
381
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
382 #if using FASTQ format, write two more lines
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
383 if ($fastq_format) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
384 print $file $seq_name2;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
385 print $file $seq_qualities;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
386 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
387 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
388
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
389 sub open_and_detect_input_format
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
390 {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
391 $input_file_io = new IO::Handle;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
392 die "Failed to open STDIN " unless $input_file_io->fdopen(fileno(STDIN),"r");
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
393
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
394 # Get the first characeter, and push it back
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
395 my $first_char = $input_file_io->getc();
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
396 $input_file_io->ungetc(ord $first_char);
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
397
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
398 if ($first_char eq '>') {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
399 # FASTA format
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
400 $fastq_format = 0 ;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
401 print STDERR "Detected FASTA format\n" if $debug;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
402 } elsif ($first_char eq '@') {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
403 # FASTQ format
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
404 $fastq_format = 1;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
405 print STDERR "Detected FASTQ format\n" if $debug;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
406 } else {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
407 die "Error: unknown file format. First character = '$first_char' (expecting > or \@)\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
408 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
409 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
410
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
411
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
412 sub open_index_and_detect_input_format($) {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
413 my $filename = shift or croak "Missing index read file name";
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
414
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
415 open IDXFILE,"<$filename" or die "Error: failed to open index read file ($filename)\n";
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
416
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
417 # Get the first line, and reset file pointer
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
418 my $first_line = <IDXFILE>;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
419 my $first_char = substr($first_line, 0, 1);
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
420 seek(IDXFILE, 0, 0);
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
421
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
422 if ($first_char eq '>') {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
423 # FASTA format
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
424 $index_fastq_format = 0 ;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
425 print STDERR "Detected FASTA format for index file\n" if $debug;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
426 } elsif ($first_char eq '@') {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
427 # FASTQ format
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
428 $index_fastq_format = 1;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
429 print STDERR "Detected FASTQ format for index file\n" if $debug;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
430 } else {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
431 die "Error: unknown index file format. First character = '$first_char' (expecting > or \@)\n";
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
432 }
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
433 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
434
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
435 sub read_index_record
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
436 {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
437 $index_seq_name = <IDXFILE>;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
438
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
439 return undef unless defined $index_seq_name; # End of file?
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
440
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
441 $index_seq_bases = <IDXFILE>;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
442 die "Error: bad input file, expecting line with sequences\n" unless defined $index_seq_bases;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
443
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
444 # If using FASTQ format, read two more lines
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
445 if ($index_fastq_format) {
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
446 $index_seq_name2 = <IDXFILE>;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
447 die "Error: bad input file, expecting line with sequence name2\n" unless defined $index_seq_name2;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
448
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
449 $index_seq_qualities = <IDXFILE>;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
450 die "Error: bad input file, expecting line with quality scores\n" unless defined $index_seq_qualities;
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
451 }
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
452 return 1;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
453 }
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
454
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
455 sub usage()
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
456 {
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
457 print<<EOF;
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
458 Barcode Splitter, by Assaf Gordon (gordon\@cshl.edu), 11sep2008
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
459
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
460 This program reads FASTA/FASTQ file and splits it into several smaller files,
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
461 Based on barcode matching.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
462 FASTA/FASTQ data is read from STDIN (format is auto-detected.)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
463 Output files will be writen to disk.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
464 Summary will be printed to STDOUT.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
465
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
466 usage: $0 --bcfile FILE --prefix PREFIX [--suffix SUFFIX] [--bol|--eol|--idxfile]
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
467 [--mismatches N] [--exact] [--partial N] [--idxidstrip N]
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
468 [--help] [--quiet] [--debug]
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
469
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
470 Arguments:
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
471
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
472 --bcfile FILE - Barcodes file name. (see explanation below.)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
473 --prefix PREFIX - File prefix. will be added to the output files. Can be used
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
474 to specify output directories.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
475 --suffix SUFFIX - File suffix (optional). Can be used to specify file
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
476 extensions.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
477 --bol - Try to match barcodes at the BEGINNING of sequences.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
478 (What biologists would call the 5' end, and programmers
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
479 would call index 0.)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
480 --eol - Try to match barcodes at the END of sequences.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
481 (What biologists would call the 3' end, and programmers
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
482 would call the end of the string.)
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
483 --idxfile FILE - Read barcodes from separate index file (fasta or fastq)
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
484 NOTE: one of --bol, --eol, --idxfile must be specified,
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
485 but not more than one.
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
486 --idxidstrip N - When using index file, strip this number of characters
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
487 from the end of the sequence id before matching.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
488 Automatically detects CASAVA 1.8 format and strips at a
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
489 space in the id, use 0 to disable this.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
490 (Default is 1).
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
491 --mismatches N - Max. number of mismatches allowed. default is 1.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
492 --exact - Same as '--mismatches 0'. If both --exact and --mismatches
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
493 are specified, '--exact' takes precedence.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
494 --partial N - Allow partial overlap of barcodes. (see explanation below.)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
495 (Default is not partial matching)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
496 --quiet - Don't print counts and summary at the end of the run.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
497 (Default is to print.)
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
498 --debug - Print lots of useless debug information to STDERR.
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
499 --help - This helpful help screen.
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
500
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
501 Example (Assuming 's_2_100.txt' is a FASTQ file, 'mybarcodes.txt' is
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
502 the barcodes file):
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
503
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
504 \$ cat s_2_100.txt | $0 --bcfile mybarcodes.txt --bol --mismatches 2 \\
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
505 --prefix /tmp/bla_ --suffix ".txt"
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
506
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
507 Barcode file format
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
508 -------------------
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
509 Barcode files are simple text files. Each line should contain an identifier
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
510 (descriptive name for the barcode), and the barcode itself (A/C/G/T),
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
511 separated by a TAB character. Example:
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
512
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
513 #This line is a comment (starts with a 'number' sign)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
514 BC1 GATCT
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
515 BC2 ATCGT
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
516 BC3 GTGAT
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
517 BC4 TGTCT
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
518
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
519 For each barcode, a new FASTQ file will be created (with the barcode's
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
520 identifier as part of the file name). Sequences matching the barcode
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
521 will be stored in the appropriate file.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
522
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
523 Running the above example (assuming "mybarcodes.txt" contains the above
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
524 barcodes), will create the following files:
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
525 /tmp/bla_BC1.txt
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
526 /tmp/bla_BC2.txt
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
527 /tmp/bla_BC3.txt
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
528 /tmp/bla_BC4.txt
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
529 /tmp/bla_unmatched.txt
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
530 The 'unmatched' file will contain all sequences that didn't match any barcode.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
531
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
532 Barcode matching
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
533 ----------------
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
534
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
535 ** Without partial matching:
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
536
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
537 Count mismatches between the FASTA/Q sequences and the barcodes.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
538 The barcode which matched with the lowest mismatches count (providing the
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
539 count is small or equal to '--mismatches N') 'gets' the sequences.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
540
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
541 Example (using the above barcodes):
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
542 Input Sequence:
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
543 GATTTACTATGTAAAGATAGAAGGAATAAGGTGAAG
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
544
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
545 Matching with '--bol --mismatches 1':
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
546 GATTTACTATGTAAAGATAGAAGGAATAAGGTGAAG
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
547 GATCT (1 mismatch, BC1)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
548 ATCGT (4 mismatches, BC2)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
549 GTGAT (3 mismatches, BC3)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
550 TGTCT (3 mismatches, BC4)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
551
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
552 This sequence will be classified as 'BC1' (it has the lowest mismatch count).
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
553 If '--exact' or '--mismatches 0' were specified, this sequence would be
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
554 classified as 'unmatched' (because, although BC1 had the lowest mismatch count,
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
555 it is above the maximum allowed mismatches).
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
556
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
557 Matching with '--eol' (end of line) does the same, but from the other side
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
558 of the sequence.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
559
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
560 ** With partial matching (very similar to indels):
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
561
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
562 Same as above, with the following addition: barcodes are also checked for
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
563 partial overlap (number of allowed non-overlapping bases is '--partial N').
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
564
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
565 Example:
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
566 Input sequence is ATTTACTATGTAAAGATAGAAGGAATAAGGTGAAG
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
567 (Same as above, but note the missing 'G' at the beginning.)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
568
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
569 Matching (without partial overlapping) against BC1 yields 4 mismatches:
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
570 ATTTACTATGTAAAGATAGAAGGAATAAGGTGAAG
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
571 GATCT (4 mismatches)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
572
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
573 Partial overlapping would also try the following match:
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
574 -ATTTACTATGTAAAGATAGAAGGAATAAGGTGAAG
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
575 GATCT (1 mismatch)
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
576
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
577 Note: scoring counts a missing base as a mismatch, so the final
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
578 mismatch count is 2 (1 'real' mismatch, 1 'missing base' mismatch).
4
0fb7e9130a70 planemo upload for repository https://github.com/lparsons/galaxy_tools/tree/master/tools/fastx_barcode_splitter_enhanced commit 460463a5406419fe8e113467bdb8bd093d21e7c5
lparsons
parents: 3
diff changeset
579 If running with '--mismatches 2' (meaning allowing upto 2 mismatches) - this
0
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
580 seqeunce will be classified as BC1.
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
581
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
582 EOF
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
583
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
584 exit 1;
84bbf4fd24c3 Initial toolshed version with support for separate index reads and automatic loading of results into Galaxy history.
lparsons
parents:
diff changeset
585 }