annotate micomplete.pl @ 0:23e768cddc7d draft

Initial commit of micomplete
author lionelguy
date Thu, 17 Oct 2013 08:39:52 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
1 #!/usr/bin/perl -w
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
2
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
3 =head1 SYNOPSIS
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
4
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
5 micomplete.pl - Given a series of HMM profiles and a query proteome, and optionally, weights for the profiles, returns the completeness of a genome.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
6
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
7 =head1 USAGE
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
8
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
9 micomplete.pl -h|--hmm_profiles -p|--proteome [-w|--weights] [-e|--evalue] [-s|--save-result-tab] [-r|--save-hmm-results] [-t|--threads]
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
10 micomplete.pl --help
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
11
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
12 =head1 INPUT
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
13
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
14 =head2 -h|--hmm
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
15
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
16 A single file containing the HMM profiles against which the proteome will be run.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
17
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
18 =head2 -p|--proteome
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
19
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
20 A fasta file containing all protein sequences in a genome.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
21
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
22 =head2 [-c|--cut-off]
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
23
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
24 The evalue cut-off to accept the presence of a HMM. 1e-10 by default.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
25
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
26 =head2 [-w|--weights]
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
27
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
28 Optionally, the weights to apply to the HMMs, for the completeness report. A tab file containing (i) a column with names corresponding to the names in the hmm profiles and (ii) a column with with the weighths.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
29
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
30 =head2 [-e|--evalue]
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
31
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
32 Shows evalue to the best hit, instead of presence/absence.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
33
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
34 =head2 [-s|--save-result-tab]
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
35
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
36 Save the result per marker
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
37
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
38 =head2 [-r|--save-hmm-tab]
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
39
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
40 Save hmmsearch tabular result for this file.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
41
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
42 =head2 [-a|--save-hmm-alignments]
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
43
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
44 Save hmmsearch tabular result for this file.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
45
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
46 =head2 [-t|--threads]
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
47
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
48 Number of threads to use in the hmmsearch. 15 by default.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
49
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
50 =head2 --help
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
51
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
52 Displays a help message.
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
53
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
54 =head1 OUTPUT
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
55
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
56 A completeness report expressed in fraction, and a list of all hmms identified as positive
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
57
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
58 =head1 AUTHOR
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
59
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
60 Lionel Guy (lionel.guy@icm.uu.se)
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
61
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
62 =head1 DATE
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
63
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
64 Mon Oct 14 14:09:46 CEST 2013
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
65
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
66 =cut
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
67
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
68 # libraries
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
69 use strict;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
70 use Pod::Usage;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
71 use Getopt::Long;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
72 # For safe use of tempfiles
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
73 use File::Temp qw/ tempfile tempdir /;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
74 use File::Copy qw/ cp /;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
75
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
76 # Options
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
77 my ($hmm_file, $fasta_in, $weights_file, $display_eval);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
78 my $cutoff = 1e-10;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
79 my ($save_tab_file, $save_aln_file, $save_res_tab);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
80 my $nthreads = 15;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
81 my $help;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
82 GetOptions(
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
83 'h|hmm=s' => \$hmm_file,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
84 'p|proteome=s' => \$fasta_in,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
85 'c|cut-off=s' => \$cutoff,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
86 'w|weights=s' => \$weights_file,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
87 'e|evalue' => \$display_eval,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
88 'r|save-hmm-tab=s' => \$save_tab_file,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
89 'a|save-hmm-alignments=s' => \$save_aln_file,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
90 's|save-result-tab=s' => \$save_res_tab,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
91 't|threads=s' => \$nthreads,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
92 'help' => \$help,
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
93 );
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
94
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
95 # Tempfile if not saving results
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
96 my $template = 'tempXXXXXXXX';
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
97 my ($fh_tab, $tab_file) = tempfile($template, DIR => '.', UNLINK => 1);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
98 my ($fh_aln, $aln_file) = tempfile($template, DIR => '.', UNLINK => 1);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
99
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
100 # Check options
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
101 pod2usage(-verbose => 1) if $help;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
102 pod2usage(-message => "No hmm profile file (-h|--hmm) defined\n",
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
103 -exitstatus => 2 ) unless ($hmm_file);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
104 pod2usage(-message => "No proteome file (-p|--proteome) defined\n",
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
105 -exitstatus => 2 ) unless ($fasta_in);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
106
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
107 # Parse eventual weights
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
108 my %hmms;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
109 if ($weights_file){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
110 open WEIGHTS, '<', $weights_file or die "Cannot open $weights_file\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
111 while (<WEIGHTS>){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
112 next if /^#/;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
113 my ($name, $weight) = split;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
114 $hmms{$name} = $weight;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
115 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
116 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
117
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
118 # Parse HMM, lists names
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
119 open HMMS, '<', $hmm_file or die "Cannot open $hmm_file\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
120 while (<HMMS>){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
121 next unless /^NAME/;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
122 my ($bla, $name) = split;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
123 if ($weights_file) {
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
124 die "HMM $name doesn't have weight defined" unless $hmms{$name};
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
125 } else {
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
126 $hmms{$name}++;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
127 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
128 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
129
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
130 # Get sum of weights
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
131 my $total;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
132 foreach (values %hmms) {
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
133 $total += $_;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
134 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
135 #print STDERR "Sum of weights: $total\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
136
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
137 # Run hmmsearch
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
138 my $cmd = "hmmsearch --tblout $tab_file --cpu $nthreads ";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
139 $cmd .= "--noali " unless $save_aln_file;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
140 $cmd .= "-E $cutoff $hmm_file $fasta_in > $aln_file";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
141 system($cmd) == 0 or die ("Command $cmd failed: $?");
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
142
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
143 # Read HMMER output. Assumes the first hit is always the best
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
144 open HMMER, '<', $tab_file or die $!;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
145 my %seen;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
146 my $sum;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
147 while (<HMMER>){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
148 next if /^#/;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
149 my ($gid, $dash, $hname, $hid, $evalue) = split;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
150 #my ($bla, $gi, $ref, $pid) = split(/\|/, $gid);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
151 die ("No gid at $_\n") unless $gid;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
152 die ("HMM not seen in the hmm file") unless $hmms{$hname};
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
153 # Skip if not first hit, or if evalue under cut-off
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
154 unless ($seen{$hname} || $evalue > $cutoff){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
155 $sum += $hmms{$hname};
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
156 $seen{$hname} = $evalue;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
157 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
158 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
159
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
160 # Save results if required
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
161 cp($tab_file, $save_tab_file) if ($save_tab_file);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
162 cp($aln_file, $save_aln_file) if ($save_aln_file);
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
163
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
164 # Print results
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
165 printf "Completeness: %.4f", $sum/$total;
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
166 print "\nN markers found: ", scalar(keys %seen), " out of ",
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
167 scalar(keys %hmms), "\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
168 if ($save_res_tab){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
169 open TAB, '>', $save_res_tab or die "Could not open $save_res_tab: $?\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
170 print TAB "#marker\t";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
171 if ($display_eval) {
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
172 print TAB "evalue\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
173 } else {
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
174 print TAB "present\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
175 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
176 foreach my $hmm (sort keys %hmms){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
177 print TAB "$hmm\t";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
178 if ($seen{$hmm}){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
179 if ($display_eval){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
180 print TAB "$seen{$hmm}\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
181 } else {
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
182 print TAB "1\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
183 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
184 } else {
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
185 if ($display_eval){
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
186 print TAB "NA\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
187 } else {
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
188 print TAB "0\n";
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
189 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
190 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
191 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
192 }
23e768cddc7d Initial commit of micomplete
lionelguy
parents:
diff changeset
193