annotate tools/protein_analysis/tmhmm2.py @ 6:a290c6d4e658

Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 18:07:09 -0400
parents 6901298ac16c
children 9b45a8743100
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Wrapper for TMHMM v2.0 for use in Galaxy.
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly two command line arguments - an input protein FASTA
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 filename and an output tabular filename. It then calls the standalone TMHMM
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 v2.0 program (not the webservice) requesting the short output (one line per
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 protein).
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8
2
6901298ac16c Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
9 The first major feature is cleaning up the tabular output. The short form raw
6901298ac16c Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
10 output from TMHMM v2.0 looks like this (six columns tab separated):
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16
2
6901298ac16c Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
17 If there are any additional 'comment' lines starting with the hash (#)
6901298ac16c Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
18 character these are ignored by this script.
6901298ac16c Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
19
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 In order to make it easier to use in Galaxy, this wrapper script simplifies
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 this to remove the redundant tags, and instead adds a comment line at the
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 top with the column names:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 #ID len ExpAA First60 PredHel Topology
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 gi|2781234|pdb|1JLY|B 304 0.01 60 0.00 0 o
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 gi|4959044|gb|AAD34209.1|AF069992_1 600 0.00 0 0.00 0 o
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 gi|671626|emb|CAA85685.1| 473 0.19 0.00 0 o
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 gi|3298468|dbj|BAA31520.1| 107 59.37 31.17 3 o23-45i52-74o89-106i
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 The second major potential feature is taking advantage of multiple cores
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 (since TMHMM v2.0 itself is single threaded) by dividing the input FASTA file
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 into chunks and running multiple copies of TMHMM in parallel. I would normally
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 use Python's multiprocessing library in this situation but it requires at
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 least Python 2.6 and at the time of writing Galaxy still supports Python 2.4.
1
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
35
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
36 Also tmhmm2 can fail without returning an error code, for example if run on a
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
37 64 bit machine with only the 32 bit binaries installed. This script will spot
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
38 when there is no output from tmhmm2, and raise an error.
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 """
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 import sys
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 import os
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 from seq_analysis_utils import stop_err, split_fasta, run_jobs
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44 FASTA_CHUNK = 500
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 if len(sys.argv) != 4:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 stop_err("Require three arguments, number of threads (int), input protein FASTA file & output tabular file")
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 try:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 num_threads = int(sys.argv[1])
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 except:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 num_threads = 0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 if num_threads < 1:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 stop_err("Threads argument %s is not a positive integer" % sys.argv[1])
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 fasta_file = sys.argv[2]
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 tabular_file = sys.argv[3]
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 def clean_tabular(raw_handle, out_handle):
1
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
58 """Clean up tabular TMHMM output, returns output line count."""
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
59 count = 0
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60 for line in raw_handle:
2
6901298ac16c Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
61 if not line.strip() or line.startswith("#"):
6901298ac16c Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
62 #Ignore any blank lines or comment lines
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63 continue
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 parts = line.rstrip("\r\n").split("\t")
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 try:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 identifier, length, expAA, first60, predhel, topology = parts
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 except:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 assert len(parts)!=6
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 stop_err("Bad line: %r" % line)
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 assert length.startswith("len="), line
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 length = length[4:]
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 assert expAA.startswith("ExpAA="), line
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73 expAA = expAA[6:]
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 assert first60.startswith("First60="), line
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 first60 = first60[8:]
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 assert predhel.startswith("PredHel="), line
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 predhel = predhel[8:]
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78 assert topology.startswith("Topology="), line
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 topology = topology[9:]
1
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
80 out_handle.write("%s\t%s\t%s\t%s\t%s\t%s\n" \
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 % (identifier, length, expAA, first60, predhel, topology))
1
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
82 count += 1
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
83 return count
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84
1
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
85 #Note that if the input FASTA file contains no sequences,
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
86 #split_fasta returns an empty list (i.e. zero temp files).
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK)
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 temp_files = [f+".out" for f in fasta_files]
2
6901298ac16c Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
peterjc
parents: 1
diff changeset
89 jobs = ["tmhmm -short %s > %s" % (fasta, temp)
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 for fasta, temp in zip(fasta_files, temp_files)]
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 def clean_up(file_list):
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 for f in file_list:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 if os.path.isfile(f):
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 os.remove(f)
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 if len(jobs) > 1 and num_threads > 1:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 #A small "info" message for Galaxy to show the user.
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 print "Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 results = run_jobs(jobs, num_threads)
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 error_level = results[cmd]
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 if error_level:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 try:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 output = open(temp).readline()
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 except IOError:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 output = ""
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 clean_up(fasta_files)
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 clean_up(temp_files)
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 stop_err("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output),
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 error_level)
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 del results
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 del jobs
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 out_handle = open(tabular_file, "w")
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 out_handle.write("#ID\tlen\tExpAA\tFirst60\tPredHel\tTopology\n")
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 for temp in temp_files:
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 data_handle = open(temp)
1
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
119 count = clean_tabular(data_handle, out_handle)
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 data_handle.close()
1
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
121 if not count:
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
122 clean_up(fasta_files)
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
123 clean_up(temp_files)
3ff1dcbb9440 Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents: 0
diff changeset
124 stop_err("No output from tmhmm2")
0
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 out_handle.close()
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127 clean_up(fasta_files)
bca9bc7fdaef Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 clean_up(temp_files)