Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/tmhmm2.py @ 1:3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
author | peterjc |
---|---|
date | Tue, 07 Jun 2011 18:04:05 -0400 |
parents | bca9bc7fdaef |
children | 6901298ac16c |
rev | line source |
---|---|
0
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
2 """Wrapper for TMHMM v2.0 for use in Galaxy. |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
3 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
4 This script takes exactly two command line arguments - an input protein FASTA |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
5 filename and an output tabular filename. It then calls the standalone TMHMM |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
6 v2.0 program (not the webservice) requesting the short output (one line per |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
7 protein). |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
9 First major feature is cleaning up the tabular output. The raw output from |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
10 TMHMM v2.0 looks like this (six columns tab separated): |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
11 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
16 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
17 In order to make it easier to use in Galaxy, this wrapper script simplifies |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
18 this to remove the redundant tags, and instead adds a comment line at the |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
19 top with the column names: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
21 #ID len ExpAA First60 PredHel Topology |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
22 gi|2781234|pdb|1JLY|B 304 0.01 60 0.00 0 o |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 gi|4959044|gb|AAD34209.1|AF069992_1 600 0.00 0 0.00 0 o |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
24 gi|671626|emb|CAA85685.1| 473 0.19 0.00 0 o |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
25 gi|3298468|dbj|BAA31520.1| 107 59.37 31.17 3 o23-45i52-74o89-106i |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
26 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 The second major potential feature is taking advantage of multiple cores |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 (since TMHMM v2.0 itself is single threaded) by dividing the input FASTA file |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 into chunks and running multiple copies of TMHMM in parallel. I would normally |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 use Python's multiprocessing library in this situation but it requires at |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
31 least Python 2.6 and at the time of writing Galaxy still supports Python 2.4. |
1
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
32 |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
33 Also tmhmm2 can fail without returning an error code, for example if run on a |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
34 64 bit machine with only the 32 bit binaries installed. This script will spot |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
35 when there is no output from tmhmm2, and raise an error. |
0
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
36 """ |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
37 import sys |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
38 import os |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
39 from seq_analysis_utils import stop_err, split_fasta, run_jobs |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
40 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
41 FASTA_CHUNK = 500 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
42 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
43 if len(sys.argv) != 4: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
44 stop_err("Require three arguments, number of threads (int), input protein FASTA file & output tabular file") |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
45 try: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
46 num_threads = int(sys.argv[1]) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
47 except: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
48 num_threads = 0 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
49 if num_threads < 1: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
50 stop_err("Threads argument %s is not a positive integer" % sys.argv[1]) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
51 fasta_file = sys.argv[2] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
52 tabular_file = sys.argv[3] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
53 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
54 def clean_tabular(raw_handle, out_handle): |
1
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
55 """Clean up tabular TMHMM output, returns output line count.""" |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
56 count = 0 |
0
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
57 for line in raw_handle: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
58 if not line: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
59 continue |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
60 parts = line.rstrip("\r\n").split("\t") |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
61 try: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
62 identifier, length, expAA, first60, predhel, topology = parts |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
63 except: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
64 assert len(parts)!=6 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
65 stop_err("Bad line: %r" % line) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 assert length.startswith("len="), line |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
67 length = length[4:] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
68 assert expAA.startswith("ExpAA="), line |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
69 expAA = expAA[6:] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
70 assert first60.startswith("First60="), line |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
71 first60 = first60[8:] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
72 assert predhel.startswith("PredHel="), line |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
73 predhel = predhel[8:] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
74 assert topology.startswith("Topology="), line |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
75 topology = topology[9:] |
1
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
76 out_handle.write("%s\t%s\t%s\t%s\t%s\t%s\n" \ |
0
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
77 % (identifier, length, expAA, first60, predhel, topology)) |
1
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
78 count += 1 |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
79 return count |
0
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
80 |
1
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
81 #Note that if the input FASTA file contains no sequences, |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
82 #split_fasta returns an empty list (i.e. zero temp files). |
0
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
83 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
84 temp_files = [f+".out" for f in fasta_files] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
85 jobs = ["tmhmm %s > %s" % (fasta, temp) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
86 for fasta, temp in zip(fasta_files, temp_files)] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
87 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
88 def clean_up(file_list): |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
89 for f in file_list: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
90 if os.path.isfile(f): |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
91 os.remove(f) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
92 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
93 if len(jobs) > 1 and num_threads > 1: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
94 #A small "info" message for Galaxy to show the user. |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
95 print "Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs)) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
96 results = run_jobs(jobs, num_threads) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
97 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs): |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
98 error_level = results[cmd] |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
99 if error_level: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
100 try: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
101 output = open(temp).readline() |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
102 except IOError: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
103 output = "" |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
104 clean_up(fasta_files) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
105 clean_up(temp_files) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
106 stop_err("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output), |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
107 error_level) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
108 del results |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
109 del jobs |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
110 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
111 out_handle = open(tabular_file, "w") |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
112 out_handle.write("#ID\tlen\tExpAA\tFirst60\tPredHel\tTopology\n") |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
113 for temp in temp_files: |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
114 data_handle = open(temp) |
1
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
115 count = clean_tabular(data_handle, out_handle) |
0
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
116 data_handle.close() |
1
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
117 if not count: |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
118 clean_up(fasta_files) |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
119 clean_up(temp_files) |
3ff1dcbb9440
Migrated tool version 0.0.3 from old tool shed archive to new tool shed repository
peterjc
parents:
0
diff
changeset
|
120 stop_err("No output from tmhmm2") |
0
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
121 out_handle.close() |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
122 |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
123 clean_up(fasta_files) |
bca9bc7fdaef
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
124 clean_up(temp_files) |