Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/psortb.py @ 20:a19b3ded8f33 draft
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
author | peterjc |
---|---|
date | Thu, 21 Sep 2017 11:35:20 -0400 |
parents | f3ecd80850e2 |
children | 238eae32483c |
rev | line source |
---|---|
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
2 """Wrapper for psortb for use in Galaxy. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
3 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
4 This script takes exactly six command line arguments - which includes the |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
5 number of threads, and the input protein FASTA filename and output |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
6 tabular filename. It then splits up the FASTA input and calls multiple |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
7 copies of the standalone psortb v3 program, then collates the output. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
8 e.g. Rather than this, |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
9 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
10 psort $type -c $cutoff -d $divergent -o long $sequence > $outfile |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
11 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
12 Call this: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
13 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
14 psort $threads $type $cutoff $divergent $sequence $outfile |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
15 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
16 If ommitting -c or -d options, set $cutoff and $divergent to zero or blank. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
17 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
18 Note that this is somewhat redundant with job-splitting available in Galaxy |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
19 itself (see the SignalP XML file for settings), but both can be applied. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
20 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
21 Additionally it ensures the header line (with the column names) starts |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
22 with a # character as used elsewhere in Galaxy. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
23 """ |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
24 |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
25 from __future__ import print_function |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
26 |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
27 import os |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
28 import sys |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
29 import tempfile |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
30 |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
31 from seq_analysis_utils import run_jobs, split_fasta, thread_count |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
32 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
33 FASTA_CHUNK = 500 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
34 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
35 if "-v" in sys.argv or "--version" in sys.argv: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
36 """Return underlying PSORTb's version""" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
37 sys.exit(os.system("psort --version")) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
38 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
39 if len(sys.argv) != 8: |
19 | 40 sys.exit("Require 7 arguments, number of threads (int), type (e.g. archaea), " |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
41 "output (e.g. terse/normal/long), cutoff, divergent, input protein " |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
42 "FASTA file & output tabular file") |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
43 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
44 num_threads = thread_count(sys.argv[1], default=4) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
45 org_type = sys.argv[2] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
46 out_type = sys.argv[3] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
47 cutoff = sys.argv[4] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
48 if cutoff.strip() and float(cutoff.strip()) != 0.0: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
49 cutoff = "-c %s" % cutoff |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
50 else: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
51 cutoff = "" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
52 divergent = sys.argv[5] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
53 if divergent.strip() and float(divergent.strip()) != 0.0: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
54 divergent = "-d %s" % divergent |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
55 else: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
56 divergent = "" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
57 fasta_file = sys.argv[6] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
58 tabular_file = sys.argv[7] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
59 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
60 if out_type == "terse": |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
61 header = ['SeqID', 'Localization', 'Score'] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
62 elif out_type == "normal": |
19 | 63 sys.exit("Normal output not implemented yet, sorry.") |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
64 elif out_type == "long": |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
65 if org_type == "-n": |
19 | 66 # Gram negative bacteria |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
67 header = ['SeqID', 'CMSVM-_Localization', 'CMSVM-_Details', 'CytoSVM-_Localization', 'CytoSVM-_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
68 'ECSVM-_Localization', 'ECSVM-_Details', 'ModHMM-_Localization', 'ModHMM-_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
69 'Motif-_Localization', 'Motif-_Details', 'OMPMotif-_Localization', 'OMPMotif-_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
70 'OMSVM-_Localization', 'OMSVM-_Details', 'PPSVM-_Localization', 'PPSVM-_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
71 'Profile-_Localization', 'Profile-_Details', |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
72 'SCL-BLAST-_Localization', 'SCL-BLAST-_Details', |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
73 'SCL-BLASTe-_Localization', 'SCL-BLASTe-_Details', |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
74 'Signal-_Localization', 'Signal-_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
75 'Cytoplasmic_Score', 'CytoplasmicMembrane_Score', 'Periplasmic_Score', 'OuterMembrane_Score', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
76 'Extracellular_Score', 'Final_Localization', 'Final_Localization_Details', 'Final_Score', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
77 'Secondary_Localization', 'PSortb_Version'] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
78 elif org_type == "-p": |
19 | 79 # Gram positive bacteria |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
80 header = ['SeqID', 'CMSVM+_Localization', 'CMSVM+_Details', 'CWSVM+_Localization', 'CWSVM+_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
81 'CytoSVM+_Localization', 'CytoSVM+_Details', 'ECSVM+_Localization', 'ECSVM+_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
82 'ModHMM+_Localization', 'ModHMM+_Details', 'Motif+_Localization', 'Motif+_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
83 'Profile+_Localization', 'Profile+_Details', |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
84 'SCL-BLAST+_Localization', 'SCL-BLAST+_Details', |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
85 'SCL-BLASTe+_Localization', 'SCL-BLASTe+_Details', |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
86 'Signal+_Localization', 'Signal+_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
87 'Cytoplasmic_Score', 'CytoplasmicMembrane_Score', 'Cellwall_Score', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
88 'Extracellular_Score', 'Final_Localization', 'Final_Localization_Details', 'Final_Score', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
89 'Secondary_Localization', 'PSortb_Version'] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
90 elif org_type == "-a": |
19 | 91 # Archaea |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
92 header = ['SeqID', 'CMSVM_a_Localization', 'CMSVM_a_Details', 'CWSVM_a_Localization', 'CWSVM_a_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
93 'CytoSVM_a_Localization', 'CytoSVM_a_Details', 'ECSVM_a_Localization', 'ECSVM_a_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
94 'ModHMM_a_Localization', 'ModHMM_a_Details', 'Motif_a_Localization', 'Motif_a_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
95 'Profile_a_Localization', 'Profile_a_Details', |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
96 'SCL-BLAST_a_Localization', 'SCL-BLAST_a_Details', |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
97 'SCL-BLASTe_a_Localization', 'SCL-BLASTe_a_Details', |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
98 'Signal_a_Localization', 'Signal_a_Details', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
99 'Cytoplasmic_Score', 'CytoplasmicMembrane_Score', 'Cellwall_Score', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
100 'Extracellular_Score', 'Final_Localization', 'Final_Localization_Details', 'Final_Score', |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
101 'Secondary_Localization', 'PSortb_Version'] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
102 else: |
19 | 103 sys.exit("Expected -n, -p or -a for the organism type, not %r" % org_type) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
104 else: |
19 | 105 sys.exit("Expected terse, normal or long for the output type, not %r" % out_type) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
106 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
107 tmp_dir = tempfile.mkdtemp() |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
108 |
19 | 109 |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
110 def clean_tabular(raw_handle, out_handle): |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
111 """Clean up tabular TMHMM output, returns output line count.""" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
112 global header |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
113 count = 0 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
114 for line in raw_handle: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
115 if not line.strip() or line.startswith("#"): |
19 | 116 # Ignore any blank lines or comment lines |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
117 continue |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
118 parts = [x.strip() for x in line.rstrip("\r\n").split("\t")] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
119 if parts == header: |
19 | 120 # Ignore the header line |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
121 continue |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
122 if not parts[-1] and len(parts) == len(header) + 1: |
19 | 123 # Ignore dummy blank extra column, e.g. |
124 # "...2.0\t\tPSORTb version 3.0\t\n" | |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
125 parts = parts[:-1] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
126 assert len(parts) == len(header), \ |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
127 "%i fields, not %i, in line:\n%r" % (len(line), len(header), line) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
128 out_handle.write(line) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
129 count += 1 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
130 return count |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
131 |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
132 |
19 | 133 # Note that if the input FASTA file contains no sequences, |
134 # split_fasta returns an empty list (i.e. zero temp files). | |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
135 fasta_files = split_fasta(fasta_file, os.path.join(tmp_dir, "tmhmm"), FASTA_CHUNK) |
19 | 136 temp_files = [f + ".out" for f in fasta_files] |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
137 jobs = ["psort %s %s %s -o %s %s > %s" % (org_type, cutoff, divergent, out_type, fasta, temp) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
138 for fasta, temp in zip(fasta_files, temp_files)] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
139 |
19 | 140 |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
141 def clean_up(file_list): |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
142 for f in file_list: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
143 if os.path.isfile(f): |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
144 os.remove(f) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
145 try: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
146 os.rmdir(tmp_dir) |
19 | 147 except Exception: |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
148 pass |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
149 |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
150 |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
151 if len(jobs) > 1 and num_threads > 1: |
19 | 152 # A small "info" message for Galaxy to show the user. |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
153 print("Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
154 results = run_jobs(jobs, num_threads) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
155 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs): |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
156 error_level = results[cmd] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
157 if error_level: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
158 try: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
159 output = open(temp).readline() |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
160 except IOError: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
161 output = "" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
162 clean_up(fasta_files + temp_files) |
19 | 163 sys.exit("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output), |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
164 error_level) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
165 del results |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
166 del jobs |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
167 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
168 out_handle = open(tabular_file, "w") |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
169 out_handle.write("#%s\n" % "\t".join(header)) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
170 count = 0 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
171 for temp in temp_files: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
172 data_handle = open(temp) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
173 count += clean_tabular(data_handle, out_handle) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
174 data_handle.close() |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
175 if not count: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
176 clean_up(fasta_files + temp_files) |
19 | 177 sys.exit("No output from psortb") |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
178 out_handle.close() |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
179 print("%i records" % count) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
180 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
181 clean_up(fasta_files + temp_files) |