Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/psortb.py @ 22:e1afa4b0b682 draft
"This is v0.2.12 with black formating and Python 3 next fix etc"
author | peterjc |
---|---|
date | Thu, 17 Jun 2021 08:34:58 +0000 |
parents | 238eae32483c |
children |
rev | line source |
---|---|
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
2 """Wrapper for psortb for use in Galaxy. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
3 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
4 This script takes exactly six command line arguments - which includes the |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
5 number of threads, and the input protein FASTA filename and output |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
6 tabular filename. It then splits up the FASTA input and calls multiple |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
7 copies of the standalone psortb v3 program, then collates the output. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
8 e.g. Rather than this, |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
9 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
10 psort $type -c $cutoff -d $divergent -o long $sequence > $outfile |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
11 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
12 Call this: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
13 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
14 psort $threads $type $cutoff $divergent $sequence $outfile |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
15 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
16 If ommitting -c or -d options, set $cutoff and $divergent to zero or blank. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
17 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
18 Note that this is somewhat redundant with job-splitting available in Galaxy |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
19 itself (see the SignalP XML file for settings), but both can be applied. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
20 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
21 Additionally it ensures the header line (with the column names) starts |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
22 with a # character as used elsewhere in Galaxy. |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
23 """ |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
24 |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
25 from __future__ import print_function |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
26 |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
27 import os |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
28 import sys |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
29 import tempfile |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
30 |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
31 from seq_analysis_utils import run_jobs, split_fasta, thread_count |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
32 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
33 FASTA_CHUNK = 500 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
34 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
35 if "-v" in sys.argv or "--version" in sys.argv: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
36 """Return underlying PSORTb's version""" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
37 sys.exit(os.system("psort --version")) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
38 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
39 if len(sys.argv) != 8: |
21
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
40 sys.exit( |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
41 "Require 7 arguments, number of threads (int), type (e.g. archaea), " |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
42 "output (e.g. terse/normal/long), cutoff, divergent, input protein " |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
43 "FASTA file & output tabular file" |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
44 ) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
45 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
46 num_threads = thread_count(sys.argv[1], default=4) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
47 org_type = sys.argv[2] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
48 out_type = sys.argv[3] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
49 cutoff = sys.argv[4] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
50 if cutoff.strip() and float(cutoff.strip()) != 0.0: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
51 cutoff = "-c %s" % cutoff |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
52 else: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
53 cutoff = "" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
54 divergent = sys.argv[5] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
55 if divergent.strip() and float(divergent.strip()) != 0.0: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
56 divergent = "-d %s" % divergent |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
57 else: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
58 divergent = "" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
59 fasta_file = sys.argv[6] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
60 tabular_file = sys.argv[7] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
61 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
62 if out_type == "terse": |
21
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
63 header = ["SeqID", "Localization", "Score"] |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
64 elif out_type == "normal": |
19 | 65 sys.exit("Normal output not implemented yet, sorry.") |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
66 elif out_type == "long": |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
67 if org_type == "-n": |
19 | 68 # Gram negative bacteria |
21
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
69 header = [ |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
70 "SeqID", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
71 "CMSVM-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
72 "CMSVM-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
73 "CytoSVM-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
74 "CytoSVM-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
75 "ECSVM-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
76 "ECSVM-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
77 "ModHMM-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
78 "ModHMM-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
79 "Motif-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
80 "Motif-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
81 "OMPMotif-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
82 "OMPMotif-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
83 "OMSVM-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
84 "OMSVM-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
85 "PPSVM-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
86 "PPSVM-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
87 "Profile-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
88 "Profile-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
89 "SCL-BLAST-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
90 "SCL-BLAST-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
91 "SCL-BLASTe-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
92 "SCL-BLASTe-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
93 "Signal-_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
94 "Signal-_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
95 "Cytoplasmic_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
96 "CytoplasmicMembrane_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
97 "Periplasmic_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
98 "OuterMembrane_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
99 "Extracellular_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
100 "Final_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
101 "Final_Localization_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
102 "Final_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
103 "Secondary_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
104 "PSortb_Version", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
105 ] |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
106 elif org_type == "-p": |
19 | 107 # Gram positive bacteria |
21
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
108 header = [ |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
109 "SeqID", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
110 "CMSVM+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
111 "CMSVM+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
112 "CWSVM+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
113 "CWSVM+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
114 "CytoSVM+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
115 "CytoSVM+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
116 "ECSVM+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
117 "ECSVM+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
118 "ModHMM+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
119 "ModHMM+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
120 "Motif+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
121 "Motif+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
122 "Profile+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
123 "Profile+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
124 "SCL-BLAST+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
125 "SCL-BLAST+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
126 "SCL-BLASTe+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
127 "SCL-BLASTe+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
128 "Signal+_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
129 "Signal+_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
130 "Cytoplasmic_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
131 "CytoplasmicMembrane_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
132 "Cellwall_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
133 "Extracellular_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
134 "Final_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
135 "Final_Localization_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
136 "Final_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
137 "Secondary_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
138 "PSortb_Version", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
139 ] |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
140 elif org_type == "-a": |
19 | 141 # Archaea |
21
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
142 header = [ |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
143 "SeqID", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
144 "CMSVM_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
145 "CMSVM_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
146 "CWSVM_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
147 "CWSVM_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
148 "CytoSVM_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
149 "CytoSVM_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
150 "ECSVM_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
151 "ECSVM_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
152 "ModHMM_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
153 "ModHMM_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
154 "Motif_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
155 "Motif_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
156 "Profile_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
157 "Profile_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
158 "SCL-BLAST_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
159 "SCL-BLAST_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
160 "SCL-BLASTe_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
161 "SCL-BLASTe_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
162 "Signal_a_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
163 "Signal_a_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
164 "Cytoplasmic_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
165 "CytoplasmicMembrane_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
166 "Cellwall_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
167 "Extracellular_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
168 "Final_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
169 "Final_Localization_Details", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
170 "Final_Score", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
171 "Secondary_Localization", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
172 "PSortb_Version", |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
173 ] |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
174 else: |
19 | 175 sys.exit("Expected -n, -p or -a for the organism type, not %r" % org_type) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
176 else: |
19 | 177 sys.exit("Expected terse, normal or long for the output type, not %r" % out_type) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
178 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
179 tmp_dir = tempfile.mkdtemp() |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
180 |
19 | 181 |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
182 def clean_tabular(raw_handle, out_handle): |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
183 """Clean up tabular TMHMM output, returns output line count.""" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
184 global header |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
185 count = 0 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
186 for line in raw_handle: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
187 if not line.strip() or line.startswith("#"): |
19 | 188 # Ignore any blank lines or comment lines |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
189 continue |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
190 parts = [x.strip() for x in line.rstrip("\r\n").split("\t")] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
191 if parts == header: |
19 | 192 # Ignore the header line |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
193 continue |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
194 if not parts[-1] and len(parts) == len(header) + 1: |
19 | 195 # Ignore dummy blank extra column, e.g. |
196 # "...2.0\t\tPSORTb version 3.0\t\n" | |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
197 parts = parts[:-1] |
21
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
198 assert len(parts) == len(header), "%i fields, not %i, in line:\n%r" % ( |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
199 len(line), |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
200 len(header), |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
201 line, |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
202 ) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
203 out_handle.write(line) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
204 count += 1 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
205 return count |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
206 |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
207 |
19 | 208 # Note that if the input FASTA file contains no sequences, |
209 # split_fasta returns an empty list (i.e. zero temp files). | |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
210 fasta_files = split_fasta(fasta_file, os.path.join(tmp_dir, "tmhmm"), FASTA_CHUNK) |
19 | 211 temp_files = [f + ".out" for f in fasta_files] |
21
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
212 jobs = [ |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
213 "psort %s %s %s -o %s %s > %s" |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
214 % (org_type, cutoff, divergent, out_type, fasta, temp) |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
215 for fasta, temp in zip(fasta_files, temp_files) |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
216 ] |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
217 |
19 | 218 |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
219 def clean_up(file_list): |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
220 for f in file_list: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
221 if os.path.isfile(f): |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
222 os.remove(f) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
223 try: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
224 os.rmdir(tmp_dir) |
19 | 225 except Exception: |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
226 pass |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
227 |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
228 |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
229 if len(jobs) > 1 and num_threads > 1: |
19 | 230 # A small "info" message for Galaxy to show the user. |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
231 print("Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
232 results = run_jobs(jobs, num_threads) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
233 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs): |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
234 error_level = results[cmd] |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
235 if error_level: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
236 try: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
237 output = open(temp).readline() |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
238 except IOError: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
239 output = "" |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
240 clean_up(fasta_files + temp_files) |
21
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
241 sys.exit( |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
242 "One or more tasks failed, e.g. %i from %r gave:\n%s" |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
243 % (error_level, cmd, output), |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
244 error_level, |
238eae32483c
"Check this is up to date with all 2020 changes (black etc)"
peterjc
parents:
20
diff
changeset
|
245 ) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
246 del results |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
247 del jobs |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
248 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
249 out_handle = open(tabular_file, "w") |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
250 out_handle.write("#%s\n" % "\t".join(header)) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
251 count = 0 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
252 for temp in temp_files: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
253 data_handle = open(temp) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
254 count += clean_tabular(data_handle, out_handle) |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
255 data_handle.close() |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
256 if not count: |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
257 clean_up(fasta_files + temp_files) |
19 | 258 sys.exit("No output from psortb") |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
259 out_handle.close() |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
260 print("%i records" % count) |
11
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
261 |
99b82a2b1272
Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents:
diff
changeset
|
262 clean_up(fasta_files + temp_files) |