Mercurial > repos > abims-sbr > cds_search
annotate scripts/S01_find_orf_on_multiple_alignment.py @ 1:c79bdda8abfb draft default tip
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
author | abims-sbr |
---|---|
date | Thu, 09 Jun 2022 12:40:00 +0000 |
parents | eb95bf7f90ae |
children |
rev | line source |
---|---|
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
1 #!/usr/bin/env python |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
2 # coding: utf8 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
3 # Author: Eric Fontanillas |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
4 # Modification: 03/09/14 by Julie BAFFARD |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
5 # Last modification : 10/09/21 by Charlotte Berthelier |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
6 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
7 """ |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
8 Description: Predict potential ORF on the basis of 2 criteria + 1 optional criteria |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
9 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
10 - CRITERIA 1 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
11 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
12 Get the longest part of the sequence alignemen without codon stop "*", |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
13 and test in the 3 potential ORF and check with a Blast the best coding sequence |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
14 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
15 - CRITERIA 2 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
16 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
17 This longest part should be > 150nc or 50aa |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
18 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
19 - CRITERIA 3 [OPTIONNAL] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
20 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
21 A codon start "M" should be present in this longuest part, before the last 50 aa |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
22 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
23 OUTPUTs: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
24 "05_CDS_aa" & "05_CDS_nuc" => NOT INCLUDE THIS CRITERIA |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
25 "06_CDS_with_M_aa" & "06_CDS_with_M_nuc" => INCLUDE THIS CRITERIA |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
26 """ |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
27 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
28 import os |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
29 import re |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
30 import argparse |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
31 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
32 from Bio.Blast import NCBIWWW |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
33 from Bio.Blast import NCBIXML |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
34 from dico import dico |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
35 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
36 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
37 def code_universel(file1): |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
38 """ Creates bash for genetic code (key : codon ; value : amino-acid) """ |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
39 bash_code_universel = {} |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
40 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
41 with open(file1, "r") as file: |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
42 for line in file.readlines(): |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
43 item = str.split(line, " ") |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
44 length1 = len(item) |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
45 if length1 == 3: |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
46 key = item[0] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
47 value = item[2][:-1] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
48 bash_code_universel[key] = value |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
49 else: |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
50 key = item[0] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
51 value = item[2] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
52 bash_code_universel[key] = value |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
53 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
54 return bash_code_universel |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
55 |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
56 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
57 def multiple3(seq): |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
58 """ |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
59 Tests if the sequence is a multiple of 3, and if not removes extra-bases |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
60 Possible to lost a codon, when I test ORF (as I will decay the ORF) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
61 """ |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
62 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
63 multiple = len(seq) % 3 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
64 if multiple != 0: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
65 return seq[:-multiple], multiple |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
66 return seq, multiple |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
67 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
68 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
69 def detect_methionine(seq_aa, ortho, minimal_cds_length): |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
70 """ Detects if methionin in the aa sequence """ |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
71 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
72 size = len(seq_aa) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
73 cutoff_last_50aa = size - minimal_cds_length |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
74 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
75 # Find all indices of occurances of "M" in a string of aa |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
76 list_indices = [pos for pos, char in enumerate(seq_aa) if char == "M"] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
77 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
78 # If some "M" are present, find whether the first "M" found is not in the |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
79 # 50 last aa (indice < CUTOFF_Last_50aa) ==> in this case: maybenot a CDS |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
80 if list_indices != []: |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
81 first_m = list_indices[0] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
82 if first_m < cutoff_last_50aa: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
83 ortho = 1 # means orthologs found |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
84 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
85 return ortho |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
86 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
87 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
88 def reverse_complement2(seq): |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
89 """ Reverse complement DNA sequence """ |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
90 seq1 = 'ATCGN-TAGCN-atcgn-tagcn-' |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
91 seq_dict = {seq1[i]: seq1[i + 6] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
92 for i in range(24) if i < 6 or 12 <= i <= 16} |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
93 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
94 return "".join([seq_dict[base] for base in reversed(seq)]) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
95 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
96 |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
97 def simply_get_orf(seq_dna, gen_code): |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
98 """ Generate the ORF sequence from DNA sequence """ |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
99 seq_by_codons = [seq_dna.upper().replace('T', 'U')[i:i + 3] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
100 for i in range(0, len(seq_dna), 3)] |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
101 seq_by_aa = [gen_code[codon] if codon in gen_code.keys() |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
102 else '?' for codon in seq_by_codons] |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
103 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
104 return ''.join(seq_by_aa) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
105 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
106 def find_good_ORF_criteria_3(bash_aligned_nc_seq, bash_codeUniversel, minimal_cds_length, min_spec): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
107 # Multiple sequence based : Based on the alignment of several sequences (orthogroup) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
108 # Criteria 1 : Get the segment in the alignment with no codon stop |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
109 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
110 # 1 - Get the list of aligned aa seq for the 3 ORF: |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
111 print("1 - Get the list of aligned aa seq for the 3 ORF") |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
112 bash_of_aligned_aa_seq_3ORF = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
113 bash_of_aligned_nuc_seq_3ORF = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
114 BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION = [] |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
115 |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
116 for fasta_name in bash_aligned_nc_seq.keys(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
117 # Get sequence, chek if multiple 3, then get 6 orfs |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
118 sequence_nc = bash_aligned_nc_seq[fasta_name] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
119 new_sequence_nc, modulo = multiple3(sequence_nc) |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
120 new_sequence_rev = reverse_complement2(new_sequence_nc) |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
121 # For each seq of the multialignment => give the 6 ORFs (in nuc) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
122 bash_of_aligned_nuc_seq_3ORF[fasta_name] = [new_sequence_nc, new_sequence_nc[1:-2], new_sequence_nc[2:-1], new_sequence_rev, new_sequence_rev[1:-2], new_sequence_rev[2:-1]] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
123 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
124 seq_prot_ORF1 = simply_get_orf(new_sequence_nc, bash_codeUniversel) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
125 seq_prot_ORF2 = simply_get_orf(new_sequence_nc[1:-2], bash_codeUniversel) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
126 seq_prot_ORF3 = simply_get_orf(new_sequence_nc[2:-1], bash_codeUniversel) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
127 seq_prot_ORF4 = simply_get_orf(new_sequence_rev, bash_codeUniversel) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
128 seq_prot_ORF5 = simply_get_orf(new_sequence_rev[1:-2], bash_codeUniversel) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
129 seq_prot_ORF6 = simply_get_orf(new_sequence_rev[2:-1], bash_codeUniversel) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
130 |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
131 # For each seq of the multialignment => give the 6 ORFs (in aa) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
132 bash_of_aligned_aa_seq_3ORF[fasta_name] = [seq_prot_ORF1, seq_prot_ORF2, seq_prot_ORF3, seq_prot_ORF4, seq_prot_ORF5, seq_prot_ORF6] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
133 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
134 # 2 - Test for the best ORF (Get the longuest segment in the alignment with no codon stop ... for each ORF ... the longuest should give the ORF) |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
135 print("2 - Test for the best ORF") |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
136 BEST_MAX = 0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
137 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
138 for i in [0,1,2,3,4,5]: # Test the 6 ORFs |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
139 ORF_Aligned_aa = [] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
140 ORF_Aligned_nuc = [] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
141 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
142 # 2.1 - Get the alignment of sequence for a given ORF |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
143 # Compare the 1rst ORF between all sequence => list them in ORF_Aligned_aa // them do the same for the second ORF, and them the 3rd |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
144 for fasta_name in bash_of_aligned_aa_seq_3ORF.keys(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
145 ORFsequence = bash_of_aligned_aa_seq_3ORF[fasta_name][i] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
146 aa_length = len(ORFsequence) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
147 ORF_Aligned_aa.append(ORFsequence) ### List of all sequences in the ORF nb "i" = |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
148 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
149 n = i+1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
150 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
151 for fasta_name in bash_of_aligned_nuc_seq_3ORF.keys(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
152 ORFsequence = bash_of_aligned_nuc_seq_3ORF[fasta_name][i] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
153 nuc_length = len(ORFsequence) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
154 ORF_Aligned_nuc.append(ORFsequence) # List of all sequences in the ORF nb "i" = |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
155 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
156 # 2.2 - Get the list of sublist of positions whithout codon stop in the alignment |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
157 # For each ORF, now we have the list of sequences available (i.e. THE ALIGNMENT IN A GIVEN ORF) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
158 # Next step is to get the longuest subsequence whithout stop |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
159 # We will explore the presence of stop "*" in each column of the alignment, and get the positions of the segments between the positions with "*" |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
160 MAX_LENGTH = 0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
161 LONGUEST_SEGMENT_UNSTOPPED = "" |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
162 j = 0 # Start from first position in alignment |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
163 List_of_List_subsequences = [] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
164 List_positions_subsequence = [] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
165 while j < aa_length: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
166 column = [] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
167 for seq in ORF_Aligned_aa: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
168 column.append(seq[j]) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
169 j = j+1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
170 if "*" in column: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
171 List_of_List_subsequences.append(List_positions_subsequence) # Add previous list of positions |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
172 List_positions_subsequence = [] # Re-initialyse list of positions |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
173 else: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
174 List_positions_subsequence.append(j) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
175 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
176 # 2.3 - Among all the sublists (separated by column with codon stop "*"), get the longuest one (BETTER SEGMENT for a given ORF) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
177 LONGUEST_SUBSEQUENCE_LIST_POSITION = [] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
178 MAX=0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
179 for sublist in List_of_List_subsequences: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
180 if len(sublist) > MAX and len(sublist) > minimal_cds_length: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
181 MAX = len(sublist) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
182 LONGUEST_SUBSEQUENCE_LIST_POSITION = sublist |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
183 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
184 # 2.4. - Test if the longuest subsequence start exactly at the beginning of the original sequence (i.e. means the ORF maybe truncated) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
185 if LONGUEST_SUBSEQUENCE_LIST_POSITION != []: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
186 if LONGUEST_SUBSEQUENCE_LIST_POSITION[0] == 0: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
187 CDS_maybe_truncated = 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
188 else: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
189 CDS_maybe_truncated = 0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
190 else: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
191 CDS_maybe_truncated = 0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
192 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
193 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
194 # 2.5 - Test if this BETTER SEGMENT for a given ORF, is the better than the one for the other ORF (GET THE BEST ORF) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
195 # Test whether it is the better ORF |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
196 if MAX > BEST_MAX: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
197 BEST_MAX = MAX |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
198 BEST_ORF = i+1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
199 BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION = LONGUEST_SUBSEQUENCE_LIST_POSITION |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
200 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
201 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
202 # 3 - ONCE we have this better segment (BEST CODING SEGMENT) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
203 # ==> GET THE STARTING and ENDING POSITIONS (in aa position and in nuc position) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
204 # And get the INDEX of the best ORF [0, 1, or 2] |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
205 print("3 - ONCE we have this better segment") |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
206 if BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION != []: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
207 pos_MIN_aa = BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION[0] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
208 pos_MIN_aa = pos_MIN_aa - 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
209 pos_MAX_aa = BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION[-1] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
210 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
211 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
212 BESTORF_bash_of_aligned_aa_seq = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
213 BESTORF_bash_of_aligned_aa_seq_CODING = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
214 for fasta_name in bash_of_aligned_aa_seq_3ORF.keys(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
215 index_BEST_ORF = BEST_ORF-1 # cause list going from 0 to 2 in LIST_3_ORF, while the ORF nb is indexed from 1 to 3 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
216 seq = bash_of_aligned_aa_seq_3ORF[fasta_name][index_BEST_ORF] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
217 seq_coding = seq[pos_MIN_aa:pos_MAX_aa] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
218 BESTORF_bash_of_aligned_aa_seq[fasta_name] = seq |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
219 BESTORF_bash_of_aligned_aa_seq_CODING[fasta_name] = seq_coding |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
220 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
221 # 4 - Get the corresponding position (START/END of BEST CODING SEGMENT) for nucleotides alignment |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
222 print("4 - Get the corresponding position") |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
223 pos_MIN_nuc = pos_MIN_aa * 3 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
224 pos_MAX_nuc = pos_MAX_aa * 3 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
225 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
226 BESTORF_bash_aligned_nc_seq = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
227 BESTORF_bash_aligned_nc_seq_CODING = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
228 for fasta_name in bash_aligned_nc_seq.keys(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
229 seq = bash_of_aligned_nuc_seq_3ORF[fasta_name][index_BEST_ORF] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
230 seq_coding = seq[pos_MIN_nuc:pos_MAX_nuc] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
231 BESTORF_bash_aligned_nc_seq[fasta_name] = seq |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
232 BESTORF_bash_aligned_nc_seq_CODING[fasta_name] = seq_coding |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
233 seq_cutted = re.sub(r'^.*?[a-zA-Z]', '', seq) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
234 sequence_for_blast=(fasta_name+'\n'+seq_cutted+'\n') |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
235 good_ORF_found = False |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
236 try: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
237 #result_handle = "" |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
238 #blast_records = "" |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
239 # logger.debug("sequence_for_blast = %s ", sequence_for_blast) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
240 print('sequence_for_blast = %s ',sequence_for_blast, end=' ', flush=True) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
241 result_handle = NCBIWWW.qblast("blastn", "/db/nt/current/fasta/nt.fsa", sequence_for_blast, expect=0.001, hitlist_size=1) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
242 blast_records = NCBIXML.parse(result_handle) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
243 except: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
244 good_ORF_found = False |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
245 else: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
246 for blast_record in blast_records: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
247 for alignment in blast_record.alignments: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
248 for hsp in alignment.hsps: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
249 if hsp.expect < 0.001: |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
250 good_ORF_found = True |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
251 print("good_ORF_found = %s" %good_ORF_found) |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
252 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
253 else: # no CDS found |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
254 BESTORF_bash_aligned_nc_seq = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
255 BESTORF_bash_aligned_nc_seq_CODING = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
256 BESTORF_bash_of_aligned_aa_seq = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
257 BESTORF_bash_of_aligned_aa_seq_CODING ={} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
258 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
259 # Check whether their is a "M" or not, and if at least 1 "M" is present, that it is not in the last 50 aa |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
260 |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
261 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
262 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = {} |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
263 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
264 Ortho = 0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
265 for fasta_name in BESTORF_bash_of_aligned_aa_seq_CODING.keys(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
266 seq_aa = BESTORF_bash_of_aligned_aa_seq_CODING[fasta_name] |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
267 Ortho = detect_methionine(seq_aa, Ortho, minimal_cds_length) ### DEF6 ### |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
268 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
269 # CASE 1: A "M" is present and correctly localized (not in last 50 aa) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
270 if Ortho == 1: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
271 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = BESTORF_bash_of_aligned_aa_seq_CODING |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
272 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = BESTORF_bash_aligned_nc_seq_CODING |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
273 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
274 # CASE 2: in case the CDS is truncated, so the "M" is maybe missing: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
275 if Ortho == 0 and CDS_maybe_truncated == 1: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
276 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = BESTORF_bash_of_aligned_aa_seq_CODING |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
277 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = BESTORF_bash_aligned_nc_seq_CODING |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
278 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
279 # CASE 3: CDS not truncated AND no "M" found in good position (i.e. before the last 50 aa): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
280 ## => the 2 bash "CDS_with_M" are left empty ("{}") |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
281 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
282 return(BESTORF_bash_aligned_nc_seq, BESTORF_bash_aligned_nc_seq_CODING, BESTORF_bash_of_aligned_nuc_seq_CDS_with_M, BESTORF_bash_of_aligned_aa_seq, BESTORF_bash_of_aligned_aa_seq_CODING, BESTORF_bash_of_aligned_aa_seq_CDS_with_M) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
283 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
284 def write_output_file(results_dict, name_elems, path_out): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
285 if results_dict != {}: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
286 name_elems[3] = str(len(results_dict.keys())) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
287 new_name = "_".join(name_elems) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
288 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
289 out1 = open("%s/%s" %(path_out,new_name), "w") |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
290 for fasta_name in results_dict.keys(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
291 seq = results_dict[fasta_name] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
292 out1.write("%s\n" %fasta_name) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
293 out1.write("%s\n" %seq) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
294 out1.close() |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
295 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
296 def main(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
297 parser = argparse.ArgumentParser() |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
298 parser.add_argument("codeUniversel", help="File describing the genetic code (code_universel_modified.txt") |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
299 parser.add_argument("min_cds_len", help="Minmal length of a CDS (in amino-acids)", type=int) |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
300 parser.add_argument("min_spec", help="Minimal number of species per alignment") |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
301 parser.add_argument("list_files", help="File with all input files names") |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
302 args = parser.parse_args() |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
303 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
304 minimal_cds_length = int(args.min_cds_len) # in aa number |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
305 bash_codeUniversel = code_universel(args.codeUniversel) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
306 minimum_species = int(args.min_spec) |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
307 |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
308 # Inputs from file containing list of species |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
309 list_files = [] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
310 with open(args.list_files, 'r') as f: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
311 for line in f.readlines(): |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
312 list_files.append(line.strip('\n')) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
313 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
314 # Directories for results |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
315 dirs = ["04_BEST_ORF_nuc", "04_BEST_ORF_aa", "05_CDS_nuc", "05_CDS_aa", "06_CDS_with_M_nuc", "06_CDS_with_M_aa"] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
316 for directory in dirs: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
317 os.mkdir(directory) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
318 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
319 count_file_processed, count_file_with_CDS, count_file_without_CDS, count_file_with_CDS_plus_M = 0, 0, 0, 0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
320 count_file_with_cds_and_enought_species, count_file_with_cds_M_and_enought_species = 0, 0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
321 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
322 # ! : Currently, files are named "Orthogroup_x_y_sequences.fasta, where x is the number of the orthogroup (not important, juste here to make a distinct name), |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
323 # and y is the number of sequences/species in the group. These files are outputs of blastalign, where species can be removed. y is then modified. |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
324 name_elems = ["orthogroup", "0", "with", "0", "species.fasta"] |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
325 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
326 # by fixing the counter here, there will be some "holes" in the outputs directories (missing numbers), but the groups between directories will correspond |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
327 #n0 = 0 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
328 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
329 for file in list_files: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
330 #n0 += 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
331 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
332 count_file_processed = count_file_processed + 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
333 nb_gp = file.split('_')[1] # Keep trace of the orthogroup number |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
334 fasta_file_path = "./%s" %file |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
335 bash_fasta = dico(fasta_file_path) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
336 BESTORF_nuc, BESTORF_nuc_CODING, BESTORF_nuc_CDS_with_M, BESTORF_aa, BESTORF_aa_CODING, BESTORF_aa_CDS_with_M = find_good_ORF_criteria_3(bash_fasta, bash_codeUniversel, minimal_cds_length, minimum_species) |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
337 |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
338 name_elems[1] = nb_gp |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
339 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
340 # Update counts and write group in corresponding output directory |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
341 if BESTORF_nuc != {}: |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
342 count_file_with_CDS += 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
343 if len(BESTORF_nuc.keys()) >= minimum_species : |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
344 count_file_with_cds_and_enought_species += 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
345 write_output_file(BESTORF_nuc, name_elems, dirs[0]) # OUTPUT BESTORF_nuc |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
346 write_output_file(BESTORF_aa, name_elems, dirs[1]) # The most interesting |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
347 else: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
348 count_file_without_CDS += 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
349 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
350 if BESTORF_nuc_CODING != {} and len(BESTORF_nuc_CODING.keys()) >= minimum_species: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
351 write_output_file(BESTORF_nuc_CODING, name_elems, dirs[2]) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
352 write_output_file(BESTORF_aa_CODING, name_elems, dirs[3]) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
353 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
354 if BESTORF_nuc_CDS_with_M != {}: |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
355 count_file_with_CDS_plus_M += 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
356 if len(BESTORF_nuc_CDS_with_M.keys()) >= minimum_species : |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
357 count_file_with_cds_M_and_enought_species += 1 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
358 write_output_file(BESTORF_nuc_CDS_with_M, name_elems, dirs[4]) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
359 write_output_file(BESTORF_aa_CDS_with_M, name_elems, dirs[5]) |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
360 |
1
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
361 print("*************** CDS detection ***************") |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
362 print("\nFiles processed: %d" %count_file_processed) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
363 print("\tFiles with CDS: %d" %count_file_with_CDS) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
364 print("\tFiles wth CDS and more than %s species: %d" %(minimum_species, count_file_with_cds_and_enought_species)) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
365 print("\t\tFiles with CDS plus M (codon start): %d" %count_file_with_CDS_plus_M) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
366 print("\t\tFiles with CDS plus M (codon start) and more than %s species: %d" %(minimum_species,count_file_with_cds_M_and_enought_species) ) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
367 print("\tFiles without CDS: %d \n" %count_file_without_CDS) |
c79bdda8abfb
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3a118aa934e6406cc8b0b24d006af6365c277519
abims-sbr
parents:
0
diff
changeset
|
368 print("") |
0
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
369 |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
370 if __name__ == '__main__': |
eb95bf7f90ae
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1-dirty
abims-sbr
parents:
diff
changeset
|
371 main() |