annotate smart_toolShed/commons/core/coord/PathUtils.py @ 0:e0f8dcca02ed

Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
author yufei-luo
date Thu, 17 Jan 2013 10:52:14 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
1 # Copyright INRA (Institut National de la Recherche Agronomique)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
2 # http://www.inra.fr
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
3 # http://urgi.versailles.inra.fr
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
4 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
5 # This software is governed by the CeCILL license under French law and
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
6 # abiding by the rules of distribution of free software. You can use,
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
7 # modify and/ or redistribute the software under the terms of the CeCILL
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
8 # license as circulated by CEA, CNRS and INRIA at the following URL
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
9 # "http://www.cecill.info".
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
10 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
11 # As a counterpart to the access to the source code and rights to copy,
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
12 # modify and redistribute granted by the license, users are provided only
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
13 # with a limited warranty and the software's author, the holder of the
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
14 # economic rights, and the successive licensors have only limited
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
15 # liability.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
16 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
17 # In this respect, the user's attention is drawn to the risks associated
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
18 # with loading, using, modifying and/or developing or reproducing the
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
19 # software by the user in light of its specific status of free software,
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
20 # that may mean that it is complicated to manipulate, and that also
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
21 # therefore means that it is reserved for developers and experienced
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
22 # professionals having in-depth computer knowledge. Users are therefore
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
23 # encouraged to load and test the software's suitability as regards their
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
24 # requirements in conditions enabling the security of their systems and/or
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
25 # data to be ensured and, more generally, to use and operate it in the
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
26 # same conditions as regards security.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
27 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
28 # The fact that you are presently reading this means that you have had
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
29 # knowledge of the CeCILL license and that you accept its terms.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
30
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
31
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
32 import os
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
33 import sys
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
34 import copy
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
35 from commons.core.coord.Path import Path
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
36 from commons.core.coord.SetUtils import SetUtils
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
37 from commons.core.coord.Map import Map
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
38 from commons.core.coord.AlignUtils import AlignUtils
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
39 from commons.core.checker.RepetException import RepetDataException
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
40
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
41 ## Static methods for the manipulation of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
42 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
43 class PathUtils ( object ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
44
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
45 ## Change the identifier of each Set instance in the given list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
46 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
47 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
48 # @param newId new identifier
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
49 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
50 def changeIdInList(lPaths, newId):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
51 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
52 iPath.id = newId
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
53
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
54 changeIdInList = staticmethod( changeIdInList )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
55
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
56
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
57 ## Return a list of Set instances containing the query range from a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
58 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
59 # @param lPaths a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
60 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
61 def getSetListFromQueries(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
62 lSets = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
63 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
64 lSets.append( iPath.getSubjectAsSetOfQuery() )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
65 return lSets
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
66
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
67 getSetListFromQueries = staticmethod( getSetListFromQueries )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
68
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
69 #TODO: add tests !!!!
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
70 ## Return a list of Set instances containing the query range from a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
71 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
72 # @param lPaths a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
73 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
74 @staticmethod
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
75 def getSetListFromSubjects(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
76 lSets = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
77 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
78 lSets.append( iPath.getQuerySetOfSubject() )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
79 return lSets
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
80
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
81
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
82 ## Return a sorted list of Range instances containing the subjects from a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
83 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
84 # @param lPaths a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
85 # @note meaningful only if all Path instances have same identifier
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
86 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
87 def getRangeListFromSubjects( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
88 lRanges = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
89 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
90 lRanges.append( iPath.range_subject )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
91 if lRanges[0].isOnDirectStrand():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
92 return sorted( lRanges, key=lambda iRange: ( iRange.getMin(), iRange.getMax() ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
93 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
94 return sorted( lRanges, key=lambda iRange: ( iRange.getMax(), iRange.getMin() ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
95
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
96 getRangeListFromSubjects = staticmethod( getRangeListFromSubjects )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
97
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
98
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
99 ## Return a tuple with min and max of query coordinates from Path instances in the given list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
100 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
101 # @param lPaths a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
102 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
103 def getQueryMinMaxFromPathList(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
104 qmin = -1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
105 qmax = -1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
106 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
107 if qmin == -1:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
108 qmin = iPath.range_query.start
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
109 qmin = min(qmin, iPath.range_query.getMin())
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
110 qmax = max(qmax, iPath.range_query.getMax())
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
111 return (qmin, qmax)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
112
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
113 getQueryMinMaxFromPathList = staticmethod( getQueryMinMaxFromPathList )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
114
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
115
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
116 ## Return a tuple with min and max of subject coordinates from Path instances in the given list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
117 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
118 # @param lPaths lists of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
119 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
120 def getSubjectMinMaxFromPathList(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
121 smin = -1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
122 smax = -1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
123 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
124 if smin == -1:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
125 smin = iPath.range_subject.start
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
126 smin = min(smin, iPath.range_subject.getMin())
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
127 smax = max(smax, iPath.range_subject.getMax())
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
128 return (smin, smax)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
129
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
130 getSubjectMinMaxFromPathList = staticmethod( getSubjectMinMaxFromPathList )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
131
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
132
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
133 ## Return True if the query range of any Path instance from the first list overlaps with the query range of any Path instance from the second list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
134 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
135 # @param lPaths1: list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
136 # @param lPaths2: list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
137 # @return boolean
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
138 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
139 def areQueriesOverlappingBetweenPathLists( lPaths1, lPaths2 ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
140 lSortedPaths1 = PathUtils.getPathListSortedByIncreasingMinQueryThenMaxQuery( lPaths1 )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
141 lSortedPaths2 = PathUtils.getPathListSortedByIncreasingMinQueryThenMaxQuery( lPaths2 )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
142 i = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
143 j = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
144 while i != len(lSortedPaths1):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
145 while j != len(lSortedPaths2):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
146 if not lSortedPaths1[i].range_query.isOverlapping( lSortedPaths2[j].range_query ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
147 j += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
148 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
149 return True
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
150 i += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
151 return False
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
152
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
153 areQueriesOverlappingBetweenPathLists = staticmethod( areQueriesOverlappingBetweenPathLists )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
154
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
155
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
156 ## Show Path instances contained in the given list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
157 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
158 # @param lPaths a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
159 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
160 def showList(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
161 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
162 iPath.show()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
163
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
164 showList = staticmethod( showList )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
165
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
166
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
167 ## Write Path instances contained in the given list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
168 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
169 # @param lPaths a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
170 # @param fileName name of the file to write the Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
171 # @param mode the open mode of the file ""w"" or ""a""
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
172 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
173 def writeListInFile(lPaths, fileName, mode="w"):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
174 AlignUtils.writeListInFile(lPaths, fileName, mode)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
175
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
176 writeListInFile = staticmethod( writeListInFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
177
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
178
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
179 ## Return new list of Path instances with no duplicate
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
180 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
181 # @param lPaths a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
182 # @param useOnlyCoord boolean if True, check only coordinates and sequence names
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
183 # @return lUniqPaths a path instances list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
184 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
185 def getPathListWithoutDuplicates(lPaths, useOnlyCoord = False):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
186 if len(lPaths) < 2:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
187 return lPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
188 lSortedPaths = PathUtils.getPathListSortedByIncreasingMinQueryThenMaxQueryThenIdentifier( lPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
189 lUniqPaths = [ lSortedPaths[0] ]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
190 if useOnlyCoord:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
191 for iPath in lSortedPaths[1:]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
192 if iPath.range_query.start != lUniqPaths[-1].range_query.start \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
193 or iPath.range_query.end != lUniqPaths[-1].range_query.end \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
194 or iPath.range_query.seqname != lUniqPaths[-1].range_query.seqname \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
195 or iPath.range_subject.start != lUniqPaths[-1].range_subject.start \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
196 or iPath.range_subject.end != lUniqPaths[-1].range_subject.end \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
197 or iPath.range_subject.seqname != lUniqPaths[-1].range_subject.seqname:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
198 lUniqPaths.append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
199 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
200 for iPath in lSortedPaths[1:]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
201 if iPath != lUniqPaths[-1]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
202 lUniqPaths.append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
203 return lUniqPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
204
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
205 getPathListWithoutDuplicates = staticmethod( getPathListWithoutDuplicates )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
206
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
207
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
208 def getPathListWithoutDuplicatesOnQueryCoord(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
209 if len(lPaths) < 2:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
210 return lPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
211 lSortedPaths = PathUtils.getPathListSortedByIncreasingMinQueryThenMaxQueryThenIdentifier( lPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
212 lUniqPaths = [ lSortedPaths[0] ]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
213 for iPath in lSortedPaths[1:]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
214 if iPath.range_query.start != lUniqPaths[-1].range_query.start \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
215 or iPath.range_query.end != lUniqPaths[-1].range_query.end \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
216 or iPath.range_query.seqname != lUniqPaths[-1].range_query.seqname:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
217 lUniqPaths.append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
218 return lUniqPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
219
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
220 getPathListWithoutDuplicatesOnQueryCoord = staticmethod(getPathListWithoutDuplicatesOnQueryCoord)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
221
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
222
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
223 ## Split a Path list in several Path lists according to the identifier
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
224 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
225 # @param lPaths a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
226 # @return a dictionary which keys are identifiers and values Path lists
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
227 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
228 def getDictOfListsWithIdAsKey( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
229 dId2PathList = {}
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
230 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
231 if dId2PathList.has_key( iPath.id ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
232 dId2PathList[ iPath.id ].append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
233 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
234 dId2PathList[ iPath.id ] = [ iPath ]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
235 return dId2PathList
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
236
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
237 getDictOfListsWithIdAsKey = staticmethod( getDictOfListsWithIdAsKey )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
238
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
239
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
240 ## Split a Path file in several Path lists according to the identifier
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
241 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
242 # @param pathFile name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
243 # @return a dictionary which keys are identifiers and values Path lists
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
244 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
245 def getDictOfListsWithIdAsKeyFromFile( pathFile ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
246 dId2PathList = {}
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
247 pathFileHandler = open( pathFile, "r" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
248 while True:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
249 line = pathFileHandler.readline()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
250 if line == "":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
251 break
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
252 iPath = Path()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
253 iPath.setFromString( line )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
254 if dId2PathList.has_key( iPath.id ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
255 dId2PathList[ iPath.id ].append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
256 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
257 dId2PathList[ iPath.id ] = [ iPath ]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
258 pathFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
259 return dId2PathList
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
260
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
261 getDictOfListsWithIdAsKeyFromFile = staticmethod( getDictOfListsWithIdAsKeyFromFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
262
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
263
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
264 ## Return a list of Path list(s) obtained while splitting a list of connected Path instances according to another based on query coordinates
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
265 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
266 # @param lToKeep: a list of Path instances to keep (reference)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
267 # @param lToUnjoin: a list of Path instances to unjoin
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
268 # @return: list of Path list(s) (can be empty if one of the input lists is empty)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
269 # @warning: all the path instances in a given list MUST be connected (i.e. same identifier)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
270 # @warning: all the path instances in a given list MUST NOT overlap neither within each other nor with the Path instances of the other list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
271 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
272 def getPathListUnjoinedBasedOnQuery( lToKeep, lToUnjoin ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
273 lSortedToKeep = PathUtils.getPathListSortedByIncreasingMinQueryThenMaxQuery( lToKeep )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
274 lSortedToUnjoin = PathUtils.getPathListSortedByIncreasingMinQueryThenMaxQuery( lToUnjoin )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
275 if lToUnjoin == []:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
276 return []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
277 if lToKeep == []:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
278 return [ lToUnjoin ]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
279
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
280 lLists = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
281 k = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
282 while k < len(lSortedToKeep):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
283 j1 = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
284 while j1 < len(lSortedToUnjoin) and lSortedToKeep[k].range_query.getMin() > lSortedToUnjoin[j1].range_query.getMax():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
285 j1 += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
286 if j1 == len(lSortedToUnjoin):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
287 break
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
288 if j1 != 0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
289 lLists.append( lSortedToUnjoin[:j1] )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
290 del lSortedToUnjoin[:j1]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
291 j1 = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
292 if k+1 == len(lSortedToKeep):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
293 break
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
294 j2 = j1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
295 if j2 < len(lSortedToUnjoin) and lSortedToKeep[k+1].range_query.getMin() > lSortedToUnjoin[j2].range_query.getMax():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
296 while j2 < len(lSortedToUnjoin) and lSortedToKeep[k+1].range_query.getMin() > lSortedToUnjoin[j2].range_query.getMax():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
297 j2 += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
298 lLists.append( lSortedToUnjoin[j1:j2] )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
299 del lSortedToUnjoin[j1:j2]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
300 k += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
301
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
302 if lLists != [] or k == 0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
303 lLists.append( lSortedToUnjoin )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
304 return lLists
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
305
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
306 getPathListUnjoinedBasedOnQuery = staticmethod( getPathListUnjoinedBasedOnQuery )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
307
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
308
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
309 ## Return the identity of the Path list, the identity of each instance being weighted by the length of each query range
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
310 # All Paths should have the same query and subject.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
311 # The Paths are merged using query coordinates only.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
312 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
313 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
314 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
315 def getIdentityFromPathList( lPaths, checkSubjects=True ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
316 if len( PathUtils.getListOfDistinctQueryNames( lPaths ) ) > 1:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
317 msg = "ERROR: try to compute identity from Paths with different queries"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
318 sys.stderr.write( "%s\n" % msg )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
319 sys.stderr.flush()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
320 raise Exception
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
321 if checkSubjects and len( PathUtils.getListOfDistinctSubjectNames( lPaths ) ) > 1:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
322 msg = "ERROR: try to compute identity from Paths with different subjects"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
323 sys.stderr.write( "%s\n" % msg )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
324 sys.stderr.flush()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
325 raise Exception
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
326 identity = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
327 lMergedPaths = PathUtils.mergePathsInListUsingQueryCoordsOnly( lPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
328 lQuerySets = PathUtils.getSetListFromQueries( lMergedPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
329 lMergedQuerySets = SetUtils.mergeSetsInList( lQuerySets )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
330 totalLengthOnQry = SetUtils.getCumulLength( lMergedQuerySets )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
331 for iPath in lMergedPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
332 identity += iPath.identity * iPath.getLengthOnQuery()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
333 weightedIdentity = identity / float(totalLengthOnQry)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
334 if weightedIdentity < 0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
335 msg = "ERROR: weighted identity '%.2f' outside range" % weightedIdentity
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
336 sys.stderr.write("%s\n" % msg)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
337 sys.stderr.flush()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
338 raise Exception
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
339 elif weightedIdentity > 100:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
340 msg = "ERROR: weighted identity '%.2f' outside range" % weightedIdentity
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
341 sys.stderr.write("%s\n" % msg)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
342 sys.stderr.flush()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
343 raise RepetDataException(msg)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
344 return weightedIdentity
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
345
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
346 getIdentityFromPathList = staticmethod( getIdentityFromPathList )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
347
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
348
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
349 ## Return a list of Path instances sorted in increasing order according to the min of the query, then the max of the query, and finally their initial order.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
350 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
351 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
352 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
353 def getPathListSortedByIncreasingMinQueryThenMaxQuery(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
354 return sorted( lPaths, key=lambda iPath: ( iPath.getQueryMin(), iPath.getQueryMax() ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
355
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
356 getPathListSortedByIncreasingMinQueryThenMaxQuery = staticmethod( getPathListSortedByIncreasingMinQueryThenMaxQuery )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
357
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
358
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
359 ## Return a list of Path instances sorted in increasing order according to the min of the query, then the max of the query, then their identifier, and finally their initial order.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
360 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
361 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
362 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
363 def getPathListSortedByIncreasingMinQueryThenMaxQueryThenIdentifier(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
364 return sorted( lPaths, key=lambda iPath: ( iPath.getQueryMin(), iPath.getQueryMax(), iPath.getIdentifier() ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
365
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
366 getPathListSortedByIncreasingMinQueryThenMaxQueryThenIdentifier = staticmethod( getPathListSortedByIncreasingMinQueryThenMaxQueryThenIdentifier )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
367
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
368
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
369 ## Return a list of Path instances sorted in increasing order according to the min of the query, then the max of the query, then the min of the subject, then the max of the subject and finally their initial order.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
370 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
371 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
372 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
373 @staticmethod
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
374 def getPathListSortedByIncreasingMinQueryThenMaxQueryThenMinSubjectThenMaxSubject(lPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
375 return sorted(lPaths, key=lambda iPath: (iPath.getQueryMin(), iPath.getQueryMax(), iPath.getSubjectMin(), iPath.getSubjectMax()))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
376
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
377
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
378 ## Return a list of the distinct identifiers
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
379 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
380 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
381 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
382 def getListOfDistinctIdentifiers( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
383 sDistinctIdentifiers = set()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
384 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
385 sDistinctIdentifiers.add(iPath.id)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
386 return list(sDistinctIdentifiers)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
387
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
388 getListOfDistinctIdentifiers = staticmethod( getListOfDistinctIdentifiers )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
389
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
390
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
391 ## Return a list of the distinct query names present in the collection
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
392 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
393 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
394 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
395 def getListOfDistinctQueryNames( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
396 sDistinctQueryNames = set()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
397 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
398 sDistinctQueryNames.add(iPath.range_query.seqname)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
399 return list(sDistinctQueryNames)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
400
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
401 getListOfDistinctQueryNames = staticmethod( getListOfDistinctQueryNames )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
402
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
403
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
404 ## Return a list of the distinct subject names present in the collection
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
405 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
406 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
407 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
408 def getListOfDistinctSubjectNames( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
409 sDistinctSubjectNames = set()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
410 for iPath in lPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
411 sDistinctSubjectNames.add(iPath.range_subject.seqname)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
412 return list(sDistinctSubjectNames)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
413
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
414 getListOfDistinctSubjectNames = staticmethod( getListOfDistinctSubjectNames )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
415
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
416
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
417 ## Return a list of lists containing query coordinates of the connections sorted in increasing order.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
418 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
419 # @param lConnectedPaths: list of Path instances having the same identifier
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
420 # @param minLength: threshold below which connections are not reported (default= 0 bp)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
421 # @note: return only connections longer than threshold
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
422 # @note: if coordinate on query ends at 100, return 101
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
423 # @warning: Path instances MUST be sorted in increasing order according to query coordinates
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
424 # @warning: Path instances MUST be on direct query strand (and maybe on reverse subject strand)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
425 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
426 def getListOfJoinCoordinatesOnQuery(lConnectedPaths, minLength=0):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
427 lJoinCoordinates = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
428 for i in xrange(1,len(lConnectedPaths)):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
429 startJoin = lConnectedPaths[i-1].range_query.end
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
430 endJoin = lConnectedPaths[i].range_query.start
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
431 if endJoin - startJoin + 1 > minLength:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
432 lJoinCoordinates.append( [ startJoin + 1, endJoin - 1 ] )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
433 return lJoinCoordinates
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
434
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
435 getListOfJoinCoordinatesOnQuery = staticmethod( getListOfJoinCoordinatesOnQuery )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
436
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
437
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
438 ## Return the length on the query of all Path instance in the given list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
439 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
440 # @param lPaths list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
441 # @note overlapping ranges are not summed but truncated.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
442 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
443 def getLengthOnQueryFromPathList( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
444 lSets = PathUtils.getSetListFromQueries( lPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
445 lMergedSets = SetUtils.mergeSetsInList( lSets )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
446 length = SetUtils.getCumulLength( lMergedSets )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
447 return length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
448
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
449 getLengthOnQueryFromPathList = staticmethod( getLengthOnQueryFromPathList )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
450
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
451
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
452 ## Convert a Path file into an Align file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
453 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
454 # @param pathFile: name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
455 # @param alignFile: name of the output Align file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
456 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
457 def convertPathFileIntoAlignFile(pathFile, alignFile):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
458 pathFileHandler = open( pathFile, "r" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
459 alignFileHandler = open( alignFile, "w" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
460 iPath = Path()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
461 while True:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
462 line = pathFileHandler.readline()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
463 if line == "":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
464 break
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
465 iPath.setFromString( line )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
466 iAlign = iPath.getAlignInstance()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
467 iAlign.write( alignFileHandler )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
468 pathFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
469 alignFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
470
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
471 convertPathFileIntoAlignFile = staticmethod( convertPathFileIntoAlignFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
472
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
473 #TODO: duplicated method => to rename with the name of the next method (which is called) ?
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
474 ## Convert a Path File into a Map file with query coordinates only
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
475 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
476 # @param pathFile: name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
477 # @param mapFile: name of the output Map file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
478 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
479 def convertPathFileIntoMapFileWithQueryCoordsOnly( pathFile, mapFile ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
480 pathFileHandler = open( pathFile, "r" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
481 mapFileHandler = open( mapFile, "w" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
482 p = Path()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
483 while True:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
484 line = pathFileHandler.readline()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
485 if line == "":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
486 break
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
487 p.reset()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
488 p.setFromTuple( line.split("\t") )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
489 p.writeSubjectAsMapOfQuery( mapFileHandler )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
490 pathFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
491 mapFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
492
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
493 convertPathFileIntoMapFileWithQueryCoordsOnly = staticmethod( convertPathFileIntoMapFileWithQueryCoordsOnly )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
494
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
495
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
496 ## for each line of a given Path file, write the coordinates of the subject on the query as one line in a Map file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
497 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
498 # @param pathFile: name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
499 # @param mapFile: name of the output Map file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
500 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
501 def convertPathFileIntoMapFileWithSubjectsOnQueries( pathFile, mapFile ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
502 PathUtils.convertPathFileIntoMapFileWithQueryCoordsOnly( pathFile, mapFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
503 convertPathFileIntoMapFileWithSubjectsOnQueries = staticmethod( convertPathFileIntoMapFileWithSubjectsOnQueries )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
504
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
505
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
506 ## Merge matches on queries
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
507 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
508 # @param inFile: name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
509 # @param outFile: name of the output Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
510 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
511 def mergeMatchesOnQueries(inFile, outFile):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
512 mapFile = "%s.map" % ( inFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
513 PathUtils.convertPathFileIntoMapFileWithQueryCoordsOnly( inFile, mapFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
514 cmd = "mapOp"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
515 cmd += " -q %s" % ( mapFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
516 cmd += " -m"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
517 cmd += " 2>&1 > /dev/null"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
518 exitStatus = os.system( cmd )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
519 if exitStatus != 0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
520 print "ERROR: mapOp returned %i" % ( exitStatus )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
521 sys.exit(1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
522 os.remove( mapFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
523 mergeFile = "%s.merge" % ( mapFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
524 mergeFileHandler = open( mergeFile, "r" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
525 outFileHandler = open( outFile, "w" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
526 m = Map()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
527 while True:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
528 line = mergeFileHandler.readline()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
529 if line == "":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
530 break
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
531 m.reset()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
532 m.setFromString( line, "\t" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
533 m.writeAsQueryOfPath( outFileHandler )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
534 mergeFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
535 os.remove( mergeFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
536 outFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
537
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
538 mergeMatchesOnQueries = staticmethod( mergeMatchesOnQueries )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
539
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
540
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
541 ## Filter chains of Path(s) which length is below a given threshold
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
542 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
543 # @param lPaths: list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
544 # @param minLengthChain: minimum length of a chain to be kept
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
545 # @note: a chain may contain a single Path instance
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
546 # @return: a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
547 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
548 def filterPathListOnChainLength( lPaths, minLengthChain ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
549 lFilteredPaths = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
550 dPathnum2Paths = PathUtils.getDictOfListsWithIdAsKey( lPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
551 for pathnum in dPathnum2Paths.keys():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
552 length = PathUtils.getLengthOnQueryFromPathList( dPathnum2Paths[ pathnum ] )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
553 if length >= minLengthChain:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
554 lFilteredPaths += dPathnum2Paths[ pathnum ]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
555 return lFilteredPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
556
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
557 filterPathListOnChainLength = staticmethod( filterPathListOnChainLength )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
558
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
559
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
560 ## Return a Path list from a Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
561 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
562 # @param pathFile string name of a Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
563 # @return a list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
564 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
565 def getPathListFromFile( pathFile ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
566 lPaths = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
567 pathFileHandler = open( pathFile, "r" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
568 while True:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
569 line = pathFileHandler.readline()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
570 if line == "":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
571 break
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
572 iPath = Path()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
573 iPath.setFromString( line )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
574 lPaths.append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
575 pathFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
576 return lPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
577
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
578 getPathListFromFile = staticmethod( getPathListFromFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
579
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
580
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
581 ## Convert a chain into a 'pathrange'
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
582 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
583 # @param lPaths a list of Path instances with the same identifier
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
584 # @note: the min and max of each Path is used
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
585 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
586 def convertPathListToPathrange( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
587 if len(lPaths) == 0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
588 return
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
589 if len(lPaths) == 1:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
590 return lPaths[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
591 iPathrange = copy.deepcopy( lPaths[0] )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
592 iPathrange.identity = lPaths[0].identity * lPaths[0].getLengthOnQuery()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
593 cumulQueryLength = iPathrange.getLengthOnQuery()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
594 for iPath in lPaths[1:]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
595 if iPath.id != iPathrange.id:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
596 msg = "ERROR: two Path instances in the chain have different identifiers"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
597 sys.stderr.write( "%s\n" % ( msg ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
598 sys.exit(1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
599 if iPathrange.range_subject.isOnDirectStrand() != iPath.range_subject.isOnDirectStrand():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
600 msg = "ERROR: two Path instances in the chain are on different strands"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
601 sys.stderr.write( "%s\n" % ( msg ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
602 sys.exit(1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
603 iPathrange.range_query.start = min( iPathrange.range_query.start, iPath.range_query.start )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
604 iPathrange.range_query.end = max( iPathrange.range_query.end, iPath.range_query.end )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
605 if iPathrange.range_subject.isOnDirectStrand():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
606 iPathrange.range_subject.start = min( iPathrange.range_subject.start, iPath.range_subject.start )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
607 iPathrange.range_subject.end = max( iPathrange.range_subject.end, iPath.range_subject.end )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
608 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
609 iPathrange.range_subject.start = max( iPathrange.range_subject.start, iPath.range_subject.start )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
610 iPathrange.range_subject.end = min( iPathrange.range_subject.end, iPath.range_subject.end )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
611 iPathrange.e_value = min( iPathrange.e_value, iPath.e_value )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
612 iPathrange.score += iPath.score
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
613 iPathrange.identity += iPath.identity * iPath.getLengthOnQuery()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
614 cumulQueryLength += iPath.getLengthOnQuery()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
615 iPathrange.identity = iPathrange.identity / float(cumulQueryLength)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
616 return iPathrange
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
617
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
618 convertPathListToPathrange = staticmethod( convertPathListToPathrange )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
619
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
620
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
621 ## Convert a Path file into an Align file via 'pathrange'
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
622 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
623 # @param pathFile: name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
624 # @param alignFile: name of the output Align file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
625 # @param verbose integer verbosity level
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
626 # @note: the min and max of each Path is used
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
627 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
628 def convertPathFileIntoAlignFileViaPathrange( pathFile, alignFile, verbose=0 ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
629 lPaths = PathUtils.getPathListFromFile( pathFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
630 dId2PathList = PathUtils.getDictOfListsWithIdAsKey( lPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
631 lIds = dId2PathList.keys()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
632 lIds.sort()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
633 if verbose > 0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
634 msg = "number of chains: %i" % ( len(lIds) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
635 sys.stdout.write( "%s\n" % ( msg ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
636 sys.stdout.flush()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
637 alignFileHandler = open( alignFile, "w" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
638 for identifier in lIds:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
639 iPath = PathUtils.convertPathListToPathrange( dId2PathList[ identifier ] )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
640 iAlign = iPath.getAlignInstance()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
641 iAlign.write( alignFileHandler )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
642 alignFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
643
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
644 convertPathFileIntoAlignFileViaPathrange = staticmethod( convertPathFileIntoAlignFileViaPathrange )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
645
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
646
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
647 ## Split a list of Path instances according to the name of the query
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
648 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
649 # @param lInPaths list of align instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
650 # @return lOutPathLists list of align instances lists
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
651 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
652 def splitPathListByQueryName( lInPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
653 lInSortedPaths = sorted( lInPaths, key=lambda o: o.range_query.seqname )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
654 lOutPathLists = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
655 if len(lInSortedPaths) != 0 :
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
656 lPathsForCurrentQuery = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
657 previousQuery = lInSortedPaths[0].range_query.seqname
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
658 for iPath in lInSortedPaths :
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
659 currentQuery = iPath.range_query.seqname
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
660 if previousQuery != currentQuery :
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
661 lOutPathLists.append( lPathsForCurrentQuery )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
662 previousQuery = currentQuery
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
663 lPathsForCurrentQuery = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
664 lPathsForCurrentQuery.append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
665
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
666 lOutPathLists.append(lPathsForCurrentQuery)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
667
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
668 return lOutPathLists
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
669
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
670 splitPathListByQueryName = staticmethod( splitPathListByQueryName )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
671
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
672
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
673 ## Create an Path file from each list of Path instances in the input list
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
674 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
675 # @param lPathList list of lists with Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
676 # @param pattern string
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
677 # @param dirName string
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
678 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
679 def createPathFiles( lPathList, pattern, dirName="" ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
680 nbFiles = len(lPathList)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
681 countFile = 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
682 if dirName != "" :
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
683 if dirName[-1] != "/":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
684 dirName = dirName + '/'
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
685 os.mkdir( dirName )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
686
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
687 for lPath in lPathList:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
688 fileName = dirName + pattern + "_%s.path" % ( str(countFile).zfill( len(str(nbFiles)) ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
689 PathUtils.writeListInFile( lPath, fileName )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
690 countFile += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
691
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
692 createPathFiles = staticmethod( createPathFiles )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
693
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
694
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
695 ## Return a list of Path instances sorted in increasing order according to the min, then the inverse of the query length, and finally their initial order
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
696 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
697 # @param lPaths: list of Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
698 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
699 def getPathListSortedByIncreasingQueryMinThenInvQueryLength( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
700 return sorted( lPaths, key=lambda iPath: ( iPath.getQueryMin(), 1 / float(iPath.getLengthOnQuery()) ) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
701
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
702 getPathListSortedByIncreasingQueryMinThenInvQueryLength = staticmethod( getPathListSortedByIncreasingQueryMinThenInvQueryLength )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
703
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
704
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
705 ## Merge all overlapping Path instances in a list without considering the identifiers
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
706 # Start by sorting the Path instances by their increasing min coordinate
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
707 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
708 # @return: a new list with the merged Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
709 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
710 def mergePathsInList( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
711 lMergedPaths = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
712 if len(lPaths)==0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
713 return lMergedPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
714
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
715 lSortedPaths = PathUtils.getPathListSortedByIncreasingQueryMinThenInvQueryLength( lPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
716
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
717 prev_count = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
718 for iPath in lSortedPaths[0:]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
719 if prev_count != len(lSortedPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
720 for i in lSortedPaths[ prev_count + 1: ]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
721 if iPath.isOverlapping( i ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
722 iPath.merge( i )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
723 isAlreadyInList = False
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
724 for newPath in lMergedPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
725 if newPath.isOverlapping( iPath ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
726 isAlreadyInList = True
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
727 newPath.merge( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
728 lMergedPaths [ lMergedPaths.index( newPath ) ] = newPath
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
729 if not isAlreadyInList:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
730 lMergedPaths.append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
731 prev_count += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
732 return lMergedPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
733
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
734 mergePathsInList = staticmethod( mergePathsInList )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
735
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
736
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
737 ## Merge all overlapping Path instances in a list without considering if subjects are overlapping.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
738 # Start by sorting the Path instances by their increasing min coordinate.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
739 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
740 # @return: a new list with the merged Path instances
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
741 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
742 def mergePathsInListUsingQueryCoordsOnly( lPaths ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
743 lMergedPaths = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
744 if len(lPaths)==0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
745 return lMergedPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
746
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
747 lSortedPaths = PathUtils.getPathListSortedByIncreasingQueryMinThenInvQueryLength( lPaths )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
748
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
749 prev_count = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
750 for iPath in lSortedPaths[0:]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
751 if prev_count != len(lSortedPaths):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
752 for i in lSortedPaths[ prev_count + 1: ]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
753 if iPath.isQueryOverlapping( i ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
754 iPath.merge( i )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
755 isAlreadyInList = False
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
756 for newPath in lMergedPaths:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
757 if newPath.isQueryOverlapping( iPath ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
758 isAlreadyInList = True
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
759 newPath.merge( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
760 lMergedPaths [ lMergedPaths.index( newPath ) ] = newPath
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
761 if not isAlreadyInList:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
762 lMergedPaths.append( iPath )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
763 prev_count += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
764 return lMergedPaths
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
765
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
766 mergePathsInListUsingQueryCoordsOnly = staticmethod( mergePathsInListUsingQueryCoordsOnly )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
767
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
768
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
769 ## Convert a Path file into a GFF file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
770 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
771 # @param pathFile: name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
772 # @param gffFile: name of the output GFF file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
773 # @param source: source to write in the GFF file (column 2)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
774 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
775 # @note the 'path' query is supposed to correspond to the 'gff' first column
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
776 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
777 def convertPathFileIntoGffFile( pathFile, gffFile, source="REPET", verbose=0 ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
778 dId2PathList = PathUtils.getDictOfListsWithIdAsKeyFromFile( pathFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
779 if verbose > 0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
780 msg = "number of chains: %i" % ( len(dId2PathList.keys()) )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
781 sys.stdout.write( "%s\n" % msg )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
782 sys.stdout.flush()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
783 gffFileHandler = open( gffFile, "w" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
784 for id in dId2PathList.keys():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
785 if len( dId2PathList[ id ] ) == 1:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
786 iPath = dId2PathList[ id ][0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
787 string = iPath.toStringAsGff( ID="%i" % iPath.getIdentifier(),
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
788 source=source )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
789 gffFileHandler.write( "%s\n" % string )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
790 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
791 iPathrange = PathUtils.convertPathListToPathrange( dId2PathList[ id ] )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
792 string = iPathrange.toStringAsGff( ID="ms%i" % iPathrange.getIdentifier(),
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
793 source=source )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
794 gffFileHandler.write( "%s\n" % string )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
795 count = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
796 for iPath in dId2PathList[ id ]:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
797 count += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
798 string = iPath.toStringAsGff( type="match_part",
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
799 ID="mp%i-%i" % ( iPath.getIdentifier(), count ),
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
800 Parent="ms%i" % iPathrange.getIdentifier(),
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
801 source=source )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
802 gffFileHandler.write( "%s\n" % string )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
803 gffFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
804
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
805 convertPathFileIntoGffFile = staticmethod( convertPathFileIntoGffFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
806
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
807
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
808 ## Convert a Path file into a Set file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
809 # replace old parser.pathrange2set
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
810 # @param pathFile: name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
811 # @param setFile: name of the output Set file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
812 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
813 def convertPathFileIntoSetFile( pathFile, setFile ):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
814 pathFileHandler = open( pathFile, "r" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
815 setFileHandler = open( setFile, "w" )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
816 iPath = Path()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
817 while True:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
818 line = pathFileHandler.readline()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
819 if line == "":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
820 break
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
821 iPath.setFromString( line )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
822 iSet = iPath.getSubjectAsSetOfQuery()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
823 iSet.write( setFileHandler )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
824 pathFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
825 setFileHandler.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
826
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
827 convertPathFileIntoSetFile = staticmethod( convertPathFileIntoSetFile )
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
828
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
829 ## Write Path File without duplicated Path (same query, same subject and same coordinate)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
830 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
831 # @param inputFile: name of the input Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
832 # @param outputFile: name of the output Path file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
833 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
834 def removeInPathFileDuplicatedPathOnQueryNameQueryCoordAndSubjectName(inputFile, outputFile):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
835 f = open(inputFile, "r")
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
836 line = f.readline()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
837 previousQuery = ""
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
838 previousSubject = ""
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
839 lPaths = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
840 while line:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
841 iPath = Path()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
842 iPath.setFromString(line)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
843 query = iPath.getQueryName()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
844 subject = iPath.getSubjectName()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
845 if (query != previousQuery or subject != previousSubject) and lPaths != []:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
846 lPathsWithoutDuplicate = PathUtils.getPathListWithoutDuplicatesOnQueryCoord(lPaths)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
847 PathUtils.writeListInFile(lPathsWithoutDuplicate, outputFile, "a")
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
848 lPaths = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
849 lPaths.append(iPath)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
850 previousQuery = query
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
851 previousSubject = subject
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
852 line = f.readline()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
853 lPathsWithoutDuplicate = PathUtils.getPathListWithoutDuplicatesOnQueryCoord(lPaths)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
854 PathUtils.writeListInFile(lPathsWithoutDuplicate, outputFile, "a")
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
855 f.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
856 removeInPathFileDuplicatedPathOnQueryNameQueryCoordAndSubjectName = staticmethod(removeInPathFileDuplicatedPathOnQueryNameQueryCoordAndSubjectName)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
857
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
858