Mercurial > repos > yufei-luo > s_mart
annotate smart_toolShed/SMART/Java/Python/CountReadGCPercent.py @ 0:e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
| author | yufei-luo | 
|---|---|
| date | Thu, 17 Jan 2013 10:52:14 -0500 | 
| parents | |
| children | 
| rev | line source | 
|---|---|
| 0 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 1 #!/usr/bin/env python | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 2 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 3 from optparse import OptionParser | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 4 from commons.core.parsing.FastaParser import FastaParser | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 5 from commons.core.writer.Gff3Writer import Gff3Writer | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 6 from SMART.Java.Python.structure.TranscriptContainer import TranscriptContainer | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 7 from SMART.Java.Python.misc.Progress import Progress | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 8 from commons.core.utils.RepetOptionParser import RepetOptionParser | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 9 from Gnome_tools.CountGCPercentBySlidingWindow import CountGCPercentBySlidingWindow | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 10 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 11 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 12 class CountReadGCPercent(object): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 13 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 14 def __init__(self): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 15 self.referenceReader = None | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 16 self.gffReader = None | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 17 self.outputWriter = None | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 18 self.verbose = 0 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 19 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 20 def setInputReferenceFile(self, fileName): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 21 self.referenceReader = fileName | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 22 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 23 def setInputGffFile(self, fileName): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 24 self.gffReader = TranscriptContainer(fileName, 'gff3', self.verbose) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 25 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 26 def setOutputFileName(self, fileName): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 27 self.outputWriter = Gff3Writer(fileName, self.verbose) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 28 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 29 def readGffAnnotation(self): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 30 self.coveredRegions = {} | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 31 progress = Progress(self.gffReader.getNbTranscripts(), "Reading gff3 annotation file", self.verbose) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 32 for transcript in self.gffReader.getIterator(): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 33 chromosome = transcript.getChromosome() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 34 if chromosome not in self.coveredRegions: | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 35 self.coveredRegions[chromosome] = {} | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 36 for exon in transcript.getExons(): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 37 for position in range(exon.getStart(), exon.getEnd()+1): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 38 self.coveredRegions[chromosome][position] = 1 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 39 progress.inc() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 40 progress.done() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 41 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 42 def write(self): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 43 iParser = FastaParser(self.referenceReader) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 44 iParser.setTags() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 45 iGetGCPercentBySW = CountGCPercentBySlidingWindow() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 46 progress = Progress(self.gffReader.getNbTranscripts(), "Writing output file", self.verbose) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 47 for transcript in self.gffReader.getIterator(): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 48 chromosome = transcript.getChromosome() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 49 GCpercent = 0 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 50 nPercent = 0 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 51 for exon in transcript.getExons(): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 52 for sequenceName in iParser.getTags().keys(): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 53 if sequenceName != chromosome: | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 54 continue | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 55 else: | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 56 subSequence = iParser.getSubSequence(sequenceName, exon.getStart() , exon.getEnd(), 1) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 57 GCpercent, nPercent = iGetGCPercentBySW.getGCPercentAccordingToNAndNPercent(subSequence) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 58 print "GCpercent = %f, nPercent = %f" % (GCpercent, nPercent) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 59 transcript.setTagValue("GCpercent", GCpercent) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 60 transcript.setTagValue("NPercent", nPercent) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 61 self.outputWriter.addTranscript(transcript) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 62 progress.inc() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 63 progress.done() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 64 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 65 def run(self): | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 66 self.readGffAnnotation() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 67 if self.outputWriter != None: | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 68 self.write() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 69 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 70 if __name__ == "__main__": | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 71 description = "Count GC percent for each read against a genome." | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 72 usage = "CountReadGCPercent.py -i <fasta file> -j <gff3 file> -o <output gff3 file> -v <verbose> -h]" | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 73 examples = "\nExample: \n" | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 74 examples += "\t$ python CountReadGCPercent.py -i file.fasta -j annotation.gff -o output.gff3" | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 75 examples += "\n\n" | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 76 parser = RepetOptionParser(description = description, usage = usage, version = "v1.0", epilog = examples) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 77 parser.add_option( '-i', '--inputGenome', dest='fastaFile', help='fasta file [compulsory]', default= None ) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 78 parser.add_option( '-j', '--inputAnnotation', dest='gffFile', help='gff3 file [compulsory]', default= None) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 79 parser.add_option( '-o', '--output', dest='outputFile', help='output gff3 file [compulsory]', default= None ) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 80 parser.add_option( '-v', '--verbose', dest='verbose', help='verbosity level (default=0/1)',type="int", default= 0 ) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 81 (options, args) = parser.parse_args() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 82 | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 83 readGCPercent = CountReadGCPercent() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 84 readGCPercent.setInputReferenceFile(options.fastaFile) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 85 readGCPercent.setInputGffFile(options.gffFile) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 86 readGCPercent.setOutputFileName(options.outputFile) | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 87 readGCPercent.run() | 
| 
e0f8dcca02ed
Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
 yufei-luo parents: diff
changeset | 88 | 
