pal_finder: pal_filter.py annotate

annotate pal_filter.py @ 9:52dbe2089d14 draft default tip

Version 0.02.04.8 (update fastq subsetting).

author	pjbriggs
date	Wed, 04 Jul 2018 06:05:52 -0400
parents	8159dab5dbdb
children

rev	line source
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	1 #!/usr/bin/python -tt
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	2 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	3 pal_filter
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	4 https://github.com/graemefox/pal_filter
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	5
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	6 Graeme Fox - 03/03/2016 - graeme.fox@manchester.ac.uk
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	7 Tested on 64-bit Ubuntu, with Python 2.7
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	8
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	9 ~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	10 PROGRAM DESCRIPTION
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	11
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	12 Program to pick optimum loci from the output of pal_finder_v0.02.04
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	13
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	14 This program can be used to filter output from pal_finder and choose the
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	15 'optimum' loci.
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	16
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	17 For the paper referncing this workflow, see Griffiths et al.
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	18 (unpublished as of 15/02/2016) (sarah.griffiths-5@postgrad.manchester.ac.uk)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	19
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	21 This program also contains a quality-check method to improve the rate of PCR
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	22 success. For this QC method, paired end reads are assembled using
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	23 PANDAseq so you must have PANDAseq installed.
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	24
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	25 For the paper referencing this assembly-QC method see Fox et al.
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	26 (unpublished as of 15/02/2016) (graeme.fox@manchester.ac.uk)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	27
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	28 For best results in PCR for marker development, I suggest enabling all the
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	29 filter options AND the assembly based QC
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	30
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	31 ~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	32 REQUIREMENTS
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	33
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	34 Must have Biopython installed (www.biopython.org).
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	35
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	36 If you with to perform the assembly QC step, you must have:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	37 PandaSeq (https://github.com/neufeld/pandaseq)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	38 PandaSeq must be in your $PATH / able to run from anywhere
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	39
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	40 ~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	41 REQUIRED OPTIONS
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	42
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	43 -i forward_paired_ends.fastQ
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	44 -j reverse_paired_ends.fastQ
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	45 -p pal_finder output - by default pal_finder names this "x_PAL_summary.txt"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	46
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	47 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	48 BY DEFAULT THIS PROGRAM DOES NOTHING. ENABLE SOME OF THE OPTIONS BELOW.
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	49 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	50 NON-REQUIRED OPTIONS
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	51
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	52 -assembly: turn on the pandaseq assembly QC step
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	53
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	54 -primers: filter microsatellite loci to just those which have primers designed
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	55
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	56 -occurrences: filter microsatellite loci to those with primers
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	57 which appear only once in the dataset
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	58
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	59 -rankmotifs: filter microsatellite loci to just those with perfect motifs.
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	60 Rank the output by size of motif (largest first)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	61
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	62 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	63 For repeat analysis, the following extra non-required options may be useful:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	64
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	65 Since PandaSeq Assembly, and fastq -> fasta conversion are slow, do them the
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	66 first time, generate the files and then skip either, or both steps with
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	67 the following:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	68
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	69 -a: skip assembly step
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	70 -c: skip fastq -> fasta conversion step
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	71
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	72 Just make sure to keep the assembled/converted files in the correct directory
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	73 with the correct filename(s)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	74
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	75 ~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	76 EXAMPLE USAGE:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	77
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	78 pal_filtery.py -i R1.fastq -j R2.fastq
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	79 -p pal_finder_output.tabular -primers -occurrences -rankmotifs -assembly
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	80
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	81 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	82 """
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	83 import Bio, subprocess, argparse, csv, os, re, time
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	84 from Bio import SeqIO
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	85 __version__ = "1.0.0"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	86 ############################################################
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	87 # Function List #
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	88 ############################################################
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	89 def ReverseComplement1(seq):
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	90 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	91 take a nucleotide sequence and reverse-complement it
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	92 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	93 seq_dict = {'A':'T','T':'A','G':'C','C':'G'}
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	94 return "".join([seq_dict[base] for base in reversed(seq)])
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	95
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	96 def fastq_to_fasta(input_file, wanted_set):
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	97 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	98 take a file in fastq format, convert to fasta format and filter on
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	99 the set of sequences that we want to keep
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	100 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	101 file_name = os.path.splitext(os.path.basename(input_file))[0]
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	102 with open(file_name + "_filtered.fasta", "w") as out:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	103 for record in SeqIO.parse(input_file, "fastq"):
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	104 ID = str(record.id)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	105 SEQ = str(record.seq)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	106 if ID in wanted_set:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	107 out.write(">" + ID + "\n" + SEQ + "\n")
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	108
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	109 def strip_barcodes(input_file, wanted_set):
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	110 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	111 take fastq data containing sequencing barcodes and strip the barcode
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	112 from each sequence. Filter on the set of sequences that we want to keep
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	113 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	114 file_name = os.path.splitext(os.path.basename(input_file))[0]
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	115 with open(file_name + "_adapters_removed.fasta", "w") as out:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	116 for record in SeqIO.parse(input_file, "fasta"):
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	117 match = re.search(r'\S*:', record.id)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	118 if match:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	119 correct = match.group().rstrip(":")
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	120 else:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	121 correct = str(record.id)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	122 SEQ = str(record.seq)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	123 if correct in wanted_set:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	124 out.write(">" + correct + "\n" + SEQ + "\n")
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	125
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	126 ############################################################
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	127 # MAIN PROGRAM #
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	128 ############################################################
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	129 print "\n~~~~~~~~~~"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	130 print "pal_filter"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	131 print "~~~~~~~~~~"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	132 print "Version: " + __version__
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	133 time.sleep(1)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	134 print "\nFind the optimum loci in your pal_finder output and increase "\
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	135 "the rate of successful microsatellite marker development"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	136 print "\nSee Griffiths et al. (currently unpublished) for more details......"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	137 time.sleep(2)
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	138
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	139 # Get values for all the required and optional arguments
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	140
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	141 parser = argparse.ArgumentParser(description='pal_filter')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	142 parser.add_argument('-i','--input1', help='Forward paired-end fastq file', \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	143 required=True)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	144
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	145 parser.add_argument('-j','--input2', help='Reverse paired-end fastq file', \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	146 required=True)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	147
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	148 parser.add_argument('-p','--pal_finder', help='Output from pal_finder ', \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	149 required=True)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	150
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	151 parser.add_argument('-assembly', help='Perform the PandaSeq based QC', \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	152 action='store_true')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	153
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	154 parser.add_argument('-a','--skip_assembly', help='If the assembly has already \
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	155 been run, skip it with -a', action='store_true')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	156
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	157 parser.add_argument('-c','--skip_conversion', help='If the fastq to fasta \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	158 conversion has already been run, skip it with -c', \
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	159 action='store_true')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	160
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	161 parser.add_argument('-primers', help='Filter \
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	162 pal_finder output to just those loci which have primers \
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	163 designed', action='store_true')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	164
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	165 parser.add_argument('-occurrences', \
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	166 help='Filter pal_finder output to just loci with primers \
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	167 which only occur once in the dataset', action='store_true')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	168
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	169 parser.add_argument('-rankmotifs', \
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	170 help='Filter pal_finder output to just loci which are a \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	171 perfect repeat unit. Also, rank the loci by motif size \
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	172 (largest first)', action='store_true')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	173
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	174 parser.add_argument('-v', '--get_version', help='Print the version number of \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	175 this pal_filter script', action='store_true')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	176
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	177 args = parser.parse_args()
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	178
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	179 if not args.assembly and not args.primers and not args.occurrences \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	180 and not args.rankmotifs:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	181 print "\nNo optional arguments supplied."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	182 print "\nBy default this program does nothing."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	183 print "\nNo files produced and no modifications made."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	184 print "\nFinished.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	185 exit()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	186 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	187 print "\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	188 print "Checking supplied filtering parameters:"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	189 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	190 time.sleep(2)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	191 if args.get_version:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	192 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	193 print "pal_filter version is " + __version__ + " (03/03/2016)"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	194 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	195 if args.primers:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	196 print "-primers flag supplied."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	197 print "Filtering pal_finder output on the \"Primers found (1=y,0=n)\"" \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	198 " column."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	199 print "Only rows where primers have successfully been designed will"\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	200 " pass the filter.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	201 time.sleep(2)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	202 if args.occurrences:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	203 print "-occurrences flag supplied."
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	204 print "Filtering pal_finder output on the \"Occurrences of Forward" \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	205 " Primer in Reads\" and \"Occurrences of Reverse Primer" \
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	206 " in Reads\" columns."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	207 print "Only rows where both primers occur only a single time in the"\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	208 " reads pass the filter.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	209 time.sleep(2)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	210 if args.rankmotifs:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	211 print "-rankmotifs flag supplied."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	212 print "Filtering pal_finder output on the \"Motifs(bases)\" column to" \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	213 " just those with perfect repeats."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	214 print "Only rows containing 'perfect' repeats will pass the filter."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	215 print "Also, ranking output by size of motif (largest first).\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	216 time.sleep(2)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	217
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	218 # index the raw fastq files so that the sequences can be pulled out and
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	219 # added to the filtered output file
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	220 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	221 print "Indexing FastQ files....."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	222 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	223 R1fastq_sequences_index = SeqIO.index(args.input1,'fastq')
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	224 R2fastq_sequences_index = SeqIO.index(args.input2,'fastq')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	225 print "Indexing complete."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	226
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	227 # create a set to hold the filtered output
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	228 wanted_lines = set()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	229
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	230 # get lines from the pal_finder output which meet filter settings
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	231 # read the pal_finder output file into a csv reader
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	232 with open (args.pal_finder) as csvfile_infile:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	233 csv_f = csv.reader(csvfile_infile, delimiter='\t')
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	234 header = csv_f.next()
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	235 header.extend(("R1_Sequence_ID", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	236 "R1_Sequence", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	237 "R2_Sequence_ID", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	238 "R2_Sequence" + "\n"))
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	239 with open( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	240 os.path.splitext(os.path.basename(args.pal_finder))[0] + \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	241 ".filtered", 'w') as csvfile_outfile:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	242 # write the header line for the output file
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	243 csvfile_outfile.write('\t'.join(header))
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	244 for row in csv_f:
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	245 # get the sequence ID
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	246 seq_ID = row[0]
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	247 # get the raw sequence reads and convert to a format that can
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	248 # go into a tsv file
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	249 R1_sequence = R1fastq_sequences_index[seq_ID].format("fasta").\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	250 replace("\n","\t",1).replace("\n","")
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	251 R2_sequence = R2fastq_sequences_index[seq_ID].format("fasta").\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	252 replace("\n","\t",1).replace("\n","")
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	253 seq_info = "\t" + R1_sequence + "\t" + R2_sequence + "\n"
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	254 # navigate through all different combinations of filter options
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	255 # if the primer filter is switched on
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	256 if args.primers:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	257 # check the occurrences of primers field
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	258 if row[5] == "1":
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	259 # if filter occurrences of primers is switched on
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	260 if args.occurrences:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	261 # check the occurrences of primers field
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	262 if (row[15] == "1" and row[16] == "1"):
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	263 # if rank by motif is switched on
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	264 if args.rankmotifs:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	265 # check for perfect motifs
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	266 if row[1].count('(') == 1:
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	267 # all 3 filter switched on
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	268 # write line out to output
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	269 csvfile_outfile.write('\t'.join(row) + \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	270 seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	271 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	272 csvfile_outfile.write('\t'.join(row) + seq_info)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	273 elif args.rankmotifs:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	274 if row[1].count('(') == 1:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	275 csvfile_outfile.write('\t'.join(row) + seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	276 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	277 csvfile_outfile.write('\t'.join(row) + seq_info)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	278 elif args.occurrences:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	279 if (row[15] == "1" and row[16] == "1"):
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	280 if args.rankmotifs:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	281 if row[1].count('(') == 1:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	282 csvfile_outfile.write('\t'.join(row) + seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	283 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	284 csvfile_outfile.write('\t'.join(row) + seq_info)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	285 elif args.rankmotifs:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	286 if row[1].count('(') == 1:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	287 csvfile_outfile.write('\t'.join(row) + seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	288 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	289 csvfile_outfile.write('\t'.join(row) + seq_info)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	290
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	291 # if filter_rank_motifs is active, order the file by the size of the motif
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	292 if args.rankmotifs:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	293 rank_motif = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	294 ranked_list = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	295 # read in the non-ordered file and add every entry to rank_motif list
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	296 with open( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	297 os.path.splitext(os.path.basename(args.pal_finder))[0] + \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	298 ".filtered") as csvfile_ranksize:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	299 csv_rank = csv.reader(csvfile_ranksize, delimiter='\t')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	300 header = csv_rank.next()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	301 for line in csv_rank:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	302 rank_motif.append(line)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	303
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	304 # open the file ready to write the ordered list
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	305 with open( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	306 os.path.splitext(os.path.basename(args.pal_finder))[0] + \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	307 ".filtered", 'w') as rank_outfile:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	308 rankwriter = csv.writer(rank_outfile, delimiter='\t', \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	309 lineterminator='\n')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	310 rankwriter.writerow(header)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	311 count = 2
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	312 while count < 10:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	313 for row in rank_motif:
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	314 # count size of motif
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	315 motif = re.search(r'[ATCG]*',row[1])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	316 if motif:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	317 the_motif = motif.group()
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	318 # rank it and write into ranked_list
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	319 if len(the_motif) == count:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	320 ranked_list.insert(0, row)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	321 count = count + 1
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	322 # write out the ordered list, into the .filtered file
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	323 for row in ranked_list:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	324 rankwriter.writerow(row)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	325
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	326 print "\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	327 print "Checking assembly flags supplied:"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	328 print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	329 if not args.assembly:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	330 print "Assembly flag not supplied. Not performing assembly QC.\n"
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	331 if args.assembly:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	332 print "-assembly flag supplied: Perform PandaSeq assembly quality checks."
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	333 print "See Fox et al. (currently unpublished) for full details on the"\
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	334 " quality-check process.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	335 time.sleep(5)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	336
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	337 # Get readID, F primers, R primers and motifs from filtered pal_finder output
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	338 seqIDs = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	339 motif = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	340 F_primers = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	341 R_primers = []
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	342 with open( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	343 os.path.splitext(os.path.basename(args.pal_finder))[0] + \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	344 ".filtered") as input_csv:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	345 pal_finder_csv = csv.reader(input_csv, delimiter='\t')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	346 header = pal_finder_csv.next()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	347 for row in pal_finder_csv:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	348 seqIDs.append(row[0])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	349 motif.append(row[1])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	350 F_primers.append(row[7])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	351 R_primers.append(row[9])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	352
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	353 # Get a list of just the unique IDs we want
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	354 wanted = set()
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	355 for line in seqIDs:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	356 wanted.add(line)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	357 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	358 Assemble the paired end reads into overlapping contigs using PandaSeq
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	359 (can be skipped with the -a flag if assembly has already been run
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	360 and the appropriate files are in the same directory as the script,
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	361 and named "Assembly.fasta" and "Assembly_adapters_removed.fasta")
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	362
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	363 The first time you riun the script you MUST not enable the -a flag.t
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	364 but you are able to skip the assembly in subsequent analysis using the
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	365 -a flag.
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	366 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	367 if not args.skip_assembly:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	368 pandaseq_command = 'pandaseq -A pear -f ' + args.input1 + ' -r ' + \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	369 args.input2 + ' -o 25 -t 0.95 -w Assembly.fasta'
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	370 subprocess.call(pandaseq_command, shell=True)
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	371 strip_barcodes("Assembly.fasta", wanted)
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	372 print "\nPaired end reads been assembled into overlapping reads."
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	373 print "\nFor future analysis, you can skip this assembly step using" \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	374 " the -a flag, provided that the assembly.fasta file" \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	375 " is intact and in the same location."
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	376 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	377 print "\n(Skipping the assembly step as you provided the -a flag)"
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	378 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	379 Fastq files need to be converted to fasta. The first time you run the script
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	380 you MUST not enable the -c flag, but you are able to skip the conversion
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	381 later using the -c flag. Make sure the fasta files are in the same location
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	382 and do not change the filenames
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	383 """
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	384 if not args.skip_conversion:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	385 fastq_to_fasta(args.input1, wanted)
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	386 fastq_to_fasta(args.input2, wanted)
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	387 print "\nThe input fastq files have been converted to the fasta format."
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	388 print "\nFor any future analysis, you can skip this conversion step" \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	389 " using the -c flag, provided that the fasta files" \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	390 " are intact and in the same location."
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	391 else:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	392 print "\n(Skipping the fastq -> fasta conversion as you provided the" \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	393 " -c flag).\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	394
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	395 # get the files and everything else needed
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	396 # Assembled fasta file
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	397 assembly_file = "Assembly_adapters_removed.fasta"
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	398 # filtered R1 reads
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	399 R1_fasta = os.path.splitext( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	400 os.path.basename(args.input1))[0] + "_filtered.fasta"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	401 # filtered R2 reads
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	402 R2_fasta = os.path.splitext( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	403 os.path.basename(args.input2))[0] + "_filtered.fasta"
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	404 outputfilename = os.path.splitext(os.path.basename(args.input1))[0]
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	405 # parse the files with SeqIO
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	406 assembly_sequences = SeqIO.parse(assembly_file,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	407 R1fasta_sequences = SeqIO.parse(R1_fasta,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	408
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	409 # create some empty lists to hold the ID tags we are interested in
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	410 assembly_IDs = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	411 fasta_IDs = []
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	412
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	413 # populate the above lists with sequence IDs
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	414 for sequence in assembly_sequences:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	415 assembly_IDs.append(sequence.id)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	416 for sequence in R1fasta_sequences:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	417 fasta_IDs.append(sequence.id)
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	418
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	419 # Index the assembly fasta file
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	420 assembly_sequences_index = SeqIO.index(assembly_file,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	421 R1fasta_sequences_index = SeqIO.index(R1_fasta,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	422 R2fasta_sequences_index = SeqIO.index(R2_fasta,'fasta')
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	423
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	424 # prepare the output file
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	425 with open ( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	426 outputfilename + "_pal_filter_assembly_output.txt", 'w') \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	427 as outputfile:
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	428 # write the headers for the output file
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	429 output_header = ("readPairID", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	430 "Forward Primer",\
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	431 "F Primer Position in Assembled Read", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	432 "Reverse Primer", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	433 "R Primer Position in Assembled Read", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	434 "Motifs(bases)", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	435 "Assembled Read ID", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	436 "Assembled Read Sequence", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	437 "Raw Forward Read ID", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	438 "Raw Forward Read Sequence", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	439 "Raw Reverse Read ID", \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	440 "Raw Reverse Read Sequence\n")
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	441 outputfile.write("\t".join(output_header))
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	442
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	443 # cycle through parameters from the pal_finder output
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	444 for x, y, z, a in zip(seqIDs, F_primers, R_primers, motif):
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	445 if str(x) in assembly_IDs:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	446 # get the raw sequences ready to go into the output file
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	447 assembly_seq = (assembly_sequences_index.get_raw(x).decode())
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	448 # fasta entries need to be converted to single line so sit
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	449 # nicely in the output
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	450 assembly_output = assembly_seq.replace("\n","\t").strip('\t')
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	451 R1_fasta_seq = (R1fasta_sequences_index.get_raw(x).decode())
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	452 R1_output = R1_fasta_seq.replace("\n","\t",1).replace("\n","")
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	453 R2_fasta_seq = (R2fasta_sequences_index.get_raw(x).decode())
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	454 R2_output = R2_fasta_seq.replace("\n","\t",1).replace("\n","")
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	455 assembly_no_id = '\n'.join(assembly_seq.split('\n')[1:])
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	456
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	457 # check that both primer sequences can be seen in the
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	458 # assembled contig
5 8159dab5dbdb Bug fix to pal_filter.py. pjbriggs parents: 4 diff changeset	459 if ((y in assembly_no_id) or \
8159dab5dbdb Bug fix to pal_filter.py. pjbriggs parents: 4 diff changeset	460 (ReverseComplement1(y) in assembly_no_id)) and \
8159dab5dbdb Bug fix to pal_filter.py. pjbriggs parents: 4 diff changeset	461 ((z in assembly_no_id) or \
8159dab5dbdb Bug fix to pal_filter.py. pjbriggs parents: 4 diff changeset	462 (ReverseComplement1(z) in assembly_no_id)):
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	463 if y in assembly_no_id:
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	464 # get the positions of the primers in the assembly
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	465 # (can be used to predict fragment length)
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	466 F_position = assembly_no_id.index(y)+len(y)+1
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	467 if ReverseComplement1(y) in assembly_no_id:
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	468 F_position = assembly_no_id.index( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	469 ReverseComplement1(y))+len(ReverseComplement1(y))+1
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	470 if z in assembly_no_id:
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	471 R_position = assembly_no_id.index(z)+1
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	472 if ReverseComplement1(z) in assembly_no_id:
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	473 R_position = assembly_no_id.index( \
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	474 ReverseComplement1(z))+1
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	475 output = (str(x),
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	476 str(y),
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	477 str(F_position),
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	478 str(z),
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	479 str(R_position),
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	480 str(a),
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	481 str(assembly_output),
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	482 str(R1_output),
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	483 str(R2_output + "\n"))
cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	484 outputfile.write("\t".join(output))
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	485 print "\nPANDAseq quality check complete."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	486 print "Results from PANDAseq quality check (and filtering, if any" \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	487 " any filters enabled) written to output file" \
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	488 " ending \"_pal_filter_assembly_output.txt\".\n"
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	489 print "Filtering of pal_finder results complete."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	490 print "Filtered results written to output file ending \".filtered\"."
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	491 print "\nFinished\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	492 else:
4 cb56cc1d5c39 Updates to the palfilter.py utility. pjbriggs parents: 3 diff changeset	493 if args.skip_assembly or args.skip_conversion:
3 e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	494 print "\nERROR: You cannot supply the -a flag or the -c flag without \
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	495 also supplying the -assembly flag.\n"
e1a14ed7a9d6 Updated to version 0.02.04.4 (new pal_filter script) pjbriggs parents: diff changeset	496 print "\nProgram Finished\n"

Mercurial > repos > pjbriggs > pal_finder

annotate pal_filter.py @ 9:52dbe2089d14 draft default tip