Mercurial > repos > lionelguy > spades
comparison tools/spades_3_0/plot_spades_stats.xml @ 8:ff058438080a draft
Version 0.8, supports SPAdes 3.0.0
author | lionelguy |
---|---|
date | Wed, 05 Feb 2014 05:19:03 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
7:95ddc2380130 | 8:ff058438080a |
---|---|
1 <tool id="plot_spades_stats" name="SPAdes stats" version="0.1"> | |
2 <description>coverage vs. length plot</description> | |
3 <requirements> | |
4 <requirement type="package">R</requirement> | |
5 </requirements> | |
6 <command interpreter="bash">r_wrapper.sh $script_file</command> | |
7 | |
8 <inputs> | |
9 <param name="input_scaffolds" type="data" format="tabular" label="Scaffold stats"/> | |
10 <param name="input_contigs" type="data" format="tabular" label="Contig stats"/> | |
11 <param name="length_co" type="integer" value="1000" min="0" label="Length cut-off" help="Contigs with length under that value are shown in red"/> | |
12 <param name="coverage_co" type="integer" value="10" min="0" label="Coverage cut-off" help="Contigs with length under that value are shown in red"/> | |
13 </inputs> | |
14 <configfiles> | |
15 <configfile name="script_file"> | |
16 ## Setup R error handling to go to stderr | |
17 options( show.error.messages=F, | |
18 error = function () { | |
19 cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) | |
20 } ) | |
21 files = c("${input_contigs}", "${input_scaffolds}") | |
22 types = c("Contigs", "Scaffolds") | |
23 | |
24 ## Start plotting device | |
25 png("${out_file}", w=500, h=1000) | |
26 par(mfrow=c(2,1)) | |
27 | |
28 ## Loop over the two files | |
29 for (i in 1:length(types)){ | |
30 seqs = read.table(files[i], header=FALSE, comment.char="#") | |
31 colnames = c("name", "length", "coverage") | |
32 names(seqs) = colnames | |
33 | |
34 ## Stats over all sequences | |
35 sl_all = sort(seqs\$length, decreasing=TRUE) | |
36 cs_all = cumsum(sl_all) | |
37 s_all = sum(seqs\$length) | |
38 n50_idx_all = which.min(sl_all[cs_all < 0.5*s_all]) | |
39 n90_idx_all = which.min(sl_all[cs_all < 0.9*s_all]) | |
40 n50_all = sl_all[n50_idx_all] | |
41 n90_all = sl_all[n90_idx_all] | |
42 | |
43 ## Filter short seqs, redo stats | |
44 seqs_filt = seqs[seqs\$length >= ${length_co} & seqs\$coverage >= ${coverage_co},] | |
45 if (nrow(seqs_filt) > 0){ | |
46 sl_filt = sort(seqs_filt\$length, decreasing=TRUE) | |
47 cs_filt = cumsum(sl_filt) | |
48 s_filt = sum(seqs_filt\$length) | |
49 n50_idx_filt = which.min(sl_filt[cs_filt < 0.5*s_filt]) | |
50 n90_idx_filt = which.min(sl_filt[cs_filt < 0.9*s_filt]) | |
51 n50_filt = sl_filt[n50_idx_filt] | |
52 n90_filt = sl_filt[n90_idx_filt] | |
53 } | |
54 seqs_bad = seqs[seqs\$length < ${length_co} | seqs\$coverage < ${coverage_co},] | |
55 | |
56 ## Length vs coverage | |
57 plot(length~coverage, data=seqs, log="xy", type="n", main=paste(types[i], ": coverage vs. length", sep=""), xlab="Coverage", ylab="Length") | |
58 if (nrow(seqs_bad) > 0){ | |
59 points(length~coverage, data=seqs_bad, cex=0.5, col="red") | |
60 } | |
61 if (nrow(seqs_filt) > 0){ | |
62 points(length~coverage, data=seqs_filt, cex=0.5, col="black") | |
63 } | |
64 abline(v=${coverage_co}, h=${length_co}, lty=2, col=grey(0.3)) | |
65 legend(x="topleft", legend=c("Before/after filtering", paste(c("N50: ", "N90: ", "Median cov.: "), c(n50_all, n90_all, round(median(seqs\$coverage))), rep("/", 3), c(n50_filt, n90_filt, round(median(seqs_filt\$coverage))), sep="")), cex=0.8) | |
66 } | |
67 dev.off() | |
68 </configfile> | |
69 </configfiles> | |
70 <outputs> | |
71 <data format="png" name="out_file" /> | |
72 </outputs> | |
73 <help> | |
74 **What it does** | |
75 | |
76 Using the output of SPAdes (a pair of fasta file and stat file for each of the contigs and scaffolds), it produces a coverage vs. contig plot. Each dot represent a contig/scaffold. Given a coverage and a length cutoff, sequences that do not meet those criteria are shown in red. Some statistics are also given (N50, N90, median contig/scaffold length) both before and after filtering. | |
77 | |
78 Use the "filter SPAdes output" tool to actually filter sequences. | |
79 </help> | |
80 </tool> |