36
|
1 <tool id="getLetterDistribution1" name="get letter distribution">
|
|
2 <description>Calculate distribution for each nucleotide per position for all short reads</description>
|
|
3 <command interpreter="python">
|
|
4 WrappGetLetterDistribution.py -i $inputFileName
|
|
5 #if $formatType.FormatInputFileName == 'fasta':
|
|
6 -f fasta
|
|
7 #else :
|
|
8 -f fastq
|
|
9 #end if
|
|
10 -c $ouputFileNameCSV -a $ouputFileNamePNG1 -b $ouputFileNamePNG2
|
|
11 </command>
|
|
12 <inputs>
|
|
13 <conditional name="formatType">
|
|
14 <param name="FormatInputFileName" type="select" label="Input File Format">
|
|
15 <option value="fasta">fasta</option>
|
|
16 <option value="fastq" selected="true">fastq</option>
|
|
17 </param>
|
|
18 <when value="fasta">
|
|
19 <param name="inputFileName" format="fasta" type="data" label="Fasta Input File"/>
|
|
20 </when>
|
|
21 <when value="fastq">
|
|
22 <param name="inputFileName" format="fastq" type="data" label="Fastq Input File"/>
|
|
23 </when>
|
|
24 </conditional>
|
|
25 </inputs>
|
|
26
|
|
27 <outputs>
|
|
28 <data name="ouputFileNameCSV" format="tabular" label="[getLetterDistribution] CSV File"/>
|
|
29 <data name="ouputFileNamePNG1" format="png" label="[getLetterDistribution] PNG File 1"/>
|
|
30 <data name="ouputFileNamePNG2" format="png" label="[getLetterDistribution] PNG File 2"/>
|
|
31 </outputs>
|
|
32 <tests>
|
|
33 <test>
|
|
34 <param name="FormatInputFileName" value="fastq" />
|
|
35 <param name="inputFileName" value="short_fastq.fastq" />
|
|
36 <output name="outputFileNameCSV" file="exp_getletterdistribution_short_fastq.csv" />
|
|
37 </test>
|
|
38 </tests>
|
|
39
|
|
40 <help>
|
|
41 The script gets the nucleotide distribution of the input sequence list. It outputs two files. The first file shows the nucleotide distribution of the data. More precisely, a point (*x*, *y*) on the curve **A** shows that *y* sequences have *x* % of **A**.
|
|
42
|
|
43 The second plot shows the average nucleotide distribution for each position of the read. You can use it to detect a bias in the first nucleotides, for instance. A point *x*, *y* on the curve **A** shows that at the position *x*, there are *y*% of **A**. A point (*x*, *y*) on the curve **#** tells you that *y* % of the sequences contain not less than *x* nucleotides. By definition, this latter line is a decreasing function. It usually explains why the tail of the other curves are sometimes erratic: there are few sequences.
|
|
44 </help>
|
|
45 </tool>
|