1
|
1 <tool name="FastQC: Comprehensive QC" id="fastqc" version="0.53">
|
|
2 <description>reporting for short read sequence</description>
|
|
3 <command interpreter="python">
|
|
4 rgFastQC.py -i "$input_file" -d "$html_file.files_path" -o "$html_file" -n "$out_prefix" -f "$input_file.ext" -j "$input_file.name"
|
|
5 #if $contaminants.dataset and str($contaminants) > ''
|
|
6 -c "$contaminants"
|
|
7 #end if
|
|
8 -e fastqc
|
|
9 </command>
|
|
10 <requirements>
|
|
11 <requirement type="package" version="0.10.1">fastqc_dist_0_10_1</requirement>
|
|
12 </requirements>
|
|
13 <inputs>
|
|
14 <param format="fastqsanger,fastq,bam,sam" name="input_file" type="data" label="Short read data from your current history" />
|
|
15 <param name="out_prefix" value="FastQC" type="text" label="Title for the output file - to remind you what the job was for" size="80"
|
|
16 help="Letters and numbers only please - other characters will be removed">
|
|
17 <sanitizer invalid_char="">
|
|
18 <valid initial="string.letters,string.digits"/>
|
|
19 </sanitizer>
|
|
20 </param>
|
|
21 <param name="contaminants" type="data" format="tabular" optional="true" label="Contaminant list"
|
|
22 help="tab delimited file with 2 columns: name and sequence. For example: Illumina Small RNA RT Primer CAAGCAGAAGACGGCATACGA"/>
|
|
23 </inputs>
|
|
24 <outputs>
|
|
25 <data format="html" name="html_file" label="${out_prefix}_${input_file.name}.html" />
|
|
26 </outputs>
|
|
27 <tests>
|
|
28 <test>
|
|
29 <param name="input_file" value="1000gsample.fastq" />
|
|
30 <param name="out_prefix" value="fastqc_out" />
|
|
31 <param name="contaminants" value="fastqc_contaminants.txt" ftype="tabular" />
|
|
32 <output name="html_file" file="fastqc_report.html" ftype="html" lines_diff="100"/>
|
|
33 </test>
|
|
34 </tests>
|
|
35 <help>
|
|
36
|
|
37 .. class:: infomark
|
|
38
|
|
39 **Purpose**
|
|
40 Quote from FastQC_
|
|
41
|
|
42 FastQC aims to provide a simple way to do some quality control checks on raw
|
|
43 sequence data coming from high throughput sequencing pipelines.
|
|
44 It provides a modular set of analyses which you can use to give a quick
|
|
45 impression of whether your data has any problems of
|
|
46 which you should be aware before doing any further analysis.
|
|
47
|
|
48 The main functions of FastQC are:
|
|
49
|
|
50 - Import of data from BAM, SAM or FastQ files (any variant)
|
|
51 - Providing a quick overview to tell you in which areas there may be problems
|
|
52 - Summary graphs and tables to quickly assess your data
|
|
53 - Export of results to an HTML based permanent report
|
|
54 - Offline operation to allow automated generation of reports without running the interactive application
|
|
55
|
|
56 FastQC_ is the best place to look for documentation - it's very good.
|
|
57 Some features of the Galaxy wrapper you are using are described below.
|
|
58
|
|
59 -----
|
|
60
|
|
61 .. class:: infomark
|
|
62
|
|
63 **This Galaxy Tool**
|
|
64 You are using FastQC_ in Galaxy.
|
|
65 This is easy because it has been packaged into a Galaxy tool by the Intergalactic Utilities Commission.
|
|
66 It exposes the external package FastQC_ which is documented at FastQC_
|
|
67 Kindly acknowledge it as well as this tool if you use it.
|
|
68 FastQC incorporates the Picard-tools_ libraries for sam/bam processing.
|
|
69
|
|
70 The contaminants file parameter was borrowed from the independently developed
|
|
71 fastqcwrapper contributed to the Galaxy Community Tool Shed by Jim Johnson.
|
|
72
|
|
73 -----
|
|
74
|
|
75 .. class:: infomark
|
|
76
|
|
77 **Inputs and outputs**
|
|
78
|
|
79 This wrapper will accept a Galaxy fastq, sam or bam as the input read file to check.
|
|
80 It will also take an optional file containing a list of contaminants information, in the form of
|
|
81 a tab-delimited file with 2 columns, name and sequence.
|
|
82
|
|
83 FastQC_ produces a single HTML output file which is slightly adjusted so it looks good in Galaxy that contains all of the results, including the following:
|
|
84
|
|
85 - Basic Statistics
|
|
86 - Per base sequence quality
|
|
87 - Per sequence quality scores
|
|
88 - Per base sequence content
|
|
89 - Per base GC content
|
|
90 - Per sequence GC content
|
|
91 - Per base N content
|
|
92 - Sequence Length Distribution
|
|
93 - Sequence Duplication Levels
|
|
94 - Overrepresented sequences
|
|
95 - Kmer Content
|
|
96
|
|
97 All except Basic Statistics and Overrepresented sequences are plots.
|
|
98 .. _FastQC: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
|
|
99 .. _Picard-tools: http://picard.sourceforge.net/index.shtml
|
|
100
|
|
101 </help>
|
|
102 </tool>
|