Mercurial > repos > devteam > fastqc
view rgFastQC.xml @ 13:9337dd1fbc66 draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/fastqc commit 8991687e2ec5f75d3841c613ea5d8ffda0389654
author | iuc |
---|---|
date | Mon, 05 Jun 2017 13:49:57 -0400 |
parents | 484e86282f4b |
children | 2b0c9d9fc6ca |
line wrap: on
line source
<tool id="fastqc" name="FastQC" version="0.69"> <description>Read Quality reports</description> <requirements> <requirement type="package" version="0.11.5">fastqc</requirement> </requirements> <command detect_errors="exit_code"><![CDATA[ #import re #set input_name = re.sub('[^\w\-\s]', '_', str($input_file.element_identifier)) python '$__tool_directory__/rgFastQC.py' -i '$input_file' -d '$html_file.files_path' -o '$html_file' -t '$text_file' -f '$input_file.ext' -j '$input_name' #if $contaminants.dataset and str($contaminants) > '' -c '$contaminants' #end if #if $limits.dataset and str($limits) > '' -l '$limits' #end if ]]></command> <inputs> <param format="fastq,fastq.gz,fastq.bz2,bam,sam" name="input_file" type="data" label="Short read data from your current history" /> <param name="contaminants" type="data" format="tabular" optional="true" label="Contaminant list" help="tab delimited file with 2 columns: name and sequence. For example: Illumina Small RNA RT Primer CAAGCAGAAGACGGCATACGA" /> <param name="limits" type="data" format="txt" optional="true" label="Submodule and Limit specifing file" help="a file that specifies which submodules are to be executed (default=all) and also specifies the thresholds for the each submodules warning parameter" /> </inputs> <outputs> <data format="html" name="html_file" label="${tool.name} on ${on_string}: Webpage" /> <data format="txt" name="text_file" label="${tool.name} on ${on_string}: RawData" /> </outputs> <tests> <test> <param name="input_file" value="1000gsample.fastq" /> <param name="contaminants" value="fastqc_contaminants.txt" ftype="tabular" /> <output name="html_file" file="fastqc_report.html" ftype="html" lines_diff="100"/> <output name="text_file" file="fastqc_data.txt" ftype="txt" lines_diff="100"/> </test> <test> <param name="input_file" value="1000gsample.fastq" /> <param name="limits" value="fastqc_customlimits.txt" ftype="txt" /> <output name="html_file" file="fastqc_report2.html" ftype="html" compare="sim_size" delta="4096"/> <output name="text_file" file="fastqc_data2.txt" ftype="txt" compare="sim_size"/> </test> <test> <param name="input_file" value="1000gsample.fastq.gz" ftype="fastq.gz" /> <param name="contaminants" value="fastqc_contaminants.txt" ftype="tabular" /> <output name="html_file" file="fastqc_report.html" ftype="html" lines_diff="100"/> <output name="text_file" file="fastqc_data.txt" ftype="txt" lines_diff="100"/> </test> <test> <param name="input_file" value="1000gsample.fastq.bz2" ftype="fastq.bz2" /> <param name="contaminants" value="fastqc_contaminants.txt" ftype="tabular" /> <output name="html_file" file="fastqc_report.html" ftype="html" lines_diff="100"/> <output name="text_file" file="fastqc_data.txt" ftype="txt" lines_diff="100"/> </test> </tests> <help> .. class:: infomark **Purpose** FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. The main functions of FastQC are: - Import of data from BAM, SAM or FastQ/FastQ.gz files (any variant), - Providing a quick overview to tell you in which areas there may be problems - Summary graphs and tables to quickly assess your data - Export of results to an HTML based permanent report - Offline operation to allow automated generation of reports without running the interactive application ----- .. class:: infomark **FastQC** This is a Galaxy wrapper. It merely exposes the external package FastQC_ which is documented at FastQC_ Kindly acknowledge it as well as this tool if you use it. FastQC incorporates the Picard-tools_ libraries for SAM/BAM processing. The contaminants file parameter was borrowed from the independently developed fastqcwrapper contributed to the Galaxy Community Tool Shed by J. Johnson. Adaption to version 0.11.2 by T. McGowan. ----- .. class:: infomark **Inputs and outputs** FastQC_ is the best place to look for documentation - it's very good. A summary follows below for those in a tearing hurry. This wrapper will accept a Galaxy fastq, fastq.gz, sam or bam as the input read file to check. It will also take an optional file containing a list of contaminants information, in the form of a tab-delimited file with 2 columns, name and sequence. As another option the tool takes a custom limits.txt file that allows setting the warning thresholds for the different modules and also specifies which modules to include in the output. The tool produces a basic text and a HTML output file that contain all of the results, including the following: - Basic Statistics - Per base sequence quality - Per sequence quality scores - Per base sequence content - Per base GC content - Per sequence GC content - Per base N content - Sequence Length Distribution - Sequence Duplication Levels - Overrepresented sequences - Kmer Content All except Basic Statistics and Overrepresented sequences are plots. .. _FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ .. _Picard-tools: https://broadinstitute.github.io/picard/ </help> <citations> <citation type="bibtex"> @unpublished{andrews_s, author = {Andrews, S.}, keywords = {bioinformatics, ngs, qc}, priority = {2}, title = {{FastQC A Quality Control tool for High Throughput Sequence Data}}, url = {http://www.bioinformatics.babraham.ac.uk/projects/fastqc/} } </citation> </citations> </tool>