Mercurial > repos > iuc > featurecounts

<tool id="featurecounts" name="featureCounts" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.09">
    <description>Measure gene expression in RNA-Seq experiments from SAM or BAM files</description>
    <macros>
        <token name="@TOOL_VERSION@">2.0.3</token>
        <token name="@VERSION_SUFFIX@">2</token>

        <macro name="conditional_gff_opions">
            <param name="gff_feature_type" type="text" value="exon" argument="-t" label="GFF feature type filter" help="Specify the feature type. Only rows which have the matched matched feature type in the provided GTF annotation file will be included for read counting. `exon' by default."/>
            <param name="gff_feature_attribute" type="text" value="gene_id" argument="-g" label="GFF gene identifier" help="Specify the attribute type used to group features (eg. exons) into meta-features (eg. genes), when GTF annotation is provided. `gene_id' by default. This attribute type is usually the gene identifier. This argument is useful for the meta-feature level summarization. Ex: if the 9th column is 'gene_id &quot;ENSG00000223972&quot;; gene_name &quot;DDX11L1&quot; gene_source &quot;havana&quot;' (GTF) or 'gene_id=ENSG00000223972; gene_name=DDX11L1; gene_source=havana' (GFF), the available attributes for this feature are 'gene_id', 'gene_name' and 'gene_source'."/>
            <param name="summarization_level" type="boolean" truevalue=" -f" falsevalue="" argument="-f" label="On feature level" help="If specified, read summarization will be performed at the feature level. By default (-f is not specified), the read summarization is performed at the meta-feature level."/>
        </macro>
    </macros>
    <xrefs>
        <xref type="bio.tools">featurecounts</xref>
    </xrefs>
    <requirements>
        <requirement type="package" version="@TOOL_VERSION@">subread</requirement>
        <requirement type="package" version="1.16.1">samtools</requirement>
    </requirements>

    <version_command>featureCounts -v 2&gt;&amp;1 | grep .</version_command>
    <command detect_errors="exit_code"><![CDATA[
        ## Export fc path for its built-in annotation
        export FC_PATH=\$(command -v featureCounts | sed 's@/bin/featureCounts$@@') &&

        ## Check whether all alignments are from the same type (bam || sam)
        featureCounts

            #if $anno.anno_select=="history":
                -a '$anno.reference_gene_sets'
                -F "GTF"
            #elif $anno.anno_select=="cached":
                -a '$anno.reference_gene_sets_cached.fields.path'
                -F "GTF"
            #elif $anno.anno_select=="builtin":
                -a \${FC_PATH}/annotation/${anno.bgenome}_RefSeq_exon.txt
                -F "SAF"
            #end if

            -o "output"
            -T \${GALAXY_SLOTS:-2}

            -s  $strand_specificity

            -Q  $read_filtering_parameters.mapping_quality
            $read_filtering_parameters.splitonly
            $read_filtering_parameters.primary
            $read_filtering_parameters.ignore_dup

            #if $anno.anno_select!="builtin":
                -t '$anno.gff_feature_type'
                -g '$anno.gff_feature_attribute'
                    $anno.summarization_level
            #end if

                $extended_parameters.multifeatures.multifeat
                #if $extended_parameters.multifeatures.multifeat != "":
                    $extended_parameters.multifeatures.fraction
                #end if

                $extended_parameters.exon_exon_junction_read_counting_enabled.count_exon_exon_junction_reads
                #if str($extended_parameters.exon_exon_junction_read_counting_enabled.count_exon_exon_junction_reads) == "-J":
                    #if $extended_parameters.exon_exon_junction_read_counting_enabled.genome:
                        -G '$extended_parameters.exon_exon_junction_read_counting_enabled.genome'
                    #end if
                #end if

                $extended_parameters.long_reads

                $extended_parameters.by_read_group

                $extended_parameters.largest_overlap
            --minOverlap  $extended_parameters.min_overlap
            --fracOverlap $extended_parameters.frac_overlap
            --fracOverlapFeature $extended_parameters.frac_overlap_feature
                $extended_parameters.read_reduction
                #if $extended_parameters.R:
                    $extended_parameters.R
                #end if
                #if str($extended_parameters.read_extension_5p) != "0":
                    --readExtension5 $extended_parameters.read_extension_5p
                #end if

                #if str($extended_parameters.read_extension_3p) != "0":
                    --readExtension3 $extended_parameters.read_extension_3p
                #end if

                #if str($pe_parameters.paired_end_status) != "single_end":
                    -p
                    $pe_parameters.only_both_ends
                    $pe_parameters.exclude_chimerics
                    #if str($pe_parameters.paired_end_status) == "PE_fragments":
                        --countReadPairs
                        #if $pe_parameters.check_distance_enabled.checkFragLength == "true":
                            --checkFragLength
                            --minFragLength $pe_parameters.check_distance_enabled.minimum_fragment_length
                            --maxFragLength $pe_parameters.check_distance_enabled.maximum_fragment_length
                        #end if
                    #end if
                #end if

        '${alignment}'

        ## Remove comment and add sample name to header
        && grep -v "^#" "output"
        | sed -e 's|${alignment}|${alignment.element_identifier}|g'
        > body.txt
        ## Set the right columns for the tabular formats
        #if $format.value == "tabdel_medium":
            && cut -f 1,7 body.txt > expression_matrix.txt

            ## Paste doesn't allow a non ordered list of columns: -f 1,7,8,6 will only return columns 1,7 and 8
            ## Thus the gene length column (last column) has to be added separately
            && cut -f 6 body.txt > gene_lengths.txt
            && paste expression_matrix.txt gene_lengths.txt > expression_matrix.txt.bak
            && mv -f expression_matrix.txt.bak '${output_medium}'
        #elif $format.value == "tabdel_short":
            && cut -f 1,7 body.txt > '${output_short}'
        #else:
            && cp body.txt '${output_full}'
        #end if

        #if str($include_feature_length_file) == "true":
            && cut -f 1,6 body.txt > '${output_feature_lengths}'
        #end if

        #if str($extended_parameters.exon_exon_junction_read_counting_enabled.count_exon_exon_junction_reads) == "-J":
            && sed -e 's|${alignment}|${alignment.element_identifier}|g' 'output.jcounts' > '${output_jcounts}'
        #end if

        #if $extended_parameters.R:
            && samtools sort --no-PG -o '$output_bam' -@ \${GALAXY_SLOTS:-2} -T "\${TMPDIR:-.}" *.featureCounts.bam
        #end if
        && sed -e 's|${alignment}|${alignment.element_identifier}|g' 'output.summary' > '${output_summary}'
    ]]></command>
    <inputs>
        <param name="alignment"
               type="data"
               multiple="false"
               format="unsorted.bam,bam,sam"
               label="Alignment file"
               help="The input alignment file(s) where the gene expression has to be counted. The file can have a SAM or BAM format; but ALL files must be in the same format. Unless you are using a Gene annotation file from the History, these files must have the database/genome attribute already specified e.g. hg38, not the default: ?">
        </param>

        <param name="strand_specificity"
               type="select"
               label="Specify strand information"
               argument="-s"
               help="Indicate if the data is stranded and if strand-specific read counting should be performed. Strand setting must be the same as the strand settings used to produce the mapped BAM input(s)">
            <option value="0" selected="true">Unstranded</option>
            <option value="1">Stranded (Forward)</option>
            <option value="2">Stranded (Reverse)</option>
        </param>

        <conditional name="anno">
            <param name="anno_select" type="select" label="Gene annotation file">
                <option value="builtin">featureCounts built-in</option>
                <option value="cached" selected="True">locally cached</option>
                <option value="history">A GFF/GTF file in your history</option>
            </param>
            <when value="builtin">
                <param name="bgenome" type="select" label="Select built-in genome" help="Built-in gene annotations for genomes hg38, hg19, mm10 and mm9 are included in featureCounts">
                    <options from_data_table="featurecounts_anno">
                        <filter type="data_meta" key="dbkey" ref="alignment" column="dbkey"/>
                        <filter type="sort_by" column="1"/>
                    </options>
                    <validator type="no_options" message="An built-in annotation file is not available for the genome build associated with the selected input file"/>
                </param>
            </when>
            <when value="cached">
                <param name="reference_gene_sets_cached" type="select" label="Using locally cached annotation" help="If the annotation file you require is not listed here, please contact the Galaxy administrator.">
                    <options from_data_table="gene_sets">
                        <filter type="data_meta" key="dbkey" ref="alignment" column="dbkey"/>
                        <filter type="sort_by" column="2"/>
                    </options>
                    <validator type="no_options" message="A cached annotation file is not available for the genome build associated with the selected input file"/>
                </param>
                <expand macro="conditional_gff_opions"/>
            </when>
            <when value="history">
                <param name="reference_gene_sets"
                       format="gff,gtf,gff3"
                       type="data"
                       label="Gene annotation file"
                       help="The program assumes that the provided annotation file is in GFF/GTF format. Make sure that the gene annotation file corresponds to the same reference genome as used for the alignment.">
                </param>
                <expand macro="conditional_gff_opions"/>
            </when>
        </conditional>

        <param name="format"
               type="select"
               label="Output format"
               help="The output format will be tabular, select the preferred columns here">
            <option value="tabdel_short">Gene-ID "\t" read-count (MultiQC/DESeq2/edgeR/limma-voom compatible)</option>
            <option value="tabdel_medium">Gene-ID "\t" read-count "\t" gene-length</option>
            <option value="tabdel_full">featureCounts 1.4.0+ default (includes regions provided by the GTF file)</option>
        </param>

        <param name="include_feature_length_file"
               type="boolean"
               truevalue="true"
               falsevalue="false"
               checked="false"
               label="Create gene-length file"
               help="Creates a tabular file that contains the effective (nucleotides used for counting reads) length of the feature; might be useful for estimating FPKM/RPKM"/>

        <conditional name="pe_parameters">
            <param name="paired_end_status" type="select" label="Does the input have read pairs?" help="Were the bam files generated by aligning the output of a paired-end sequencing experiment? If yes, the tool can consider 2 reads = 1 read pair as 1 entity to count. Alternatively, you can opt to consider treating the read pairs as 2 individual reads to count seperately.">
                <option value="single_end" selected="True">No, single-end.</option>
                <option value="PE_individual">Yes, paired-end but still count them as if individual reads.</option>
                <option value="PE_fragments">Yes, paired-end and count them as 1 single fragment.</option>
            </param>

            <when value="single_end"/>
            <when value="PE_individual">
                <param name="only_both_ends"
                    type="boolean"
                    truevalue=" -B"
                    falsevalue=""
                    argument="-B"
                    label="Only allow fragments with both reads aligned"
                    help="If specified, only fragments that have both ends successfully aligned will be considered for summarization. This option is only applicable for paired-end reads."/>

                <param name="exclude_chimerics"
                    type="boolean"
                    truevalue=" -C"
                    falsevalue=""
                    argument="-C"
                    checked="true"
                    label="Exclude chimeric fragments"
                    help="If specified, the chimeric fragments (those fragments that have their two ends aligned to different chromosomes) will NOT be included for summarization. This option is only applicable for paired-end read data."/>
            </when>
            <when value="PE_fragments">
                <conditional name="check_distance_enabled">
                    <param argument="--checkFragLength" type="select" label="Check paired-end distance" help="If specified, paired-end distance will be checked when assigning fragments to meta-features or features. This option is only applicable when -p (Count fragments instead of reads) is specified. The distance thresholds can be specified using -d and -D (minimum and maximum fragment/template length) options.">
                        <option value="true">Check the distance between paired reads</option>
                        <option value="false" selected="True">Do not check</option>
                    </param>
                    <when value="true">
                        <param name="minimum_fragment_length"
                            type="integer"
                            value="50"
                            min="0"
                            argument="-d"
                            label="Minimum fragment/template length."/>
                        <param name="maximum_fragment_length"
                            type="integer"
                            value="600"
                            min="1"
                            argument="-D"
                            label="Maximum fragment/template length."/>
                    </when>
                    <when value="false"/>
                </conditional>

                <param name="only_both_ends"
                    type="boolean"
                    truevalue=" -B"
                    falsevalue=""
                    argument="-B"
                    label="Only allow fragments with both reads aligned"
                    help="If specified, only fragments that have both ends successfully aligned will be considered for summarization. This option is only applicable for paired-end reads."/>

                <param name="exclude_chimerics"
                    type="boolean"
                    truevalue=" -C"
                    falsevalue=""
                    argument="-C"
                    checked="true"
                    label="Exclude chimeric fragments"
                    help="If specified, the chimeric fragments (those fragments that have their two ends aligned to different chromosomes) will NOT be included for summarization. This option is only applicable for paired-end read data."/>
            </when>
        </conditional>

        <section name="read_filtering_parameters" title="Read filtering options">
            <param name="mapping_quality"
                   type="integer"
                   value="0"
                   min="0"
                   argument="-Q"
                   label="Minimum mapping quality per read"
                   help="The minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criteria. 0 by default."/>
            <param name="splitonly" type="select" display="radio" label="Filter split alignments" help="Split alignments are alignments with CIGAR string containing 'N', e.g. exon spanning reads in RNASeq.">
                <option value="">No filtering: count split and non-split alignments</option>
                <option value="--splitOnly">Count only split alignments (--splitOnly)</option>
                <option value="--nonSplitOnly">Count only non-split alignments (--nonSplitOnly)</option>
            </param>
            <param type="boolean"
                   truevalue=" --primary"
                   falsevalue=""
                   argument="--primary"
                   label="Only count primary alignments"
                   help="If specified, only primary alignments will be counted. Primary and secondary alignments are identified using bit 0x100 in theFlag field of SAM/BAM files. All primary alignments in a dataset will be counted regardless of whether they are from multi-mapping reads or not ('-M' is ignored)."/>
            <param name="ignore_dup"
                   type="boolean"
                   truevalue=" --ignoreDup"
                   falsevalue=""
                   argument="--ignoreDup"
                   label="Ignore reads marked as duplicate"
                   help="If specified, reads that were marked as duplicates will be ignored. Bit Ox400 in the FLAG field of a SAM/BAM file is used for identifying duplicate reads. In paired end data, the entire read pair will be ignored if at least one end is found to be a duplicate read."/>
        </section>

        <section name="extended_parameters" title="Advanced options">
            <conditional name = "multifeatures">
                <param name="multifeat" type="select" label="Allow reads to map to multiple features" help="Setting -O, -M and --fraction">
                    <option value="" selected="true">Disabled: reads that align to multiple features or overlapping features are excluded</option>
                    <option value="-M">Enabled: multi-mapping reads are included (-M)</option>
                    <option value="-O">Enabled: multi-overlapping features are included (-O)</option>
                    <option value="-O -M">Enabled: both multi-mapping and multi-overlapping features are included (-M -O)</option>
                </param>
                <when value=""/>
                <when value="-M">
                        <param type="boolean"
                            truevalue="--fraction"
                            falsevalue=""
                            argument="--fraction"
                            label="Assign fractions to multi-mapping reads"
                            help="If specified, a fractional count 1/x will be generated for each multi-mapping read, where x is the number of alignments (indicated by 'NH' tag) reported for the read."/>
                </when>
                <when value="-O">
                        <param type="boolean"
                            truevalue="--fraction"
                            falsevalue=""
                            argument="--fraction"
                            label="Assign fractions to multi-overlapping features"
                            help="If specified, a fractional count 1/y will be generated for each multi-overlapping feature, where y is the number of features overlapping with the read."/>
                </when>
                <when value="-O -M">
                        <param type="boolean"
                            truevalue="--fraction"
                            falsevalue=""
                            argument="--fraction"
                            label="Assign fractions to both multi-mapping reads and multi-overlapping features"
                            help="If specified, a fractional count 1/(x*y) will be generated, where x is the number of alignments (indicated by 'NH' tag) and y the number of overlapping features."/>
                </when>
            </conditional>

            <conditional name="exon_exon_junction_read_counting_enabled">
                <param name="count_exon_exon_junction_reads" type="select" argument="-J" label="Exon-exon junctions" help="Junctions are identified from those exon-spanning reads (containing ‘N’ in CIGAR string) in input data. The output result includes names of primary and secondary genes that overlap at least one of the two splice sites of a junction.">
                    <option value="-J">Count reads supporting each exon-exon junction.</option>
                    <option value="" selected="True">Do not count</option>
                </param>
                <when value="-J">
                    <param name="genome" argument="-G" type="data" format="fasta" optional="true"
                           label="Reference sequence file"
                           help="The FASTA-format file that contains the reference sequences used in read mapping can be used to improve read counting for junctions"/>
                </when>
                <when value=""/>
            </conditional>

            <param name="long_reads" argument="-L" type="boolean" truevalue="-L" falsevalue=""
                   label="Long reads"
                   help="If specified, long reads such as Nanopore and PacBio reads will be counted. Long read counting can only run in one thread and only reads (not read-pairs) can be counted."/>

            <param name="by_read_group" argument="--byReadGroup" type="boolean" truevalue="--byReadGroup" falsevalue=""
                  label="Count reads by read group"
                  help="If specified, reads are counted for each read group separately. The 'RG' tag must be present in the input BAM/SAM alignment files."/>

            <param name="largest_overlap"
                   type="boolean"
                   truevalue=" --largestOverlap"
                   falsevalue=""
                   argument="--largestOverlap"
                   label="Largest overlap"
                   help="If specified, reads (or fragments) will be assigned to the target that has the largest number of overlapping bases"/>

            <param name="min_overlap"
                   type="integer"
                   value="1"
                   argument="--minOverlap"
                   label="Minimum bases of overlap"
                   help="Specify the minimum required number of overlapping bases between a read (or a fragment) and a feature. 1 by default. If a negative value is provided, the read will be extended from both ends."/>

            <param name="frac_overlap"
                  type="integer"
                  value="0"
                  min="0"
                  max="1"
                  argument="--fracOverlap"
                  label="Minimum fraction (of read) overlapping a feature"
                  help="Specify the minimum required fraction of overlapping bases between a read (or a fragment) and a feature. Value should be within range [0,1]. 0 by default. Number of overlapping bases is counted from both reads if paired end. Both this option and '--minOverlap' need to be satisfied for read assignment."/>

            <param name="frac_overlap_feature"
                     type="integer"
                     value="0"
                     min="0"
                     max="1"
                     argument="--fracOverlapFeature"
                     label="Minimum fraction (of feature) overlapping a read"
                     help="Specify the minimum required fraction of bases included in a feature overlapping bases between a read (or a read-pair). Value should be within range [0,1]. 0 by default."/>

            <param name="read_extension_5p"
                   type="integer"
                   value="0"
                   min="0"
                   argument="--readExtension5"
                   label="Read 5' extension"
                   help="Reads are extended upstream by ... bases from their 5' end"/>

            <param name="read_extension_3p"
                   type="integer"
                   value="0"
                   min="0"
                   argument="--readExtension3"
                   label="Read 3' extension"
                   help="Reads are extended upstream by ... bases from their 3' end"/>

            <param name="read_reduction"
                   type="select"
                   label="Reduce read to single position"
                   argument="--read2pos"
                   help="The read is reduced to its 5' most base or 3'most base. Read summarization is then performed based on the single base the the read is reduced to.">
                <option value="" selected="true">Leave the read as it is</option>
                <option value="--read2pos 5">Reduce it to the 5' end</option>
                <option value="--read2pos 3">Reduce it to the 3' end</option>
            </param>

            <param type="boolean"
                   truevalue="-R BAM"
                   falsevalue=""
                   argument="-R"
                   label="Annotates the alignment file with 'XS:Z:'-tags to described per read or read-pair the corresponding assigned feature(s)."
                   help=""/>

        </section>
    </inputs>
    <outputs>
        <data format="tabular"
              name="output_medium"
              label="${tool.name} on ${on_string}: Counts (with length)">
            <filter>format == "tabdel_medium"</filter>
            <actions>
                <action name="column_names" type="metadata" default="Geneid,${alignment.element_identifier},Length"/>
            </actions>
        </data>

        <data format="bam"
              name="output_bam"
              label="${tool.name} on ${on_string}: Alignment file">
            <filter>extended_parameters['R']</filter>
        </data>

        <data format="tabular"
              name="output_short"
              label="${tool.name} on ${on_string}: Counts">
            <filter>format == "tabdel_short"</filter>
            <actions>
                <action name="column_names" type="metadata" default="Geneid,${alignment.element_identifier}"/>
            </actions>
        </data>

        <data format="tabular"
              name="output_full"
              label="${tool.name} on ${on_string}: Counts (with location)">
            <filter>format == "tabdel_full"</filter>
            <actions>
                <action name="column_names" type="metadata" default="Geneid,Chr,Start,End,Strand,Length,${alignment.element_identifier}"/>
            </actions>
        </data>

        <data format="tabular"
              name="output_summary"
              label="${tool.name} on ${on_string}: Summary">
            <actions>
                <action name="column_names" type="metadata" default="Status,${alignment.element_identifier}"/>
            </actions>
        </data>

        <data format="tabular"
              name="output_feature_lengths"
              label="${tool.name} on ${on_string}: Feature lengths">
            <filter>include_feature_length_file</filter>
            <actions>
                <action name="column_names" type="metadata" default="Feature,Length"/>
            </actions>
        </data>

        <data name="output_jcounts" format="tabular"
              label="${tool.name} on ${on_string}: Junction counts">
            <filter>extended_parameters['exon_exon_junction_read_counting_enabled']['count_exon_exon_junction_reads']</filter>
            <actions>
                <action name="column_names" type="metadata"
                    default="PrimaryGene,SecondaryGene,Site1_chr,Site1_location,Site1_strand,Site2_chr,Site2_location,Site2_strand,${alignment.element_identifier}"/>
            </actions>
        </data>
    </outputs>
    <tests>
        <test expect_num_outputs="3">
            <param name="alignment" value="featureCounts_input1.bam" ftype="unsorted.bam" dbkey="hg38"/>
            <param name="anno_select" value="history"/>
            <param name="reference_gene_sets" value="featureCounts_guide.gff" ftype="gff" dbkey="hg38"/>
            <param name="format" value="tabdel_medium"/>
            <param name="include_feature_length_file" value="true"/>
            <output name="output_medium" file="output_1_medium.tab">
                <metadata name="column_names" value="Geneid,featureCounts_input1.bam,Length"/>
            </output>
            <output name="output_summary" file="output_1_summary.tab">
                <metadata name="column_names" value="Status,featureCounts_input1.bam"/>
            </output>
        </test>
        <test expect_num_outputs="3">
            <param name="alignment" value="featureCounts_input1.bam" ftype="unsorted.bam" dbkey="hg38"/>
            <param name="anno_select" value="history"/>
            <param name="reference_gene_sets" value="featureCounts_guide.gff" ftype="gff" dbkey="hg38"/>
            <param name="format" value="tabdel_full"/>
            <param name="include_feature_length_file" value="true"/>
            <output name="output_full" file="output_1_full.tab">
                <metadata name="column_names" value="Geneid,Chr,Start,End,Strand,Length,featureCounts_input1.bam"/>
            </output>
            <output name="output_summary" file="output_1_summary.tab">
                <metadata name="column_names" value="Status,featureCounts_input1.bam"/>
            </output>
            <output name="output_feature_lengths" file="output_feature_lengths.tab">
                <metadata name="column_names" value="Feature,Length"/>
            </output>
        </test>
        <test expect_num_outputs="4">
            <param name="alignment" value="featureCounts_input1.bam" ftype="unsorted.bam" dbkey="hg38"/>
            <param name="anno_select" value="history"/>
            <param name="reference_gene_sets" value="featureCounts_guide.gff" ftype="gff" dbkey="hg38"/>
            <param name="format" value="tabdel_short"/>
            <param name="include_feature_length_file" value="true"/>
            <param name="count_exon_exon_junction_reads" value="-J"/>
            <output name="output_short" file="output_1_short.tab">
                <metadata name="column_names" value="Geneid,featureCounts_input1.bam"/>
            </output>
            <output name="output_summary" file="output_1_summary.tab">
                <metadata name="column_names" value="Status,featureCounts_input1.bam"/>
            </output>
            <output name="output_jcounts" file="output_1_jcounts.tab">
                <metadata name="column_names" value="PrimaryGene,SecondaryGene,Site1_chr,Site1_location,Site1_strand,Site2_chr,Site2_location,Site2_strand,featureCounts_input1.bam"/>
            </output>
        </test>
        <!-- Ensure featureCounts built-in annotation works -->
        <test expect_num_outputs="3">
            <param name="alignment" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam" ftype="unsorted.bam" dbkey="hg19"/>
            <param name="anno_select" value="builtin"/>
            <param name="format" value="tabdel_short"/>
            <conditional name="pe_parameters">
                <param name="paired_end_status" value="PE_individual"/>
            </conditional>
            <section name="extended_parameters">
                <param name="R" value="true"/>
            </section>
            <output name="output_short" file="output_builtin_hg19.tab">
                <metadata name="column_names" value="Geneid,pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/>
            </output>
            <output name="output_summary" file="output_summary_builtin_hg19.tab"/>
            <output name="output_bam" file="output.bam" ftype="bam"/>
        </test>
        <!-- Ensure fragment counting works -->
        <test expect_num_outputs="3">
            <param name="alignment" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam" ftype="unsorted.bam" dbkey="hg19"/>
            <param name="anno_select" value="builtin"/>
            <param name="format" value="tabdel_short"/>
            <conditional name="pe_parameters">
                <param name="paired_end_status" value="PE_fragments"/>
            </conditional>
            <section name="extended_parameters">
                <param name="R" value="true"/>
            </section>
            <output name="output_short" file="output_builtin_hg19_fragment.tab">
                <metadata name="column_names" value="Geneid,pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/>
            </output>
            <output name="output_summary" file="output_summary_builtin_hg19_fragments.tab"/>
            <output name="output_bam" file="output_fragments.bam" ftype="bam"/>
        </test>
        <!-- Ensure cached GTFs work -->
        <test expect_num_outputs="3">
            <param name="alignment" value="featureCounts_input1.bam" ftype="unsorted.bam" dbkey="hg38"/>
            <param name="anno_select" value="cached"/>
            <param name="format" value="tabdel_medium"/>
            <param name="include_feature_length_file" value="true"/>
            <output name="output_medium" file="output_1_medium.tab">
                <metadata name="column_names" value="Geneid,featureCounts_input1.bam,Length"/>
            </output>
            <output name="output_summary" file="output_1_summary.tab">
                <metadata name="column_names" value="Status,featureCounts_input1.bam"/>
            </output>
        </test>
        <!-- Ensure BAM output works -->
        <test expect_num_outputs="3">
            <param name="alignment" value="subset.sorted.bam" ftype="bam"/>
            <param name="anno_select" value="history"/>
            <param name="reference_gene_sets" value="small.gtf" ftype="gtf"/>
            <section name="extended_parameters">
                <param name="R" value="true"/>
            </section>
            <output name="output_bam" value="subset.sorted.featurecounts.bam" compare="sim_size"/>
            <output name="output_short" file="subset.sorted.featurecounts_short.tab">
                <metadata name="column_names" value="Geneid,subset.sorted.bam"/>
            </output>
            <output name="output_summary" file="subset.sorted.featurecounts_summary.tab"/>
        </test>
    </tests>

    <help><![CDATA[
featureCounts
#############

Overview
--------
FeatureCounts is a light-weight read counting program written entirely in the C programming language. It can be used to count both gDNA-seq and RNA-seq reads for genomic features in in SAM/BAM files. FeatureCounts is part of the Subread_ package.

Input formats
-------------
Alignments should be provided in either:

 - SAM format, http://samtools.sourceforge.net/samtools.shtml#5
 - BAM format

Annotations for gene regions should be provided in the GFF/GTF format:

 - http://genome.ucsc.edu/FAQ/FAQformat.html#format3
 - http://www.ensembl.org/info/website/upload/gff.html

Alternatively, the featureCounts built-in annotations for genomes hg38, hg19, mm10 and mm9 can be used through selecting the built-in option above. These annotation files are in simplified annotation format (SAF) as shown below. The GeneID column contains Entrez gene identifiers and each entry (row) is taken as a feature (e.g. an exon).

Example - **Built-in annotation format**:

  ======  ====  =======  =======  ======
  GeneID  Chr   Start    End      Strand
  ======  ====  =======  =======  ======
  497097  chr1  3204563  3207049  -
  497098  chr1  3411783  3411982  -
  497099  chr1  3660633  3661579  -
  ======  ====  =======  =======  ======

These annotation files can be found in the `Subread package`_. You can see the version of Subread used by this wrapper in the tool form above under `Options > Requirements`. To create the files, the annotations were downloaded from NCBI RefSeq database and then adapted by merging overlapping exons from the same gene to form a set of disjoint exons for each gene. Genes with the same Entrez gene identifiers were also merged into one gene. See the `Subread User's Guide`_ for more information. Gene names can be obtained for these Entrez identifiers with the Galaxy **annotateMyIDs** tool.

Output format
-------------
FeatureCounts produces a table containing counted reads, per gene, per row. Optionally the last column can be set to be the effective gene-length. These tables are compatible with the DESeq2, edgeR and limma-voom Galaxy wrappers by IUC.

.. _Subread: http://subread.sourceforge.net/
.. _`Subread User's Guide`: https://bioconductor.org/packages/release/bioc/vignettes/Rsubread/inst/doc/SubreadUsersGuide.pdf
.. _`Subread package`: https://sourceforge.net/projects/subread/files/
    ]]></help>
    <citations>
        <citation type="doi">10.1093/bioinformatics/btt656</citation>
    </citations>
</tool>
author	iuc
date	Sat, 09 Sep 2023 21:18:37 +0000
parents	6f66ae7c5f7a
children	1b851f87fbe0