Mercurial > repos > bebatut > format_cd_hit_output

<tool id="format_cd_hit_output" name="Format cd-hit outputs" version="1.0.0+galaxy1">
    <description>to rename representative sequences with cluster name and/or extract distribution inside clusters given a mapping file</description>

    <requirements>
    </requirements>

    <stdio>
        <exit_code range="1:" />
    </stdio>

    <version_command>
    </version_command>

    <command><![CDATA[
        python $__tool_directory__/format_cd_hit_output.py
            --input_cluster_info $input_cluster_info

            #if $rename_repr_seq.rename_repr_seq_test == "true"
                --input_representative_sequences $rename_repr_seq.input_representative_sequences
                --output_representative_sequences $output_representative_sequences
            #end if

            #if $extract_cat_distri.extract_cat_distri_test == "true"
                --input_mapping $extract_cat_distri.input_mapping
                --output_category_distribution $output_category_distribution
                $extract_cat_distri.number_sum
            #end if
    ]]>
    </command>

    <inputs>
        <param type="data" format="txt" name="input_cluster_info" label="Cluster info" help="(--input_cluster_info)"/>
        <conditional name="rename_repr_seq">
            <param name='rename_repr_seq_test' type="select" label="Rename representative sequences with the  corresponding cluster name?" help="">
                <option value="true" selected="true">Yes</option>
                <option value="false">No</option>
            </param>
            <when value="true">
                <param type="data" format="fasta"
                    name="input_representative_sequences"
                    label="Representative sequences" help="--input_representative_sequences)"/>
            </when>
            <when value="false" />
        </conditional>

        <conditional name="extract_cat_distri">
            <param name='extract_cat_distri_test' type="select" label="Extract category distribution of each cluster?" help="">
                <option value="true" selected="true">Yes</option>
                <option value="false">No</option>
            </param>
            <when value="true">
                <param type="data" format="tabular,tsv,csv" name="input_mapping" label="Mapping file" help="The mapping file is a tabular file with 2 columns. First column contains the sequence    names and the second one the corresponding category (--input_mapping)"/>
                <param name='number_sum' type='boolean' checked="true" truevalue="--number_sum 1" falsevalue="" label="Sum sequence number for each category?" help="The alternative is the sum of size for sequences in each category (if the size information is available in sequence name, --number_sum)"/>
            </when>
            <when value="false" />
        </conditional>
    </inputs>

    <outputs>
        <data name="output_representative_sequences" format="fasta"
            label="${tool.name} on ${on_string}: Renamed representative sequences">
            <filter>((rename_repr_seq['rename_repr_seq_test'] == "true"))</filter>
        </data>
        <data name="output_category_distribution" format="tabular"
            label="${tool.name} on ${on_string}: Category distribution">
            <filter>((extract_cat_distri['extract_cat_distri_test'] == "true"))</filter>
        </data>
    </outputs>

    <tests>
        <test>
            <param name="input_cluster_info" value="input_cluster_info.txt"/>
            <param name="rename_repr_seq_test" value="true"/>
            <param name="input_representative_sequences" value="input_representative_sequences.fasta"/>
            <param name="extract_cat_distri_test" value="true"/>
            <param name="input_mapping" value="input_mapping.txt"/>
            <param name="number_sum" value="true"/>
            <output name="output_representative_sequences" file="rename_representative_sequences_output.fasta"/>
            <output name="output_category_distribution">
                <assert_contents>
                    <has_size value="139937" delta="100" />
                    <has_n_lines n="2990"/>
                    <has_text text="amoA_arc_19F-TV1-T0-b"/>
                </assert_contents>
            </output>
        </test>
    </tests>

    <help><![CDATA[

**What it does**

This tool format cd-hit outputs (cluster information and cluster representative sequences) to rename representative sequences with cluster name and/or extract category distribution inside clusters given a mapping file.

The tool takes as input:

 - The cd-hit output file with cluster information
 - The cd-hit output file with representative sequences for each cluster (optional)
 - A mapping file in tabular format with first column being the sequence names (corresponding to the ones in cluster information file) and the second column being the corresponding categories (optional)

The tool generates different outputs given chosen parameters:

 - A file with representative sequences of each cluster named with the cluster name
 - A tabular file with lines corresponding to clusters, columns to categories (and one column with sequence number in the cluster), and cases to number of sequences of the given category in the cluster

]]>
    </help>

    <citations></citations>
</tool>
author	bgruening
date	Wed, 19 Oct 2022 14:42:33 +0000
parents	4015e9d6d277
children