view sam2interval.xml @ 0:8c737b8ddc45 draft

Uploaded tool tarball.
author devteam
date Mon, 26 Aug 2013 15:12:38 -0400
parents
children 75557c0908a9
line wrap: on
line source

<tool id="sam2interval" name="Convert SAM" version="1.0.1">
  <description>to interval</description>
  <command interpreter="python">sam2interval.py --input_sam_file=$input1 $print_all > $out_file1
  </command>
  <inputs>
    <param format="sam" name="input1" type="data" label="Select dataset to convert"/>
    <param name="print_all" type="select" label="Print all?" help="Do you want to retain original SAM fields? See example below.">
        <option value="-p">Yes</option>
        <option value="">No</option>
    </param>
  </inputs>
 <outputs>
    <data format="interval" name="out_file1" label="Converted Interval" />
  </outputs>
<tests>
    <test>          
        <param name="input1" value="sam_bioinf_example.sam" ftype="sam"/>
        <param name="print_all" value="Yes"/>
        <output name="out_file1" file="sam2interval_printAll.dat" ftype="interval"/>
    </test>
    <test>          
        <param name="input1" value="sam_bioinf_example.sam" ftype="sam"/>
        <param name="print_all" value="No"/>
        <output name="out_file1" file="sam2interval_noprintAll.dat" ftype="interval"/>
    </test>
    <test>
        <param name="input1" value="sam2interval-test3.sam" ftype="sam"/>
        <param name="print_all" value="No"/>
        <output name="out_file1" file="sam2interval_with_unmapped_reads_noprintAll.dat" ftype="interval"/>
    </test>

</tests>
  <help>

**What it does**

Converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

-----

**Example**

Converting the following dataset::

 r001 163 ref  7 30 8M2I4M1D3M = 37  39 TTAGATAAAGGATACTA *
 r002   0 ref  9 30 3S6M1P1I4M *  0   0 AAAAGATAAGGATA    *
 r003   0 ref  9 30       5H6M *  0   0 AGCTAA            * NM:i:1
 r004   0 ref 16 30    6M14N5M *  0   0 ATAGCTTCAGC       *
 r003  16 ref 29 30       6H5M *  0   0 TAGGC             * NM:i:0
 r001  83 ref 37 30         9M =  7 -39 CAGCGCCAT         *

into Interval format will produce the following if *Print all?* is set to **Yes**::

 ref  6 22 + r001 163 ref  7 30 8M2I4M1D3M = 37  39 TTAGATAAAGGATACTA *
 ref  8 19 + r002   0 ref  9 30 3S6M1P1I4M *  0   0 AAAAGATAAGGATA    *
 ref  8 14 + r003   0 ref  9 30 5H6M       *  0   0 AGCTAA            * NM:i:1
 ref 15 40 + r004   0 ref 16 30 6M14N5M    *  0   0 ATAGCTTCAGC       *
 ref 28 33 - r003  16 ref 29 30 6H5M       *  0   0 TAGGC             * NM:i:0
 ref 36 45 - r001  83 ref 37 30 9M         =  7 -39 CAGCGCCAT         *
 
Setting  *Print all?* to **No** will generate the following::

 ref  6 22 + r001
 ref  8 19 + r002
 ref  8 14 + r003
 ref 15 40 + r004
 ref 28 33 - r003
 ref 36 45 - r001


  </help>
</tool>