view VCFStorage_wrapper.xml @ 7:a6b557df86db draft

Uploaded
author urgi-team
date Tue, 15 Dec 2015 05:35:36 -0500
parents
children
line wrap: on
line source

<tool id="VCFStorage" name="VCFStorage" version="0.01">
  <description> stores info from variant calling into a table. It will create a tabulate filed with SNP infos</description>
  <requirements>
    <requirement type="package" version="1.0">VCF_Gandalf_Tools</requirement>
  </requirements>
  <version_command>
  VCFStorage.py --version
  </version_command>
  <command interpreter="python">
    VCFStorage_wrapper.py -f $inputFasta -o $outputVCFStorage 
    ## genome list
    #for $VCF in $VCFFile
     -l $VCF.strainName
     -L $VCF.inputStrainVCF
    #end for
  </command>
  <inputs>
    <param name="inputFasta" type="data" format="fasta" label="Input genome sequence file name (fasta)"/>
    <repeat name="VCFFile" title="VCF list" min="1">
        <param name="strainName" size="20" type="text" value="V1" label="strain name (no space allowed)"/>
        <param name="inputStrainVCF" type="data" format="vcf" label="Select VCF file "/>
    </repeat>
  </inputs>
  <outputs>
    <data format="tabular" name="outputVCFStorage" label="${tool.name} on ${on_string} (tabular)"/>
  </outputs>
  <tests>
    <test>
        <param name="inputFasta" ftype="fasta" value="chr17.fa" />
        <param name="strainName" value="V1"/>
        <param name="inputStrainVCF" ftype="vcf" value="chr17.VCF"/>
        <output name="outputVCFStorage" ftype="tabular" file="Expchr17.tab"/>
    </test>
  </tests>
  <help><![CDATA[
    **stores info from variant calling into a table. It will create a tabulate filed with SNP infos**
    
-----

**what it does :**

VCFStorage.py is a python script that allows to store data from multiple VCF into a single tabular marker file. each VCF will be a new column on the final output.

-----

**input format :**

Multiple files are necessary as input : 

 - the fasta file of your genomic sequence
 - multiple VCF files (1 per strain). It is strongly advised to use the column filter (col 7) for filtered positions instead of removing the lines from the VCF. 

-----

**ouput format :**

the result is a tab delimited format file  where all genomic positions are in rows, and all strains are in columns (in the order you gave the VCF)

For each position and each genome, a code is attributed : 

- for the reference : ::

    A,T,G,C for the corresponding nucleotidic acid 

- for the genomes : ::

    U if the position was not refered in the VCF file 
    R if the base is similar to the reference 
    F if the base has been filtered in the column FILTER (column 7) of the VCF 
    A,T,G,C if the genome has a validated SNP at the position


-----

**example :**

fasta input file (genomic sequence): ::
    
    >chr_17
    ccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaa
    TACGCGCGCGCCTAACCCTACGACTTTAACCTACTCTAAACTCTCCTACTAGTACGTCTT
    
VCF input file : ::

    ##fileformat=VCFv4.1
    ##fileDate=20140725
    ##source=freeBayes v0.9.13-2-ga830efd
    ##reference=exmple.fsa
    ##phasing=none
    ##DetectedFormat=freebayes
    ##FILTER=<ID=G_AN,Description="The SNP has been filtered ; out of AN range(over 2)">
    ##FILTER=<ID=G_AF,Description="The SNP has been filtered ; out of AF range(under 0.9)">
    ##FILTER=<ID=G_DP,Description="The SNP has been filtered ; out of DP range(15 - 35)">
    ##FILTER=<ID=InDel,Description="The SNP has been filtered ; InDel detected">
    ##FILTER=<ID=Nmatch,Description="The SNP has been filtered ; reference base detected : N">
    #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    V1
    chr_17    17    .    A    G    529.213    G_AF;G_DP    AB=0.583333;ABP=5.18177;AC=1;AF=0.5;AN=2;AO=21;CIGAR=1X;DP=36;DPB=36;DPRA=0;EPP=3.1137;EPPR=3.15506;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=77.012;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=751;QR=535;RO=15;RPP=5.59539;RPPR=4.31318;RUN=1;SAF=11;SAP=3.1137;SAR=10;SRF=5;SRP=6.62942;SRR=10;TYPE=snp;technology.illumina=1;G_AN=2;G_AF=0.58;G_DP=36;G_Base=G    GT:DP:RO:QR:AO:QA:GL    0/1:36:15:535:21:751:-10,0,-10
    chr_17    37    .    C    G    1082.38    .    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=34;CIGAR=1X;DP=34;DPB=34;DPRA=0;EPP=3.26577;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=48.0391;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=1243;QR=0;RO=0;RPP=15.5282;RPPR=0;RUN=1;SAF=18;SAP=3.26577;SAR=16;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.illumina=1;G_AN=2;G_AF=1.00;G_DP=34;G_Base=G    GT:DP:RO:QR:AO:QA:GL    1/1:34:0:0:34:1243:-10,-9.23017,0
    chr_17    40    .    T    T    825.518    G_AF    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=29;CIGAR=1X;DP=34;DPB=34;DPRA=0;EPP=6.67934;EPPR=13.8677;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=8.92992;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=1082;QR=178;RO=5;RPP=9.07545;RPPR=13.8677;RUN=1;SAF=13;SAP=3.68421;SAR=16;SRF=5;SRP=13.8677;SRR=0;TYPE=snp;technology.illumina=1;G_AN=2;G_AF=0.85;G_DP=34;G_Base=T    GT:DP:RO:QR:AO:QA:GL    1/1:34:5:178:29:1082:-10,0,-6.82575
    chr_17    60    .    A    .    699.741    .    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=22;CIGAR=1X;DP=22;DPB=22;DPRA=0;EPP=17.2236;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=32.2544;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=823;QR=0;RO=0;RPP=9.32731;RPPR=0;RUN=1;SAF=12;SAP=3.40511;SAR=10;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.illumina=1;G_AN=2;G_AF=1.00;G_DP=22;G_Base=G    GT:DP:RO:QR:AO:QA:GL    1/1:22:0:0:22:823:-10,-5.98732,0
    chr_17    73    .    T    .    846.299    .    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=27;CIGAR=1X;DP=27;DPB=27;DPRA=0;EPP=16.6021;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=38.84;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=1002;QR=0;RO=0;RPP=5.02092;RPPR=0;RUN=1;SAF=21;SAP=21.1059;SAR=6;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.illumina=1;G_AN=2;G_AF=1.00;G_DP=27;G_Base=T    GT:DP:RO:QR:AO:QA:GL    1/1:27:0:0:27:1002:-10,-7.34226,0
    chr_17    81    .    C    T    764.464    .    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=25;CIGAR=1X;DP=25;DPB=25;DPRA=0;EPP=13.5202;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=36.1324;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=902;QR=0;RO=0;RPP=3.79203;RPPR=0;RUN=1;SAF=19;SAP=17.6895;SAR=6;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.illumina=1;G_AN=2;G_AF=1.00;G_DP=25;G_Base=T    GT:DP:RO:QR:AO:QA:GL    1/1:25:0:0:25:902:-10,-6.76842,0
    chr_17    105    .    C    T    1154    G_DP    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=37;CIGAR=1X;DP=37;DPB=37;DPRA=0;EPP=5.88603;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=52.0047;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=1336;QR=0;RO=0;RPP=19.9713;RPPR=0;RUN=1;SAF=23;SAP=7.76406;SAR=14;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.illumina=1;G_AN=2;G_AF=1.00;G_DP=37;G_Base=T    GT:DP:RO:QR:AO:QA:GL    1/1:37:0:0:37:1336:-10,-10,0
    chr_17    112    .    G    A    1276.25    G_DP    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=40;CIGAR=1X;DP=40;DPB=40;DPRA=0;EPP=10.8276;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=55.9501;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=1471;QR=0;RO=0;RPP=10.8276;RPPR=0;RUN=1;SAF=26;SAP=10.8276;SAR=14;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.illumina=1;G_AN=2;G_AF=1.00;G_DP=40;G_Base=A    GT:DP:RO:QR:AO:QA:GL    1/1:40:0:0:40:1471:-10,-10,0

expected result : ::

    CHROM    POS    reference    V1
    chr_17    1    C    U
    chr_17    2    C    U
    chr_17    3    C    U
    chr_17    4    T    U
    chr_17    5    A    U
    chr_17    6    A    U
    chr_17    7    C    U
    chr_17    8    C    U
    chr_17    9    C    U
    chr_17    10    T    U
    chr_17    11    A    U
    chr_17    12    A    U
    chr_17    13    C    U
    chr_17    14    C    U
    chr_17    15    C    U
    chr_17    16    T    U
    chr_17    17    A    F
    chr_17    18    A    U
    chr_17    19    C    U
    chr_17    20    C    U
    chr_17    21    C    U
    chr_17    22    T    U
    chr_17    23    A    U
    chr_17    24    A    U
    chr_17    25    C    U
    chr_17    26    C    U
    chr_17    27    C    U
    chr_17    28    T    U
    chr_17    29    A    U
    chr_17    30    A    U
    chr_17    31    C    U
    chr_17    32    C    U
    chr_17    33    C    U
    chr_17    34    T    U
    chr_17    35    A    U
    chr_17    36    A    U
    chr_17    37    C    G
    chr_17    38    C    U
    chr_17    39    C    U
    chr_17    40    T    F
    chr_17    41    A    U
    chr_17    42    A    U
    chr_17    43    C    U
    chr_17    44    C    U
    chr_17    45    C    U
    chr_17    46    T    U
    chr_17    47    A    U
    chr_17    48    A    U
    chr_17    49    C    U
    chr_17    50    C    U
    chr_17    51    C    U
    chr_17    52    T    U
    chr_17    53    A    U
    chr_17    54    A    U
    chr_17    55    C    U
    chr_17    56    C    U
    chr_17    57    C    U
    chr_17    58    T    U
    chr_17    59    A    U
    chr_17    60    A    R
    chr_17    61    T    U
    chr_17    62    A    U
    chr_17    63    C    U
    chr_17    64    G    U
    chr_17    65    C    U
    chr_17    66    G    U
    chr_17    67    C    U
    chr_17    68    G    U
    chr_17    69    C    U
    chr_17    70    G    U
    chr_17    71    C    U
    chr_17    72    C    U
    chr_17    73    T    R
    chr_17    74    A    U
    chr_17    75    A    U
    chr_17    76    C    U
    chr_17    77    C    U
    chr_17    78    C    U
    chr_17    79    T    U
    chr_17    80    A    U
    chr_17    81    C    T
    chr_17    82    G    U
    chr_17    83    A    U
    chr_17    84    C    U
    chr_17    85    T    U
    chr_17    86    T    U
    chr_17    87    T    U
    chr_17    88    A    U
    chr_17    89    A    U
    chr_17    90    C    U
    chr_17    91    C    U
    chr_17    92    T    U
    chr_17    93    A    U
    chr_17    94    C    U
    chr_17    95    T    U
    chr_17    96    C    U
    chr_17    97    T    U
    chr_17    98    A    U
    chr_17    99    A    U
    chr_17    100    A    U
    chr_17    101    C    U
    chr_17    102    T    U
    chr_17    103    C    U
    chr_17    104    T    U
    chr_17    105    C    F
    chr_17    106    C    U
    chr_17    107    T    U
    chr_17    108    A    U
    chr_17    109    C    U
    chr_17    110    T    U
    chr_17    111    A    U
    chr_17    112    G    F
    chr_17    113    T    U
    chr_17    114    A    U
    chr_17    115    C    U
    chr_17    116    G    U
    chr_17    117    T    U
    chr_17    118    C    U
    chr_17    119    T    U
    chr_17    120    T    U

-----

**reference :**

]]>
  </help>
</tool>