Mercurial > repos > nanettec > classifier
view classifier/classifier.xml @ 0:ef9c2044d86a draft
Uploaded
author | nanettec |
---|---|
date | Fri, 18 Mar 2016 05:15:29 -0400 |
parents | |
children |
line wrap: on
line source
<tool id="classifier5" name="Classify eQTLs" version="5.0.0"> <description> as cis or trans</description> <command interpreter="python"> classifier.py --rscript \$R_SCRIPT_PATH/classifier/eqtl_genes_positions_plot.txt --input1 $input1 --input2 $input2 --input3 $input3 --input4 $input4 --output1 $output1 --output2 $output2 --output3 $output3 --output4 $output4 --output5 $output5 --output6 $output6 --output7 $output7 --output8 $output8 </command> <inputs> <param label="eQTL results file" name="input1" type="data" format="tabular" help="A tabular file with the mapped eQTLs and its associated statistics"></param> <param label="Chr summary file" name="input2" type="data" format="tabular" help="A tabular file with a data summary per chromosome (bp)"></param> <param label="Gene positions file" name="input3" type="data" format="tabular" help="A tabular file with the positions (bp) of each gene"></param> <param label="Lookup table file" name="input4" type="data" format="tabular" help="A tabular file with cM and bp positions for each interval"></param> </inputs> <outputs> <data format="tabular" name="output1" /> <data format="tabular" name="output2" /> <data format="tabular" name="output3" /> <data format="tabular" name="output4" /> <data format="tabular" name="output5" /> <data format="tabular" name="output6" /> <data format="tabular" name="output7" /> <data format="pdf" name="output8" /> </outputs> <requirements> <requirement type="set_environment">R_SCRIPT_PATH</requirement> </requirements> <tests> <test> </test> </tests> <help> **What it does** Calculates the average genetic interval size across all eQTLs. Classifies an eQTL as 'cis' if it maps within half the above mentioned interval size of the gene exhibiting the eQTL. Classifies an eQTL as 'trans' if it maps to a different region on the genome than the location of the gene exhibiting the eQTL (further away than half the above mentioned interval size from the gene). Classifies an eQTL as 'no_result' if the location of the target gene is not known. ------- **Example input files** eQTL results file, each row correspond to an eQTL (21 columns; only a part of the file is shown):: trait_name trait_number eQTL_number chr peak_marker peak_position peak_LR peak_LOD R2 TR2 S additive dominance LOD1_L_m LOD1_L_pos LOD1_R_m LOD1_R_pos LOD2_L_m LOD2_L_pos LOD2_R_m LOD2_R_pos geneA 106 2 10 4 0.5206 13.0002477 2.821053751 0.1067186 0.2802598 2741.216084 -80.0805117 0 3 0.4045 5 0.6791 3 0.3583 5 0.7505 geneB 434 3 6 3 0.1455 13.000651 2.821141267 0.0881461 0.3710748 38.650035 502.7692948 0 2 0.0847 3 0.2153 1 0.0112 3 0.2763 geneC 343 2 4 10 1.1039 13.0012249 2.821265803 0.1168611 0.3068127 42.9667077 -101.8310204 0 10 1.0217 10 1.1078 9 0.9838 10 1.1118 geneD 384 1 1 19 2.3414 13.0022994 2.82149897 0.1372476 0.1985604 2.1933164 -688.0268455 0 19 2.1956 20 2.4956 19 2.0883 20 2.5488 geneD 267 2 9 8 1.2052 13.0026682 2.821578999 0.0862225 0.3794662 55.4157254 278.1351403 0 7 1.2023 8 1.2277 7 1.1994 8 1.2466 Chromosome summary file, each row correspond to a chromosome (6 columns; only a part of the file is shown). The last row gives the total across the genome:: chr markers cM bp int_positions bins 1 27 324.4 301354135 177 176 2 14 169.11 237068873 92 91 3 19 221.29 232140174 123 122 4 20 188.37 241473504 105 104 5 20 203.82 217872852 110 109 6 17 195.85 169174353 106 105 Total 117 1302.84 1399083891 713 707 Gene positions file, each row correspond to a gene (4 columns; only a part of the file is shown):: gene chr start_bp end_bp geneA 1 33214735 33217244 geneB 2 216416829 216433258 geneC 6 162556092 162559012 geneD 4 197750322 197751855 geneE 7 144322379 144325978 geneF 10 88551726 88552391 geneG 8 163218231 163219697 geneH 4 28738352 28739816 geneI 5 180868777 180878474 geneJ 5 182124005 182130631 Lookup table file, each row correspond to a 2 cM interval (6 columns; only a part of the file is shown):: id chr marker int cM bp length_cM 1 1 1 0.0001 0.0 2038278 2.0 2 1 1 0.0201 2.0 2466324 2.0 3 1 1 0.0401 4.0 2894370 2.0 4 1 1 0.0601 6.0 3322416 1.53 5 1 2 0.0754 7.53 3649871 2.0 6 1 2 0.0954 9.53 4095673 2.0 7 1 2 0.1154 11.53 4541476 2.0 8 1 2 0.1354 13.53 4987278 2.0 ------- **Example output files** eQTL full classification file, each row correspond to an eQTL (16 columns; only a part of the file is shown). A classification column was added to the eQTL results file:: gene index chr start_marker start_int end_marker end_int peak_marker peak_int peakLR rsq rtsq parent_up_reg classification eQTL_bin gene_bin geneA 1 6 13 1.5139 15 1.6431 13 1.5539 12.7532485 0.1337606 0.3630217 parentA trans 691 800 geneC 2 9 5 0.8106 6 0.9614 6 0.9214 20.344489 0.1559524 0.3123026 parentB trans 902 700 geneC 3 9 8 1.2052 8 1.2452 8 1.2052 16.6822024 0.1244943 0.314542 parentA cis 917 920 geneD 4 9 1 0.0001 2 0.2395 1 0.1201 19.531317 0.1753893 0.4300621 parentA cis 860 862 geneH 5 1 1 0.0001 1 0.1001 1 0.0001 19.5727096 0.1373944 0.392982 parentB trans 939 465 geneH 6 1 9 1.0268 11 1.2164 10 1.1261 13.5560176 0.095168 0.4823061 parentB trans 1000 465 geneH 7 6 14 1.5977 15 1.8031 15 1.7231 19.8953622 0.3181244 0.3909106 parentB no_result 904 904 geneI 8 9 7 1.0982 9 1.3079 8 1.2052 20.3966235 0.1305025 0.4233788 parentA cis 977 969 eQTL cis classification file, each row correspond to a cis eQTL (16 columns; only a part of the file is shown):: gene index chr start_marker start_int end_marker end_int peak_marker peak_int peakLR rsq rtsq parent_up_reg classification eQTL_bin gene_bin geneC 3 9 8 1.2052 8 1.2452 8 1.2052 16.6822024 0.1244943 0.314542 parentA cis 917 920 geneD 4 9 1 0.0001 2 0.2395 1 0.1201 19.531317 0.1753893 0.4300621 parentA cis 860 862 geneI 8 9 7 1.0982 9 1.3079 8 1.2052 20.3966235 0.1305025 0.4233788 parentA cis 977 969 eQTL trans classification file, each row correspond to a trans eQTL (16 columns; only a part of the file is shown):: gene index chr start_marker start_int end_marker end_int peak_marker peak_int peakLR rsq rtsq parent_up_reg classification eQTL_bin gene_bin geneA 1 6 13 1.5139 15 1.6431 13 1.5539 12.7532485 0.1337606 0.3630217 parentA trans 691 800 geneC 2 9 5 0.8106 6 0.9614 6 0.9214 20.344489 0.1559524 0.3123026 parentB trans 902 700 geneH 5 1 1 0.0001 1 0.1001 1 0.0001 19.5727096 0.1373944 0.392982 parentB trans 939 465 geneH 6 1 9 1.0268 11 1.2164 10 1.1261 13.5560176 0.095168 0.4823061 parentB trans 1000 465 Classification summary file, each row correspond to a class (6 columns):: class number_eQTLs percentage_eQTLs average_peakLR average_rsq average_rtsq cis 4712 14.93% 36.0 0.29 0.47 trans 20726 65.69% 36.0 0.29 0.47 no_result 6111 19.369% 20.1 0.16 0.39 total 31549 100.0% 19.5 0.16 0.38 Chromosome summary v2 file, each row correspond to a chromosome (11 columns; only a part of the file is shown). The last row gives the total across the genome:: chr markers cM bp interval.positions bins genes cis eQTL trans eQTL unknown eQTL all eQTL 1 27 324.4 301354135 177 176 5185 782 3209 761 4752 2 14 169.11 237068873 92 91 3782 512 1897 510 2919 3 19 221.29 232140174 123 122 3608 469 2098 614 3181 4 20 188.37 241473504 105 104 3389 493 2006 491 2990 5 20 203.82 217872852 110 109 3964 657 3077 762 4496 6 17 195.85 169174353 106 105 2744 413 1933 516 2862 Total 117 1302.84 1399083891 713 707 22672 3326 14220 3654 21200 Gene positions v2 file, each row correspond to a gene (9 columns; only a part of the file is shown):: gene chr start_bp end_bp num_eQTL num_cis_eQTL num_trans_eQTL num_unknown_eQTL gene_bin geneA 6 155513712 155518148 1 0 1 0 682 geneB 4 230729005 230729064 0 0 0 0 472 geneC 2 172852270 172853086 2 0 1 1 229 geneD 1 282744902 282749375 3 0 3 0 154 geneE 2 6556394 6560322 0 0 0 0 189 eQTL per gene summary file (2 columns):: Average number of eQTLs per gene with eQTL 2.4 Average number of cis eQTLs per gene with cis eQTL 1.0 Average number of trans eQTLs per gene with trans eQTL 1.8 Number of genes with only cis eQTL (no trans) 1402 (8.5%) Number of genes with only trans eQTL (no cis) 11042 (66.7%) Number of genes with cis and trans eQTL 4121 (24.9%) Number of genes with cis or trans eQTL 16565 (100.0%) eQTL vs gene position plot (in pdf format, produced using R). </help> </tool>