Mercurial > repos > galaxyp > bed_to_protein_map
comparison bed_to_protein_map.xml @ 0:024ed7b0ad93 draft
planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/bed_to_protein_map commit 2d39f681f77eedc840c17aebe4ddc8f66c8a7c62-dirty
author | galaxyp |
---|---|
date | Thu, 04 Jan 2018 16:29:38 -0500 |
parents | |
children | a7c58b43cbaa |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:024ed7b0ad93 |
---|---|
1 <tool id="bed_to_protein_map" name="bed to protein map" version="0.1.0"> | |
2 <description>genomic location of proteins for MVP</description> | |
3 <requirements> | |
4 </requirements> | |
5 <stdio> | |
6 <exit_code range="1:" /> | |
7 </stdio> | |
8 <command><![CDATA[ | |
9 python '$__tool_directory__/bed_to_protein_map.py' '$input' '$output' | |
10 ]]></command> | |
11 <inputs> | |
12 <param name="input" type="data" format="bed" label="A BED file with 12 columns, thickStart and thickEnd define protein coding region"/> | |
13 </inputs> | |
14 <outputs> | |
15 <data name="output" format="tabular"> | |
16 <actions> | |
17 <action name="column_names" type="metadata" default="name,chrom,start,end,strand,cds_start,cds_end"/> | |
18 </actions> | |
19 </data> | |
20 </outputs> | |
21 <tests> | |
22 <test> | |
23 <param name="input" ftype="bed" value="input.bed"/> | |
24 <output name="output" file="output.tabular"/> | |
25 </test> | |
26 </tests> | |
27 <help><![CDATA[ | |
28 Convert a BED format file of the proteins from a proteomics search database into a tabular format for the Multiomics Visualization Platform (MVP). | |
29 | |
30 Example input BED dataset:: | |
31 | |
32 X 276352 291629 ENST00000430923 20 + 284187 291629 80,80,80 5 42,148,137,129,131 0,7814,12380,14295,15146 | |
33 X 304749 318819 ENST00000326153 20 - 305073 318787 80,80,80 10 448,153,149,209,159,68,131,71,138,381 0,2610,2982,6669,8016,9400,10140,10479,12164,13689 | |
34 | |
35 | |
36 Output:: | |
37 | |
38 name chrom start end strand cds_start cds_end | |
39 ENST00000430923 X 284187 284314 + 0 127 | |
40 ENST00000430923 X 288732 288869 + 127 264 | |
41 ENST00000430923 X 290647 290776 + 264 393 | |
42 ENST00000430923 X 291498 291629 + 393 524 | |
43 ENST00000326153 X 318438 318787 - 0 349 | |
44 ENST00000326153 X 316913 317051 - 349 487 | |
45 ENST00000326153 X 315228 315299 - 487 558 | |
46 ENST00000326153 X 314889 315020 - 558 689 | |
47 ENST00000326153 X 314149 314217 - 689 757 | |
48 ENST00000326153 X 312765 312924 - 757 916 | |
49 ENST00000326153 X 311418 311627 - 916 1125 | |
50 ENST00000326153 X 307731 307880 - 1125 1274 | |
51 ENST00000326153 X 307359 307512 - 1274 1427 | |
52 ENST00000326153 X 305073 305197 - 1427 1551 | |
53 | |
54 | |
55 The tabular output can be converted to a sqlite database using the Query_Tabular_ tool. | |
56 | |
57 The sqlite table should be named: feature_cds_map | |
58 The names for the columns should be: name,chrom,start,end,strand,cds_start,cds_end | |
59 | |
60 This SQL query will return the genomic location for a peptide sequence in a protein (multiply the animo acid position by 3 for the cds location):: | |
61 | |
62 SELECT distinct chrom, CASE WHEN strand = '+' THEN start + cds_offset - cds_start ELSE end - cds_offset - cds_start END as "pos" | |
63 FROM feature_cds_map | |
64 WHERE name = acc_name AND cds_offset >= cds_start AND cds_offset < cds_end | |
65 | |
66 | |
67 .. _Query_Tabular: https://toolshed.g2.bx.psu.edu/view/iuc/query_tabular/1ea4e668bf73 | |
68 | |
69 ]]></help> | |
70 </tool> |