comparison regex_tabular.xml @ 0:60d04307b027 draft

planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/regex_find_replace commit 568a615b191482c54ecb31399ba27f78d6c71510
author galaxyp
date Wed, 18 Jan 2017 17:45:20 -0500 (2017-01-18)
parents
children 209b7c5ee9d7
comparison
equal deleted inserted replaced
-1:000000000000 0:60d04307b027
1 <tool id="regexColumn1" name="Column Regex Find And Replace" version="1.0.0">
2 <description></description>
3 <command interpreter="python">regex.py --input '$input' --output '$out_file1' --column $field --input_display_name '$input.display_name'
4 #for $check in $checks:
5 --pattern='$check.pattern' --replacement='$check.replacement'
6 #end for
7 </command>
8 <inputs>
9 <param format="tabular" name="input" type="data" label="Select cells from"/>
10 <param name="field" label="using column" type="data_column" data_ref="input" />
11 <repeat name="checks" title="Check">
12 <param name="pattern" size="40" type="text" value="chr([0-9A-Za-z])+" label="Find Regex" help="here you can enter text or regular expression (for syntax check lower part of this frame)">
13 <sanitizer>
14 <valid>
15 <add preset="string.printable"/>
16 <remove value="&#92;" />
17 <remove value="&apos;" />
18 </valid>
19 <mapping initial="none">
20 <add source="&#92;" target="__backslash__" />
21 <add source="&apos;" target="__sq__"/>
22 </mapping>
23 </sanitizer>
24 </param>
25 <param name="replacement" size="40" type="text" value="newchr\1" label="Replacement">
26 <sanitizer>
27 <valid>
28 <add preset="string.printable"/>
29 <remove value="&#92;" />
30 <remove value="&apos;" />
31 </valid>
32 <mapping initial="none">
33 <add source="&#92;" target="__backslash__" />
34 <add source="&apos;" target="__sq__"/>
35 </mapping>
36 </sanitizer>
37 </param>
38 </repeat>
39 </inputs>
40 <outputs>
41 <data format="input" name="out_file1" metadata_source="input" />
42 </outputs>
43 <tests>
44 <test>
45 <param name="input" value="find_tabular_1.txt" ftype="tabular" />
46 <param name="field" value="2" />
47 <param name="pattern" value="moo"/>
48 <param name="replacement" value="cow" />
49 <output name="out_file1" file="replace_tabular_1.txt"/>
50 </test>
51 <test>
52 <param name="input" value="find_tabular_1.txt" ftype="tabular" />
53 <param name="field" value="1" />
54 <param name="pattern" value="moo"/>
55 <param name="replacement" value="cow" />
56 <output name="out_file1" file="replace_tabular_2.txt"/>
57 </test>
58 </tests>
59 <help>
60
61 .. class:: warningmark
62
63 **This tool will attempt to reuse the metadata from your first input.** To change metadata assignments click on the "edit attributes" link of the history item generated by this tool.
64
65 .. class:: infomark
66
67 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert*
68
69 -----
70
71 This tool goes line by line through the specified input file and
72 if the text in the selected column matches a specified regular expression pattern
73 replaces the text with the corresponding specified replacement.
74
75 This tool can be used to change between the chromosome naming conventions of UCSC and Ensembl.
76
77 For example to remove the **chr** part of the reference sequence name in the first column of this GFF file::
78
79 ##gff-version 2
80 ##Date: Thu Mar 23 11:21:17 2006
81 ##bed2gff.pl $Rev: 601 $
82 ##Input file: ./database/files/61c6c604e0ef50b280e2fd9f1aa7da61.dat
83 chr1 bed2gff CCDS1000.1_cds_0_0_chr1_148325916_f 148325916 148325975 . + . score "0";
84 chr21 bed2gff CCDS13614.1_cds_0_0_chr21_32707033_f 32707033 32707192 . + . score "0";
85 chrX bed2gff CCDS14606.1_cds_0_0_chrX_122745048_f 122745048 122745924 . + . score "0";
86
87 Setting::
88
89 using column: c1
90 Find Regex: chr([0-9]+|X|Y|M[Tt]?)
91 Replacement: \1
92
93 produces::
94
95 ##gff-version 2
96 ##Date: Thu Mar 23 11:21:17 2006
97 ##bed2gff.pl $Rev: 601 $
98 ##Input file: ./database/files/61c6c604e0ef50b280e2fd9f1aa7da61.dat
99 1 bed2gff CCDS1000.1_cds_0_0_chr1_148325916_f 148325916 148325975 . + . score "0";
100 21 bed2gff CCDS13614.1_cds_0_0_chr21_32707033_f 32707033 32707192 . + . score "0";
101 X bed2gff CCDS14606.1_cds_0_0_chrX_122745048_f 122745048 122745924 . + . score "0";
102
103
104 This tool uses Python regular expressions with the **re.sub()** function.
105 More information about Python regular expressions can be found here:
106 http://docs.python.org/library/re.html.
107
108 The regex **chr([0-9]+|X|Y|M)** means start with text **chr** followed by either: one or more digits, or the letter X, or the letter Y, or the letter M (optionally followed by a single letter T or t).
109 Note that the parentheses **()** capture patterns in the text that can be used in the replacement text by using a backslash-number reference: **\\1**
110
111
112 In the replacement pattern, use the special token #{input_name} to insert the input dataset's display name.
113 The name can be modified by a second find/replace check. Suppose you want to insert the sample id of your dataset,
114 named **Sample ABC123**, into the dataset itself, which currently contains the lines::
115 Data 1
116 Data 2
117 Data 3
118
119 You can use the following checks::
120 Find Regex: Data
121 Replacement: #{input_name} Data
122
123 Find Regex: Sample (\S+)
124 Replacement: \1
125
126 The result will be::
127 ABC123 Data 1
128 ABC123 Data 2
129 ABC123 Data 3
130
131
132
133 Galaxy aggressively escapes input supplied to tools, so if something
134 is not working please let us know and we can look into whether this is
135 the cause. Also if you would like help constructing regular
136 expressions for your inputs, please let us know at help@msi.umn.edu.
137
138 </help>
139 </tool>