comparison multi_join_serial/multi_join_serial.xml @ 0:1b7d0d2a3543 draft

Uploaded
author mir-bioinf
date Wed, 15 Apr 2015 14:23:56 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:1b7d0d2a3543
1 <tool id="Multi_Join_serial" name="Join multiple" version="0.0.1" force_history_refresh="True">
2 <description>tab delimited files serially</description>
3 <!-- cms commenting out to troubleshoot -->
4 <command interpreter="perl">
5 #for $j, $s in enumerate( $Files )
6 #silent $j
7 #end for
8
9 #for $i, $s in enumerate( $Files )
10 /opt/galaxy/galaxy-dist/tools/ngs_rna/Unreleased/run-multi_join_serial.pl --join_file $s.joinMe --join_col $s.joinCol --iteration $i --totalfiles $j --with_header $headerYes --resultsfile $Joined_all --log $log
11 ##print "loop iteration $i.\n";
12 ;
13 #end for
14 </command>
15 <inputs>
16 <repeat name="Files" title="Join file">
17 <param name="joinMe" type="data" checked="yes" format="tabular" label="Join" />
18 <param name="joinCol" label="using column" type="data_column" data_ref="joinMe" />
19 </repeat>
20 <param name="headerYes" type="select" label="Treat first line as header?" help="If header starts with #, it will NOT be read, so this field should be set to no. Otherwise it can be set to yes if first line is header for ALL FILES.">
21 <option value="yes" selected="true">Yes</option>
22 <option value="no">No</option>
23 </param>
24 </inputs>
25 <outputs>
26 <data format="tabular" name="Joined_all" label="Multi-Join result"/>
27 <data format="txt" name="log" label="debug_info"/>
28 </outputs>
29 <tests>
30 <test>
31 <param name="Files_0|joinMe" value="multi_join_serial_in1.tab" ftype="tabular"/>
32 <param name="Files_0|joinCol" value="1"/>
33 <param name="Files_1|joinMe" value="multi_join_serial_in2.tab" ftype="tabular"/>
34 <param name="Files_1|joinCol" value="1"/>
35 <param name="Files_2|joinMe" value="multi_join_serial_in3.tab" ftype="tabular"/>
36 <param name="Files_2joinCol" value="2"/>
37 <param name="headerYes" value="yes"/>
38 <output name="Joined_all" value="multi_join_serial_out.tab" ftype="tabular"/>
39 <output name="log" value="multi_join_serial_debug.txt" ftype="tabular"/>
40 <test/>
41 <tests/>
42 <help>
43
44 This tool performs a left-outer join on multiple (at least two) files using a perl script that Ron wrote (thanks, Ron!). The resulting joined file will have the same number of rows as the first file chosen and subsequent files' matches will be shown if present. Rows in the first file without matches in the other files will have empty cells. If none of the input files have a header present, a simple column number header will be added to the output file to denote the start of each set of matches (from each file, start denoted by "C1").
45
46 To convert from left-outer join result to inner join result (only include rows in common to all datasets), run Filter out rows and columns with non-numeric values tool with the following options selected (last 3 options, all are drop-down select menus):
47 1. Replace/remove: Empty only
48 2. Remove entire column or row (leave default)
49 3. Remove non-numeric/empty cell-containing ROWS from dataset
50
51
52 .. class:: warningmark
53
54 This tool may fail due to the system running out of memory depending on the number and size of input files and number of matching lines. The higher all of these are, the more likely the tool is to fail. A red output dataset saying "Job killed" typically means the system ran into an out of memory error and as a result the job was killed. This issue has yet to be addressed at the moment...
55
56
57 **Steps:**
58 1. Click Add new File for each tab-delimited file you'd like to add and the column you want to join on.
59 2. After adding all files to join, select whether the headers should all be preserved (this should be Yes if all input datasets have headers).
60 3. Click Execute.
61 4. Please report any issues and/or suggestions to Christy.
62
63 -----
64
65 **Example**
66
67 Dataset1::
68
69 chr1 10 20 geneA
70 chr1 50 80 geneB
71 chr5 10 40 geneL
72
73 Dataset2::
74
75 geneA tumor-supressor
76 geneB Foxp2
77 geneC Gnas1
78 geneE INK4a
79
80 Joining the 4th column of Dataset1 with the 1st column of Dataset2, no header, will yield::
81
82 C1 C2 C3 C4 C1 C2
83 chr1 10 20 geneA geneA tumor-suppressor
84 chr1 50 80 geneB geneB Foxp2
85 chr5 10 40 geneL
86
87
88 </help>
89
90
91 </tool>
92