annotate multi_join_serial/multi_join_serial.xml @ 0:1b7d0d2a3543 draft

Uploaded
author mir-bioinf
date Wed, 15 Apr 2015 14:23:56 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
1 <tool id="Multi_Join_serial" name="Join multiple" version="0.0.1" force_history_refresh="True">
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
2 <description>tab delimited files serially</description>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
3 <!-- cms commenting out to troubleshoot -->
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
4 <command interpreter="perl">
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
5 #for $j, $s in enumerate( $Files )
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
6 #silent $j
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
7 #end for
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
8
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
9 #for $i, $s in enumerate( $Files )
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
10 /opt/galaxy/galaxy-dist/tools/ngs_rna/Unreleased/run-multi_join_serial.pl --join_file $s.joinMe --join_col $s.joinCol --iteration $i --totalfiles $j --with_header $headerYes --resultsfile $Joined_all --log $log
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
11 ##print "loop iteration $i.\n";
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
12 ;
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
13 #end for
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
14 </command>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
15 <inputs>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
16 <repeat name="Files" title="Join file">
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
17 <param name="joinMe" type="data" checked="yes" format="tabular" label="Join" />
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
18 <param name="joinCol" label="using column" type="data_column" data_ref="joinMe" />
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
19 </repeat>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
20 <param name="headerYes" type="select" label="Treat first line as header?" help="If header starts with #, it will NOT be read, so this field should be set to no. Otherwise it can be set to yes if first line is header for ALL FILES.">
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
21 <option value="yes" selected="true">Yes</option>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
22 <option value="no">No</option>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
23 </param>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
24 </inputs>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
25 <outputs>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
26 <data format="tabular" name="Joined_all" label="Multi-Join result"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
27 <data format="txt" name="log" label="debug_info"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
28 </outputs>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
29 <tests>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
30 <test>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
31 <param name="Files_0|joinMe" value="multi_join_serial_in1.tab" ftype="tabular"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
32 <param name="Files_0|joinCol" value="1"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
33 <param name="Files_1|joinMe" value="multi_join_serial_in2.tab" ftype="tabular"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
34 <param name="Files_1|joinCol" value="1"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
35 <param name="Files_2|joinMe" value="multi_join_serial_in3.tab" ftype="tabular"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
36 <param name="Files_2joinCol" value="2"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
37 <param name="headerYes" value="yes"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
38 <output name="Joined_all" value="multi_join_serial_out.tab" ftype="tabular"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
39 <output name="log" value="multi_join_serial_debug.txt" ftype="tabular"/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
40 <test/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
41 <tests/>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
42 <help>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
43
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
44 This tool performs a left-outer join on multiple (at least two) files using a perl script that Ron wrote (thanks, Ron!). The resulting joined file will have the same number of rows as the first file chosen and subsequent files' matches will be shown if present. Rows in the first file without matches in the other files will have empty cells. If none of the input files have a header present, a simple column number header will be added to the output file to denote the start of each set of matches (from each file, start denoted by "C1").
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
45
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
46 To convert from left-outer join result to inner join result (only include rows in common to all datasets), run Filter out rows and columns with non-numeric values tool with the following options selected (last 3 options, all are drop-down select menus):
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
47 1. Replace/remove: Empty only
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
48 2. Remove entire column or row (leave default)
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
49 3. Remove non-numeric/empty cell-containing ROWS from dataset
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
50
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
51
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
52 .. class:: warningmark
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
53
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
54 This tool may fail due to the system running out of memory depending on the number and size of input files and number of matching lines. The higher all of these are, the more likely the tool is to fail. A red output dataset saying "Job killed" typically means the system ran into an out of memory error and as a result the job was killed. This issue has yet to be addressed at the moment...
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
55
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
56
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
57 **Steps:**
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
58 1. Click Add new File for each tab-delimited file you'd like to add and the column you want to join on.
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
59 2. After adding all files to join, select whether the headers should all be preserved (this should be Yes if all input datasets have headers).
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
60 3. Click Execute.
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
61 4. Please report any issues and/or suggestions to Christy.
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
62
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
63 -----
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
64
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
65 **Example**
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
66
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
67 Dataset1::
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
68
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
69 chr1 10 20 geneA
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
70 chr1 50 80 geneB
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
71 chr5 10 40 geneL
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
72
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
73 Dataset2::
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
74
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
75 geneA tumor-supressor
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
76 geneB Foxp2
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
77 geneC Gnas1
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
78 geneE INK4a
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
79
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
80 Joining the 4th column of Dataset1 with the 1st column of Dataset2, no header, will yield::
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
81
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
82 C1 C2 C3 C4 C1 C2
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
83 chr1 10 20 geneA geneA tumor-suppressor
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
84 chr1 50 80 geneB geneB Foxp2
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
85 chr5 10 40 geneL
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
86
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
87
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
88 </help>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
89
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
90
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
91 </tool>
1b7d0d2a3543 Uploaded
mir-bioinf
parents:
diff changeset
92