annotate multi_join_serial.xml @ 2:3a9cc859f4c1 draft

Uploaded
author mir-bioinf
date Wed, 15 Apr 2015 14:43:04 -0400
parents
children 0aa0ebcd307c
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
1 <tool id="Multi_Join_serial" name="Join multiple" version="0.0.1" force_history_refresh="True">
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
2 <description>tab delimited files serially</description>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
3 <!-- cms commenting out to troubleshoot -->
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
4 <command interpreter="perl">
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
5 #for $j, $s in enumerate( $Files )
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
6 #silent $j
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
7 #end for
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
8
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
9 #for $i, $s in enumerate( $Files )
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
10 /opt/galaxy/galaxy-dist/tools/ngs_rna/Unreleased/run-multi_join_serial.pl --join_file $s.joinMe --join_col $s.joinCol --iteration $i --totalfiles $j --with_header $headerYes --resultsfile $Joined_all --log $log
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
11 ##print "loop iteration $i.\n";
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
12 ;
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
13 #end for
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
14 </command>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
15 <inputs>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
16 <repeat name="Files" title="Join file">
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
17 <param name="joinMe" type="data" checked="yes" format="tabular" label="Join" />
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
18 <param name="joinCol" label="using column" type="data_column" data_ref="joinMe" />
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
19 </repeat>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
20 <param name="headerYes" type="select" label="Treat first line as header?" help="If header starts with #, it will NOT be read, so this field should be set to no. Otherwise it can be set to yes if first line is header for ALL FILES.">
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
21 <option value="yes" selected="true">Yes</option>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
22 <option value="no">No</option>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
23 </param>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
24 </inputs>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
25 <outputs>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
26 <data format="tabular" name="Joined_all" label="Multi-Join result"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
27 <data format="txt" name="log" label="debug_info"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
28 </outputs>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
29 <tests>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
30 <test>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
31 <param name="Files_0|joinMe" value="multi_join_serial_in1.tab" ftype="tabular"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
32 <param name="Files_0|joinCol" value="1"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
33 <param name="Files_1|joinMe" value="multi_join_serial_in2.tab" ftype="tabular"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
34 <param name="Files_1|joinCol" value="1"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
35 <param name="Files_2|joinMe" value="multi_join_serial_in3.tab" ftype="tabular"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
36 <param name="Files_2joinCol" value="2"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
37 <param name="headerYes" value="yes"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
38 <output name="Joined_all" value="multi_join_serial_out.tab" ftype="tabular"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
39 <output name="log" value="multi_join_serial_debug.txt" ftype="tabular"/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
40 <test/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
41 <tests/>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
42 <help>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
43
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
44 This tool performs a left-outer join on multiple (at least two) files using a perl script that Ron wrote (thanks, Ron!). The resulting joined file will have the same number of rows as the first file chosen and subsequent files' matches will be shown if present. Rows in the first file without matches in the other files will have empty cells. If none of the input files have a header present, a simple column number header will be added to the output file to denote the start of each set of matches (from each file, start denoted by "C1").
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
45
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
46 To convert from left-outer join result to inner join result (only include rows in common to all datasets), run Filter out rows and columns with non-numeric values tool with the following options selected (last 3 options, all are drop-down select menus):
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
47 1. Replace/remove: Empty only
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
48 2. Remove entire column or row (leave default)
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
49 3. Remove non-numeric/empty cell-containing ROWS from dataset
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
50
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
51
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
52 .. class:: warningmark
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
53
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
54 This tool may fail due to the system running out of memory depending on the number and size of input files and number of matching lines. The higher all of these are, the more likely the tool is to fail. A red output dataset saying "Job killed" typically means the system ran into an out of memory error and as a result the job was killed. This issue has yet to be addressed at the moment...
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
55
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
56
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
57 **Steps:**
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
58 1. Click Add new File for each tab-delimited file you'd like to add and the column you want to join on.
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
59 2. After adding all files to join, select whether the headers should all be preserved (this should be Yes if all input datasets have headers).
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
60 3. Click Execute.
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
61 4. Please report any issues and/or suggestions to Christy.
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
62
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
63 -----
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
64
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
65 **Example**
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
66
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
67 Dataset1::
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
68
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
69 chr1 10 20 geneA
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
70 chr1 50 80 geneB
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
71 chr5 10 40 geneL
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
72
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
73 Dataset2::
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
74
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
75 geneA tumor-supressor
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
76 geneB Foxp2
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
77 geneC Gnas1
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
78 geneE INK4a
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
79
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
80 Joining the 4th column of Dataset1 with the 1st column of Dataset2, no header, will yield::
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
81
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
82 C1 C2 C3 C4 C1 C2
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
83 chr1 10 20 geneA geneA tumor-suppressor
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
84 chr1 50 80 geneB geneB Foxp2
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
85 chr5 10 40 geneL
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
86
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
87
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
88 </help>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
89
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
90
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
91 </tool>
3a9cc859f4c1 Uploaded
mir-bioinf
parents:
diff changeset
92