hadoop_galaxy: cat_paths.xml comparison

comparison cat_paths.xml @ 1:30bd2584b6a0 draft default tip

Uploaded

author	crs4
date	Wed, 15 Oct 2014 09:39:16 -0400
parents	7698311d4466
children

comparison

equal deleted inserted replaced

-:7698311d4466
+:30bd2584b6a0
-<tool id="hadoop_galaxy_cat_paths" name="Cat paths" version="0.1.0">
-<description>Concatenate all components of a pathset into a single file.</description>
-<requirements>
-<requirement type="package" version="0.11">pydoop</requirement>
-<requirement type="package" version="0.1.1">hadoop-galaxy</requirement>
-</requirements>
-<command>
-#if $use_hadoop
-dist_cat_paths
-#else
-cat_paths
-#end if
-#if $delete_source
---delete-source
-#end if
-$input_pathset $output_path
-</command>
-<inputs>
-<param name="input_pathset" type="data" format="pathset" label="Input pathset">
-<validator type="empty_field" />
-</param>
-<param name="delete_source" type="boolean" checked="false" label="Delete remote input data"
-help="This option makes the tool move the data rather than copy it" />
-<param name="use_hadoop" type="boolean" checked="false" label="Use Hadoop-based program"
-help="The Galaxy workspace must be accessible by the Hadoop cluster (see help for details)" />
-</inputs>
-<outputs>
-<!-- TODO: can we read the format from input pathset and transfer it to output? -->
-<data name="output_path" format="data" label="Concatenated dataset $input_pathset.name" />
-</outputs>
-<stdio>
-<exit_code range="1:" level="fatal" />
-</stdio>
-<help>
-Datasets represented as pathsets can be split in a number of files.
-This tool takes all of them and concatenates them into a single output file.
-In your workflow, you'll need to explicitly set the appropriate data format on the
-output dataset with an Action to "Change Datatype".
-"Delete remote input data" option
-====================================
-With this option, after the data has been concated into the new Galaxy dataset,
-the original files that were referenced by the pathset are deleted.  This effectively
-tells the action to "move" the data instead of a "copying" it and helps
-avoid amassing intermediate data in your Hadoop workspace.
-"Use Hadoop-based program" option
-====================================
-With this option you will use your entire Hadoop cluster to simultaneously write
-multiple parts of the final file.  For this to be possible, the Hadoop nodes
-must be able to access the Galaxy file space directly.  In addition, to achieve
-reasonable results the Galaxy workspace should on a parallel shared file system.
-</help>
-</tool>

Mercurial > repos > crs4 > hadoop_galaxy

comparison cat_paths.xml @ 1:30bd2584b6a0 draft default tip