comparison text_exporter.xml @ 2:cf0d72c7b482 draft

Update.
author galaxyp
date Fri, 10 May 2013 17:31:05 -0400
parents
children
comparison
equal deleted inserted replaced
1:5c65f8116244 2:cf0d72c7b482
1 <tool id="openms_text_exporter" version="0.1.0" name="Text Exporter">
2 <description>
3 </description>
4 <macros>
5 <import>macros.xml</import>
6 </macros>
7 <expand macro="stdio" />
8 <expand macro="requires" />
9 <command interpreter="python">
10 openms_wrapper.py \
11 --executable '__SHELL__' --config $link \
12 --executable 'TextExporter' --config $config
13 </command>
14 <configfiles>
15 <configfile name="link">ln -s '${type.input}' 'input.${type.input.ext}'</configfile>
16 <configfile name="config">[simple_options]
17 in=input.${type.input.ext}
18 out=${out}
19 #set $input_type = str($type.input_type)
20 #if $input_type == "featurexml"
21 feature!minimal=${type.minimal}
22 #end if
23 no_ids=${no_ids}
24 </configfile>
25 </configfiles>
26 <inputs>
27 <conditional name="type">
28 <param name="input_type" type="select" label="Input Type">
29 <option value="featurexml">Features (FeatureXML)</option>
30 <option value="consensusxml">Consensus (ConsensusXML)</option>
31 <option value="idxml">Identifications (IdXML)</option>
32 <option value="mzml">Peak List (mzML)</option>
33 </param>
34 <when value="mzml">
35 <param format="mzml" name="input" type="data" label="Input Peak List"/>
36 </when>
37 <when value="featurexml">
38 <param format="featurexml" name="input" type="data" label="Input Features"/>
39 <param name="minimal" type="boolean" label="Minimal Output" help="Set this flag to write only three attributes: RT, m/z, and intensity." truevalue="true" falsevalue="false" />
40 </when>
41 <when value="consensusxml">
42 <param format="consensusxml" name="input" type="data" label="Input Consensus"/>
43 </when>
44 <when value="idxml">
45 <param format="idxml" name="input" type="data" label="Input Identifications"/>
46 </when>
47 </conditional>
48 <param name="no_ids" type="boolean" label="Suppress IDs" help="Supresses output of identification data." truevalue="true" falsevalue="false" />
49 </inputs>
50 <outputs>
51 <data format="txt" name="out" />
52 </outputs>
53 <help>
54 **What it does**
55
56 The goal of this tool is to create output in a table format that is easily readable in Excel or OpenOffice. Lines in the output correspond to rows in the table.
57
58 utput files begin with comment lines, starting with the special character "#". The last such line(s) will be a header with column names, but this may be preceded by more general comments.
59
60 Because the OpenMS XML formats contain different kinds of data in a hierarchical structure, TextExporter produces somewhat unusual TSV/CSV files for many inputs: Different lines in the output may belong to different types of data, and the number of columns and the meanings of the individual fields depend on the type. In such cases, the first column always contains an indicator (in capital letters) for the data type of the current line. In addition, some lines have to be understood relative to a previous line, if there is a hierarchical relationship in the data. (See below for details and examples.)
61
62 Missing values are represented by "-1" or "nan" in numeric fields and by blanks in character/text fields.
63
64 Depending on the input and the parameters, the output contains the following columns:
65
66 featureXML input:
67
68 first column: RUN / PROTEIN / UNASSIGNEDPEPTIDE / FEATURE / PEPTIDE (indicator for the type of data in the current row)
69 a RUN line contains information about a protein identification run; further columns: run_id, score_type, score_direction, data_time, search_engine_version, parameters
70 a PROTEIN line contains data of a protein identified in the previously listed run; further columns: score, rank, accession, coverage, sequence
71 an UNASSIGNEDPEPTIDE line contains data of peptide hit that was not assigned to any feature; further columns: rt, mz, score, rank, sequence, charge, aa_before, aa_after, score_type, search_identifier, accessions
72 a FEATURE line contains data of a single feature; further columns: rt, mz, intensity, charge, width, quality, rt_quality, mz_quality, rt_start, rt_end
73 a PEPTIDE line contains data of a peptide hit annotated to the previous feature; further columns: same as for UNASSIGNEDPEPTIDE
74 With the no_ids flag, only FEATURE lines (without the FEATURE indicator) are written.
75
76 With the feature:minimal flag, only the rt, mz, and intensity columns of FEATURE lines are written.
77
78 consensusXML input:
79
80 Output format produced for the out parameter:
81
82 first column: MAP / RUN / PROTEIN / UNASSIGNEDPEPTIDE / CONSENSUS / PEPTIDE (indicator for the type of data in the current row)
83 a MAP line contains information about a sub-map; further columns: id, filename, label, size (potentially followed by further columns containing meta data, depending on the input)
84 a CONSENSUS line contains data of a single consensus feature; further columns: rt_cf, mz_cf, intensity_cf, charge_cf, width_cf, quality_cf, rt_X0, mz_X0, ..., rt_X1, mz_X1, ...
85 "..._cf" columns refer to the consensus feature itself, "..._Xi" columns refer to a sub-feature from the map with ID "Xi" (no quality column in this case); missing sub-features are indicated by "nan" values
86 see above for the formats of RUN, PROTEIN, UNASSIGNEDPEPTIDE, PEPTIDE lines
87 With the no_ids flag, only MAP and CONSENSUS lines are written.
88
89 Output format produced for the consensus_centroids parameter:
90
91 one line per consensus centroid
92 columns: rt, mz, intensity, charge, width, quality
93 Output format produced for the consensus_elements parameter:
94
95 one line per sub-feature (element) of a consensus feature
96 first column: H / L (indicator for new/repeated element)
97 H indicates a new element, L indicates the replication of the first element of the current consensus feature (for plotting)
98 further columns: rt, mz, intensity, charge, width, rt_cf, mz_cf, intensity_cf, charge_cf, width_cf, quality_cf
99 "..._cf" columns refer to the consensus feature, the other columns refer to the sub-feature
100 Output format produced for the consensus_features parameter:
101
102 one line per consensus feature (suitable for processing with e.g. R)
103 columns: same as for a CONSENSUS line above, followed by additional columns for identification data
104 additional columns: peptide_N0, n_diff_peptides_N0, protein_N0, n_diff_proteins_N0, peptide_N1, ...
105 "..._Ni" columns refer to the identification run with index "Ni", n_diff_... stands for "number of different ..."; different peptides/proteins in one column are separated by "/"
106 With the no_ids flag, the additional columns are not included.
107
108 idXML input:
109
110 first column: RUN / PROTEIN / PEPTIDE (indicator for the type of data in the current row)
111 see above for the formats of RUN, PROTEIN, PEPTIDE lines
112 additional column for PEPTIDE lines: predicted_rt
113 With the id:proteins_only flag, only RUN and PROTEIN lines are written.
114
115 With the id:peptides_only flag, only PEPTIDE lines (without the PEPTIDE indicator) are written.
116
117 With the id:first_dim_rt flag, the additional columns rt_first_dim and predicted_rt_first_dim are included for PEPTIDE lines.
118
119 **Citation**
120
121 For the underlying tool, please cite ``Marc Sturm, Andreas Bertsch, Clemens Gröpl, Andreas Hildebrandt, Rene Hussong, Eva Lange, Nico Pfeifer, Ole Schulz-Trieglaff, Alexandra Zerck, Knut Reinert, and Oliver Kohlbacher, 2008. OpenMS – an Open-Source Software Framework for Mass Spectrometry. BMC Bioinformatics 9: 163. doi:10.1186/1471-2105-9-163.``
122
123 If you use this tool in Galaxy, please cite Chilton J, et al. https://bitbucket.org/galaxyp/galaxyp-toolshed-openms
124 </help>
125 </tool>