annotate pmids_to_pubtator_matrix.xml @ 2:ada4c7b3fc39 draft default tip

"planemo upload for repository https://github.com/dlal-group/simtext commit 63f67ee02be2eb4323a5ba5dcdd33d1fd0b7c24e"
author dlalgroup
date Thu, 08 Oct 2020 05:39:58 +0000
parents 3f4adc85ba5d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
1 <tool id="pmids_to_pubtator_matrix" name="pmids_to_pubtator_matrix" version="@VERSION@">
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
2 <description>Per row, extract scientific terms from PMIDs susing PubTator and generate a binary matrix</description>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
3 <macros>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
4 <import>macros.xml</import>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
5 </macros>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
6 <expand macro="requirements">
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
7 <requirement type="package" version="2.0.1">r-argparse</requirement>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
8 <requirement type="package" version="1.4.0">r-stringr</requirement>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
9 <requirement type="package" version="1.95_4.12">r-rcurl</requirement>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
10 <requirement type="package" version="1.4.3">r-stringi</requirement>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
11 </expand>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
12 <expand macro="stdio"/>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
13 <command><![CDATA[Rscript
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
14 '${__tool_directory__}/pmids_to_pubtator_matrix.R'
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
15 --input '$input'
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
16 --output '$output'
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
17 --number '$number'
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
18 $byid
2
ada4c7b3fc39 "planemo upload for repository https://github.com/dlal-group/simtext commit 63f67ee02be2eb4323a5ba5dcdd33d1fd0b7c24e"
dlalgroup
parents: 0
diff changeset
19 --categories
0
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
20 #for $category in $categories:
2
ada4c7b3fc39 "planemo upload for repository https://github.com/dlal-group/simtext commit 63f67ee02be2eb4323a5ba5dcdd33d1fd0b7c24e"
dlalgroup
parents: 0
diff changeset
21 '$category'
0
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
22 #end for
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
23 ]]>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
24 </command>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
25 <inputs>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
26 <param argument="--input" label="Input file" name="input" optional="false" type="data" format="tabular" help="input"/>
2
ada4c7b3fc39 "planemo upload for repository https://github.com/dlal-group/simtext commit 63f67ee02be2eb4323a5ba5dcdd33d1fd0b7c24e"
dlalgroup
parents: 0
diff changeset
27 <param argument="--categories" name="categories" type="select" label="categories" multiple="true" display="checkboxes">
0
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
28 <option value="Gene">Genes</option>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
29 <option value="Disease">Diseases</option>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
30 <option value="Mutation">Mutations</option>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
31 <option value="Chemical">Chemicals</option>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
32 <option value="Species">Species</option>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
33 </param>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
34 <param argument="--byid" label="If you want to find common gene IDs / mesh IDs instead of specific scientific terms." name="byid" type="boolean" truevalue="--byid" falsevalue="" help="byid" checked="false"/>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
35 <param argument="--number" label="Number of most frequent terms/IDs to extract." name="number" optional="true" type="integer" help="number" value="50"/>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
36 </inputs>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
37 <outputs>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
38 <data format="tabular" name="output" />
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
39 </outputs>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
40 <tests>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
41 <test>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
42 <param name="input" value="pubmed_by_queries_output" ftype="tabular"/>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
43 <output name="output" value="pmids_to_pubtator_matrix_output"/>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
44 </test>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
45 </tests>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
46 <help><![CDATA[
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
47
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
48 The tool uses all PMIDs per row and extracts "Gene", "Disease", "Mutation", "Chemical" and "Species" terms of the corresponding abstracts, using PubTator annotations. The user can choose from which categories terms should be extracted. The extracted terms are united in one large binary matrix, with 0= term not present in abstracts of that row and 1= term present in abstracts of that row. The user can decide if the scientific terms should be extracted and used as they are or if they should be grouped by their geneIDs/ meshIDs (several terms are often grouped into one ID). The the user can specify a number of most frequent words to extract per row.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
49
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
50 Input: Output of 'abstracts_by_pmids' tool, or tab-delimited table with columns containing PMIDs. The names of the PMID columns should start with "PMID", e.g. "PMID_1", "PMID_2" etc.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
51
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
52 Output: Binary matrix in that each column represents one of the extracted terms.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
53
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
54 ]]></help>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
55 <expand macro="citations"/>
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
56 </tool>