Mercurial > repos > dlalgroup > pmids_to_pubtator_matrix
annotate pmids_to_pubtator_matrix.xml @ 2:ada4c7b3fc39 draft default tip
"planemo upload for repository https://github.com/dlal-group/simtext commit 63f67ee02be2eb4323a5ba5dcdd33d1fd0b7c24e"
author | dlalgroup |
---|---|
date | Thu, 08 Oct 2020 05:39:58 +0000 |
parents | 3f4adc85ba5d |
children |
rev | line source |
---|---|
0
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
1 <tool id="pmids_to_pubtator_matrix" name="pmids_to_pubtator_matrix" version="@VERSION@"> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
2 <description>Per row, extract scientific terms from PMIDs susing PubTator and generate a binary matrix</description> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
3 <macros> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
4 <import>macros.xml</import> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
5 </macros> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
6 <expand macro="requirements"> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
7 <requirement type="package" version="2.0.1">r-argparse</requirement> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
8 <requirement type="package" version="1.4.0">r-stringr</requirement> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
9 <requirement type="package" version="1.95_4.12">r-rcurl</requirement> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
10 <requirement type="package" version="1.4.3">r-stringi</requirement> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
11 </expand> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
12 <expand macro="stdio"/> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
13 <command><![CDATA[Rscript |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
14 '${__tool_directory__}/pmids_to_pubtator_matrix.R' |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
15 --input '$input' |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
16 --output '$output' |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
17 --number '$number' |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
18 $byid |
2
ada4c7b3fc39
"planemo upload for repository https://github.com/dlal-group/simtext commit 63f67ee02be2eb4323a5ba5dcdd33d1fd0b7c24e"
dlalgroup
parents:
0
diff
changeset
|
19 --categories |
0
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
20 #for $category in $categories: |
2
ada4c7b3fc39
"planemo upload for repository https://github.com/dlal-group/simtext commit 63f67ee02be2eb4323a5ba5dcdd33d1fd0b7c24e"
dlalgroup
parents:
0
diff
changeset
|
21 '$category' |
0
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
22 #end for |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
23 ]]> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
24 </command> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
25 <inputs> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
26 <param argument="--input" label="Input file" name="input" optional="false" type="data" format="tabular" help="input"/> |
2
ada4c7b3fc39
"planemo upload for repository https://github.com/dlal-group/simtext commit 63f67ee02be2eb4323a5ba5dcdd33d1fd0b7c24e"
dlalgroup
parents:
0
diff
changeset
|
27 <param argument="--categories" name="categories" type="select" label="categories" multiple="true" display="checkboxes"> |
0
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
28 <option value="Gene">Genes</option> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
29 <option value="Disease">Diseases</option> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
30 <option value="Mutation">Mutations</option> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
31 <option value="Chemical">Chemicals</option> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
32 <option value="Species">Species</option> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
33 </param> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
34 <param argument="--byid" label="If you want to find common gene IDs / mesh IDs instead of specific scientific terms." name="byid" type="boolean" truevalue="--byid" falsevalue="" help="byid" checked="false"/> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
35 <param argument="--number" label="Number of most frequent terms/IDs to extract." name="number" optional="true" type="integer" help="number" value="50"/> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
36 </inputs> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
37 <outputs> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
38 <data format="tabular" name="output" /> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
39 </outputs> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
40 <tests> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
41 <test> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
42 <param name="input" value="pubmed_by_queries_output" ftype="tabular"/> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
43 <output name="output" value="pmids_to_pubtator_matrix_output"/> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
44 </test> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
45 </tests> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
46 <help><![CDATA[ |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
47 |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
48 The tool uses all PMIDs per row and extracts "Gene", "Disease", "Mutation", "Chemical" and "Species" terms of the corresponding abstracts, using PubTator annotations. The user can choose from which categories terms should be extracted. The extracted terms are united in one large binary matrix, with 0= term not present in abstracts of that row and 1= term present in abstracts of that row. The user can decide if the scientific terms should be extracted and used as they are or if they should be grouped by their geneIDs/ meshIDs (several terms are often grouped into one ID). The the user can specify a number of most frequent words to extract per row. |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
49 |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
50 Input: Output of 'abstracts_by_pmids' tool, or tab-delimited table with columns containing PMIDs. The names of the PMID columns should start with "PMID", e.g. "PMID_1", "PMID_2" etc. |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
51 |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
52 Output: Binary matrix in that each column represents one of the extracted terms. |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
53 |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
54 ]]></help> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
55 <expand macro="citations"/> |
3f4adc85ba5d
"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff
changeset
|
56 </tool> |