annotate README.md @ 1:f55a02423699 draft

"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
author dlalgroup
date Thu, 24 Sep 2020 04:28:18 +0000
parents 3f4adc85ba5d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
1 # SimText
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
2
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
3 A text mining framework for interactive analysis and visualization of similarities among biomedical entities.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
4
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
5 ## Brief overview of tools:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
6
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
7 - pubmed_by_queries:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
8
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
9 For each search query, PMIDs or abstracts from PubMed are saved.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
10
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
11 - abstracts_by_pmids:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
12
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
13 For all PMIDs in each row of a table the according abstracts are saved in additional columns.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
14
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
15 - text_to_wordmatrix:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
16
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
17 The most frequent words of text from each row are extracted and united in one large binary matrix.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
18
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
19 - pmids_to_pubtator_matrix:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
20
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
21 For PMIDs of each row, scientific words are extracted using PubTator annotations and subsequently united in one large binary matrix.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
22
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
23 - simtext_app:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
24
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
25 Shiny app with word clouds, dimension reduction plot, dendrogram of hierarchical clustering and table with words and their frequency among the search queries.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
26
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
27 ## Requirements
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
28
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
29 - R (version > 4.0.0)
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
30
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
31 ## Installation
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
32
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
33 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
34 $ mkdir -p <path>/simtext
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
35 $ cd <path>/simtext
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
36 $ git clone https://github.com/dlal-group/simtext
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
37 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
38
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
39 ## pubmed_by_queries
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
40
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
41 This tool uses a set of search queries to download a defined number of abstracts or PMIDs for each search query from PubMed. PubMed's search rules and syntax apply. Users can obtain an API key from the Settings page of their NCBI account (to create an account, visit http://www.ncbi.nlm.nih.gov/account/).
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
42
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
43 Input:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
44
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
45 Tab-delimited table with a list of search queries (biomedical entities of interest) in one column. The column header should start with "ID_" (e.g., "ID_gene" if search queries are genes).
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
46
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
47 Usage:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
48 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
49 $ Rscript pubmed_by_queries.R [-h] [-i INPUT] [-o OUTPUT] [-n NUMBER] [-a] [-k KEY] [--install_packages]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
50 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
51
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
52 Optional arguments:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
53 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
54 -h, --help show help message
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
55 -i INPUT, --input INPUT input file name. add path if file is not in working directory
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
56 -o OUTPUT, --output OUTPUT output file name [default "pubmed_by_queries_output"]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
57 -n NUMBER, --number NUMBER number of PMIDs or abstracts to save per ID [default "5"]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
58 -a, --abstract if abstracts instead of PMIDs should be retrieved use --abstracts
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
59 -k KEY, --key KEY if NCBI API key is available, add it to speed up the download of PubMed data
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
60 --install_packages if you want to auto install missing required packages
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
61 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
62
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
63 Output:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
64
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
65 A table with additional columns containing PMIDs or abstracts from PubMed.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
66
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
67 ## abstracts_by_pmids
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
68
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
69 This tool retrieves abstracts for a matrix of PMIDs. The abstract text is saved in additional columns.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
70
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
71 Input:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
72
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
73 Tab-delimited table with rows representing biomedical entities and columns containing the corresponding PMIDs. The names of the PMID columns should start with “PMID_” (e.g., “PMID_1”, “PMID_2” etc.).
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
74
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
75 Usage:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
76 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
77 $ Rscript abstracts_by_pmid.R [-h] [-i INPUT] [-o OUTPUT]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
78 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
79
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
80 Optional arguments:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
81 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
82 -h, --help show help message
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
83 -i INPUT, --input INPUT input file name. add path if file is not in working directory
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
84 -o OUTPUT, --output OUTPUT output file name [default "abstracts_by_pmids_output"]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
85 --install_packages if you want to auto install missing required packages
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
86 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
87
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
88 Output:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
89
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
90 A table with additional columns containing abstract texts.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
91
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
92 ## text_to_wordmatrix
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
93
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
94 The tool extracts for each row the most frequent words from the text in columns starting with "ABSTRACT" or "TEXT. The extracted words from each row are united in one large binary matrix, with 0= word not frequently occurring in text of that row and 1= word frequently present in text of that row.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
95
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
96 Input:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
97
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
98 The output of ‘pubmed_by_queries’ or ‘abstracts_by_pmids’ tools, or a tab-delimited table with text in columns starting with "ABSTRACT" or "TEXT".
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
99
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
100 Usage:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
101 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
102 $ Rscript text_to_wordmatrix.R [-h] [-i INPUT] [-o OUTPUT] [-n NUMBER] [-r] [-l] [-w] [-s] [-p]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
103 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
104
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
105 Optional arguments:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
106 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
107 -h, --help show help message
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
108 -i INPUT, --input INPUT input file name. add path if file is not in working directory
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
109 -o OUTPUT, --output OUTPUT output file name. [default "text_to_wordmatrix_output"]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
110 -n NUMBER, --number NUMBER number of most frequent words that should be extracted per row [default "50"]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
111 -r, --remove_num remove any numbers in text
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
112 -l, --lower_case by default all characters are translated to lower case. otherwise use -l
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
113 -w, --remove_stopwords by default a set of english stopwords (e.g., 'the' or 'not') are removed. otherwise use -w
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
114 -s, --stemDoc apply Porter's stemming algorithm: collapsing words to a common root to aid comparison of vocabulary
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
115 -p, --plurals by default words in plural and singular are merged to the singular form. otherwise use -p
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
116 -- install_packages if you want to auto install missing required packages
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
117 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
118
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
119 Output:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
120
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
121 A binary matrix in that each column represents one of the extracted words.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
122
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
123 ## pmids_to_pubtator_matrix
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
124
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
125 The tool uses all PMIDs per row and extracts "Gene", "Disease", "Mutation", "Chemical" and "Species" terms of the corresponding abstracts, using PubTator annotations. The user can choose from which categories terms should be extracted. The extracted terms are united in one large binary matrix, with 0= term not present in abstracts of that row and 1= term present in abstracts of that row. The user can decide if the scientific terms should be extracted and used as they are or if they should be grouped by their geneIDs/ meshIDs (several terms are often grouped into one ID). Also, by default all terms are extracted, otherwise the user can specify a number of most frequent words to extract per row.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
126
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
127 Input:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
128
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
129 Output of 'abstracts_by_pmids' tool, or tab-delimited table with columns containing PMIDs. The names of the PMID columns should start with "PMID", e.g. "PMID_1", "PMID_2" etc.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
130
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
131 Usage:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
132 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
133 $ Rscript pmids_to_pubtator_matrix.R [-h] [-i INPUT] [-o OUTPUT] [-b BYID] [-n NUMBER][-c {Gene,Disease,Mutation,Chemical,Species} [{Gene,Disease,Mutation,Chemical,Species} ...]]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
134 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
135
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
136 Optional arguments:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
137 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
138 -h, --help show help message
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
139 -i INPUT, --input INPUT input file name. add path if file is not in workind directory
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
140 -o OUTPUT, --output OUTPUT output file name. [default "pmids_to_pubtator_matrix_output"]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
141 -b, --byid if you want to find common gene IDs / mesh IDs instead of specific scientific terms.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
142 -n NUMBER, --number NUMBER number of most frequent terms/IDs to extract. by default all terms/IDs are extracted.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
143 -c [...], --categories [...] PubTator categories that should be considered [default "('Gene', 'Disease', 'Mutation','Chemical')"]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
144 -- install_packages if you want to auto install missing required packages
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
145 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
146
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
147 Output:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
148
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
149 Binary matrix in that each column represents one of the extracted terms.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
150
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
151 ## simtext_app
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
152
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
153 The tool enables the exploration of data generated by ‘text_to_wordmatrix’ or ‘pmids_to_pubtator_matrix’ tools in a Shiny local instance. The following features can be generated: 1) word clouds for each initial search query, 2) dimension reduction and hierarchical clustering of binary matrices, and 3) tables with words and their frequency in the search queries.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
154
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
155 Input:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
156
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
157 1) Input 1:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
158 Tab-delimited table with
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
159 - A column with initial search queries starting with "ID_" (e.g., "ID_gene" if initial search queries were genes).
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
160 - Column(s) with grouping factor(s) to compare pre-existing categories of the initial search queries with the grouping based on text. The column names should start with "GROUPING_". If the column name is "GROUPING_disorder", "disorder" will be shown as a grouping variable in the app.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
161 2) Input 2:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
162 The output of ‘text_to_wordmatrix’ or ‘pmids_to_pubtator_matrix’ tools, or a binary matrix.
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
163
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
164 Usage:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
165 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
166 $ Rscript simtext_app.R [-h] [-i INPUT] [-m MATRIX] [-p PORT]
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
167 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
168
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
169 Optional arguments:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
170 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
171 -h, --help show help message
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
172 -i INPUT, --input INPUT input file name. add path if file is not in working directory
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
173 -m MATRIX, --matrix MATRIX matrix file name. add path if file is not in working directory
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
174 -p PORT, --port PORT specify port, otherwise randomly selected
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
175 --host specify host
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
176 -- install_packages if you want to auto install missing required packages
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
177 ```
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
178
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
179 Output:
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
180
3f4adc85ba5d "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
181 SimText app