Mercurial > repos > dlalgroup > pmids_to_pubtator_matrix

  <tool id="pmids_to_pubtator_matrix" name="pmids_to_pubtator_matrix" version="@VERSION@">
    <description>Per row, extract scientific terms from PMIDs susing PubTator and generate a binary matrix</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements">
      <requirement type="package" version="2.0.1">r-argparse</requirement>
      <requirement type="package" version="1.4.0">r-stringr</requirement>
      <requirement type="package" version="1.95_4.12">r-rcurl</requirement>
      <requirement type="package" version="1.4.3">r-stringi</requirement>
    </expand>
    <expand macro="stdio"/>
    <command><![CDATA[Rscript
      '${__tool_directory__}/pmids_to_pubtator_matrix.R'
      --input '$input'
      --output '$output'
      --number '$number'
      $byid
      --categories
      #for $category in $categories:
      '$category'
      #end for
      ]]>
    </command>
    <inputs>
      <param argument="--input" label="Input file" name="input" optional="false" type="data" format="tabular" help="input"/>
      <param argument="--categories" name="categories" type="select" label="categories" multiple="true" display="checkboxes">
        <option value="Gene">Genes</option>
        <option value="Disease">Diseases</option>
        <option value="Mutation">Mutations</option>
        <option value="Chemical">Chemicals</option>
        <option value="Species">Species</option>
      </param>
       <param argument="--byid" label="If you want to find common gene IDs / mesh IDs instead of specific scientific terms." name="byid" type="boolean" truevalue="--byid" falsevalue="" help="byid" checked="false"/>
        <param argument="--number" label="Number of most frequent terms/IDs to extract." name="number" optional="true" type="integer" help="number" value="50"/>
    </inputs>
    <outputs>
      <data format="tabular" name="output" />
    </outputs>
    <tests>
        <test>
          <param name="input" value="pubmed_by_queries_output" ftype="tabular"/>
          <output name="output" value="pmids_to_pubtator_matrix_output"/>
        </test>
    </tests>
    <help><![CDATA[

The tool uses all PMIDs per row and extracts "Gene", "Disease", "Mutation", "Chemical" and "Species" terms of the corresponding abstracts, using PubTator annotations. The user can choose from which categories terms should be extracted. The extracted terms are united in one large binary matrix, with 0= term not present in abstracts of that row and 1= term present in abstracts of that row. The user can decide if the scientific terms should be extracted and used as they are or if they should be grouped by their geneIDs/ meshIDs (several terms are often grouped into one ID). The the user can specify a number of most frequent words to extract per row.

Input: Output of 'abstracts_by_pmids' tool, or tab-delimited table with columns containing PMIDs. The names of the PMID columns should start with "PMID", e.g. "PMID_1", "PMID_2" etc.

Output: Binary matrix in that each column represents one of the extracted terms.

        ]]></help>
    <expand macro="citations"/>
</tool>
author	dlalgroup
date	Thu, 08 Oct 2020 05:39:58 +0000
parents	3f4adc85ba5d
children