comparison phage_host_prediction/run_galaxy.xml @ 2:3e1e8be4e65c draft default tip

Uploaded
author pedro_araujo
date Fri, 02 Apr 2021 10:11:13 +0000
parents
children
comparison
equal deleted inserted replaced
1:d9cda08472ea 2:3e1e8be4e65c
1 <tool id="run_galaxy" name="PhageHostPrediction" version="0.1.0" python_template_version="3.5">
2 <description>prediction of phage-bacteria interactions</description>
3 <requirements>
4 <requirement type="package">biopython</requirement>
5 <requirement type="package">scikit-learn</requirement>
6 <requirement type="package">numpy</requirement>
7 <requirement type="package">pandas</requirement>
8 <requirement type="package">scikit-bio</requirement>
9 </requirements>
10 <command detect_errors="exit_code" interpreter="python3"><![CDATA[
11 $__tool_directory__/run_galaxy.py
12 $input_phage.phage_input_type $input_phage.phages
13
14 $input_bact.bact_input_type $input_bact.bacts
15
16 $adv.run_interpro $adv.ml_model
17
18 ]]></command>
19 <inputs>
20 <conditional name="input_phage">
21 <param type="select" name="phage_input_type" label='Phage input:'>
22 <option value="ID" selected="true">NCBI IDs (comma separated)</option>
23 <option value="seq_file" selected="false">Sequence fasta file (only one organism)</option>
24 </param>
25 <when value="ID">
26 <param type="text" name="phages" label='Phage IDs'/>
27 </when>
28 <when value="seq_file">
29 <param type="data" name="phages" label='Phage fasta file' format="fasta"/>
30 </when>
31 </conditional>
32
33 <conditional name="input_bact">
34 <param type="select" name="bact_input_type" label='Bacteria input:'>
35 <option value="ID" selected="true">NCBI IDs (comma separated)</option>
36 <option value="seq_file" selected="false">Sequence fasta file (only one organism)</option>
37 </param>
38 <when value="ID">
39 <param type="text" name="bacts" label='Bacteria IDs'/>
40 </when>
41 <when value="seq_file">
42 <param type="data" name="bacts" label='Bacteria fasta file' format="fasta"/>
43 </when>
44 </conditional>
45
46 <section name='adv' lable="Advanced options" title='Advanced Options' expanded='false'>
47 <param type="boolean" name="run_interpro" label='Perform interpro search' checked="false" truevalue="True" falsevalue="False" />
48 <param type="select" name="ml_model" label="Machine learning model">
49 <option value="RandomForests" selected="yes">Random Forests</option>
50 <option value="SVM">SVM</option>
51 </param>
52 </section>
53 </inputs>
54 <outputs>
55 <data name="output1" format="tabular" from_work_dir="output.tsv" />
56 </outputs>
57 <help>
58
59 PhageHostPrediction
60 ===================
61
62 Predict interactions between phages and bacterial strains.
63
64 PhageHostPrediction is a python script that predicts phage-host interactions for *E. coli*, *K. pneumoniae* and *A. baumannii* phages, using supervised machine learning models. The models were built from a dataset containing 252 features and 23 987 entries with balanced outputs of 'Yes' and 'No'. The positive cases of interaction predicted are described in the file "NCBI_Phage_Bacteria_Data.csv", contained within this tool, while the negative were randomly assigned by pairing phages with bacteria of different species.
65
66 The prediction resorts to complete host proteome and to phage tail proteins, that are inferred within the tool. This inference is made with a locally created database of phage protein functions, available in the file "phagesProteins.json". Unknown proteins are predicted against this database. To help with this prediction, the use of InterProScan is made optional.
67
68 **Inputs:**
69
70 * phage/bacteria genome format: ID vs fasta;
71 * ID: must be a GenBank ID, with the proteome described;
72 * fasta file: must contain the whole proteome of the organism;
73 * machine learning model: random forests have better predictive power, while SVM can be slightly faster to run;
74 * interpro search: should predict tails with higher confidence, but it significantly increases time to run.
75
76 **Outputs:**
77 this tool outputs a tabular file in which phage-host pairs are present in the first column and the prediction result in the second.
78
79 **Requirements:**
80
81 * Biopython
82 * Scikit-learn
83 * Numpy
84 * Pandas
85 * Scikit-bio
86 * BLAST_ - must be installed locally and available globally as an environment variable
87 * InterProScan_ (optional) - must be installed locally and available globally as an environment variable
88
89 .. _BLAST: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
90 .. _InterProScan: http://www.ebi.ac.uk/interpro/download/
91
92 </help>
93 </tool>