annotate phage_host_prediction/README.rst @ 2:3e1e8be4e65c draft default tip

Uploaded
author pedro_araujo
date Fri, 02 Apr 2021 10:11:13 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
1
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
2 **PhageHost**
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
3
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
4 Predict interactions between phages and bacterial strains.
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
5
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
6 PhageHost is a set of python scripts that predict phage-host interactions for *E. coli*, *K. pneumoniae* and *A. baumannii* phages, using supervised machine learning models. The models were built from a dataset containing 252 features and 23 987 entries with balanced outputs of 'Yes' and 'No'. The positive cases of interaction predicted are described in the file "NCBI_Phage_Bacteria_Data.csv", contained within this tool, while the negative were randomly assigned by pairing phages with bacteria of different species.
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
7
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
8 The prediction resorts to complete host proteome and to phage tail proteins, that are inferred within the tool. This inference is made with a locally created database of phage protein functions, available in the file "phagesProteins.json". Unknown proteins are predicted against this database. To help with this prediction, the use of InterProScan is made optional.
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
9
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
10 **Inputs:**
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
11
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
12 * phage/bacteria genome format: ID vs fasta;
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
13 * ID: must be a GenBank ID, with the proteome described;
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
14 * fasta file: must contain the whole proteome of the organism;
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
15 * machine learning model: random forests have better predictive power, while SVM can be slightly faster to run;
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
16 * interpro search: should predict tails with higher confidence, but it significantly increases time to run.
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
17
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
18 **Outputs:**
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
19 This tool outputs a tabular file in which phage-host pairs are present in the first column and the prediction result in the second.
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
20
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
21 **Requirements:**
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
22
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
23 * Biopython
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
24 * Scikit-learn
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
25 * Numpy
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
26 * Pandas
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
27 * Scikit-bio
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
28 * BLAST_ - must be installed locally and available globally as an environment variable
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
29 * InterProScan_ (optional) - must be installed locally and available globally as an environment variable
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
30
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
31 .. _BLAST: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
3e1e8be4e65c Uploaded
pedro_araujo
parents:
diff changeset
32 .. _InterProScan: http://www.ebi.ac.uk/interpro/download/