# HG changeset patch # User pedro_araujo # Date 1612450170 0 # Node ID cb678661a1da2bb201d08b8001d13bf7d37ee1dd # Parent ca8d2b919299d76d8ce6e4d0bb83f2a3971b7d55 Deleted selected files diff -r ca8d2b919299 -r cb678661a1da README.rst --- a/README.rst Sun Jan 31 11:39:16 2021 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,32 +0,0 @@ - -**PhageHostPrediction** - -Predict interactions between phages and bacterial strains. - -PhageHostPrediction is a python script that predicts phage-host interactions for *E. coli*, *K. pneumoniae* and *A. baumannii* phages, using supervised machine learning models. The models were built from a dataset containing 252 features and 23 987 entries with balanced outputs of 'Yes' and 'No'. The positive cases of interaction predicted are described in the file "NCBI_Phage_Bacteria_Data.csv", contained within this tool, while the negative were randomly assigned by pairing phages with bacteria of different species. - -The prediction resorts to complete host proteome and to phage tail proteins, that are inferred within the tool. This inference is made with a locally created database of phage protein functions, available in the file "phagesProteins.json". Unknown proteins are predicted against this database. To help with this prediction, the use of InterProScan is made optional. - -**Inputs:** - -* phage/bacteria genome format: ID vs fasta; -* ID: must be a GenBank ID, with the proteome described; -* fasta file: must contain the whole proteome of the organism; -* machine learning model: random forests have better predictive power, while SVM can be slightly faster to run; -* interpro search: should predict tails with higher confidence, but it significantly increases time to run. - -**Outputs:** -This tool outputs a tabular file in which phage-host pairs are present in the first column and the prediction result in the second. - -**Requirements:** - -* Biopython -* Scikit-learn -* Numpy -* Pandas -* Scikit-bio -* BLAST_ - must be installed locally and available globally as an environment variable -* InterProScan_ (optional) - must be installed locally and available globally as an environment variable - -.. _BLAST: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ -.. _InterProScan: http://www.ebi.ac.uk/interpro/download/ diff -r ca8d2b919299 -r cb678661a1da run_galaxy.xml --- a/run_galaxy.xml Sun Jan 31 11:39:16 2021 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,93 +0,0 @@ - - prediction of phage-bacteria interactions - - biopython - scikit-learn - numpy - pandas - scikit-bio - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - -
-
- - - - - -PhageHostPrediction -=================== - -Predict interactions between phages and bacterial strains. - -PhageHostPrediction is a python script that predicts phage-host interactions for *E. coli*, *K. pneumoniae* and *A. baumannii* phages, using supervised machine learning models. The models were built from a dataset containing 252 features and 23 987 entries with balanced outputs of 'Yes' and 'No'. The positive cases of interaction predicted are described in the file "NCBI_Phage_Bacteria_Data.csv", contained within this tool, while the negative were randomly assigned by pairing phages with bacteria of different species. - -The prediction resorts to complete host proteome and to phage tail proteins, that are inferred within the tool. This inference is made with a locally created database of phage protein functions, available in the file "phagesProteins.json". Unknown proteins are predicted against this database. To help with this prediction, the use of InterProScan is made optional. - -**Inputs:** - -* phage/bacteria genome format: ID vs fasta; -* ID: must be a GenBank ID, with the proteome described; -* fasta file: must contain the whole proteome of the organism; -* machine learning model: random forests have better predictive power, while SVM can be slightly faster to run; -* interpro search: should predict tails with higher confidence, but it significantly increases time to run. - -**Outputs:** -this tool outputs a tabular file in which phage-host pairs are present in the first column and the prediction result in the second. - -**Requirements:** - -* Biopython -* Scikit-learn -* Numpy -* Pandas -* Scikit-bio -* BLAST_ - must be installed locally and available globally as an environment variable -* InterProScan_ (optional) - must be installed locally and available globally as an environment variable - -.. _BLAST: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ -.. _InterProScan: http://www.ebi.ac.uk/interpro/download/ - - -
\ No newline at end of file diff -r ca8d2b919299 -r cb678661a1da tool_dependencies.xml --- a/tool_dependencies.xml Sun Jan 31 11:39:16 2021 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,7 +0,0 @@ - - - - - - -