Mercurial > repos > iuc > virhunter
comparison README.txt @ 0:457fd8fd681a draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/VirHunter commit 628688c1302dbf972e48806d2a5bafe27847bdcc
author | iuc |
---|---|
date | Wed, 09 Nov 2022 12:19:26 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:457fd8fd681a |
---|---|
1 # VirHunter | |
2 | |
3 VirHunter is a deep learning method that uses Convolutional Neural Networks (CNNs) and a Random Forest Classifier to identify viruses in sequening datasets. More precisely, VirHunter classifies previously assembled contigs as viral, host and bacterial (contamination). | |
4 | |
5 ## System Requirements | |
6 VirHunter installation requires a Unix environment with [python 3.8](http://www.python.org/). | |
7 It was tested on Linux and macOS operating systems. | |
8 For now, VirHunter is still not fully compatible with M1 chip MacBook. | |
9 | |
10 In order to run VirHunter you need to have git and conda already installed. | |
11 If you are installing conda for the first time, we suggest you to use | |
12 a lightweight [miniconda](https://docs.conda.io/en/latest/miniconda.html). | |
13 Otherwise, you can use pip for the dependencies' installation. | |
14 | |
15 ## Installation | |
16 | |
17 To install VirHunter, you need to download it from github and then to install the dependancies. | |
18 | |
19 First, clone the repository from [github](https://github.com/cbib/virhunter) | |
20 | |
21 git clone https://github.com/cbib/virhunter.git | |
22 | |
23 Go to the VirHunter root folder | |
24 | |
25 cd virhunter/ | |
26 | |
27 ### Installing dependencies with Conda | |
28 | |
29 First, you have to create the environment from the envs/environment.yml file. | |
30 The installation may take around 500 Mb of drive space. | |
31 | |
32 conda env create -f envs/environment.yml | |
33 | |
34 Second, activate the environment: | |
35 | |
36 conda activate virhunter | |
37 | |
38 ### Installing dependencies with pip | |
39 | |
40 If you don't have Conda installed in your system, you can install python dependencies via pip program: | |
41 | |
42 pip install -r envs/requirements.txt | |
43 | |
44 Then if you have macOS you will need to install wget library to run some scripts (Conda installation already has it). You can do this with brew package manager. | |
45 | |
46 brew install wget | |
47 | |
48 ### Testing your installation of VirHunter | |
49 | |
50 You can test that VirHunter was successfully installed on the toy dataset we provide. | |
51 IMPORTANT: the toy dataset is intended only to test that VirHunter has been well installed and all the scripts can be executed. | |
52 These modules should not be used for prediction on your owd datasets! | |
53 | |
54 First, you have to download the toy dataset | |
55 | |
56 bash scripts/download_test_installation.sh | |
57 | |
58 Then run the bash script that calls the testing, training and prediction python scripts of VirHunter. | |
59 Attention, the training process may take some time (up to an hour). | |
60 | |
61 bash scripts/test_installation.sh | |
62 | |
63 | |
64 ## Using VirHunter for prediction | |
65 | |
66 To run VirHunter you can use the already pre-trained models or train VirHunter yourself (described in the next section). | |
67 Pre-trained model weights are already available for the multiple host plants. | |
68 You can download them using the download_weights.sh script. | |
69 | |
70 bash scripts/download_weights.sh | |
71 | |
72 Once the config file is ready, you can start the prediction: | |
73 | |
74 python virhunter/predict.py --test_ds /path/to/test_ds_1 | |
75 | |
76 After prediction VirHunter produces two csv files and one optional fasta file: | |
77 | |
78 1. The first file ends with _predicted_fragments.csv | |
79 It is an intermediate result containing predictions of the three CNN networks (probabilities of belonging to each of the virus/plant/bacteria class) and of the RF classifier for each fragment of every contig. | |
80 | |
81 2. The second file ends with _predicted.csv. | |
82 This file contains final predictions for contigs calculated from the previous file. | |
83 - id - fasta header of a contig. | |
84 - length - length of the contig. | |
85 - # viral fragments, # plant fragments and # bacterial fragments - the number of fragments of the contig that received corresponding class prediction by the RF classifier. | |
86 - decision - class given by the VirHunter to the contig. | |
87 - # viral / # total - number of viral fragments divided by the total number of fragments of the contig. | |
88 - # viral / # total * length - number of viral fragments divided by the total number of fragments of the contig multiplied by contig length. It is used to display the most relevant contigs first. | |
89 | |
90 3. The fasta file ends with _viral.fasta. It contains contigs that were predicted as viral by VirHunter. |