comparison README.txt @ 0:457fd8fd681a draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/VirHunter commit 628688c1302dbf972e48806d2a5bafe27847bdcc
author iuc
date Wed, 09 Nov 2022 12:19:26 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:457fd8fd681a
1 # VirHunter
2
3 VirHunter is a deep learning method that uses Convolutional Neural Networks (CNNs) and a Random Forest Classifier to identify viruses in sequening datasets. More precisely, VirHunter classifies previously assembled contigs as viral, host and bacterial (contamination).
4
5 ## System Requirements
6 VirHunter installation requires a Unix environment with [python 3.8](http://www.python.org/).
7 It was tested on Linux and macOS operating systems.
8 For now, VirHunter is still not fully compatible with M1 chip MacBook.
9
10 In order to run VirHunter you need to have git and conda already installed.
11 If you are installing conda for the first time, we suggest you to use
12 a lightweight [miniconda](https://docs.conda.io/en/latest/miniconda.html).
13 Otherwise, you can use pip for the dependencies' installation.
14
15 ## Installation
16
17 To install VirHunter, you need to download it from github and then to install the dependancies.
18
19 First, clone the repository from [github](https://github.com/cbib/virhunter)
20
21 git clone https://github.com/cbib/virhunter.git
22
23 Go to the VirHunter root folder
24
25 cd virhunter/
26
27 ### Installing dependencies with Conda
28
29 First, you have to create the environment from the envs/environment.yml file.
30 The installation may take around 500 Mb of drive space.
31
32 conda env create -f envs/environment.yml
33
34 Second, activate the environment:
35
36 conda activate virhunter
37
38 ### Installing dependencies with pip
39
40 If you don't have Conda installed in your system, you can install python dependencies via pip program:
41
42 pip install -r envs/requirements.txt
43
44 Then if you have macOS you will need to install wget library to run some scripts (Conda installation already has it). You can do this with brew package manager.
45
46 brew install wget
47
48 ### Testing your installation of VirHunter
49
50 You can test that VirHunter was successfully installed on the toy dataset we provide.
51 IMPORTANT: the toy dataset is intended only to test that VirHunter has been well installed and all the scripts can be executed.
52 These modules should not be used for prediction on your owd datasets!
53
54 First, you have to download the toy dataset
55
56 bash scripts/download_test_installation.sh
57
58 Then run the bash script that calls the testing, training and prediction python scripts of VirHunter.
59 Attention, the training process may take some time (up to an hour).
60
61 bash scripts/test_installation.sh
62
63
64 ## Using VirHunter for prediction
65
66 To run VirHunter you can use the already pre-trained models or train VirHunter yourself (described in the next section).
67 Pre-trained model weights are already available for the multiple host plants.
68 You can download them using the download_weights.sh script.
69
70 bash scripts/download_weights.sh
71
72 Once the config file is ready, you can start the prediction:
73
74 python virhunter/predict.py --test_ds /path/to/test_ds_1
75
76 After prediction VirHunter produces two csv files and one optional fasta file:
77
78 1. The first file ends with _predicted_fragments.csv
79 It is an intermediate result containing predictions of the three CNN networks (probabilities of belonging to each of the virus/plant/bacteria class) and of the RF classifier for each fragment of every contig.
80
81 2. The second file ends with _predicted.csv.
82 This file contains final predictions for contigs calculated from the previous file.
83 - id - fasta header of a contig.
84 - length - length of the contig.
85 - # viral fragments, # plant fragments and # bacterial fragments - the number of fragments of the contig that received corresponding class prediction by the RF classifier.
86 - decision - class given by the VirHunter to the contig.
87 - # viral / # total - number of viral fragments divided by the total number of fragments of the contig.
88 - # viral / # total * length - number of viral fragments divided by the total number of fragments of the contig multiplied by contig length. It is used to display the most relevant contigs first.
89
90 3. The fasta file ends with _viral.fasta. It contains contigs that were predicted as viral by VirHunter.