Mercurial > repos > fubar > egapx_runner
annotate README.md @ 0:d9c5c5b87fec draft
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
author | fubar |
---|---|
date | Sat, 03 Aug 2024 11:16:53 +0000 |
parents | |
children | c8e1543546f8 |
rev | line source |
---|---|
0
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
1 # Eukaryotic Genome Annotation Pipeline - External (EGAPx) |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
2 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
3 EGAPx is the publicly accessible version of the updated NCBI [Eukaryotic Genome Annotation Pipeline](https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/). |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
4 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
5 EGAPx takes an assembly fasta file, a taxid of the organism, and RNA-seq data. Based on the taxid, EGAPx will pick protein sets and HMM models. The pipeline runs `miniprot` to align protein sequences, and `STAR` to align RNA-seq to the assembly. Protein alignments and RNA-seq read alignments are then passed to `Gnomon` for gene prediction. In the first step of `Gnomon`, the short alignments are chained together into putative gene models. In the second step, these predictions are further supplemented by _ab-initio_ predictions based on HMM models. The final annotation for the input assembly is produced as a `gff` file. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
6 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
7 We currently have protein datasets posted that are suitable for most vertebrates and arthropods: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
8 - Chordata - Mammalia, Sauropsida, Actinopterygii (ray-finned fishes) |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
9 - Insecta - Hymenoptera, Diptera, Lepidoptera, Coleoptera, Hemiptera |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
10 - Arthropoda - Arachnida, other Arthropoda |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
11 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
12 We will be adding datasets for plants and other invertebrates in the next couple of months. Fungi, protists and nematodes are currently out-of-scope for EGAPx pending additional refinements. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
13 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
14 We currently have protein datasets posted for most vertebrates (mammals, sauropsids, ray-finned fishes) and arthropods. We will be adding datasets for more arthropods, vertebrates and plants in the next couple of months. Fungi, protists and nematodes are currently out-of-scope for EGAPx pending additional refinements. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
15 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
16 **Warning:** |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
17 The current version is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use. Please open a GitHub [Issue](https://github.com/ncbi/egapx/issues) if you encounter any problems with EGAPx. You can also write to cgr@nlm.nih.gov to give us your feedback or if you have any questions. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
18 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
19 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
20 **Security Notice:** |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
21 EGAPx has dependencies in and outside of its execution path that include several thousand files from the [NCBI C++ toolkit](https://www.ncbi.nlm.nih.gov/toolkit), and more than a million total lines of code. Static Application Security Testing has shown a small number of verified buffer overrun security vulnerabilities. Users should consult with their organizational security team on risk and if there is concern, consider mitigating options like running via VM or cloud instance. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
22 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
23 **License:** |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
24 See the EGAPx license [here](https://github.com/ncbi/egapx/blob/main/LICENSE). |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
25 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
26 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
27 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
28 ## Prerequisites |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
29 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
30 - Docker or Singularity |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
31 - AWS batch, UGE cluster, or a r6a.4xlarge machine (32 CPUs, 256GB RAM) |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
32 - Nextflow v.23.10.1 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
33 - Python v.3.9+ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
34 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
35 Notes: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
36 - General configuration for AWS Batch is described in the Nextflow documentation at https://www.nextflow.io/docs/latest/aws.html |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
37 - See Nextflow installation at https://www.nextflow.io/docs/latest/getstarted.html |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
38 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
39 ## The workflow files |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
40 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
41 - Clone the EGAPx repo: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
42 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
43 git clone https://github.com/ncbi/egapx.git |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
44 cd egapx |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
45 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
46 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
47 ## Input data format |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
48 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
49 Input to EGAPx is in the form of a YAML file. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
50 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
51 - The following are the _required_ key-value pairs for the input file: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
52 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
53 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
54 genome: path to assembled genome in FASTA format |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
55 taxid: NCBI Taxonomy identifier of the target organism |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
56 reads: RNA-seq data |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
57 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
58 You can obtain taxid from the [NCBI Taxonomy page](https://www.ncbi.nlm.nih.gov/taxonomy). |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
59 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
60 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
61 - RNA-seq data can be supplied in any one of the following ways: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
62 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
63 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
64 reads: [ array of paths to reads FASTA or FASTQ files] |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
65 reads: [ array of SRA run IDs ] |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
66 reads: [SRA Study ID] |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
67 reads: SRA query for reads |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
68 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
69 - If you are using your local reads, then the FASTA/FASTQ files should be provided in the following format: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
70 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
71 reads: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
72 - path_to_Sample1_R1.gz |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
73 - path_to_Sample1_R2.gz |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
74 - path_to_Sample2_R1.gz |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
75 - path_to_Sample2_R2.gz |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
76 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
77 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
78 - If you provide an SRA Study ID, all the SRA run ID's belonging to that Study ID will be included in the EGAPx run. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
79 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
80 - The following are the _optional_ key-value pairs for the input file: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
81 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
82 - A protein set. A taxid-based protein set will be chosen if no protein set is provided. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
83 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
84 proteins: path to proteins data in FASTA format. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
85 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
86 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
87 - HMM file used in Gnomon training. A taxid-based HMM will be chosen if no HMM file is provided. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
88 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
89 hmm: path to HMM file |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
90 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
91 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
92 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
93 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
94 ## Input example |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
95 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
96 - A test example YAML file `./examples/input_D_farinae_small.yaml` is included in the `egapx` folder. Here, the RNA-seq data is provided as paths to the reads FASTA files. These FASTA files are a sampling of the reads from the complete SRA read files to expedite testing. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
97 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
98 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
99 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
100 genome: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/020/809/275/GCF_020809275.1_ASM2080927v1/GCF_020809275.1_ASM2080927v1_genomic.fna.gz |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
101 taxid: 6954 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
102 reads: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
103 - https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/data/Dermatophagoides_farinae_small/SRR8506572.1 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
104 - https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/data/Dermatophagoides_farinae_small/SRR8506572.2 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
105 - https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/data/Dermatophagoides_farinae_small/SRR9005248.1 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
106 - https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/data/Dermatophagoides_farinae_small/SRR9005248.2 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
107 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
108 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
109 - To specify an array of NCBI SRA datasets: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
110 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
111 reads: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
112 - SRR8506572 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
113 - SRR9005248 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
114 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
115 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
116 - To specify an SRA entrez query: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
117 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
118 reads: 'txid6954[Organism] AND biomol_transcript[properties] NOT SRS024887[Accession] AND (SRR8506572[Accession] OR SRR9005248[Accession] )' |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
119 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
120 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
121 **Note:** Both the above examples will have more RNA-seq data than the `input_D_farinae_small.yaml` example. To make sure the entrez query does not produce a large number of SRA runs, please run it first at the [NCBI SRA page](https://www.ncbi.nlm.nih.gov/sra). If there are too many SRA runs, then select a few of them and list it in the input yaml. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
122 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
123 - First, test EGAPx on the example provided (`input_D_farinae_small.yaml`, a dust mite) to make sure everything works. This example usually runs under 30 minutes depending upon resource availability. There are other examples you can try: `input_C_longicornis.yaml`, a green fly, and `input_Gavia_tellata.yaml`, a bird. These will take close to two hours. You can prepare your input YAML file following these examples. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
124 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
125 ## Run EGAPx |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
126 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
127 - The `egapx` folder contains the following directories: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
128 - examples |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
129 - nf |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
130 - test |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
131 - third_party_licenses |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
132 - ui |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
133 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
134 - The runner script is within the ui directory (`ui/egapx.py`). |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
135 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
136 - Create a virtual environment where you can run EGAPx. There is a `requirements.txt` file. PyYAML will be installed in this environment. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
137 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
138 python -m venv /path/to/new/virtual/environment |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
139 source /path/to/new/virtual/environment/bin/activate |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
140 pip install -r ui/requirements.txt |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
141 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
142 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
143 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
144 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
145 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
146 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
147 - Run EGAPx for the first time to copy the config files so you can edit them: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
148 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
149 python3 ui/egapx.py ./examples/input_D_farinae_small.yaml -o example_out |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
150 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
151 - When you run `egapx.py` for the first time it copies the template config files to the directory `./egapx_config`. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
152 - You will need to edit these templates to reflect the actual parameters of your setup. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
153 - For AWS Batch execution, set up AWS Batch Service following advice in the AWS link above. Then edit the value for `process.queue` in `./egapx_config/aws.config` file. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
154 - For execution on the local machine you don't need to adjust anything. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
155 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
156 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
157 - Run EGAPx with the following command for real this time. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
158 - For AWS Batch execution, replace temp_datapath with an existing S3 bucket. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
159 - For local execution, use a local path for `-w` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
160 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
161 python3 ui/egapx.py ./examples/input_D_farinae_small.yaml -e aws -w s3://temp_datapath/D_farinae -o example_out |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
162 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
163 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
164 - use `-e aws` for AWS batch using Docker image |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
165 - use `-e docker` for using Docker image |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
166 - use `-e singularity` for using the Singularity image |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
167 - use `-e biowulf_cluster` for Biowulf cluster using Singularity image |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
168 - use '-e slurm` for using SLURM in your HPC. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
169 - Note that for this option, you have to edit `./egapx_config/slurm.config` according to your cluster specifications. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
170 - type `python3 ui/egapx.py  -h ` for the help menu |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
171 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
172 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
173 $ ui/egapx.py -h |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
174 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
175 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
176 !!WARNING!! |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
177 This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
178 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
179 usage: egapx.py [-h] [-o OUTPUT] [-e EXECUTOR] [-c CONFIG_DIR] [-w WORKDIR] [-r REPORT] [-n] [-st] |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
180 [-so] [-dl] [-lc LOCAL_CACHE] [-q] [-v] [-fn FUNC_NAME] |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
181 [filename] |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
182 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
183 Main script for EGAPx |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
184 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
185 optional arguments: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
186 -h, --help show this help message and exit |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
187 -e EXECUTOR, --executor EXECUTOR |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
188 Nextflow executor, one of docker, singularity, aws, or local (for NCBI |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
189 internal use only). Uses corresponding Nextflow config file |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
190 -c CONFIG_DIR, --config-dir CONFIG_DIR |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
191 Directory for executor config files, default is ./egapx_config. Can be also |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
192 set as env EGAPX_CONFIG_DIR |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
193 -w WORKDIR, --workdir WORKDIR |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
194 Working directory for cloud executor |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
195 -r REPORT, --report REPORT |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
196 Report file prefix for report (.report.html) and timeline (.timeline.html) |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
197 files, default is in output directory |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
198 -n, --dry-run |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
199 -st, --stub-run |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
200 -so, --summary-only Print result statistics only if available, do not compute result |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
201 -lc LOCAL_CACHE, --local-cache LOCAL_CACHE |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
202 Where to store the downloaded files |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
203 -q, --quiet |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
204 -v, --verbose |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
205 -fn FUNC_NAME, --func_name FUNC_NAME |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
206 func_name |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
207 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
208 run: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
209 filename YAML file with input: section with at least genome: and reads: parameters |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
210 -o OUTPUT, --output OUTPUT |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
211 Output path |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
212 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
213 download: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
214 -dl, --download-only Download external files to local storage, so that future runs can be |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
215 isolated |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
216 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
217 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
218 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
219 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
220 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
221 ## Test run |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
222 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
223 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
224 $ python3 ui/egapx.py examples/input_D_farinae_small.yaml -e aws -o example_out -w s3://temp_datapath/D_farinae |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
225 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
226 !!WARNING!! |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
227 This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
228 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
229 N E X T F L O W ~ version 23.10.1 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
230 Launching `/../home/user/egapx/ui/../nf/ui.nf` [golden_mercator] DSL2 - revision: c134f40af5 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
231 in egapx block |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
232 executor > awsbatch (67) |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
233 [f5/3007b8] process > egapx:setup_genome:get_genome_info [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
234 [32/a1bfa5] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
235 [96/621c4b] process > egapx:miniprot:run_miniprot [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
236 [6d/766c2f] process > egapx:paf2asn:run_paf2asn [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
237 [56/f1dd6b] process > egapx:best_aligned_prot:run_best_aligned_prot [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
238 [c1/ccc4a3] process > egapx:align_filter_sa:run_align_filter_sa [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
239 [e0/5548d0] process > egapx:run_align_sort [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
240 [a8/456a0e] process > egapx:star_index:build_index [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
241 [d5/6469a6] process > egapx:star_simplified:exec (1) [100%] 2 of 2 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
242 [64/99ab35] process > egapx:bam_strandedness:exec (2) [100%] 2 of 2 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
243 [98/a12969] process > egapx:bam_strandedness:merge [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
244 [78/0d7007] process > egapx:bam_bin_and_sort:calc_assembly_sizes [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
245 [74/bb014e] process > egapx:bam_bin_and_sort:bam_bin (2) [100%] 2 of 2 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
246 [39/3cdd00] process > egapx:bam_bin_and_sort:merge_prepare [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
247 [01/f64e38] process > egapx:bam_bin_and_sort:merge (1) [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
248 [aa/47a002] process > egapx:bam2asn:convert (1) [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
249 [45/6661b3] process > egapx:rnaseq_collapse:generate_jobs [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
250 [64/68bc37] process > egapx:rnaseq_collapse:run_rnaseq_collapse (3) [100%] 9 of 9 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
251 [18/bff1ac] process > egapx:rnaseq_collapse:run_gpx_make_outputs [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
252 [a4/76a4a5] process > egapx:get_hmm_params:run_get_hmm [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
253 [3c/b71c42] process > egapx:chainer:run_align_sort (1) [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
254 [e1/340b6d] process > egapx:chainer:generate_jobs [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
255 [c0/477d02] process > egapx:chainer:run_chainer (16) [100%] 16 of 16 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
256 [9f/27c1c8] process > egapx:chainer:run_gpx_make_outputs [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
257 [5c/8f65d0] process > egapx:gnomon_wnode:gpx_qsubmit [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
258 [34/6ab0c9] process > egapx:gnomon_wnode:annot (1) [100%] 10 of 10 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
259 [a9/e38221] process > egapx:gnomon_wnode:gpx_qdump [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
260 [bc/8ebca4] process > egapx:annot_builder:annot_builder_main [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
261 [5f/6b72c0] process > egapx:annot_builder:annot_builder_input [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
262 [eb/1ccdd0] process > egapx:annot_builder:annot_builder_run [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
263 [4d/6c33db] process > egapx:annotwriter:run_annotwriter [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
264 [b6/d73d18] process > export [100%] 1 of 1 ✔ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
265 Waiting for file transfers to complete (1 files) |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
266 Completed at: 27-Mar-2024 11:43:15 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
267 Duration : 27m 36s |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
268 CPU hours : 4.2 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
269 Succeeded : 67 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
270 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
271 ## Output |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
272 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
273 Look at the output in the out diectory (`example_out`) that was supplied in the command line. The annotation file is called `accept.gff`. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
274 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
275 accept.gff |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
276 annot_builder_output |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
277 nextflow.log |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
278 run.report.html |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
279 run.timeline.html |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
280 run.trace.txt |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
281 run_params.yaml |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
282 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
283 The `nextflow.log` is the log file that captures all the process information and their work directories. `run_params.yaml` has all the parameters that were used in the EGAPx run. More information about the process time and resources can be found in the other run* files. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
284 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
285 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
286 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
287 ## Intermediate files |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
288 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
289 In the above log, each line denotes the process that completed in the workflow. The first column (_e.g._ `[96/621c4b]`) is the subdirectory where the intermediate output files and logs are found for the process in the same line, _i.e._, `egapx:miniprot:run_miniprot`. To see the intermediate files for that process, you can go to the work directory path that you had supplied and traverse to the subdirectory `96/621c4b`: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
290 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
291 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
292 $ aws s3 ls s3://temp_datapath/D_farinae/96/ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
293 PRE 06834b76c8d7ceb8c97d2ccf75cda4/ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
294 PRE 621c4ba4e6e87a4d869c696fe50034/ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
295 $ aws s3 ls s3://temp_datapath/D_farinae/96/621c4ba4e6e87a4d869c696fe50034/ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
296 PRE output/ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
297 2024-03-27 11:19:18 0 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
298 2024-03-27 11:19:28 6 .command.begin |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
299 2024-03-27 11:20:24 762 .command.err |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
300 2024-03-27 11:20:26 762 .command.log |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
301 2024-03-27 11:20:23 0 .command.out |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
302 2024-03-27 11:19:18 13103 .command.run |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
303 2024-03-27 11:19:18 129 .command.sh |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
304 2024-03-27 11:20:24 276 .command.trace |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
305 2024-03-27 11:20:25 1 .exitcode |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
306 $ aws s3 ls s3://temp_datapath/D_farinae/96/621c4ba4e6e87a4d869c696fe50034/output/ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
307 2024-03-27 11:20:24 17127134 aligns.paf |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
308 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
309 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
310 ## Offline mode |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
311 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
312 If you do not have internet access from your cluster, you can run EGAPx in offline mode. To do this, you would first pull the Singularity image, then download the necessary files from NCBI FTP using `egapx.py` script, and then finally use the path of the downloaded folder in the run command. Here is an example of how to download the files and execute EGAPx in the Biowulf cluster. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
313 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
314 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
315 - Download the Singularity image: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
316 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
317 rm egap*sif |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
318 singularity cache clean |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
319 singularity pull docker://ncbi/egapx:0.2-alpha |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
320 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
321 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
322 - Clone the repo: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
323 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
324 git clone https://github.com/ncbi/egapx.git |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
325 cd egapx |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
326 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
327 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
328 - Download EGAPx related files from NCBI: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
329 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
330 python3 ui/egapx.py -dl -lc ../local_cache |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
331 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
332 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
333 - Download SRA reads: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
334 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
335 prefetch SRR8506572 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
336 prefetch SRR9005248 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
337 fasterq-dump --skip-technical --threads 6 --split-files --seq-defline ">\$ac.\$si.\$ri" --fasta -O sradir/ ./SRR8506572 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
338 fasterq-dump --skip-technical --threads 6 --split-files --seq-defline ">\$ac.\$si.\$ri" --fasta -O sradir/ ./SRR9005248 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
339 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
340 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
341 You should see downloaded files inside the 'sradir' folder": |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
342 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
343 ls sradir/ |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
344 SRR8506572_1.fasta SRR8506572_2.fasta SRR9005248_1.fasta SRR9005248_2.fasta |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
345 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
346 Now edit the file paths of SRA reads files in `examples/input_D_farinae_small.yaml` to include the above SRA files. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
347 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
348 - Run `egapx.py` first to edit the `biowulf_cluster.config`: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
349 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
350 ui/egapx.py examples/input_D_farinae_small.yaml -e biowulf_cluster -w dfs_work -o dfs_out -lc ../local_cache |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
351 echo "process.container = '/path_to_/egapx_0.2-alpha.sif'" >> egapx_config/biowulf_cluster.config |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
352 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
353 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
354 - Run `egapx.py`: |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
355 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
356 ui/egapx.py examples/input_D_farinae_small.yaml -e biowulf_cluster -w dfs_work -o dfs_out -lc ../local_cache |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
357 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
358 ``` |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
359 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
360 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
361 ## References |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
362 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
363 Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021 Apr;18(4):366-368. doi: 10.1038/s41592-021-01101-x. Epub 2021 Apr 7. PMID: 33828273; PMCID: PMC8026399. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
364 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
365 Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PMID: 33590861; PMCID: PMC7931819. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
366 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
367 Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PMID: 23104886; PMCID: PMC3530905. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
368 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
369 Li H. Protein-to-genome alignment with miniprot. Bioinformatics. 2023 Jan 1;39(1):btad014. doi: 10.1093/bioinformatics/btad014. PMID: 36648328; PMCID: PMC9869432. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
370 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
371 Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS One. 2016 Oct 5;11(10):e0163962. doi: 10.1371/journal.pone.0163962. PMID: 27706213; PMCID: PMC5051824. |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
372 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
373 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
374 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
375 ## Contact us |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
376 |
d9c5c5b87fec
planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4
fubar
parents:
diff
changeset
|
377 Please open a GitHub [Issue](https://github.com/ncbi/egapx/issues) if you encounter any problems with EGAPx. You can also write to cgr@nlm.nih.gov to give us your feedback or if you have any questions. |