comparison README.rst @ 14:d00e15139065 draft

planemo upload for repository https://github.com/usegalaxy-au/tools-au commit d490defa32d9c318137d2d781243b392cb14110d-dirty
author galaxy-australia
date Tue, 28 Feb 2023 01:15:42 +0000
parents 7fbec959cf2b
children f9eb041c518c
comparison
equal deleted inserted replaced
13:c0e71cb2bd1b 14:d00e15139065
73 73
74 REFERENCE DATA 74 REFERENCE DATA
75 ~~~~~~~~~~~~~~ 75 ~~~~~~~~~~~~~~
76 76
77 Alphafold needs reference data to run. The wrapper expects this data to 77 Alphafold needs reference data to run. The wrapper expects this data to
78 be present at ``/data/alphafold_databases``. A custom DB root can be read from 78 be present at ``/data/alphafold_databases``. A custom path will be read from
79 the ALPHAFOLD_DB environment variable, if set. To download the AlphaFold, 79 the ``ALPHAFOLD_DB`` environment variable, if set.
80 reference data, run the following shell script command in the tool directory. 80
81 81 To download the AlphaFold reference DBs:
82 :: 82
83 83 ::
84 # Set databases root 84
85 ALPHAFOLD_DB_ROOT=/data/alphafold_databases 85 # Set your AlphaFold DB path
86 86 ALPHAFOLD_DB=/data/alphafold_databases
87 # make folders if needed 87
88 mkdir -p $ALPHAFOLD_DB_ROOT 88 # Set your target AlphaFold version
89 89 ALPHAFOLD_VERSION= # e.g. 2.1.2
90 # download ref data 90
91 bash scripts/download_all_data.sh $ALPHAFOLD_DB_ROOT 91 # Download repo
92 92 wget https://github.com/deepmind/alphafold/releases/tag/v${ALPHAFOLD_VERSION}.tar.gz
93 This will install the reference data to ``/data/alphafold_databases``. 93 tar xzf v${ALPHAFOLD_VERSION}.tar.gz
94
95 # Ensure dirs
96 mkdir -p $ALPHAFOLD_DB
97
98 # Download
99 bash alphafold*/scripts/download_all_data.sh $ALPHAFOLD_DB
100
101 You will most likely want to run this as a background job, as it will take a
102 very long time (7+ days in Australia).
103
104 This will install the reference data to your ``$ALPHAFOLD_DB``.
94 To check this has worked, ensure the final folder structure is as 105 To check this has worked, ensure the final folder structure is as
95 follows: 106 follows:
96 107
97 :: 108 ::
109
110 # NOTE: this structure will change between minor AlphaFold versions
111 # The tree shown below was updated for v2.3.1
98 112
99 data/alphafold_databases 113 data/alphafold_databases
100 ├── bfd 114 ├── bfd
101 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata 115 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata
102 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex 116 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex
103 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata 117 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata
104 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex 118 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex
105 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata 119 │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata
106 │   └── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex 120 │   └── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex
107 ├── mgnify 121 ├── mgnify
108 │   └── mgy_clusters_2018_12.fa 122 │   └── mgy_clusters_2022_05.fa
109 ├── params 123 ├── params
110 │   ├── LICENSE 124 │   ├── LICENSE
111 │   ├── params_model_1.npz 125 │   ├── params_model_1.npz
126 │   ├── params_model_1_multimer_v3.npz
112 │   ├── params_model_1_ptm.npz 127 │   ├── params_model_1_ptm.npz
113 │   ├── params_model_2.npz 128 │   ├── params_model_2.npz
129 │   ├── params_model_2_multimer_v3.npz
114 │   ├── params_model_2_ptm.npz 130 │   ├── params_model_2_ptm.npz
115 │   ├── params_model_3.npz 131 │   ├── params_model_3.npz
132 │   ├── params_model_3_multimer_v3.npz
116 │   ├── params_model_3_ptm.npz 133 │   ├── params_model_3_ptm.npz
117 │   ├── params_model_4.npz 134 │   ├── params_model_4.npz
135 │   ├── params_model_4_multimer_v3.npz
118 │   ├── params_model_4_ptm.npz 136 │   ├── params_model_4_ptm.npz
119 │   ├── params_model_5.npz 137 │   ├── params_model_5.npz
138 │   ├── params_model_5_multimer_v3.npz
120 │   └── params_model_5_ptm.npz 139 │   └── params_model_5_ptm.npz
121 ├── pdb70 140 ├── pdb70
122 │   ├── md5sum 141 │   ├── md5sum
123 │   ├── pdb70_a3m.ffdata 142 │   ├── pdb70_a3m.ffdata
124 │   ├── pdb70_a3m.ffindex 143 │   ├── pdb70_a3m.ffindex
129 │   ├── pdb70_hhm.ffindex 148 │   ├── pdb70_hhm.ffindex
130 │   └── pdb_filter.dat 149 │   └── pdb_filter.dat
131 ├── pdb_mmcif 150 ├── pdb_mmcif
132 │   ├── mmcif_files 151 │   ├── mmcif_files
133 │   └── obsolete.dat 152 │   └── obsolete.dat
134 ├── uniclust30 153 ├── pdb_seqres
135 │   └── uniclust30_2018_08 154 │   └── pdb_seqres.txt
155 ├── uniprot
156 │   └── uniprot.fasta
157 ├── uniref30
158 │   ├── UniRef30_2021_03.md5sums
159 │   ├── UniRef30_2021_03_a3m.ffdata
160 │   ├── UniRef30_2021_03_a3m.ffindex
161 │   ├── UniRef30_2021_03_cs219.ffdata
162 │   ├── UniRef30_2021_03_cs219.ffindex
163 │   ├── UniRef30_2021_03_hhm.ffdata
164 │   └── UniRef30_2021_03_hhm.ffindex
136 └── uniref90 165 └── uniref90
137 └── uniref90.fasta 166 └── uniref90.fasta
138 167
139 In more recent releases of the AlphaFold tool, you will need to download an 168 In more recent releases of the AlphaFold tool, you will need to download an
140 additional file to allow the ``reduced_dbs`` option: 169 additional file to allow the ``reduced_dbs`` option:
141 170
142 :: 171 ::
149 178
150 data/alphafold_databases 179 data/alphafold_databases
151 ├── small_bfd 180 ├── small_bfd
152 │   └── bfd-first_non_consensus_sequences.fasta 181 │   └── bfd-first_non_consensus_sequences.fasta
153 182
183
184 **Upgrading database versions**
185
186 When upgrading to a new minor version of AlphaFold, you will most likely have to
187 upgrade the reference database. This can be a pain, due to the size of the
188 databases and the obscurity around what has changed. The simplest way to do
189 this is simply create a new directory and download the DBs from scratch.
190 However, you can save a considerable amount of time by downloading only the
191 components that have changed.
192
193 If you wish to continue hosting prior versions of the tool, you must maintain
194 the reference DBs for each version. The ``ALPHAFOLD_DB`` environment variable
195 must then be set respectively for each tool version in your job conf (on Galaxy
196 AU this is currently `configured with TPV<https://github.com/usegalaxy-au/infrastructure/blob/master/files/galaxy/dynamic_job_rules/production/total_perspective_vortex/tools.yml#L1515-L1554>`_).
197
198 To minimize redundancy between DB version, we have symlinked the database
199 components that are unchanging between versions. In ``v2.1.2 -> v2.3.1`` the BFD
200 database is the only component that is persistent, but they are by far the
201 largest on disk.
154 202
155 203
156 JOB DESTINATION 204 JOB DESTINATION
157 ~~~~~~~~~~~~~~~ 205 ~~~~~~~~~~~~~~~
158 206