Mercurial > repos > fubar > egapx_runner
view egapx_runner.xml @ 6:a7304162d737 draft
planemo upload for repository https://github.com/ncbi/egapx commit 9e59da535540cb4d5c1c412bb2b0969744dfb0b0-dirty
author | fubar |
---|---|
date | Sun, 04 Aug 2024 02:30:36 +0000 |
parents | 6effccc966d0 |
children | 9c778770514f |
line wrap: on
line source
<tool name="egapx_runner" id="egapx_runner" version="6.0.1" profile="22.05"> <!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay--> <!--Created by toolfactory@galaxy.org at 03/08/2024 10:40:32 using the Galaxy Tool Factory.--> <description>Runs egapx</description> <requirements> <requirement version="3.12.3" type="package">python</requirement> <requirement version="24.04.4-0" type="package">nextflow</requirement> <requirement version="6.0.1" type="package">pyyaml</requirement> </requirements> <version_command><![CDATA[echo "6.0.1"]]></version_command> <command><![CDATA[mkdir -p ./egapx_config && #set econfigfile = $econfig + '.config' cp '$__tool_directory__/ui/assets/config/executor/$econfigfile' ./egapx_config/ && python '$__tool_directory__/ui/egapx.py' '$yamlconfig' -e '$econfig' -o 'egapx_out']]></command> <inputs> <param name="yamlconfig" type="data" optional="false" label="egapx configuration yaml file to execute" help="" format="yaml,txt" multiple="false"/> <param name="econfig" type="select" label="Workflow run configuration to suit the machine in use" help="Docker minimal will run the sample minimal dustmite yaml"> <option value="docker_minimal">Docker_minimal: supports only the minimal dust mite example yaml using 6GB and 4 cores</option> <option value="singularity">Singularity: requires at least 128GB ram and 32 cores. 256GB and 64 cores recommended</option> <option value="docker">Docker: requires at least 128GB ram and 32 cores. 256GB and 64 cores recommended</option> </param> </inputs> <outputs> <collection name="egapx_out" type="list" label="Outputs from egapx"> <discover_datasets pattern="__name_and_ext__" directory="egapx_out" visible="false"/> </collection> </outputs> <tests> <test> <output_collection name="egapx_out" count="8"/> <param name="yamlconfig" value="yamlconfig_sample"/> <param name="econfig" value="docker_minimal"/> </test> </tests> <help>< if you encounter any problems with EGAPx. You can also write to cgr@nlm.nih.gov to give us your feedback or if you have any questions. EGAPx is the publicly accessible version of the updated NCBI [Eukaryotic Genome Annotation Pipeline](https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/). EGAPx takes an assembly fasta file, a taxid of the organism, and RNA-seq data. Based on the taxid, EGAPx will pick protein sets and HMM models. The pipeline runs `miniprot` to align protein sequences, and `STAR` to align RNA-seq to the assembly. Protein alignments and RNA-seq read alignments are then passed to `Gnomon` for gene prediction. In the first step of `Gnomon`, the short alignments are chained together into putative gene models. In the second step, these predictions are further supplemented by _ab-initio_ predictions based on HMM models. The final annotation for the input assembly is produced as a `gff` file. **Security Notice:** EGAPx has dependencies in and outside of its execution path that include several thousand files from the [NCBI C++ toolkit](https://www.ncbi.nlm.nih.gov/toolkit), and more than a million total lines of code. Static Application Security Testing has shown a small number of verified buffer overrun security vulnerabilities. Users should consult with their organizational security team on risk and if there is concern, consider mitigating options like running via VM or cloud instance. *To specify an array of NCBI SRA datasets in yaml* :: reads: - SRR8506572 - SRR9005248 *To specify an SRA entrez query* :: reads: 'txid6954[Organism] AND biomol_transcript[properties] NOT SRS024887[Accession] AND (SRR8506572[Accession] OR SRR9005248[Accession] )' **Note:** Both the above examples will have more RNA-seq data than the `input_D_farinae_small.yaml` example. To make sure the entrez query does not produce a large number of SRA runs, please run it first at the [NCBI SRA page](https://www.ncbi.nlm.nih.gov/sra). If there are too many SRA runs, then select a few of them and list it in the input yaml. Output ======= EGAPx output will appear as a collection in the user history. The main annotation file is called *accept.gff*. :: accept.gff annot_builder_output nextflow.log run.report.html run.timeline.html run.trace.txt run_params.yaml The *nextflow.log* is the log file that captures all the process information and their work directories. ``run_params.yaml`` has all the parameters that were used in the EGAPx run. More information about the process time and resources can be found in the other run* files. ## Intermediate files In the log, each line denotes the process that completed in the workflow. The first column (_e.g._ `[96/621c4b]`) is the subdirectory where the intermediate output files and logs are found for the process in the same line, _i.e._, `egapx:miniprot:run_miniprot`. To see the intermediate files for that process, you can go to the work directory path that you had supplied and traverse to the subdirectory `96/621c4b`: :: $ aws s3 ls s3://temp_datapath/D_farinae/96/ PRE 06834b76c8d7ceb8c97d2ccf75cda4/ PRE 621c4ba4e6e87a4d869c696fe50034/ $ aws s3 ls s3://temp_datapath/D_farinae/96/621c4ba4e6e87a4d869c696fe50034/ PRE output/ 2024-03-27 11:19:18 0 2024-03-27 11:19:28 6 .command.begin 2024-03-27 11:20:24 762 .command.err 2024-03-27 11:20:26 762 .command.log 2024-03-27 11:20:23 0 .command.out 2024-03-27 11:19:18 13103 .command.run 2024-03-27 11:19:18 129 .command.sh 2024-03-27 11:20:24 276 .command.trace 2024-03-27 11:20:25 1 .exitcode $ aws s3 ls s3://temp_datapath/D_farinae/96/621c4ba4e6e87a4d869c696fe50034/output/ 2024-03-27 11:20:24 17127134 aligns.paf ]]></help> <citations> <citation type="doi">10.1093/bioinformatics/bts573</citation> </citations> </tool>