Mercurial > repos > fubar > egapx_runner
view egapx_runner.xml @ 4:6592ae57bb8b draft
planemo upload for repository https://github.com/ncbi/egapx commit cb2d8304fde9fad4348296c3a51b7992ac5b83bb
author | fubar |
---|---|
date | Sun, 04 Aug 2024 00:57:18 +0000 (7 months ago) |
parents | a3b158471bd3 |
children | 6effccc966d0 |
line wrap: on
line source
<tool name="egapx_runner" id="egapx_runner" version="6.0.1" profile="22.05"> <!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay--> <!--Created by toolfactory@galaxy.org at 03/08/2024 10:40:32 using the Galaxy Tool Factory.--> <description>Runs egapx</description> <requirements> <requirement version="3.12.3" type="package">python</requirement> <requirement version="24.04.4-0" type="package">nextflow</requirement> <requirement version="6.0.1" type="package">pyyaml</requirement> </requirements> <version_command><![CDATA[echo "6.0.1"]]></version_command> <command><![CDATA[mkdir -p ./egapx_config && #set econfigfile = $econfig + '.config' cp '$__tool_directory__/ui/assets/config/executor/$econfigfile' ./egapx_config/ && python '$__tool_directory__/ui/egapx.py' '$yamlconfig' -e '$econfig' -o 'egapx_out']]></command> <inputs> <param name="yamlconfig" type="data" optional="false" label="egapx configuration yaml file to execute" help="" format="yaml,txt" multiple="false"/> <param name="econfig" type="select" label="Workflow run configuration to suit the machine in use" help="Docker minimal will run the sample minimal dustmite yaml"> <option value="docker_minimal">Docker_minimal supports only the minimal dust mite example yaml using 6GB and 4 cores</option> <option value="singularity">Singularity requires at least 128GB ram and 32 cores. 256GB and 64 cores recommended</option> <option value="docker">Docker requires at least 128GB ram and 32 cores. 256GB and 64 cores recommended</option> </param> </inputs> <outputs> <collection name="egapx_out" type="list" label="Outputs from egapx"> <discover_datasets pattern="__name_and_ext__" directory="egapx_out" visible="false"/> </collection> </outputs> <tests> <test> <output_collection name="egapx_out" count="8"/> <param name="yamlconfig" value="yamlconfig_sample"/> <param name="econfig" value="docker_minimal"/> </test> </tests> <help>< if you encounter any problems with EGAPx. You can also write to cgr@nlm.nih.gov to give us your feedback or if you have any questions. EGAPx is the publicly accessible version of the updated NCBI [Eukaryotic Genome Annotation Pipeline](https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/). EGAPx takes an assembly fasta file, a taxid of the organism, and RNA-seq data. Based on the taxid, EGAPx will pick protein sets and HMM models. The pipeline runs `miniprot` to align protein sequences, and `STAR` to align RNA-seq to the assembly. Protein alignments and RNA-seq read alignments are then passed to `Gnomon` for gene prediction. In the first step of `Gnomon`, the short alignments are chained together into putative gene models. In the second step, these predictions are further supplemented by _ab-initio_ predictions based on HMM models. The final annotation for the input assembly is produced as a `gff` file. **Security Notice:** EGAPx has dependencies in and outside of its execution path that include several thousand files from the [NCBI C++ toolkit](https://www.ncbi.nlm.nih.gov/toolkit), and more than a million total lines of code. Static Application Security Testing has shown a small number of verified buffer overrun security vulnerabilities. Users should consult with their organizational security team on risk and if there is concern, consider mitigating options like running via VM or cloud instance. *To specify an array of NCBI SRA datasets in yaml* :: reads: - SRR8506572 - SRR9005248 *To specify an SRA entrez query* :: reads: 'txid6954[Organism] AND biomol_transcript[properties] NOT SRS024887[Accession] AND (SRR8506572[Accession] OR SRR9005248[Accession] )' **Note:** Both the above examples will have more RNA-seq data than the `input_D_farinae_small.yaml` example. To make sure the entrez query does not produce a large number of SRA runs, please run it first at the [NCBI SRA page](https://www.ncbi.nlm.nih.gov/sra). If there are too many SRA runs, then select a few of them and list it in the input yaml. Output ======= EGAPx output will appear as a collection in the user history. The main annotation file is called *accept.gff*. :: accept.gff annot_builder_output nextflow.log run.report.html run.timeline.html run.trace.txt run_params.yaml The *nextflow.log* is the log file that captures all the process information and their work directories. ``run_params.yaml`` has all the parameters that were used in the EGAPx run. More information about the process time and resources can be found in the other run* files. ## Intermediate files In the log, each line denotes the process that completed in the workflow. The first column (_e.g._ `[96/621c4b]`) is the subdirectory where the intermediate output files and logs are found for the process in the same line, _i.e._, `egapx:miniprot:run_miniprot`. To see the intermediate files for that process, you can go to the work directory path that you had supplied and traverse to the subdirectory `96/621c4b`: :: $ aws s3 ls s3://temp_datapath/D_farinae/96/ PRE 06834b76c8d7ceb8c97d2ccf75cda4/ PRE 621c4ba4e6e87a4d869c696fe50034/ $ aws s3 ls s3://temp_datapath/D_farinae/96/621c4ba4e6e87a4d869c696fe50034/ PRE output/ 2024-03-27 11:19:18 0 2024-03-27 11:19:28 6 .command.begin 2024-03-27 11:20:24 762 .command.err 2024-03-27 11:20:26 762 .command.log 2024-03-27 11:20:23 0 .command.out 2024-03-27 11:19:18 13103 .command.run 2024-03-27 11:19:18 129 .command.sh 2024-03-27 11:20:24 276 .command.trace 2024-03-27 11:20:25 1 .exitcode $ aws s3 ls s3://temp_datapath/D_farinae/96/621c4ba4e6e87a4d869c696fe50034/output/ 2024-03-27 11:20:24 17127134 aligns.paf ]]></help> <citations> <citation type="doi">10.1093/bioinformatics/bts573</citation> </citations> </tool>