Mercurial > repos > cpt > cpt_sar_finder
diff SAR_finder.xml @ 1:112751823323 draft
planemo upload commit 94b0cd1fff0826c6db3e7dc0c91c0c5a8be8bb0c
author | cpt |
---|---|
date | Mon, 05 Jun 2023 02:52:57 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/SAR_finder.xml Mon Jun 05 02:52:57 2023 +0000 @@ -0,0 +1,73 @@ +<tool id="edu.tamu.cpt.sar.sar_finder" name="SAR Finder" version="1.0"> + <description>SAR Domain Finder</description> + <macros> + <import>macros.xml</import> + </macros> + <expand macro="requirements"> + </expand> + <command detect_errors="aggressive"><![CDATA[ +python '$__tool_directory__/SAR_finder.py' +'$fa' +--sar_min '$sar_min' +--sar_max '$sar_max' +--out_fa '$out_fa' +--out_gff3 '$out_gff3' +--out_stat '$out_stat' + ]]></command> + <inputs> + <param label="Multi FASTA File" name="fa" type="data" format="fasta"/> + <param label="SAR domain minimal size" name="sar_min" type="integer" value="15"/> + <param label="SAR domain maximum size" name="sar_max" type="integer" value="20"/> + </inputs> + <outputs> + <data format="tabular" name="out_stat" label="candidate_SAR_stats.tsv"/> + <data format="fasta" name="out_fa" label="candidate_SAR.fa"/> + <data format="gff3" name="out_gff3" label="candidate_SAR.gff3"/> + </outputs> + <tests> + <test> + <param name="fa" value="simple-proteins.fa"/> + <param name="sar_min" value="15"/> + <param name="sar_max" value="20"/> + <output name="out_stat" file="candidate_SAR_stats.tsv"/> + <output name="out_fa" file="candidate_SAR.fa"/> + <output name="out_gff3" file="candidate_SAR.gff3"/> + </test> + </tests> + <help><![CDATA[ + +**What it does** +A tool that analyzes the sequence within the first 50 residues of a protein for a weakly hydrophobic domain called Signal-Anchor-Release (aka SAR). +The tool finds proteins that contain a stretch (default 15-20 residues) of hydrophobic residues (Ile, Leu, Val, Phe, Tyr, Trp, Met, Gly, Ala, Ser) and +calculates the % Gly/Ala/Ser/Thr residues in the hydrophobic stretch. The net charge on the N-terminus is also displayed to aid in determining the +SAR orientation in the membrane.[2] + +Definition: A Signal-Anchor-Release (SAR) domain is an N-terminal, weakly hydrophobic transmembrane region rich in Gly/Ala and/or Ser (and sometimes Thr) residues. +The SAR domain is sometimes found in phage lysis proteins, including endolysins and holins. The SAR domain can be released from the membrane in a proton +motive force-dependent manner. Known SAR domains in phage endolysins often have >50-60% Gly/Ala/Ser/Thr content. SAR endolysins are expected to have a net positive +charge on the N-terminus by the positive-inside rule. + +**INPUT** --> Protein Multi FASTA + +**OUTPUT** --> + +* Multi FASTA with candidate proteins that pass the SAR domain criteria + +* Tabular summary file that lists every subdomain fitting the criteria for each potential SAR domain-containing protein with the following: protein name/sequence/length, SAR length/start/sequence/end, individual and total GAST% content in SAR, and N-terminal sequence/net charge + +* Multi GFF3 for unique candidate SAR domain-containing proteins + + ]]></help> + <citations> + <citation type="doi">10.1371/journal.pcbi.1008214</citation> + <citation type="doi">https://dx.doi.org/10.1016/bs.aivir.2018.09.003</citation> + <citation type="bibtex"> + @unpublished{galaxyTools, + author = {C. Ross}, + title = {CPT Galaxy Tools}, + year = {2020-}, + note = {https://github.com/tamu-cpt/galaxy-tools/} + } + </citation> + </citations> +</tool>