Mercurial > repos > dereeper > pangenome_explorer
diff COG/bac-genomics-scripts/rename_fasta_id/README.md @ 3:e42d30da7a74 draft
Uploaded
| author | dereeper |
|---|---|
| date | Thu, 30 May 2024 11:52:25 +0000 |
| parents | |
| children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/COG/bac-genomics-scripts/rename_fasta_id/README.md Thu May 30 11:52:25 2024 +0000 @@ -0,0 +1,117 @@ +rename_fasta_id +=============== + +`rename_fasta_id.pl` is a script to rename fasta IDs according to regular expressions. + +* [Synopsis](#synopsis) +* [Description](#description) +* [Usage](#usage) +* [Options](#options) + * [Mandatory options](#mandatory-options) + * [Optional options](#optional-options) +* [Output](#output) +* [Run environment](#run-environment) +* [Author - contact](#author---contact) +* [Citation, installation, and license](#citation-installation-and-license) +* [Changelog](#changelog) + +## Synopsis + + perl rename_fasta_id.pl -i file.fasta -p "NODE_.+$" -r "K-12_" -n -a c > out.fasta + +**or** + + zcat file.fasta.gz | perl rename_fasta_id.pl -i - -p "coli" -r "" -o > out.fasta + +## Description + +This script uses the built-in Perl substitution operator `s///` to +replace strings in FASTA IDs. To do this, a **pattern** and a +**replacement** have to be provided (Perl regular expression syntax +can be used). The leading '>' character for the FASTA ID will be +removed before the substitution and added again afterwards. FASTA +IDs will be searched for matches with the **pattern**, and if found +the **pattern** will be replaced by the **replacement**. + +**IMPORTANT**: Enclose the **pattern** and the **replacement** in +quotation marks (' or ") if they contain characters that would be +interpreted by the shell (e.g. pipes '|', brackets etc.). + +For substitutions without any appendices in a UNIX OS you can of +course just use the great +[`sed`](https://www.gnu.org/software/sed/manual/sed.html) (see +`man sed`), e.g.: + + sed 's/^>pattern/>replacement/' file.fasta + +## Usage + + perl rename_fasta_id.pl -i file.fasta -p "T" -r "a" -c -g -o + +## Options + +### Mandatory options + +- -i, -input + +Input FASTA file or piped STDIN (-) from a gzipped file + +- -p, -pattern + +Pattern to be replaced in FASTA ID + +- -r, -replacement + +Replacement to replace the pattern with. To entirely remove the pattern use '' or "" as input for **-r**. + +### Optional options + +- -h, -help + +Help (perldoc POD) + +- -c, -case-insensitive + +Match pattern case-insensitive + +- -g, -global + +Replace pattern globally in the string + +- -n, -numerate + +Append a numeration/the count of the pattern hits to the replacement. This is e.g. useful to number contigs consecutively in a draft genome. + +- -a, -append + +Append a string after the numeration, e.g. 'c' for chromosome + +- -o, -output + +Verbose output of the substitutions that were carried out, printed to *STDERR* + +- -v, -version + +Print version number to *STDERR* + +## Output + +- *STDOUT* + +The FASTA file with substituted ID lines is printed to *STDOUT*. Redirect or pipe into another tool as needed. + +## Run environment + +The Perl script runs under Windows and UNIX flavors. + +## Author - contact + +Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster) + +## Citation, installation, and license + +For [citation](https://github.com/aleimba/bac-genomics-scripts#citation), [installation](https://github.com/aleimba/bac-genomics-scripts#installation-recommendations), and [license](https://github.com/aleimba/bac-genomics-scripts#license) information please see the repository main [*README.md*](https://github.com/aleimba/bac-genomics-scripts/blob/master/README.md). + +## Changelog + +- v0.1 (09.11.2014)
