annotate README.md @ 0:2b970db61912 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
author iuc
date Sun, 21 Jul 2024 07:19:00 +0000
parents
children 3f6e5c701945
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
1 ## microsatellites to bed features
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
2
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
3 **Convert short repetitive sequences to bed features**
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
4
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
5 Microsatellites are usually defined as repeated short DNA patterns in an unbroken sequence.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
6 A microsatellite pattern or *motif* can be any combination nucleotides, typically from 1 to 6nt in length.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
7
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
8 This tool allows microsatellite and related features to be selected from a fasta sequence input file, and output into a single bed track, suitable for viewing in a genome browser such as JBrowse2.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
9
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
10 All motifs of selected lengths can be reported as individual features in the output bed file, or specific motifs can be provided and all
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
11 others will be ignored. In all cases, a minimum required number of repeats can be specified. For example, requiring 2 or more repeats of the trimer *ACG* will report
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
12 every sequence of *ACGACG* or *ACGACGACG* or *ACGACGACGACG* and so on, as individual bed features. Similarly, requiring 3 repeats of any trimer will
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
13 report every distinct 3 nucleotide pattern, including *ACGACGACG* as well as every other unique 3 nucleotide pattern with 3 sequential repeats or more such, as "CTCCTCCTC*.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
14
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
15 For other output formats, the pytrf native command line *findstr* can be used to produce a gff, csv or tsv output containing all exact short tandem repeats, as
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
16 described at the end of https://pytrf.readthedocs.io/en/latest
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
17
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
18 A fasta file must be supplied for processing. A built in genome can be selected, or a fasta file of any kind can be selected from the current history. Note that all
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
19 symbols are treated as valid nucleotides by pytrf, so extraneous characters such as *-* or *N* in the input fasta may appear as unexpected bed features. Lower case fasta symbols will be converted
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
20 to uppercase, to prevent them being reported as distinct motifs.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
21
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
22
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
23 **Filter motifs by length**
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
24
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
25 The default tool form setting is to select all dimer motif patterns.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
26
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
27 Additional motif lengths from 1 to 6nt can be selected in the multiple-select drop-down list. All features will be returned in a single bed file. For each selected motif length,
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
28 the minimum number of repeats required for reporting can be adjusted. **Tandem repeats** are defined as at least 2 of any pattern. This tool allows singleton motifs to be reported,
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
29 so is not restricted to short tandem repeats (STR)
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
30
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
31 **Filter motifs by pattern**
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
32
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
33 This option allows a motif pattern to be specified as a text string such as *CG* or *ATC*. Multiple motifs can be specified as a comma separated string such as *CG,ATC*.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
34 All features will be returned as a single bed file.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
35
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
36 The minimum number of repeats for all motifs can be set to match specific requirements.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
37
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
38 For example, technical sequencing read bias may be influenced by the density of specific dimers, whether they are repeated or not
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
39 such as in https://github.com/arangrhie/T2T-Polish/tree/master/pattern
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
40
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
41 **Run pytrf findstr to create a csv, tsv or gff format output with all perfect STR**
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
42
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
43 This selection runs the pytrf *findstr* option to create gff/csv/tsv outputs as described at the end of https://pytrf.readthedocs.io/en/latest/.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
44
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
45 Quoted here:
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
46
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
47 *A Tandem repeat (TR) in genomic sequence is a set of adjacent short DNA sequence repeated consecutively. The core sequence or repeat unit is generally called motif.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
48 According to the motif length, tandem repeats can be classified as microsatellites and minisatellites. Microsatellites are also known as simple sequence repeats (SSRs)
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
49 or short tandem repeats (STRs) with motif length of 1-6 bp. Minisatellites are also sometimes referred to as variable number of tandem repeats (VNTRs) has longer motif length than microsatellites.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
50 Pytrf is a lightweight Python C extension for identification of tandem repeats. The pytrf enables to fastly identify both exact or perfect SSRs.
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
51 It also can find generic tandem repeats with any size of motif, such as with maximum motif length of 100 bp. Additionally, it has capability of finding approximate or imperfect tandem repeats*
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
52
2b970db61912 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
iuc
parents:
diff changeset
53