1
|
1 <tool id="fa_gc_content_1" name="Discover CTCF Sites for Forward Strand">
|
|
2 <description></description>
|
|
3 <command interpreter="perl">check.pl $input $input2 $output</command>
|
|
4 <inputs>
|
|
5 <param format="fasta" name="input" type="data" label="Forward Strand Sequence File"/>
|
|
6 <param format="fasta" name="input2" type="data" label="Forward Strand Coordinate file"/>
|
|
7 </inputs>
|
|
8
|
|
9
|
|
10 <outputs>
|
|
11 <data format="tabular" name="output" />
|
|
12 </outputs>
|
|
13
|
|
14 <tests>
|
|
15 <test>
|
|
16 <param name="input" value="fa_gc_content_input.fa"/>
|
|
17 <param name="input2" value="fa_gc_content_input2.fa"/>
|
|
18 <output name="out_file1" file="concatenated.txt"/>
|
|
19 </test>
|
|
20 </tests>
|
|
21
|
|
22 <help>
|
|
23 Background:
|
|
24 This tool computationally predicts CTCF sites for a nucleotide sequence located on the forward strand. The user is required to provide two files as inputs. The first is the nucleotide sequence of interest on the + strand in FASTA format (this can be obtained from UCSC genome browser or Ensembl). The second file must be a FASTA formatted file containing the chromosome number and the genomic position of the first nucleotide sequence (separated by a tab). For example, if the sequence of interest is located on chromosome 3 with a starting genomic position of 1850000, the first line of the second input file must start with a fasta tag, and the second line will be chr3 1850000
|
|
25
|
|
26 Details of Algorithm:
|
|
27 CTCF sites are predicted by applying the following equation
|
|
28 w(σ,j) = log2 (((f(σ,j) + sqrt(N) x b(σ)) / (N + sqrt(N))) / b(σ))
|
|
29
|
|
30 Where w(σ,j) is the weight of nucleotide σ at position j, N is the total number of binding sites or the sum of all nucleotide occurrences in the column, and b is the prior background frequency of the nucleotide σ.
|
|
31
|
|
32 The sum of weights for corresponding nucleotides at each column of the matrix then estimates the likelihood of any sequence of length m to be an instance of a CTCF binding site and takes into account the GC content of the genomic region being scanned.
|
|
33
|
|
34
|
|
35 Citation and further help: For further details of the algorithm, please refer to
|
|
36
|
|
37 Khan MA, Soto-Jimenez LM, Howe T, Streit A, Sosinsky A, Stern CD (2013). Computational tools and resources for prediction and analysis of gene regulatory regions in the chick genome.. Genesis, , - . doi:10.1002/dvg.22375
|
|
38
|
|
39 For queries/questions, email ucbtmaf@ucl.ac.uk
|
|
40 </help>
|
|
41
|
|
42
|
|
43 </tool>
|