comparison check.xml @ 1:e3c4e5ff7f74 draft

Uploaded
author mkhan1980
date Mon, 04 Mar 2013 06:37:58 -0500
parents
children
comparison
equal deleted inserted replaced
0:ebad609b8a6d 1:e3c4e5ff7f74
1 <tool id="fa_gc_content_1" name="Discover CTCF Sites for Forward Strand">
2 <description></description>
3 <command interpreter="perl">check.pl $input $input2 $output</command>
4 <inputs>
5 <param format="fasta" name="input" type="data" label="Forward Strand Sequence File"/>
6 <param format="fasta" name="input2" type="data" label="Forward Strand Coordinate file"/>
7 </inputs>
8
9
10 <outputs>
11 <data format="tabular" name="output" />
12 </outputs>
13
14 <tests>
15 <test>
16 <param name="input" value="fa_gc_content_input.fa"/>
17 <param name="input2" value="fa_gc_content_input2.fa"/>
18 <output name="out_file1" file="concatenated.txt"/>
19 </test>
20 </tests>
21
22 <help>
23 Background:
24 This tool computationally predicts CTCF sites for a nucleotide sequence located on the forward strand. The user is required to provide two files as inputs. The first is the nucleotide sequence of interest on the + strand in FASTA format (this can be obtained from UCSC genome browser or Ensembl). The second file must be a FASTA formatted file containing the chromosome number and the genomic position of the first nucleotide sequence (separated by a tab). For example, if the sequence of interest is located on chromosome 3 with a starting genomic position of 1850000, the first line of the second input file must start with a fasta tag, and the second line will be chr3 1850000
25
26 Details of Algorithm:
27 CTCF sites are predicted by applying the following equation
28 w(σ,j) = log2 (((f(σ,j) + sqrt(N) x b(σ)) / (N + sqrt(N))) / b(σ))
29
30 Where w(σ,j) is the weight of nucleotide σ at position j, N is the total number of binding sites or the sum of all nucleotide occurrences in the column, and b is the prior background frequency of the nucleotide σ.
31
32 The sum of weights for corresponding nucleotides at each column of the matrix then estimates the likelihood of any sequence of length m to be an instance of a CTCF binding site and takes into account the GC content of the genomic region being scanned.
33
34
35 Citation and further help: For further details of the algorithm, please refer to
36
37 Khan MA, Soto-Jimenez LM, Howe T, Streit A, Sosinsky A, Stern CD (2013). Computational tools and resources for prediction and analysis of gene regulatory regions in the chick genome.. Genesis, , - . doi:10.1002/dvg.22375
38
39 For queries/questions, email ucbtmaf@ucl.ac.uk
40 </help>
41
42
43 </tool>