annotate check2.xml @ 3:5f235b95619f draft

Uploaded
author mkhan1980
date Mon, 04 Mar 2013 06:38:53 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
1 <tool id="fa_gc_content_2" name="Discover CTCF Sites for Reverse Strand">
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
2 <description></description>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
3 <command interpreter="perl">check2.pl $input $input2 $output</command>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
4 <inputs>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
5 <param format="fasta" name="input" type="data" label="Reverse Strand Sequence File"/>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
6 <param format="fasta" name="input2" type="data" label="Reverse Strand Coordinate file"/>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
7 </inputs>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
8
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
9
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
10 <outputs>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
11 <data format="tabular" name="output" />
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
12 </outputs>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
13
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
14 <tests>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
15 <test>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
16 <param name="input" value="fa_gc_content_input3.fa"/>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
17 <param name="input2" value="fa_gc_content_input4.fa"/>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
18 <output name="out_file1" file="concatenated.txt"/>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
19 </test>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
20 </tests>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
21
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
22 <help>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
23 Background:
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
24 This tool computationally predicts CTCF sites for a nucleotide sequence located on the reverse strand. The user is required to provide two files as inputs. The first is the nucleotide sequence of interest on the - strand in FASTA format (this can be obtained from UCSC genome browser or Ensembl). The second file must be a FASTA formatted file containing the chromosome number and the genomic position of the last nucleotide sequence (separated by a tab). For example, if the sequence of interest is located on chromosome 3 with an ending genomic position of 1870000, the first line of the second input file must start with a fasta tag, and the second line will be chr3 1870000
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
25
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
26 Details of Algorithm:
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
27 CTCF sites are predicted by applying the following equation
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
28 w( ,j) = log2 (((f( ,j) + sqrt(N) x b( )) / (N + sqrt(N))) / b( ))
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
29
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
30 Where w( ,j) is the weight of nucleotide at position j, N is the total number of binding sites or the sum of all nucleotide occurrences in the column, and b is the prior background frequency of the nucleotide .
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
31
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
32 The sum of weights for corresponding nucleotides at each column of the matrix then estimates the likelihood of any sequence of length m to be an instance of a CTCF binding site and takes into account the GC content of the genomic region being scanned.
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
33
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
34
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
35 Citation and further help: For further details of the algorithm, please refer to
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
36
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
37 Khan MA, Soto-Jimenez LM, Howe T, Streit A, Sosinsky A, Stern CD (2013). Computational tools and resources for prediction and analysis of gene regulatory regions in the chick genome.. Genesis, , - . doi:10.1002/dvg.22375
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
38
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
39
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
40 For queries/questions, email ucbtmaf@ucl.ac.uk
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
41 </help>
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
42
5f235b95619f Uploaded
mkhan1980
parents:
diff changeset
43 </tool>