Mercurial > repos > pjbriggs > weeder2
changeset 0:496bc4eff47e draft
Initial version.
author | pjbriggs |
---|---|
date | Wed, 19 Nov 2014 07:56:27 -0500 |
parents | |
children | 571cb77ab9e7 |
files | README.markdown test-data/weeder2_matrix.out test-data/weeder2_motifs.out test-data/weeder_in.fa tool_dependencies.xml weeder2_wrapper.sh weeder2_wrapper.xml |
diffstat | 7 files changed, 1752 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.markdown Wed Nov 19 07:56:27 2014 -0500 @@ -0,0 +1,20 @@ +weeder2 +======= + +Galaxy tool for motif discovery using weeder2. + +weeder2_wrapper +--------------- + +XML and wrapper script for weeder motif discovery package version 2.0. + +`weeder2` can be obtained from <http://159.149.160.51/modtools/downloads/weeder2.html>. + +To add to Galaxy add the following to tool_conf.xml: + + <tool file="weeder2/weeder2_wrapper.xml" /> + +### Changes ### + +2.0.0.0: initial version +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/weeder2_matrix.out Wed Nov 19 07:56:27 2014 -0500 @@ -0,0 +1,100 @@ +>MAT1 GTTTCAATTA +A 0.1739 0.08497 0.1198 0.1538 0.186 0.8184 0.6871 0.09156 0.1002 0.7403 +C 0.06162 0.05256 0.05743 0.07398 0.5592 0.08087 0.0987 0.08757 0.149 0.08036 +G 0.591 0.06033 0.07754 0.07127 0.1033 0.0563 0.08422 0.04673 0.08862 0.06081 +T 0.1735 0.8021 0.7452 0.701 0.1515 0.04446 0.1299 0.7741 0.6622 0.1185 +>MAT2 CATTTTAA +A 0.216 0.8128 0.1075 0.04678 0.1112 0.1399 0.7346 0.7969 +C 0.5444 0.04193 0.0509 0.03932 0.0429 0.08711 0.1099 0.09147 +G 0.1067 0.03289 0.07156 0.0785 0.0966 0.09202 0.05706 0.0382 +T 0.1328 0.1124 0.77 0.8354 0.7493 0.681 0.09836 0.0734 +>MAT3 GTGAATTA +A 0.2434 0.0588 0.09038 0.7708 0.7064 0.0692 0.08129 0.7448 +C 0.08343 0.02873 0.1008 0.06039 0.0821 0.07782 0.09324 0.09242 +G 0.5181 0.06202 0.6983 0.08538 0.05523 0.08203 0.1107 0.09517 +T 0.1551 0.8504 0.1106 0.08347 0.1562 0.7709 0.7148 0.0676 +>MAT4 TCAATCAT +A 0.05132 0.186 0.8183 0.6756 0.1425 0.08545 0.8726 0.1838 +C 0.02852 0.5378 0.07891 0.05957 0.02786 0.6724 0.03639 0.1337 +G 0.0695 0.1008 0.02175 0.09712 0.07326 0.07966 0.0383 0.06208 +T 0.8507 0.1753 0.08099 0.1677 0.7564 0.1625 0.05268 0.6204 +>MAT5 TGTTTAAT +A 0.1111 0.09112 0.1881 0.1074 0.1057 0.6769 0.7668 0.1993 +C 0.02537 0.03582 0.06521 0.08013 0.08927 0.09377 0.03432 0.08804 +G 0.07262 0.7214 0.06612 0.05896 0.07036 0.1072 0.116 0.09061 +T 0.7909 0.1516 0.6806 0.7536 0.7347 0.1221 0.08288 0.6221 +>MAT6 ATTACT +A 0.8781 0.0835 0.07038 0.7901 0.03473 0.1075 +C 0.04152 0.05455 0.05837 0.03709 0.7755 0.04605 +G 0.0322 0.01277 0.1088 0.07223 0.04238 0.01572 +T 0.04814 0.8492 0.7624 0.1005 0.1473 0.8308 +>MAT7 TCACAT +A 0.08233 0.1013 0.9032 0.139 0.8283 0.0876 +C 0.04274 0.7009 0.02867 0.7396 0.0355 0.04803 +G 0.04677 0.1109 0.01307 0.0554 0.0163 0.05845 +T 0.8282 0.08693 0.05509 0.06599 0.1199 0.8059 +>MAT8 TACATT +A 0.09867 0.7322 0.1194 0.8806 0.101 0.09423 +C 0.1068 0.06715 0.743 0.03825 0.03645 0.02894 +G 0.1046 0.07625 0.0467 0.01049 0.07555 0.07099 +T 0.6899 0.1244 0.09086 0.07061 0.7871 0.8058 +>MAT9 TTGACA +A 0.1287 0.06713 0.09732 0.8048 0.1278 0.8784 +C 0.07599 0.01615 0.136 0.02754 0.7339 0.02517 +G 0.07764 0.03641 0.6902 0.03821 0.05137 0.01588 +T 0.7176 0.8803 0.0764 0.1295 0.08697 0.0806 +>MAT10 AATAAT +A 0.821 0.7797 0.1145 0.7009 0.8508 0.1043 +C 0.05975 0.04103 0.05818 0.09164 0.07329 0.07718 +G 0.05042 0.1458 0.06827 0.08449 0.04593 0.05097 +T 0.06887 0.03344 0.759 0.1229 0.02993 0.7675 +>MAT11 ATGACT +A 0.7923 0.07691 0.08105 0.8363 0.111 0.123 +C 0.07045 0.02479 0.07396 0.02863 0.7081 0.04261 +G 0.06487 0.02256 0.6694 0.03558 0.06369 0.01484 +T 0.07241 0.8757 0.1756 0.0995 0.1172 0.8196 +>MAT12 TTGAAA +A 0.08972 0.0524 0.2142 0.8734 0.7765 0.8008 +C 0.08565 0.01703 0.08046 0.009563 0.09992 0.05955 +G 0.07526 0.08263 0.6446 0.04169 0.04608 0.03337 +T 0.7494 0.8479 0.06075 0.07533 0.07746 0.1063 +>MAT13 ATTTTA +A 0.7686 0.09324 0.05315 0.07052 0.1079 0.7442 +C 0.05296 0.05566 0.02564 0.04286 0.09559 0.0825 +G 0.06642 0.0539 0.07598 0.07754 0.08916 0.06427 +T 0.112 0.7972 0.8452 0.8091 0.7073 0.1091 +>MAT14 TAAACA +A 0.1209 0.7643 0.8407 0.816 0.1373 0.8505 +C 0.06995 0.06399 0.03895 0.04513 0.685 0.05353 +G 0.09693 0.07741 0.05489 0.04938 0.08995 0.02176 +T 0.7122 0.09432 0.06543 0.08946 0.08774 0.07422 +>MAT15 ATGATT +A 0.7798 0.03895 0.1394 0.8016 0.09996 0.05479 +C 0.04628 0.04674 0.09287 0.04385 0.1069 0.06237 +G 0.07995 0.03341 0.6306 0.0422 0.05873 0.09202 +T 0.09401 0.8809 0.1371 0.1123 0.7344 0.7908 +>MAT16 AGTATT +A 0.8169 0.03667 0.1017 0.7024 0.1984 0.05968 +C 0.01586 0.0534 0.06469 0.08485 0.01561 0.04079 +G 0.03184 0.739 0.03857 0.072 0.06428 0.04865 +T 0.1354 0.1709 0.7951 0.1407 0.7217 0.8509 +>MAT17 TTGAGT +A 0.1122 0.08252 0.1535 0.8677 0.167 0.08711 +C 0.05038 0.01835 0.1439 0.01705 0.09193 0.05017 +G 0.1 0.04409 0.647 0.02826 0.5944 0.03802 +T 0.7373 0.855 0.05572 0.08702 0.1467 0.8247 +>MAT18 TAAAAC +A 0.09976 0.6641 0.8508 0.8037 0.7918 0.1444 +C 0.1021 0.08641 0.05693 0.04963 0.05966 0.5613 +G 0.05432 0.1092 0.04048 0.04923 0.04498 0.06765 +T 0.7438 0.1403 0.05175 0.09743 0.1036 0.2267 +>MAT19 GTGAAT +A 0.1593 0.04483 0.06949 0.8955 0.7175 0.1105 +C 0.07553 0.01876 0.09003 0.01629 0.08374 0.06563 +G 0.6152 0.0609 0.7443 0.03019 0.08214 0.05749 +T 0.15 0.8755 0.09618 0.05804 0.1166 0.7664 +>MAT20 AATACA +A 0.8268 0.7796 0.1396 0.6772 0.1349 0.8374 +C 0.05488 0.04317 0.05118 0.06514 0.7076 0.03498 +G 0.04696 0.05148 0.06801 0.08048 0.04932 0.01976 +T 0.0714 0.1257 0.7412 0.1771 0.1081 0.1079
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/weeder2_motifs.out Wed Nov 19 07:56:27 2014 -0500 @@ -0,0 +1,1250 @@ +COMMAND LINE: + +weeder2 -f weeder_in.fa -chipseq -sim 0.95 -O MM + +MOTIFS SUMMARY: + +1) GTTTCAATTA (TAATTGAAAC) 2.38 +2) CATTTTAA (TTAAAATG) 2.082 +3) GTGAATTA (TAATTCAC) 1.969 +4) TCAATCAT (ATGATTGA) 1.944 +5) TGTTTAAT (ATTAAACA) 1.874 +6) ATTACT (AGTAAT) 1.823 +7) TCACAT (ATGTGA) 1.8 +8) TACATT (AATGTA) 1.773 +9) TTGACA (TGTCAA) 1.733 +10) AATAAT (ATTATT) 1.714 +11) ATGACT (AGTCAT) 1.702 +12) TTGAAA (TTTCAA) 1.693 +13) ATTTTA (TAAAAT) 1.691 +14) TAAACA (TGTTTA) 1.685 +15) ATGATT (AATCAT) 1.678 +16) AGTATT (AATACT) 1.664 +17) TTGAGT (ACTCAA) 1.574 +18) TAAAAC (GTTTTA) 1.572 +19) GTGAAT (ATTCAC) 1.568 +20) AATACA (TGTATT) 1.564 + + +DETAILED RESULTS: + +1) GTTTCAATTA (TAATTGAAAC) 2.38 + +Matrix: MAT1 GTTTCAATTA +A 0.1739 0.08497 0.1198 0.1538 0.186 0.8184 0.6871 0.09156 0.1002 0.7403 +C 0.06162 0.05256 0.05743 0.07398 0.5592 0.08087 0.0987 0.08757 0.149 0.08036 +G 0.591 0.06033 0.07754 0.07127 0.1033 0.0563 0.08422 0.04673 0.08862 0.06081 +T 0.1735 0.8021 0.7452 0.701 0.1515 0.04446 0.1299 0.7741 0.6622 0.1185 + +OCCURRENCES: +>chr1:8797248-879744 35 GTTTCAATTA 1 174 + +>chr1:3467418-346761 1 TTTTCAATTA 0.934855 187 - +>chr1:5072821-507302 8 GTTTGAATTA 0.928879 153 - +>chr1:9932599-993279 45 GTTTCAATCA 0.919923 124 + +>chr1:9013525-901372 37 GTTTCACTTA 0.908195 51 + +>chr1:9956513-995671 48 GTTTCCATTA 0.88494 114 - +>chr1:9813405-981360 44 TTTTAAATTA 0.876639 141 + +>chr1:7768736-776893 34 ATTTTAATTA 0.871314 131 - +>chr1:9956513-995671 48 ATTTCAATCA 0.854845 177 - +>chr1:7444756-744495 33 GTTTAAATGA 0.852292 99 + +>chr1:7386917-738711 29 TTTACAATTA 0.849485 45 + +>chr1:7303541-730374 27 GTTTTAATGA 0.8469 8 - +>chr1:5805347-580554 10 GTTCAAATTA 0.843965 178 - +>chr1:7768736-776893 34 GTATTAATTA 0.83883 127 + +>chr1:6588721-658892 21 GTTGTAATTA 0.83815 138 + +>chr1:7388025-738822 30 CTTTCAATCA 0.83733 1 + +>chr1:9962925-996312 49 GTCTAAATTA 0.834485 190 - +>chr1:4562216-456241 4 ATTTCAACTA 0.827808 147 + +>chr1:6878570-687877 25 GTTTTGATTA 0.817499 91 - +>chr1:6396504-639670 18 GTTTCATTTT 0.816051 171 + +>chr1:7768736-776893 34 GTTTCATCTA 0.805952 147 + +>chr1:9460371-946057 38 GTATCAATTT 0.805422 84 + +>chr1:6721868-672206 23 GTTTCAGATA 0.799444 28 - +>chr1:6090845-609104 11 TTTTTAATCA 0.79117 168 - +>chr1:9948240-994844 47 GTTTCAACTC 0.789921 109 - +>chr1:6266967-626716 15 GCTCCAATTA 0.785236 145 - +>chr1:5072821-507302 8 GTTTAATTCA 0.774774 150 + +>chr1:6090845-609104 11 ATTGTAATTA 0.773071 87 + +>chr1:8851545-885174 36 TTTTGACTTA 0.771928 49 - +>chr1:6878570-687877 25 ATTACAATCA 0.769475 34 - +>chr1:6090845-609104 11 GTTAAAATGA 0.766922 148 - +>chr1:4662531-466273 5 GTTTTACTCA 0.764511 34 - +>chr1:4562216-456241 4 GTTGAAATCA 0.763465 145 - +>chr1:6090845-609104 11 ATTACAATAA 0.761859 85 - +>chr1:9013525-901372 37 GTTTAAGTAA 0.760027 162 + +>chr1:6205539-620573 13 GTTAAAATTT 0.7594 156 + +>chr1:7413722-741392 32 TTTTCATTGA 0.758428 125 - +>chr1:9932599-993279 45 GTTTGATTAA 0.754251 45 - +>chr1:6588721-658892 21 TTATCAATAA 0.749598 145 + +>chr1:6205539-620573 13 GTATAAATTT 0.747207 160 - +>chr1:4774948-477514 6 TTTACAATTC 0.74652 118 - +>chr1:4833767-483396 7 GGTTAAATCA 0.745976 183 + +>chr1:6588721-658892 21 ATTACAACTA 0.742438 136 - +>chr1:9948240-994844 47 GTTGAAACTA 0.736427 111 + +>chr1:4833767-483396 7 ATTACAAGTA 0.736066 103 - +>chr1:9574131-957433 42 ATTACCATTA 0.734492 157 - +>chr1:6721868-672206 23 AATTCAGTTA 0.728969 72 + +>chr1:4774948-477514 6 GATCTAATTA 0.726685 11 - +>chr1:6266967-626716 15 ATTTCAACTC 0.724842 175 + +>chr1:4833767-483396 7 GTAACGATTA 0.698174 62 - +********** + +2) CATTTTAA (TTAAAATG) 2.082 + +Matrix: MAT2 CATTTTAA +A 0.216 0.8128 0.1075 0.04678 0.1112 0.1399 0.7346 0.7969 +C 0.5444 0.04193 0.0509 0.03932 0.0429 0.08711 0.1099 0.09147 +G 0.1067 0.03289 0.07156 0.0785 0.0966 0.09202 0.05706 0.0382 +T 0.1328 0.1124 0.77 0.8354 0.7493 0.681 0.09836 0.0734 + +OCCURRENCES: +>chr1:9813405-981360 44 CATTTTAA 1 131 - +>chr1:7768736-776893 34 CATTTTAA 1 134 - +>chr1:7388801-738900 31 CATTTTAA 1 62 + +>chr1:6205539-620573 13 CATTTTAA 1 28 - +>chr1:6090845-609104 11 CATTTTAA 1 149 + +>chr1:3467418-346761 1 CATTTTAA 1 182 + +>chr1:9942670-994287 46 AATTTTAA 0.939956 134 - +>chr1:9013525-901372 37 AATTTTAA 0.939956 112 - +>chr1:6205539-620573 13 AATTTTAA 0.939956 157 - +>chr1:8851545-885174 36 TATTTTAA 0.924741 103 - +>chr1:6262414-626261 14 TATTTTAA 0.924741 183 + +>chr1:7386917-738711 29 GATTTTAA 0.919973 30 - +>chr1:6266967-626716 15 GATTTTAA 0.919973 2 - +>chr1:5072821-507302 8 GATTTTAA 0.919973 114 - +>chr1:7444756-744495 33 CATTTAAA 0.901069 100 - +>chr1:9932599-993279 45 CATTTGAA 0.892317 175 - +>chr1:6266967-626716 15 CATTTCAA 0.891418 174 + +>chr1:6750817-675101 24 CATTTTCA 0.885779 154 - +>chr1:6090845-609104 11 CATTTTCA 0.885779 5 + +>chr1:5168670-516887 9 CATTTTCA 0.885779 26 - +>chr1:6360131-636033 17 CATTTTTA 0.883664 135 + +>chr1:9948240-994844 47 CATTATAA 0.883344 56 - +>chr1:5805347-580554 10 CATTGTAA 0.880668 126 - +>chr1:9948240-994844 47 CAATTTAA 0.878877 16 + +>chr1:9942670-994287 46 CAATTTAA 0.878877 130 + +>chr1:6090845-609104 11 CAATTTAA 0.878877 122 + +>chr1:6396504-639670 18 CATTTTGA 0.876113 175 + +>chr1:9013525-901372 37 CAGTTTAA 0.872299 108 + +>chr1:6266967-626716 15 CTTTTTAA 0.871938 44 - +>chr1:6262414-626261 14 CTTTTTAA 0.871938 187 - +>chr1:6090845-609104 11 CTTTTTAA 0.871938 171 - +>chr1:4774948-477514 6 CATTCTAA 0.870849 60 + +>chr1:6090845-609104 11 CACTTTAA 0.868521 126 - +>chr1:8797248-879744 35 CATTTTAT 0.867711 146 - +>chr1:4024990-402519 2 CATTTTAG 0.861275 117 - +>chr1:8851545-885174 36 AATTTGAA 0.832272 183 - +>chr1:5805347-580554 10 AATTTGAA 0.832272 179 + +>chr1:9956513-995671 48 AATTTCAA 0.831373 180 - +>chr1:6588721-658892 21 AATTTCAA 0.831373 101 - +>chr1:3467418-346761 1 AATTTTCA 0.825735 191 - +>chr1:7386917-738711 29 AATTGTAA 0.820624 46 - +>chr1:4774948-477514 6 AATTGTAA 0.820624 119 + +>chr1:9956513-995671 48 AATTTTAC 0.810972 184 + +>chr1:7413722-741392 32 AATTTTAC 0.810972 10 + +>chr1:6472202-647240 20 AATTTTAC 0.810972 32 + +>chr1:6090845-609104 11 TATTGTAA 0.805409 86 + +>chr1:9554705-955490 40 AATGTTAA 0.801568 46 + +>chr1:8851545-885174 36 AATGTTAA 0.801568 99 + +>chr1:6878570-687877 25 GATTGTAA 0.800641 35 + +>chr1:7444756-744495 33 CATTATAC 0.75436 70 - +********** + +3) GTGAATTA (TAATTCAC) 1.969 + +Matrix: MAT3 GTGAATTA +A 0.2434 0.0588 0.09038 0.7708 0.7064 0.0692 0.08129 0.7448 +C 0.08343 0.02873 0.1008 0.06039 0.0821 0.07782 0.09324 0.09242 +G 0.5181 0.06202 0.6983 0.08538 0.05523 0.08203 0.1107 0.09517 +T 0.1551 0.8504 0.1106 0.08347 0.1562 0.7709 0.7148 0.0676 + +OCCURRENCES: +>chr1:6266967-626716 15 GTGAATTA 1 15 + +>chr1:8851545-885174 36 ATGAATTA 0.94756 84 - +>chr1:6437830-643803 19 ATGAATTA 0.94756 128 - +>chr1:5072821-507302 8 TTGAATTA 0.930703 153 - +>chr1:6878570-687877 25 CTGAATTA 0.917022 85 + +>chr1:6721868-672206 23 CTGAATTA 0.917022 71 - +>chr1:9568576-956877 41 GTGATTTA 0.89497 22 + +>chr1:4833767-483396 7 GTGATTTA 0.89497 186 - +>chr1:6262414-626261 14 GTGAATGA 0.884673 174 + +>chr1:4402453-440265 3 GTGAATCA 0.881343 131 - +>chr1:6437830-643803 19 GTGAATAA 0.879062 123 + +>chr1:9956513-995671 48 GTGAATTG 0.875981 77 - +>chr1:6878570-687877 25 GTGAATTG 0.875981 53 + +>chr1:6277750-627795 16 GTGAATTC 0.875456 27 - +>chr1:7768736-776893 34 GTGGATTA 0.86916 11 + +>chr1:6721868-672206 23 GTGGATTA 0.86916 168 + +>chr1:3467418-346761 1 GTGTATTA 0.868795 4 - +>chr1:7209035-720923 26 GTGAAGTA 0.868484 12 - +>chr1:6183701-618390 12 GTGAACTA 0.867682 60 + +>chr1:4024990-402519 2 GTGAACTA 0.867682 97 - +>chr1:6396504-639670 18 GGGAATTA 0.849489 51 + +>chr1:7768736-776893 34 ATTAATTA 0.835366 129 + +>chr1:9978791-997899 50 ATAAATTA 0.831511 27 + +>chr1:7768736-776893 34 ATAAATTA 0.831511 104 + +>chr1:6183701-618390 12 ATGAATCA 0.828904 71 + +>chr1:5072821-507302 8 ATGAATCA 0.828904 85 - +>chr1:9932599-993279 45 ATGACTTA 0.828376 12 - +>chr1:9574131-957433 42 ATGACTTA 0.828376 55 + +>chr1:5072821-507302 8 ATGAATAA 0.826623 21 + +>chr1:9574131-957433 42 TTGATTTA 0.825673 145 - +>chr1:8797248-879744 35 TTGATTTA 0.825673 7 + +>chr1:8797248-879744 35 TTCAATTA 0.816637 176 + +>chr1:3467418-346761 1 TTCAATTA 0.816637 187 - +>chr1:7768736-776893 34 ATGTATTA 0.816356 125 + +>chr1:5072821-507302 8 ATGTATTA 0.816356 42 + +>chr1:3467418-346761 1 ATGTATTA 0.816356 26 - +>chr1:9568576-956877 41 ATGAAGTA 0.816044 150 + +>chr1:6721868-672206 23 TTGAATGA 0.815376 85 + +>chr1:7209035-720923 26 ATGAACTA 0.815242 42 + +>chr1:6878570-687877 25 ATGCATTA 0.81195 43 - +>chr1:8851545-885174 36 TTGACTTA 0.811518 49 - +>chr1:9942670-994287 46 TTGAATTG 0.806684 162 + +>chr1:6090845-609104 11 TTGCATTA 0.795092 39 + +>chr1:6719561-671976 22 ACGAATTA 0.790695 71 - +>chr1:9942670-994287 46 GTCATTTA 0.780905 11 - +>chr1:6472202-647240 20 GTCATTTA 0.780905 144 + +>chr1:9956513-995671 48 GTGATTGA 0.779643 176 + +>chr1:9932599-993279 45 GTGATTGA 0.779643 127 - +>chr1:4774948-477514 6 GTGATCTA 0.762652 15 - +>chr1:6205539-620573 13 GTCTATTA 0.75473 143 + +********** + +4) TCAATCAT (ATGATTGA) 1.944 + +Matrix: MAT4 TCAATCAT +A 0.05132 0.186 0.8183 0.6756 0.1425 0.08545 0.8726 0.1838 +C 0.02852 0.5378 0.07891 0.05957 0.02786 0.6724 0.03639 0.1337 +G 0.0695 0.1008 0.02175 0.09712 0.07326 0.07966 0.0383 0.06208 +T 0.8507 0.1753 0.08099 0.1677 0.7564 0.1625 0.05268 0.6204 + +OCCURRENCES: +>chr1:8851545-885174 36 TCAATCAT 1 120 + +>chr1:7444756-744495 33 TCAATCAT 1 11 + +>chr1:9013525-901372 37 TTAATCAT 0.932721 174 + +>chr1:6090845-609104 11 TTAATCAT 0.932721 167 - +>chr1:4774948-477514 6 TTAATCAT 0.932721 139 - +>chr1:4833767-483396 7 TCAATCAA 0.918952 128 - +>chr1:5072821-507302 8 TGAATCAT 0.918887 84 - +>chr1:9956513-995671 48 TCAATCAC 0.909665 176 - +>chr1:9932599-993279 45 TCAATCAC 0.909665 127 + +>chr1:9574131-957433 42 TCATTCAT 0.90572 83 - +>chr1:7303541-730374 27 TCATTCAT 0.90572 107 - +>chr1:8797248-879744 35 TCAATTAT 0.905366 177 + +>chr1:7388025-738822 30 TCAATCAG 0.896366 4 + +>chr1:7444756-744495 33 TCAAACAT 0.886061 130 - +>chr1:6472202-647240 20 TCACTCAT 0.885652 20 - +>chr1:8797248-879744 35 TCTATCAT 0.863139 56 + +>chr1:6277750-627795 16 TCTATCAT 0.863139 95 + +>chr1:9574131-957433 42 TCCATCAT 0.862751 116 + +>chr1:6266967-626716 15 TCCATCAT 0.862751 112 - +>chr1:9574131-957433 42 TAAATCAA 0.853657 145 + +>chr1:8797248-879744 35 TAAATCAA 0.853657 7 - +>chr1:9932599-993279 45 TTAATCAA 0.851673 45 + +>chr1:6878570-687877 25 TTAATCAA 0.851673 90 + +>chr1:7386917-738711 29 TCAATCCT 0.844783 2 + +>chr1:9568576-956877 41 TAAATCAC 0.84437 22 - +>chr1:4833767-483396 7 TAAATCAC 0.84437 186 + +>chr1:8851545-885174 36 TAATTCAT 0.840425 84 + +>chr1:6437830-643803 19 TAATTCAT 0.840425 128 + +>chr1:9962925-996312 49 TAAATTAT 0.84007 189 - +>chr1:7768736-776893 34 TAAATTAT 0.84007 105 + +>chr1:7303541-730374 27 TTATTCAT 0.838441 169 + +>chr1:5072821-507302 8 TTATTCAT 0.838441 21 - +>chr1:4402453-440265 3 TGAATCAA 0.837839 130 - +>chr1:9932599-993279 45 TAAGTCAT 0.827328 12 + +>chr1:9574131-957433 42 TAAGTCAT 0.827328 55 - +>chr1:6721868-672206 23 TCATTCAA 0.824672 85 - +>chr1:6183701-618390 12 TGATTCAT 0.824607 71 - +>chr1:5072821-507302 8 TGATTCAT 0.824607 85 + +>chr1:3467418-346761 1 TCAATTAA 0.824318 186 - +>chr1:6437830-643803 19 TGAATTAT 0.824253 127 - +>chr1:6266967-626716 15 TGAATTAT 0.824253 16 + +>chr1:4024990-402519 2 TAAAACAT 0.820765 82 + +>chr1:9942670-994287 46 TTAAACAT 0.818782 80 + +>chr1:6360131-636033 17 TTAAACAT 0.818782 143 + +>chr1:5072821-507302 8 TTAAACAT 0.818782 148 - +>chr1:6588721-658892 21 TCAATAAC 0.800726 148 + +>chr1:6262414-626261 14 TCAGTTAT 0.797989 137 + +>chr1:6205539-620573 13 TCTATCAA 0.782091 96 - +>chr1:4662531-466273 5 TAAATCGT 0.779842 56 + +>chr1:4833767-483396 7 TTAATCGT 0.777858 61 + +********** + +5) TGTTTAAT (ATTAAACA) 1.874 + +Matrix: MAT5 TGTTTAAT +A 0.1111 0.09112 0.1881 0.1074 0.1057 0.6769 0.7668 0.1993 +C 0.02537 0.03582 0.06521 0.08013 0.08927 0.09377 0.03432 0.08804 +G 0.07262 0.7214 0.06612 0.05896 0.07036 0.1072 0.116 0.09061 +T 0.7909 0.1516 0.6806 0.7536 0.7347 0.1221 0.08288 0.6221 + +OCCURRENCES: +>chr1:6360131-636033 17 TGTTTAAT 1 142 - +>chr1:5072821-507302 8 TGTTTAAT 1 149 + +>chr1:9942670-994287 46 TGTTTAAA 0.919846 79 - +>chr1:6878570-687877 25 TGATTAAT 0.906643 89 - +>chr1:4774948-477514 6 TGATTAAT 0.906643 140 + +>chr1:9013525-901372 37 TGTTTAAG 0.899248 161 + +>chr1:6360131-636033 17 TGTTTAAG 0.899248 6 + +>chr1:5805347-580554 10 TGTTTAAC 0.898761 16 - +>chr1:9978791-997899 50 TGTTTGAT 0.892015 68 + +>chr1:6396504-639670 18 TGTTTGAT 0.892015 10 - +>chr1:4562216-456241 4 TGTTTGAT 0.892015 141 + +>chr1:6396504-639670 18 TTTTTAAT 0.891988 59 + +>chr1:6090845-609104 11 TTTTTAAT 0.891988 170 - +>chr1:9460371-946057 38 TGTTTCAT 0.889461 121 - +>chr1:9568576-956877 41 TGTTCAAT 0.877642 144 + +>chr1:7388801-738900 31 TGTTTAGT 0.876624 88 - +>chr1:6719561-671976 22 TGTCTAAT 0.872337 67 + +>chr1:6090845-609104 11 TGATTAAA 0.826489 168 + +>chr1:7444756-744495 33 TGTTTGAA 0.811861 131 + +>chr1:5072821-507302 8 TGTTTGAA 0.811861 156 - +>chr1:9932599-993279 45 TGTTTCAA 0.809307 123 + +>chr1:8797248-879744 35 TGTTTCAA 0.809307 173 + +>chr1:6396504-639670 18 TGTTTCAA 0.809307 179 - +>chr1:9932599-993279 45 TGATTAAC 0.805404 44 - +>chr1:9013525-901372 37 TGATTAAC 0.805404 173 - +>chr1:9574131-957433 42 TGATTTAT 0.801481 144 - +>chr1:9568576-956877 41 TGATTTAT 0.801481 23 + +>chr1:8797248-879744 35 TGATTTAT 0.801481 8 + +>chr1:8851545-885174 36 TGATTGAT 0.798658 119 - +>chr1:4833767-483396 7 TGATTGAT 0.798658 129 + +>chr1:9948240-994844 47 TGTATAAA 0.797343 126 + +>chr1:6205539-620573 13 TGTATAAA 0.797343 163 - +>chr1:9932599-993279 45 TGTTTAGA 0.79647 148 + +>chr1:4562216-456241 4 TGTTTAGA 0.79647 165 - +>chr1:6183701-618390 12 TGATTCAT 0.796104 71 - +>chr1:5072821-507302 8 TGATTCAT 0.796104 85 + +>chr1:9948240-994844 47 TGTTTATA 0.790191 128 - +>chr1:9978791-997899 50 TGATAAAT 0.787394 72 + +>chr1:6437830-643803 19 TGAATAAT 0.78414 124 + +>chr1:4024990-402519 2 TGATTATT 0.776988 186 + +>chr1:6183701-618390 12 TGTATAAC 0.776258 8 - +>chr1:6878570-687877 25 TGTTCTAT 0.77248 74 + +>chr1:3467418-346761 1 TGTATTAT 0.772335 25 - +>chr1:3467418-346761 1 TGTATTAT 0.772335 3 - +>chr1:7413722-741392 32 AGTTTAAC 0.769901 187 + +>chr1:9574131-957433 42 TGTATGAT 0.769512 119 - +>chr1:9568576-956877 41 TGTATAGT 0.754121 107 + +>chr1:7444756-744495 33 AGTATAAT 0.748637 69 + +>chr1:6090845-609104 11 AGTATAAT 0.748637 93 - +>chr1:6721868-672206 23 GGTATAAT 0.741335 124 - +********** + +6) ATTACT (AGTAAT) 1.823 + +Matrix: MAT6 ATTACT +A 0.8781 0.0835 0.07038 0.7901 0.03473 0.1075 +C 0.04152 0.05455 0.05837 0.03709 0.7755 0.04605 +G 0.0322 0.01277 0.1088 0.07223 0.04238 0.01572 +T 0.04814 0.8492 0.7624 0.1005 0.1473 0.8308 + +OCCURRENCES: +>chr1:9978791-997899 50 ATTACT 1 31 + +>chr1:9568576-956877 41 ATTACT 1 154 - +>chr1:9013525-901372 37 ATTACT 1 183 - +>chr1:7209035-720923 26 ATTACT 1 183 + +>chr1:6878570-687877 25 ATTACT 1 64 - +>chr1:6719561-671976 22 ATTACT 1 137 + +>chr1:6588721-658892 21 ATTACT 1 9 - +>chr1:6472202-647240 20 ATTACT 1 176 - +>chr1:6472202-647240 20 ATTACT 1 29 - +>chr1:6360131-636033 17 ATTACT 1 86 + +>chr1:6262414-626261 14 ATTACT 1 160 + +>chr1:6090845-609104 11 ATTACT 1 20 - +>chr1:5805347-580554 10 ATTACT 1 176 - +>chr1:5072821-507302 8 ATTACT 1 46 + +>chr1:4774948-477514 6 ATTACT 1 194 - +>chr1:3467418-346761 1 ATTACT 1 32 + +>chr1:9962925-996312 49 ATTATT 0.866209 188 - +>chr1:9962925-996312 49 ATTATT 0.866209 133 + +>chr1:9942670-994287 46 ATTATT 0.866209 73 + +>chr1:8797248-879744 35 ATTATT 0.866209 192 + +>chr1:6878570-687877 25 ATTATT 0.866209 41 - +>chr1:6719561-671976 22 ATTATT 0.866209 181 - +>chr1:6719561-671976 22 ATTATT 0.866209 145 - +>chr1:6472202-647240 20 ATTATT 0.866209 5 + +>chr1:6437830-643803 19 ATTATT 0.866209 126 - +>chr1:6396504-639670 18 ATTATT 0.866209 55 + +>chr1:6090845-609104 11 ATTATT 0.866209 84 + +>chr1:6090845-609104 11 ATTATT 0.866209 43 + +>chr1:5805347-580554 10 ATTATT 0.866209 65 - +>chr1:4024990-402519 2 ATTATT 0.866209 188 + +>chr1:3467418-346761 1 ATTATT 0.866209 24 - +>chr1:9942670-994287 46 ATGACT 0.860789 34 + +>chr1:9932599-993279 45 ATGACT 0.860789 14 - +>chr1:9574131-957433 42 ATGACT 0.860789 55 + +>chr1:9013525-901372 37 ATGACT 0.860789 193 + +>chr1:7768736-776893 34 ATGACT 0.860789 121 - +>chr1:7444756-744495 33 ATGACT 0.860789 194 - +>chr1:7444756-744495 33 ATGACT 0.860789 40 - +>chr1:7303541-730374 27 ATGACT 0.860789 111 + +>chr1:6472202-647240 20 ATGACT 0.860789 143 - +>chr1:6183701-618390 12 ATGACT 0.860789 175 - +>chr1:4662531-466273 5 ATGACT 0.860789 126 + +>chr1:6878570-687877 25 ATTACA 0.845949 38 - +>chr1:6719561-671976 22 ATTACA 0.845949 127 + +>chr1:6588721-658892 21 ATTACA 0.845949 140 - +>chr1:6090845-609104 11 ATTACA 0.845949 89 - +>chr1:5805347-580554 10 ATTACA 0.845949 147 + +>chr1:5805347-580554 10 ATTACA 0.845949 125 + +>chr1:4833767-483396 7 ATTACA 0.845949 107 - +>chr1:4774948-477514 6 ATTACA 0.845949 9 - +********** + +7) TCACAT (ATGTGA) 1.8 + +Matrix: MAT7 TCACAT +A 0.08233 0.1013 0.9032 0.139 0.8283 0.0876 +C 0.04274 0.7009 0.02867 0.7396 0.0355 0.04803 +G 0.04677 0.1109 0.01307 0.0554 0.0163 0.05845 +T 0.8282 0.08693 0.05509 0.06599 0.1199 0.8059 + +OCCURRENCES: +>chr1:9956513-995671 48 TCACAT 1 18 + +>chr1:9813405-981360 44 TCACAT 1 66 - +>chr1:9568576-956877 41 TCACAT 1 33 + +>chr1:8797248-879744 35 TCACAT 1 32 + +>chr1:7303541-730374 27 TCACAT 1 80 + +>chr1:7209035-720923 26 TCACAT 1 127 - +>chr1:6878570-687877 25 TCACAT 1 101 + +>chr1:6878570-687877 25 TCACAT 1 51 - +>chr1:6588721-658892 21 TCACAT 1 22 - +>chr1:6472202-647240 20 TCACAT 1 74 - +>chr1:6277750-627795 16 TCACAT 1 31 + +>chr1:6266967-626716 15 TCACAT 1 171 + +>chr1:6205539-620573 13 TCACAT 1 33 - +>chr1:5805347-580554 10 TCACAT 1 131 - +>chr1:4774948-477514 6 TCACAT 1 132 - +>chr1:4774948-477514 6 TCACAT 1 19 + +>chr1:4662531-466273 5 TCACAT 1 31 - +>chr1:4402453-440265 3 TCACAT 1 135 + +>chr1:4024990-402519 2 TCACAT 1 66 - +>chr1:3467418-346761 1 TCACAT 1 79 + +>chr1:3467418-346761 1 TCACAT 1 53 + +>chr1:9978791-997899 50 TGACAT 0.870153 84 - +>chr1:9942670-994287 46 TGACAT 0.870153 93 + +>chr1:9574131-957433 42 TGACAT 0.870153 111 + +>chr1:9554705-955490 40 TGACAT 0.870153 53 - +>chr1:7303541-730374 27 TGACAT 0.870153 99 + +>chr1:7209035-720923 26 TGACAT 0.870153 121 - +>chr1:6878570-687877 25 TGACAT 0.870153 69 + +>chr1:6437830-643803 19 TGACAT 0.870153 6 - +>chr1:6183701-618390 12 TGACAT 0.870153 140 + +>chr1:4833767-483396 7 TGACAT 0.870153 29 - +>chr1:9956513-995671 48 TAACAT 0.868041 137 + +>chr1:9574131-957433 42 TAACAT 0.868041 104 - +>chr1:9554705-955490 40 TAACAT 0.868041 47 - +>chr1:8851545-885174 36 TAACAT 0.868041 100 - +>chr1:8797248-879744 35 TAACAT 0.868041 103 + +>chr1:7768736-776893 34 TAACAT 0.868041 80 - +>chr1:6090845-609104 11 TAACAT 0.868041 154 + +>chr1:9962925-996312 49 TCAAAT 0.867804 40 - +>chr1:9932599-993279 45 TCAAAT 0.867804 176 + +>chr1:9813405-981360 44 TCAAAT 0.867804 55 + +>chr1:8851545-885174 36 TCAAAT 0.867804 188 - +>chr1:8851545-885174 36 TCAAAT 0.867804 184 + +>chr1:8851545-885174 36 TCAAAT 0.867804 96 + +>chr1:6719561-671976 22 TCAAAT 0.867804 142 + +>chr1:5805347-580554 10 TCAAAT 0.867804 180 - +>chr1:5805347-580554 10 TCAAAT 0.867804 121 + +>chr1:4833767-483396 7 TCAAAT 0.867804 126 - +>chr1:4833767-483396 7 TCAAAT 0.867804 25 + +>chr1:4774948-477514 6 TTACAT 0.864876 8 - +********** + +8) TACATT (AATGTA) 1.773 + +Matrix: MAT8 TACATT +A 0.09867 0.7322 0.1194 0.8806 0.101 0.09423 +C 0.1068 0.06715 0.743 0.03825 0.03645 0.02894 +G 0.1046 0.07625 0.0467 0.01049 0.07555 0.07099 +T 0.6899 0.1244 0.09086 0.07061 0.7871 0.8058 + +OCCURRENCES: +>chr1:9962925-996312 49 TACATT 1 130 + +>chr1:9942670-994287 46 TACATT 1 158 + +>chr1:9932599-993279 45 TACATT 1 105 - +>chr1:9460371-946057 38 TACATT 1 173 + +>chr1:9013525-901372 37 TACATT 1 59 + +>chr1:6721868-672206 23 TACATT 1 46 + +>chr1:6588721-658892 21 TACATT 1 160 - +>chr1:6472202-647240 20 TACATT 1 92 - +>chr1:6396504-639670 18 TACATT 1 64 - +>chr1:6205539-620573 13 TACATT 1 167 + +>chr1:4024990-402519 2 TACATT 1 121 - +>chr1:3467418-346761 1 TACATT 1 29 + +>chr1:4662531-466273 5 CACATT 0.865945 30 - +>chr1:4024990-402519 2 CACATT 0.865945 65 - +>chr1:3467418-346761 1 CACATT 0.865945 54 + +>chr1:9942670-994287 46 GACATT 0.865452 94 + +>chr1:9554705-955490 40 GACATT 0.865452 52 - +>chr1:7388801-738900 31 GACATT 0.865452 176 - +>chr1:7303541-730374 27 GACATT 0.865452 100 + +>chr1:6437830-643803 19 GACATT 0.865452 84 - +>chr1:6437830-643803 19 GACATT 0.865452 5 - +>chr1:6205539-620573 13 GACATT 0.865452 114 + +>chr1:6183701-618390 12 GACATT 0.865452 83 + +>chr1:4833767-483396 7 GACATT 0.865452 28 - +>chr1:4562216-456241 4 GACATT 0.865452 157 + +>chr1:9978791-997899 50 TAAATT 0.856652 28 + +>chr1:9962925-996312 49 TAAATT 0.856652 191 - +>chr1:9948240-994844 47 TAAATT 0.856652 17 - +>chr1:9942670-994287 46 TAAATT 0.856652 131 - +>chr1:9813405-981360 44 TAAATT 0.856652 144 + +>chr1:7768736-776893 34 TAAATT 0.856652 105 + +>chr1:6205539-620573 13 TAAATT 0.856652 161 - +>chr1:6090845-609104 11 TAAATT 0.856652 123 - +>chr1:4774948-477514 6 TAAATT 0.856652 73 + +>chr1:9956513-995671 48 TATATT 0.85009 155 + +>chr1:9942670-994287 46 TATATT 0.85009 119 + +>chr1:9813405-981360 44 TATATT 0.85009 31 - +>chr1:9013525-901372 37 TATATT 0.85009 78 - +>chr1:7768736-776893 34 TATATT 0.85009 2 - +>chr1:7413722-741392 32 TATATT 0.85009 87 + +>chr1:7413722-741392 32 TATATT 0.85009 85 - +>chr1:6472202-647240 20 TATATT 0.85009 2 + +>chr1:6360131-636033 17 TATATT 0.85009 26 - +>chr1:9813405-981360 44 TACAAT 0.842287 28 + +>chr1:9535189-953538 39 TACAAT 0.842287 43 + +>chr1:7386917-738711 29 TACAAT 0.842287 47 + +>chr1:6878570-687877 25 TACAAT 0.842287 36 - +>chr1:6090845-609104 11 TACAAT 0.842287 87 - +>chr1:5805347-580554 10 TACAAT 0.842287 127 + +>chr1:4774948-477514 6 TACAAT 0.842287 120 - +********** + +9) TTGACA (TGTCAA) 1.733 + +Matrix: MAT9 TTGACA +A 0.1287 0.06713 0.09732 0.8048 0.1278 0.8784 +C 0.07599 0.01615 0.136 0.02754 0.7339 0.02517 +G 0.07764 0.03641 0.6902 0.03821 0.05137 0.01588 +T 0.7176 0.8803 0.0764 0.1295 0.08697 0.0806 + +OCCURRENCES: +>chr1:9942670-994287 46 TTGACA 1 92 + +>chr1:8851545-885174 36 TTGACA 1 190 + +>chr1:8797248-879744 35 TTGACA 1 125 - +>chr1:7768736-776893 34 TTGACA 1 75 - +>chr1:7444756-744495 33 TTGACA 1 9 - +>chr1:7209035-720923 26 TTGACA 1 122 - +>chr1:6437830-643803 19 TTGACA 1 7 - +>chr1:6205539-620573 13 TTGACA 1 136 + +>chr1:6090845-609104 11 TTGACA 1 181 + +>chr1:5805347-580554 10 TTGACA 1 119 - +>chr1:4833767-483396 7 TTGACA 1 30 - +>chr1:4024990-402519 2 TTGACA 1 21 + +>chr1:6719561-671976 22 TTCACA 0.875237 86 - +>chr1:6437830-643803 19 TTCACA 0.875237 122 - +>chr1:6277750-627795 16 TTCACA 0.875237 30 + +>chr1:6266967-626716 15 TTCACA 0.875237 170 + +>chr1:6262414-626261 14 TTCACA 0.875237 59 + +>chr1:6090845-609104 11 TTCACA 0.875237 30 + +>chr1:5805347-580554 10 TTCACA 0.875237 132 - +>chr1:5805347-580554 10 TTCACA 0.875237 43 + +>chr1:5168670-516887 9 TTCACA 0.875237 155 + +>chr1:5168670-516887 9 TTCACA 0.875237 24 - +>chr1:4774948-477514 6 TTCACA 0.875237 133 - +>chr1:4402453-440265 3 TTCACA 0.875237 134 + +>chr1:9978791-997899 50 ATGACA 0.867422 85 - +>chr1:9956513-995671 48 ATGACA 0.867422 106 + +>chr1:9932599-993279 45 ATGACA 0.867422 140 + +>chr1:9574131-957433 42 ATGACA 0.867422 179 - +>chr1:9574131-957433 42 ATGACA 0.867422 110 + +>chr1:7768736-776893 34 ATGACA 0.867422 139 + +>chr1:6878570-687877 25 ATGACA 0.867422 68 + +>chr1:6360131-636033 17 ATGACA 0.867422 149 + +>chr1:6262414-626261 14 ATGACA 0.867422 102 - +>chr1:6183701-618390 12 ATGACA 0.867422 108 - +>chr1:6183701-618390 12 ATGACA 0.867422 42 + +>chr1:6090845-609104 11 ATGACA 0.867422 146 - +>chr1:9962925-996312 49 TTAACA 0.866518 121 + +>chr1:9932599-993279 45 TTAACA 0.866518 43 - +>chr1:9574131-957433 42 TTAACA 0.866518 105 - +>chr1:9554705-955490 40 TTAACA 0.866518 48 - +>chr1:8851545-885174 36 TTAACA 0.866518 101 - +>chr1:6266967-626716 15 TTAACA 0.866518 42 - +>chr1:6090845-609104 11 TTAACA 0.866518 153 + +>chr1:4024990-402519 2 TTAACA 0.866518 79 - +>chr1:9978791-997899 50 TTGATA 0.854356 71 + +>chr1:9568576-956877 41 TTGATA 0.854356 190 - +>chr1:9460371-946057 38 TTGATA 0.854356 85 - +>chr1:6588721-658892 21 TTGATA 0.854356 146 - +>chr1:6205539-620573 13 TTGATA 0.854356 96 + +>chr1:4833767-483396 7 TTGATA 0.854356 23 - +********** + +10) AATAAT (ATTATT) 1.714 + +Matrix: MAT10 AATAAT +A 0.821 0.7797 0.1145 0.7009 0.8508 0.1043 +C 0.05975 0.04103 0.05818 0.09164 0.07329 0.07718 +G 0.05042 0.1458 0.06827 0.08449 0.04593 0.05097 +T 0.06887 0.03344 0.759 0.1229 0.02993 0.7675 + +OCCURRENCES: +>chr1:9962925-996312 49 AATAAT 1 188 + +>chr1:9962925-996312 49 AATAAT 1 133 - +>chr1:9942670-994287 46 AATAAT 1 73 - +>chr1:8797248-879744 35 AATAAT 1 192 - +>chr1:6878570-687877 25 AATAAT 1 41 + +>chr1:6719561-671976 22 AATAAT 1 181 + +>chr1:6719561-671976 22 AATAAT 1 145 + +>chr1:6472202-647240 20 AATAAT 1 5 - +>chr1:6437830-643803 19 AATAAT 1 126 + +>chr1:6396504-639670 18 AATAAT 1 55 - +>chr1:6090845-609104 11 AATAAT 1 84 - +>chr1:6090845-609104 11 AATAAT 1 43 - +>chr1:5805347-580554 10 AATAAT 1 65 + +>chr1:4024990-402519 2 AATAAT 1 188 - +>chr1:3467418-346761 1 AATAAT 1 24 + +>chr1:9962925-996312 49 AATTAT 0.867789 189 - +>chr1:8797248-879744 35 AATTAT 0.867789 179 + +>chr1:7768736-776893 34 AATTAT 0.867789 107 + +>chr1:6719561-671976 22 AATTAT 0.867789 182 - +>chr1:6719561-671976 22 AATTAT 0.867789 146 - +>chr1:6588721-658892 21 AATTAT 0.867789 143 + +>chr1:6437830-643803 19 AATTAT 0.867789 127 - +>chr1:6396504-639670 18 AATTAT 0.867789 54 + +>chr1:6266967-626716 15 AATTAT 0.867789 18 + +>chr1:6090845-609104 11 AATTAT 0.867789 92 + +>chr1:6090845-609104 11 AATTAT 0.867789 83 + +>chr1:5805347-580554 10 AATTAT 0.867789 66 - +>chr1:8851545-885174 36 AATCAT 0.860628 122 + +>chr1:8797248-879744 35 AATCAT 0.860628 138 - +>chr1:7444756-744495 33 AATCAT 0.860628 13 + +>chr1:6090845-609104 11 AATCAT 0.860628 167 - +>chr1:5168670-516887 9 AATCAT 0.860628 31 - +>chr1:5072821-507302 8 AATCAT 0.860628 84 - +>chr1:4774948-477514 6 AATCAT 0.860628 139 - +>chr1:9978791-997899 50 AGTAAT 0.854996 31 - +>chr1:9568576-956877 41 AGTAAT 0.854996 154 + +>chr1:9013525-901372 37 AGTAAT 0.854996 183 + +>chr1:7209035-720923 26 AGTAAT 0.854996 183 - +>chr1:6878570-687877 25 AGTAAT 0.854996 64 + +>chr1:6719561-671976 22 AGTAAT 0.854996 137 - +>chr1:6588721-658892 21 AGTAAT 0.854996 9 + +>chr1:6472202-647240 20 AGTAAT 0.854996 176 + +>chr1:6472202-647240 20 AGTAAT 0.854996 29 + +>chr1:6360131-636033 17 AGTAAT 0.854996 86 - +>chr1:6262414-626261 14 AGTAAT 0.854996 160 - +>chr1:6090845-609104 11 AGTAAT 0.854996 20 + +>chr1:5805347-580554 10 AGTAAT 0.854996 176 + +>chr1:5072821-507302 8 AGTAAT 0.854996 46 - +>chr1:4774948-477514 6 AGTAAT 0.854996 194 + +>chr1:3467418-346761 1 AGTAAT 0.854996 32 - +********** + +11) ATGACT (AGTCAT) 1.702 + +Matrix: MAT11 ATGACT +A 0.7923 0.07691 0.08105 0.8363 0.111 0.123 +C 0.07045 0.02479 0.07396 0.02863 0.7081 0.04261 +G 0.06487 0.02256 0.6694 0.03558 0.06369 0.01484 +T 0.07241 0.8757 0.1756 0.0995 0.1172 0.8196 + +OCCURRENCES: +>chr1:9942670-994287 46 ATGACT 1 34 + +>chr1:9932599-993279 45 ATGACT 1 14 - +>chr1:9574131-957433 42 ATGACT 1 55 + +>chr1:9013525-901372 37 ATGACT 1 193 + +>chr1:7768736-776893 34 ATGACT 1 121 - +>chr1:7444756-744495 33 ATGACT 1 194 - +>chr1:7444756-744495 33 ATGACT 1 40 - +>chr1:7303541-730374 27 ATGACT 1 111 + +>chr1:6472202-647240 20 ATGACT 1 143 - +>chr1:6183701-618390 12 ATGACT 1 175 - +>chr1:4662531-466273 5 ATGACT 1 126 + +>chr1:9978791-997899 50 ATTACT 0.888601 31 + +>chr1:9568576-956877 41 ATTACT 0.888601 154 - +>chr1:9013525-901372 37 ATTACT 0.888601 183 - +>chr1:7209035-720923 26 ATTACT 0.888601 183 + +>chr1:6878570-687877 25 ATTACT 0.888601 64 - +>chr1:6719561-671976 22 ATTACT 0.888601 137 + +>chr1:6588721-658892 21 ATTACT 0.888601 9 - +>chr1:6472202-647240 20 ATTACT 0.888601 176 - +>chr1:6472202-647240 20 ATTACT 0.888601 29 - +>chr1:6360131-636033 17 ATTACT 0.888601 86 + +>chr1:6262414-626261 14 ATTACT 0.888601 160 + +>chr1:6090845-609104 11 ATTACT 0.888601 20 - +>chr1:5805347-580554 10 ATTACT 0.888601 176 - +>chr1:5072821-507302 8 ATTACT 0.888601 46 + +>chr1:4774948-477514 6 ATTACT 0.888601 194 - +>chr1:3467418-346761 1 ATTACT 0.888601 32 + +>chr1:8797248-879744 35 ATAACT 0.867272 132 + +>chr1:7209035-720923 26 ATAACT 0.867272 79 + +>chr1:6360131-636033 17 ATAACT 0.867272 111 - +>chr1:6262414-626261 14 ATAACT 0.867272 139 - +>chr1:6183701-618390 12 ATAACT 0.867272 7 - +>chr1:5072821-507302 8 ATAACT 0.867272 25 + +>chr1:9956513-995671 48 ATGATT 0.866692 35 - +>chr1:9013525-901372 37 ATGATT 0.866692 176 - +>chr1:8851545-885174 36 ATGATT 0.866692 122 - +>chr1:8797248-879744 35 ATGATT 0.866692 138 + +>chr1:7444756-744495 33 ATGATT 0.866692 13 - +>chr1:6090845-609104 11 ATGATT 0.866692 167 + +>chr1:5168670-516887 9 ATGATT 0.866692 31 + +>chr1:5072821-507302 8 ATGATT 0.866692 84 + +>chr1:4774948-477514 6 ATGATT 0.866692 139 + +>chr1:8851545-885174 36 ATGAAT 0.865302 86 - +>chr1:7303541-730374 27 ATGAAT 0.865302 171 - +>chr1:7303541-730374 27 ATGAAT 0.865302 107 + +>chr1:6437830-643803 19 ATGAAT 0.865302 130 - +>chr1:6183701-618390 12 ATGAAT 0.865302 71 + +>chr1:5072821-507302 8 ATGAAT 0.865302 127 + +>chr1:5072821-507302 8 ATGAAT 0.865302 87 - +>chr1:5072821-507302 8 ATGAAT 0.865302 21 + +********** + +12) TTGAAA (TTTCAA) 1.693 + +Matrix: MAT12 TTGAAA +A 0.08972 0.0524 0.2142 0.8734 0.7765 0.8008 +C 0.08565 0.01703 0.08046 0.009563 0.09992 0.05955 +G 0.07526 0.08263 0.6446 0.04169 0.04608 0.03337 +T 0.7494 0.8479 0.06075 0.07533 0.07746 0.1063 + +OCCURRENCES: +>chr1:9956513-995671 48 TTGAAA 1 180 + +>chr1:9948240-994844 47 TTGAAA 1 112 + +>chr1:9932599-993279 45 TTGAAA 1 125 - +>chr1:8797248-879744 35 TTGAAA 1 175 - +>chr1:7444756-744495 33 TTGAAA 1 161 - +>chr1:7388025-738822 30 TTGAAA 1 2 - +>chr1:7309743-730994 28 TTGAAA 1 146 - +>chr1:7209035-720923 26 TTGAAA 1 54 - +>chr1:6750817-675101 24 TTGAAA 1 153 + +>chr1:6588721-658892 21 TTGAAA 1 101 + +>chr1:6396504-639670 18 TTGAAA 1 179 + +>chr1:6266967-626716 15 TTGAAA 1 176 - +>chr1:6090845-609104 11 TTGAAA 1 102 + +>chr1:4562216-456241 4 TTGAAA 1 148 - +>chr1:3467418-346761 1 TTGAAA 1 190 + +>chr1:6205539-620573 13 TTAAAA 0.903286 157 + +>chr1:6205539-620573 13 TTAAAA 0.903286 28 + +>chr1:6090845-609104 11 TTAAAA 0.903286 171 + +>chr1:6090845-609104 11 TTAAAA 0.903286 151 - +>chr1:5072821-507302 8 TTAAAA 0.903286 114 + +>chr1:4024990-402519 2 TTAAAA 0.903286 81 + +>chr1:3467418-346761 1 TTAAAA 0.903286 184 - +>chr1:9942670-994287 46 TTGACA 0.847972 92 + +>chr1:8851545-885174 36 TTGACA 0.847972 190 + +>chr1:8797248-879744 35 TTGACA 0.847972 125 - +>chr1:7768736-776893 34 TTGACA 0.847972 75 - +>chr1:7444756-744495 33 TTGACA 0.847972 9 - +>chr1:7209035-720923 26 TTGACA 0.847972 122 - +>chr1:6437830-643803 19 TTGACA 0.847972 7 - +>chr1:6205539-620573 13 TTGACA 0.847972 136 + +>chr1:6090845-609104 11 TTGACA 0.847972 181 + +>chr1:5805347-580554 10 TTGACA 0.847972 119 - +>chr1:4833767-483396 7 TTGACA 0.847972 30 - +>chr1:4024990-402519 2 TTGACA 0.847972 21 + +>chr1:9942670-994287 46 TTGAAT 0.843956 162 + +>chr1:7444756-744495 33 TTGAAT 0.843956 134 + +>chr1:7413722-741392 32 TTGAAT 0.843956 123 - +>chr1:6721868-672206 23 TTGAAT 0.843956 85 + +>chr1:6277750-627795 16 TTGAAT 0.843956 35 - +>chr1:6277750-627795 16 TTGAAT 0.843956 25 + +>chr1:6183701-618390 12 TTGAAT 0.843956 87 + +>chr1:6090845-609104 11 TTGAAT 0.843956 119 - +>chr1:6090845-609104 11 TTGAAT 0.843956 115 + +>chr1:5072821-507302 8 TTGAAT 0.843956 155 - +>chr1:9978791-997899 50 TTGATA 0.842925 71 + +>chr1:9568576-956877 41 TTGATA 0.842925 190 - +>chr1:9460371-946057 38 TTGATA 0.842925 85 - +>chr1:6588721-658892 21 TTGATA 0.842925 146 - +>chr1:6205539-620573 13 TTGATA 0.842925 96 + +>chr1:4833767-483396 7 TTGATA 0.842925 23 - +********** + +13) ATTTTA (TAAAAT) 1.691 + +Matrix: MAT13 ATTTTA +A 0.7686 0.09324 0.05315 0.07052 0.1079 0.7442 +C 0.05296 0.05566 0.02564 0.04286 0.09559 0.0825 +G 0.06642 0.0539 0.07598 0.07754 0.08916 0.06427 +T 0.112 0.7972 0.8452 0.8091 0.7073 0.1091 + +OCCURRENCES: +>chr1:9956513-995671 48 ATTTTA 1 185 + +>chr1:9942670-994287 46 ATTTTA 1 135 - +>chr1:9813405-981360 44 ATTTTA 1 132 - +>chr1:9013525-901372 37 ATTTTA 1 135 - +>chr1:9013525-901372 37 ATTTTA 1 113 - +>chr1:8851545-885174 36 ATTTTA 1 104 - +>chr1:8797248-879744 35 ATTTTA 1 147 - +>chr1:7768736-776893 34 ATTTTA 1 135 - +>chr1:7413722-741392 32 ATTTTA 1 11 + +>chr1:7388801-738900 31 ATTTTA 1 63 + +>chr1:7386917-738711 29 ATTTTA 1 31 - +>chr1:6472202-647240 20 ATTTTA 1 33 + +>chr1:6266967-626716 15 ATTTTA 1 3 - +>chr1:6262414-626261 14 ATTTTA 1 184 + +>chr1:6205539-620573 13 ATTTTA 1 158 - +>chr1:6205539-620573 13 ATTTTA 1 29 - +>chr1:6090845-609104 11 ATTTTA 1 150 + +>chr1:5805347-580554 10 ATTTTA 1 155 + +>chr1:5072821-507302 8 ATTTTA 1 115 - +>chr1:4833767-483396 7 ATTTTA 1 122 - +>chr1:4024990-402519 2 ATTTTA 1 118 - +>chr1:3467418-346761 1 ATTTTA 1 183 + +>chr1:3467418-346761 1 ATTTTA 1 134 + +>chr1:8851545-885174 36 ATTTGA 0.857659 188 + +>chr1:8851545-885174 36 ATTTGA 0.857659 184 - +>chr1:8851545-885174 36 ATTTGA 0.857659 96 - +>chr1:6719561-671976 22 ATTTGA 0.857659 142 - +>chr1:5805347-580554 10 ATTTGA 0.857659 180 + +>chr1:5805347-580554 10 ATTTGA 0.857659 121 - +>chr1:4833767-483396 7 ATTTGA 0.857659 126 + +>chr1:4833767-483396 7 ATTTGA 0.857659 25 - +>chr1:9813405-981360 44 ATTGTA 0.83155 28 - +>chr1:9535189-953538 39 ATTGTA 0.83155 43 - +>chr1:7386917-738711 29 ATTGTA 0.83155 47 - +>chr1:6878570-687877 25 ATTGTA 0.83155 36 + +>chr1:6090845-609104 11 ATTGTA 0.83155 87 + +>chr1:5805347-580554 10 ATTGTA 0.83155 127 - +>chr1:4774948-477514 6 ATTGTA 0.83155 120 + +>chr1:9948240-994844 47 ATTATA 0.829934 57 - +>chr1:7444756-744495 33 ATTATA 0.829934 71 - +>chr1:6721868-672206 23 ATTATA 0.829934 124 + +>chr1:6266967-626716 15 ATTATA 0.829934 19 + +>chr1:6090845-609104 11 ATTATA 0.829934 93 + +>chr1:9956513-995671 48 ATGTTA 0.822869 137 - +>chr1:9574131-957433 42 ATGTTA 0.822869 104 + +>chr1:9554705-955490 40 ATGTTA 0.822869 47 + +>chr1:8851545-885174 36 ATGTTA 0.822869 100 + +>chr1:8797248-879744 35 ATGTTA 0.822869 103 - +>chr1:7768736-776893 34 ATGTTA 0.822869 80 + +>chr1:6090845-609104 11 ATGTTA 0.822869 154 - +********** + +14) TAAACA (TGTTTA) 1.685 + +Matrix: MAT14 TAAACA +A 0.1209 0.7643 0.8407 0.816 0.1373 0.8505 +C 0.06995 0.06399 0.03895 0.04513 0.685 0.05353 +G 0.09693 0.07741 0.05489 0.04938 0.08995 0.02176 +T 0.7122 0.09432 0.06543 0.08946 0.08774 0.07422 + +OCCURRENCES: +>chr1:9948240-994844 47 TAAACA 1 130 + +>chr1:9942670-994287 46 TAAACA 1 81 + +>chr1:9932599-993279 45 TAAACA 1 148 - +>chr1:9574131-957433 42 TAAACA 1 5 + +>chr1:9013525-901372 37 TAAACA 1 161 - +>chr1:7768736-776893 34 TAAACA 1 169 + +>chr1:7388801-738900 31 TAAACA 1 90 + +>chr1:6472202-647240 20 TAAACA 1 12 - +>chr1:6360131-636033 17 TAAACA 1 144 + +>chr1:6360131-636033 17 TAAACA 1 6 - +>chr1:5805347-580554 10 TAAACA 1 18 + +>chr1:5072821-507302 8 TAAACA 1 149 - +>chr1:5072821-507302 8 TAAACA 1 15 + +>chr1:4562216-456241 4 TAAACA 1 167 + +>chr1:7209035-720923 26 TAAAGA 0.862934 168 + +>chr1:6721868-672206 23 TAAAGA 0.862934 96 - +>chr1:6588721-658892 21 TAAAGA 0.862934 84 - +>chr1:6437830-643803 19 TAAAGA 0.862934 173 - +>chr1:6396504-639670 18 TAAAGA 0.862934 5 + +>chr1:6205539-620573 13 TAAAGA 0.862934 190 + +>chr1:6205539-620573 13 TAAAGA 0.862934 25 - +>chr1:4024990-402519 2 TAAAGA 0.862934 36 + +>chr1:9962925-996312 49 TTAACA 0.845673 121 + +>chr1:9932599-993279 45 TTAACA 0.845673 43 - +>chr1:9574131-957433 42 TTAACA 0.845673 105 - +>chr1:9554705-955490 40 TTAACA 0.845673 48 - +>chr1:8851545-885174 36 TTAACA 0.845673 101 - +>chr1:6266967-626716 15 TTAACA 0.845673 42 - +>chr1:6090845-609104 11 TTAACA 0.845673 153 + +>chr1:4024990-402519 2 TTAACA 0.845673 79 - +>chr1:9956513-995671 48 TCAACA 0.838688 85 + +>chr1:9568576-956877 41 TCAACA 0.838688 88 - +>chr1:8797248-879744 35 TCAACA 0.838688 127 + +>chr1:6878570-687877 25 TCAACA 0.838688 2 - +>chr1:6262414-626261 14 TCAACA 0.838688 130 - +>chr1:6205539-620573 13 TCAACA 0.838688 134 - +>chr1:6183701-618390 12 TCAACA 0.838688 54 - +>chr1:9932599-993279 45 TAATCA 0.832635 46 + +>chr1:9013525-901372 37 TAATCA 0.832635 175 + +>chr1:6878570-687877 25 TAATCA 0.832635 91 + +>chr1:6090845-609104 11 TAATCA 0.832635 168 - +>chr1:4774948-477514 6 TAATCA 0.832635 140 - +>chr1:4024990-402519 2 TAATCA 0.832635 186 - +>chr1:3467418-346761 1 TAATCA 0.832635 50 + +>chr1:9956513-995671 48 TATACA 0.821406 153 - +>chr1:9948240-994844 47 TATACA 0.821406 126 - +>chr1:9568576-956877 41 TATACA 0.821406 107 - +>chr1:6437830-643803 19 TATACA 0.821406 105 + +>chr1:6205539-620573 13 TATACA 0.821406 165 + +>chr1:6183701-618390 12 TATACA 0.821406 10 + +********** + +15) ATGATT (AATCAT) 1.678 + +Matrix: MAT15 ATGATT +A 0.7798 0.03895 0.1394 0.8016 0.09996 0.05479 +C 0.04628 0.04674 0.09287 0.04385 0.1069 0.06237 +G 0.07995 0.03341 0.6306 0.0422 0.05873 0.09202 +T 0.09401 0.8809 0.1371 0.1123 0.7344 0.7908 + +OCCURRENCES: +>chr1:9956513-995671 48 ATGATT 1 35 - +>chr1:9013525-901372 37 ATGATT 1 176 - +>chr1:8851545-885174 36 ATGATT 1 122 - +>chr1:8797248-879744 35 ATGATT 1 138 + +>chr1:7444756-744495 33 ATGATT 1 13 - +>chr1:6090845-609104 11 ATGATT 1 167 + +>chr1:5168670-516887 9 ATGATT 1 31 + +>chr1:5072821-507302 8 ATGATT 1 84 + +>chr1:4774948-477514 6 ATGATT 1 139 + +>chr1:9962925-996312 49 ATAATT 0.885497 189 + +>chr1:8797248-879744 35 ATAATT 0.885497 179 - +>chr1:7768736-776893 34 ATAATT 0.885497 107 - +>chr1:6719561-671976 22 ATAATT 0.885497 182 + +>chr1:6719561-671976 22 ATAATT 0.885497 146 + +>chr1:6588721-658892 21 ATAATT 0.885497 143 - +>chr1:6437830-643803 19 ATAATT 0.885497 127 + +>chr1:6396504-639670 18 ATAATT 0.885497 54 - +>chr1:6266967-626716 15 ATAATT 0.885497 18 - +>chr1:6090845-609104 11 ATAATT 0.885497 92 - +>chr1:6090845-609104 11 ATAATT 0.885497 83 - +>chr1:5805347-580554 10 ATAATT 0.885497 66 + +>chr1:9962925-996312 49 ATTATT 0.884969 188 - +>chr1:9962925-996312 49 ATTATT 0.884969 133 + +>chr1:9942670-994287 46 ATTATT 0.884969 73 + +>chr1:8797248-879744 35 ATTATT 0.884969 192 + +>chr1:6878570-687877 25 ATTATT 0.884969 41 - +>chr1:6719561-671976 22 ATTATT 0.884969 181 - +>chr1:6719561-671976 22 ATTATT 0.884969 145 - +>chr1:6472202-647240 20 ATTATT 0.884969 5 + +>chr1:6437830-643803 19 ATTATT 0.884969 126 - +>chr1:6396504-639670 18 ATTATT 0.884969 55 + +>chr1:6090845-609104 11 ATTATT 0.884969 84 + +>chr1:6090845-609104 11 ATTATT 0.884969 43 + +>chr1:5805347-580554 10 ATTATT 0.884969 65 - +>chr1:4024990-402519 2 ATTATT 0.884969 188 + +>chr1:3467418-346761 1 ATTATT 0.884969 24 - +>chr1:9978791-997899 50 ATCATT 0.874651 77 - +>chr1:9932599-993279 45 ATCATT 0.874651 18 + +>chr1:9013525-901372 37 ATCATT 0.874651 40 - +>chr1:7444756-744495 33 ATCATT 0.874651 104 - +>chr1:6277750-627795 16 ATCATT 0.874651 98 + +>chr1:6205539-620573 13 ATCATT 0.874651 38 + +>chr1:5168670-516887 9 ATCATT 0.874651 30 - +>chr1:4774948-477514 6 ATCATT 0.874651 138 - +>chr1:6472202-647240 20 ATGACT 0.853739 143 - +>chr1:6183701-618390 12 ATGACT 0.853739 175 - +>chr1:4662531-466273 5 ATGACT 0.853739 126 + +>chr1:9942670-994287 46 ACGATT 0.805545 105 - +>chr1:4833767-483396 7 ACGATT 0.805545 63 - +>chr1:4662531-466273 5 ACGATT 0.805545 58 - +********** + +16) AGTATT (AATACT) 1.664 + +Matrix: MAT16 AGTATT +A 0.8169 0.03667 0.1017 0.7024 0.1984 0.05968 +C 0.01586 0.0534 0.06469 0.08485 0.01561 0.04079 +G 0.03184 0.739 0.03857 0.072 0.06428 0.04865 +T 0.1354 0.1709 0.7951 0.1407 0.7217 0.8509 + +OCCURRENCES: +>chr1:9574131-957433 42 AGTATT 1 164 - +>chr1:8851545-885174 36 AGTATT 1 143 - +>chr1:7444756-744495 33 AGTATT 1 137 - +>chr1:7388801-738900 31 AGTATT 1 75 - +>chr1:7209035-720923 26 AGTATT 1 10 - +>chr1:6262414-626261 14 AGTATT 1 181 + +>chr1:5805347-580554 10 AGTATT 1 152 + +>chr1:4833767-483396 7 AGTATT 1 18 + +>chr1:4402453-440265 3 AGTATT 1 177 + +>chr1:9978791-997899 50 AGTAAT 0.881262 31 - +>chr1:9568576-956877 41 AGTAAT 0.881262 154 + +>chr1:9013525-901372 37 AGTAAT 0.881262 183 + +>chr1:7209035-720923 26 AGTAAT 0.881262 183 - +>chr1:6878570-687877 25 AGTAAT 0.881262 64 + +>chr1:6719561-671976 22 AGTAAT 0.881262 137 - +>chr1:6588721-658892 21 AGTAAT 0.881262 9 + +>chr1:6472202-647240 20 AGTAAT 0.881262 176 + +>chr1:6472202-647240 20 AGTAAT 0.881262 29 + +>chr1:6360131-636033 17 AGTAAT 0.881262 86 - +>chr1:6262414-626261 14 AGTAAT 0.881262 160 - +>chr1:6090845-609104 11 AGTAAT 0.881262 20 + +>chr1:5805347-580554 10 AGTAAT 0.881262 176 + +>chr1:5072821-507302 8 AGTAAT 0.881262 46 - +>chr1:4774948-477514 6 AGTAAT 0.881262 194 + +>chr1:3467418-346761 1 AGTAAT 0.881262 32 - +>chr1:9962925-996312 49 ATTATT 0.871068 188 - +>chr1:9962925-996312 49 ATTATT 0.871068 133 + +>chr1:9942670-994287 46 ATTATT 0.871068 73 + +>chr1:8797248-879744 35 ATTATT 0.871068 192 + +>chr1:6878570-687877 25 ATTATT 0.871068 41 - +>chr1:6719561-671976 22 ATTATT 0.871068 181 - +>chr1:6719561-671976 22 ATTATT 0.871068 145 - +>chr1:6472202-647240 20 ATTATT 0.871068 5 + +>chr1:6437830-643803 19 ATTATT 0.871068 126 - +>chr1:6396504-639670 18 ATTATT 0.871068 55 + +>chr1:6090845-609104 11 ATTATT 0.871068 84 + +>chr1:6090845-609104 11 ATTATT 0.871068 43 + +>chr1:5805347-580554 10 ATTATT 0.871068 65 - +>chr1:4024990-402519 2 ATTATT 0.871068 188 + +>chr1:3467418-346761 1 ATTATT 0.871068 24 - +>chr1:9956513-995671 48 TGTATT 0.845348 68 - +>chr1:9948240-994844 47 TGTATT 0.845348 84 + +>chr1:9942670-994287 46 TGTATT 0.845348 156 - +>chr1:9574131-957433 42 TGTATT 0.845348 48 + +>chr1:7768736-776893 34 TGTATT 0.845348 126 + +>chr1:5072821-507302 8 TGTATT 0.845348 43 + +>chr1:4024990-402519 2 TGTATT 0.845348 173 + +>chr1:3467418-346761 1 TGTATT 0.845348 114 + +>chr1:3467418-346761 1 TGTATT 0.845348 27 - +>chr1:3467418-346761 1 TGTATT 0.845348 5 - +********** + +17) TTGAGT (ACTCAA) 1.574 + +Matrix: MAT17 TTGAGT +A 0.1122 0.08252 0.1535 0.8677 0.167 0.08711 +C 0.05038 0.01835 0.1439 0.01705 0.09193 0.05017 +G 0.1 0.04409 0.647 0.02826 0.5944 0.03802 +T 0.7373 0.855 0.05572 0.08702 0.1467 0.8247 + +OCCURRENCES: +>chr1:9978791-997899 50 TTGAGT 1 34 - +>chr1:9956513-995671 48 TTGAGT 1 83 - +>chr1:9574131-957433 42 TTGAGT 1 43 + +>chr1:9568576-956877 41 TTGAGT 1 90 + +>chr1:8851545-885174 36 TTGAGT 1 94 - +>chr1:7413722-741392 32 TTGAGT 1 165 + +>chr1:7388801-738900 31 TTGAGT 1 78 - +>chr1:7309743-730994 28 TTGAGT 1 133 + +>chr1:7209035-720923 26 TTGAGT 1 90 + +>chr1:6719561-671976 22 TTGAGT 1 140 - +>chr1:6262414-626261 14 TTGAGT 1 132 + +>chr1:6183701-618390 12 TTGAGT 1 56 + +>chr1:5072821-507302 8 TTGAGT 1 174 - +>chr1:4833767-483396 7 TTGAGT 1 3 + +>chr1:9942670-994287 46 TTGAAT 0.89953 162 + +>chr1:7444756-744495 33 TTGAAT 0.89953 134 + +>chr1:7413722-741392 32 TTGAAT 0.89953 123 - +>chr1:6721868-672206 23 TTGAAT 0.89953 85 + +>chr1:6277750-627795 16 TTGAAT 0.89953 35 - +>chr1:6277750-627795 16 TTGAAT 0.89953 25 + +>chr1:6183701-618390 12 TTGAAT 0.89953 87 + +>chr1:6090845-609104 11 TTGAAT 0.89953 119 - +>chr1:6090845-609104 11 TTGAAT 0.89953 115 + +>chr1:5072821-507302 8 TTGAAT 0.89953 155 - +>chr1:9932599-993279 45 TTGATT 0.89477 47 - +>chr1:9574131-957433 42 TTGATT 0.89477 147 - +>chr1:8851545-885174 36 TTGATT 0.89477 118 - +>chr1:8797248-879744 35 TTGATT 0.89477 7 + +>chr1:6878570-687877 25 TTGATT 0.89477 92 - +>chr1:4833767-483396 7 TTGATT 0.89477 128 + +>chr1:4562216-456241 4 TTGATT 0.89477 144 + +>chr1:4402453-440265 3 TTGATT 0.89477 130 + +>chr1:4024990-402519 2 TTGATT 0.89477 185 + +>chr1:9962925-996312 49 TTAAGT 0.884013 119 - +>chr1:9932599-993279 45 TTAAGT 0.884013 11 + +>chr1:9932599-993279 45 TTAAGT 0.884013 9 - +>chr1:9574131-957433 42 TTAAGT 0.884013 58 - +>chr1:9013525-901372 37 TTAAGT 0.884013 164 + +>chr1:8797248-879744 35 TTAAGT 0.884013 120 + +>chr1:8797248-879744 35 TTAAGT 0.884013 84 - +>chr1:6396504-639670 18 TTAAGT 0.884013 109 - +>chr1:6360131-636033 17 TTAAGT 0.884013 43 - +>chr1:6396504-639670 18 TTGACT 0.881899 34 - +>chr1:4833767-483396 7 TTGACT 0.881899 6 - +>chr1:9813405-981360 44 ATGAGT 0.853079 136 + +>chr1:6588721-658892 21 ATGAGT 0.853079 75 - +>chr1:6472202-647240 20 ATGAGT 0.853079 20 + +>chr1:6277750-627795 16 ATGAGT 0.853079 42 - +>chr1:6262414-626261 14 ATGAGT 0.853079 178 + +>chr1:6205539-620573 13 ATGAGT 0.853079 20 + +********** + +18) TAAAAC (GTTTTA) 1.572 + +Matrix: MAT18 TAAAAC +A 0.09976 0.6641 0.8508 0.8037 0.7918 0.1444 +C 0.1021 0.08641 0.05693 0.04963 0.05966 0.5613 +G 0.05432 0.1092 0.04048 0.04923 0.04498 0.06765 +T 0.7438 0.1403 0.05175 0.09743 0.1036 0.2267 + +OCCURRENCES: +>chr1:9813405-981360 44 TAAAAC 1 140 - +>chr1:7388801-738900 31 TAAAAC 1 51 + +>chr1:7303541-730374 27 TAAAAC 1 12 + +>chr1:6472202-647240 20 TAAAAC 1 133 - +>chr1:4774948-477514 6 TAAAAC 1 124 + +>chr1:4662531-466273 5 TAAAAC 1 38 + +>chr1:4024990-402519 2 TAAAAC 1 82 + +>chr1:9956513-995671 48 TAAAAT 0.917837 185 - +>chr1:9942670-994287 46 TAAAAT 0.917837 135 + +>chr1:9813405-981360 44 TAAAAT 0.917837 132 + +>chr1:9013525-901372 37 TAAAAT 0.917837 135 + +>chr1:9013525-901372 37 TAAAAT 0.917837 113 + +>chr1:8851545-885174 36 TAAAAT 0.917837 104 + +>chr1:8797248-879744 35 TAAAAT 0.917837 147 + +>chr1:7768736-776893 34 TAAAAT 0.917837 135 + +>chr1:7413722-741392 32 TAAAAT 0.917837 11 - +>chr1:7388801-738900 31 TAAAAT 0.917837 63 - +>chr1:7386917-738711 29 TAAAAT 0.917837 31 + +>chr1:6472202-647240 20 TAAAAT 0.917837 33 - +>chr1:6266967-626716 15 TAAAAT 0.917837 3 + +>chr1:6262414-626261 14 TAAAAT 0.917837 184 - +>chr1:6205539-620573 13 TAAAAT 0.917837 158 + +>chr1:6205539-620573 13 TAAAAT 0.917837 29 + +>chr1:6090845-609104 11 TAAAAT 0.917837 150 - +>chr1:5805347-580554 10 TAAAAT 0.917837 155 - +>chr1:5072821-507302 8 TAAAAT 0.917837 115 + +>chr1:4833767-483396 7 TAAAAT 0.917837 122 + +>chr1:4024990-402519 2 TAAAAT 0.917837 118 + +>chr1:3467418-346761 1 TAAAAT 0.917837 183 - +>chr1:3467418-346761 1 TAAAAT 0.917837 134 - +>chr1:9962925-996312 49 TTAAAC 0.871367 7 + +>chr1:9942670-994287 46 TTAAAC 0.871367 80 + +>chr1:9013525-901372 37 TTAAAC 0.871367 162 - +>chr1:9013525-901372 37 TTAAAC 0.871367 110 - +>chr1:8797248-879744 35 TTAAAC 0.871367 118 - +>chr1:7444756-744495 33 TTAAAC 0.871367 99 - +>chr1:7413722-741392 32 TTAAAC 0.871367 188 - +>chr1:7388801-738900 31 TTAAAC 0.871367 37 + +>chr1:6360131-636033 17 TTAAAC 0.871367 143 + +>chr1:6360131-636033 17 TTAAAC 0.871367 7 - +>chr1:5805347-580554 10 TTAAAC 0.871367 17 + +>chr1:5072821-507302 8 TTAAAC 0.871367 150 - +>chr1:4833767-483396 7 TAAATC 0.831007 94 - +>chr1:4662531-466273 5 TAAATC 0.831007 56 + +>chr1:9574131-957433 42 TAATAC 0.826572 163 + +>chr1:7768736-776893 34 TAATAC 0.826572 127 - +>chr1:5072821-507302 8 TAATAC 0.826572 44 - +>chr1:4833767-483396 7 TAATAC 0.826572 19 - +>chr1:3467418-346761 1 TAATAC 0.826572 26 + +>chr1:3467418-346761 1 TAATAC 0.826572 4 + +********** + +19) GTGAAT (ATTCAC) 1.568 + +Matrix: MAT19 GTGAAT +A 0.1593 0.04483 0.06949 0.8955 0.7175 0.1105 +C 0.07553 0.01876 0.09003 0.01629 0.08374 0.06563 +G 0.6152 0.0609 0.7443 0.03019 0.08214 0.05749 +T 0.15 0.8755 0.09618 0.05804 0.1166 0.7664 + +OCCURRENCES: +>chr1:9956513-995671 48 GTGAAT 1 79 - +>chr1:7209035-720923 26 GTGAAT 1 156 - +>chr1:6878570-687877 25 GTGAAT 1 53 + +>chr1:6588721-658892 21 GTGAAT 1 30 - +>chr1:6437830-643803 19 GTGAAT 1 123 + +>chr1:6277750-627795 16 GTGAAT 1 29 - +>chr1:6266967-626716 15 GTGAAT 1 15 + +>chr1:6262414-626261 14 GTGAAT 1 174 + +>chr1:5168670-516887 9 GTGAAT 1 154 - +>chr1:4402453-440265 3 GTGAAT 1 133 - +>chr1:4024990-402519 2 GTGAAT 1 62 + +>chr1:9574131-957433 42 ATGAAT 0.89386 83 + +>chr1:9574131-957433 42 ATGAAT 0.89386 51 - +>chr1:9013525-901372 37 ATGAAT 0.89386 117 - +>chr1:8851545-885174 36 ATGAAT 0.89386 86 - +>chr1:7303541-730374 27 ATGAAT 0.89386 171 - +>chr1:7303541-730374 27 ATGAAT 0.89386 107 + +>chr1:6437830-643803 19 ATGAAT 0.89386 130 - +>chr1:6183701-618390 12 ATGAAT 0.89386 71 + +>chr1:5072821-507302 8 ATGAAT 0.89386 127 + +>chr1:5072821-507302 8 ATGAAT 0.89386 87 - +>chr1:5072821-507302 8 ATGAAT 0.89386 21 + +>chr1:9942670-994287 46 TTGAAT 0.891676 162 + +>chr1:7444756-744495 33 TTGAAT 0.891676 134 + +>chr1:7413722-741392 32 TTGAAT 0.891676 123 - +>chr1:6721868-672206 23 TTGAAT 0.891676 85 + +>chr1:6277750-627795 16 TTGAAT 0.891676 35 - +>chr1:6277750-627795 16 TTGAAT 0.891676 25 + +>chr1:6183701-618390 12 TTGAAT 0.891676 87 + +>chr1:6090845-609104 11 TTGAAT 0.891676 119 - +>chr1:6090845-609104 11 TTGAAT 0.891676 115 + +>chr1:5072821-507302 8 TTGAAT 0.891676 155 - +>chr1:9956513-995671 48 GTGATT 0.860082 176 + +>chr1:9932599-993279 45 GTGATT 0.860082 129 - +>chr1:9568576-956877 41 GTGATT 0.860082 22 + +>chr1:9554705-955490 40 GTGATT 0.860082 126 + +>chr1:6472202-647240 20 GTGATT 0.860082 76 + +>chr1:4833767-483396 7 GTGATT 0.860082 188 - +>chr1:3467418-346761 1 GTGATT 0.860082 51 - +>chr1:9956513-995671 48 GTTAAT 0.849088 112 + +>chr1:9932599-993279 45 GTTAAT 0.849088 44 + +>chr1:9574131-957433 42 GTTAAT 0.849088 106 + +>chr1:9554705-955490 40 GTTAAT 0.849088 49 + +>chr1:9013525-901372 37 GTTAAT 0.849088 173 + +>chr1:6721868-672206 23 GTTAAT 0.849088 172 - +>chr1:4662531-466273 5 GTTAAT 0.849088 159 - +>chr1:9942670-994287 46 GTCAAT 0.847655 91 - +>chr1:7768736-776893 34 GTCAAT 0.847655 76 + +>chr1:7444756-744495 33 GTCAAT 0.847655 10 + +>chr1:7209035-720923 26 GTCAAT 0.847655 123 + +********** + +20) AATACA (TGTATT) 1.564 + +Matrix: MAT20 AATACA +A 0.8268 0.7796 0.1396 0.6772 0.1349 0.8374 +C 0.05488 0.04317 0.05118 0.06514 0.7076 0.03498 +G 0.04696 0.05148 0.06801 0.08048 0.04932 0.01976 +T 0.0714 0.1257 0.7412 0.1771 0.1081 0.1079 + +OCCURRENCES: +>chr1:9962925-996312 49 AATACA 1 83 + +>chr1:9956513-995671 48 AATACA 1 68 + +>chr1:9948240-994844 47 AATACA 1 84 - +>chr1:9942670-994287 46 AATACA 1 156 + +>chr1:9574131-957433 42 AATACA 1 48 - +>chr1:7768736-776893 34 AATACA 1 126 - +>chr1:5072821-507302 8 AATACA 1 43 - +>chr1:4024990-402519 2 AATACA 1 173 - +>chr1:3467418-346761 1 AATACA 1 114 - +>chr1:3467418-346761 1 AATACA 1 27 + +>chr1:3467418-346761 1 AATACA 1 5 + +>chr1:9956513-995671 48 AATTCA 0.883548 78 + +>chr1:9942670-994287 46 AATTCA 0.883548 163 - +>chr1:9013525-901372 37 AATTCA 0.883548 116 + +>chr1:8851545-885174 36 AATTCA 0.883548 85 + +>chr1:7303541-730374 27 AATTCA 0.883548 56 + +>chr1:6878570-687877 25 AATTCA 0.883548 86 - +>chr1:6878570-687877 25 AATTCA 0.883548 54 - +>chr1:6721868-672206 23 AATTCA 0.883548 72 + +>chr1:6437830-643803 19 AATTCA 0.883548 129 + +>chr1:6277750-627795 16 AATTCA 0.883548 28 + +>chr1:6277750-627795 16 AATTCA 0.883548 26 - +>chr1:6266967-626716 15 AATTCA 0.883548 16 - +>chr1:6090845-609104 11 AATTCA 0.883548 118 + +>chr1:6090845-609104 11 AATTCA 0.883548 116 - +>chr1:5072821-507302 8 AATTCA 0.883548 154 + +>chr1:9956513-995671 48 AATATA 0.860396 155 - +>chr1:9942670-994287 46 AATATA 0.860396 119 - +>chr1:9813405-981360 44 AATATA 0.860396 31 + +>chr1:9013525-901372 37 AATATA 0.860396 78 + +>chr1:7768736-776893 34 AATATA 0.860396 2 + +>chr1:7413722-741392 32 AATATA 0.860396 87 - +>chr1:7413722-741392 32 AATATA 0.860396 85 + +>chr1:6472202-647240 20 AATATA 0.860396 2 - +>chr1:6360131-636033 17 AATATA 0.860396 26 + +>chr1:9813405-981360 44 ATTACA 0.84772 147 + +>chr1:6878570-687877 25 ATTACA 0.84772 38 - +>chr1:6719561-671976 22 ATTACA 0.84772 127 + +>chr1:6588721-658892 21 ATTACA 0.84772 140 - +>chr1:6090845-609104 11 ATTACA 0.84772 89 - +>chr1:5805347-580554 10 ATTACA 0.84772 147 + +>chr1:5805347-580554 10 ATTACA 0.84772 125 + +>chr1:4833767-483396 7 ATTACA 0.84772 107 - +>chr1:4774948-477514 6 ATTACA 0.84772 9 - +>chr1:7388801-738900 31 AATACT 0.830134 75 + +>chr1:7209035-720923 26 AATACT 0.830134 10 + +>chr1:6262414-626261 14 AATACT 0.830134 181 - +>chr1:5805347-580554 10 AATACT 0.830134 152 - +>chr1:4833767-483396 7 AATACT 0.830134 18 - +>chr1:4402453-440265 3 AATACT 0.830134 177 - +********** +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/weeder_in.fa Wed Nov 19 07:56:27 2014 -0500 @@ -0,0 +1,100 @@ +>chr1:3467418-3467618 +CCATAATACACCCTTGTATAGCTAATAATACATTACTTTGATCCTGTGCTAATCACATTGATCCTGGGAAAAGGAAACTCACATACCTCTGTGTATGAGGAAGCTCACTGTACTGTATTCTTCAAGGAGTGGAATTTTATCACCACTTCACACCATTTGGAACAGGCAGAGGAATTCTAGGCATTTTAATTGAAAATTCA +>chr1:4024990-4025190 +CTTACTTTGCAGTGATCCCGTTGACACAAACTGTGTAAAGAAATCAGAGTGAGAACAGGTGGTGAATGTGACTCAGAGTGTTAAAACATGTGCACGTAGTTCACTTGCTGTCTTACCTAAAATGTAGCACAGCAGTGTGCCCAAAGGAATTTGGGGGTCTAGTAGATCTACATGTATTTATTGCTTGATTATTTCTCTTG +>chr1:4402453-4402653 +TAAAACATGTCTCTAGTAGTCACAGTGCCCACACTGCTGATGGAGTCACCTTTTCCAGGGAACGGCTGCTGCTCTCGATATCTGGTGGATCTCTGGAAAGACTTGTGCTGATCTCTCTCTGCCCCTTCCTTGATTCACATCTCAAGGGACCGAGAAGGGAGGGAAAACACCAGTCCAGTATTTCCTATCAGTTCAGCGGG +>chr1:4562216-4562416 +TTGCTGAGCCCTGCTGCCCGCACTGCAACGCTGGGCTCTCTCATCACCTTTGCCAACACCTGCCTCTTCTGTGCAGCCTTCCATCAAGTATGACGGGAGCCCTCACCAAGGACGATTCTAGACATATTTAGCAGCAATATTGTTTGATTTCAACTAGACATTGTTCTAAACACAAAGGAGTTCAAGGCTGGGGCTTTGAT +>chr1:4662531-4662731 +GTCAACTTCAGCCTCCTGGGACAACGCACAATGTGAGTAAAACTGCCTTTCATCATAAATCGTTCTTACCCACCCCGAATCCCAGGGAGAACTGACAAACAGCCTGCCTGTGAGAGGAAAGGCAGATGACTCCTTCAACTTCGAGGTAGGAGCAAAACATTAACCCAGTGCAACCACAACAATCTAGGCAGGATGAAGGG +>chr1:4774948-4775148 +CGTAAATATGTAATTAGATCACATATGGCTAAGGAAAACTCCAATTTTTGCTAGGGACACATTCTAATTTCCTAAATTCCTAAAAGATAAGCTCCTGGCAACTTGTCCCCCTCCACAGAATTGTAAAACTGATGTGAAATGATTAATTGCTTCGGTAAAGTTCCTCATGTAGCTGTAAGTTCCCCAGAAATAAAGTAATT +>chr1:4833767-4833967 +TTTTGAGTCAAGATAGAAGTATTATCAAATGTCAAGAGCGGGAGCAGAGTTCTTGGGAGCTTAATCGTTACTTAGATGAGAAACACCTGGATGGATTTAGGGTACTTGTAATGGCCTAGAATAAAATTTGATTGATGGCTAATCTCTGGGGTATAGTCCACTGTTGTACAACCACTAAGTGAGGTTAAATCACTTGCTTC +>chr1:5072821-5073021 +AAAATGAAGTTAGGTAAACAATGAATAACTTTTTCTGAGACATGTATTACTGGTGGGTCTACATATACTTATACCATGGGTCTATGATTCATTTCTCAGTCTGAGTCTTATGCTTAAAATCGGAGTATGAATGCCTGGCCCGCTGAGATGTTTAATTCAAACAAACCCAAACCACTCAAAGGTCATAAACCAAGCTTCTT +>chr1:5168670-5168870 +AAAGTCAGCACTCCTCAGGAACTTGTGAAAATGATTTCCATCTGCTAAAAGAAAACATTTCCCTTGCCTCTGGCAGAATGGACATATGGGATATGCATGGACCCCTCTCAGCAGATCAGAGCCCATGCAGGCCTCCAGGCTCCCACAGTCCCTATTCACAAGTACTTAGTCCTTGCTGCCCTCCCCACCTACTTATCCTC +>chr1:5805347-5805547 +TCTATTTCAGAGAAGGTTAAACAAAAAACAGGTCTGGATATTTTCACAACCACTTGAAGTCAGCAATAATTTCTTAGTCTGACAAGGACCATGAGAAATTCTTTCTATACCTTGTCTGTGTCAAATTACAATGTGAAGTTTCCAGAATTACAGTATTTTATTTCTGAACTTGCAAAGTAATTTGAACAGGTTTTTCCTGG +>chr1:6090845-6091045 +GAGCCATTTTCATATGTACAGTAATGCTTTTCACAAAGTTGCATTATTCTAGATGCGTATGTGTGTGTCTGCAGACTGAACCAATTATTGTAATTATACTCTTGAAAAAACTGCTTGAATTCAATTTAAAGTGGCTTCTGCTAGCTGTCATTTTAACATCCTGCTTATGATTAAAAAGACTTGACACTTGGAATAAACTA +>chr1:6183701-6183901 +GTAATTAGTTATACAAGAACACCTGTCACAGTGGCTGATCCATGACAGGTCTTTGTTGAGTGAACTAACCATGAATCAGTAAGACATTGAATGCAGGGCATGCTGAGTGTCATGGTGTGGACACGTGTCAGACGGTGGCTGACATCCTTAGGAACTGCAATGCCTAAGGGAAGTAGTCATAGGTATAGCTGGTGTTCTCC +>chr1:6205539-6205739 +TAATCCTGTCCCTTCCGTCATGAGTCTTTAAAATGTGATCATTCCATGAACTGGTACATAGGACTGATGCCACGGAACACTCGTGGACTCACTTATTGATAGAGTCCCTGCAAGACATTCAGTTGGGTGCGGATGTTGACAAGTCTATTATGAGCGTTAAAATTTATACATTTTGCTAACCATTTTCTGTAAAGAATTGG +>chr1:6262414-6262614 +CTGAGTCCAAGTTGTGTGTGTGTTGCTGCTGCTGTGCTAGGCACTGGTTACAGCTTCCTTCACACACTGACCGACTGCCAGCCACTCAGAAGTCTGGAGGTTGTCATTCGTTCCTTCAGTAGGTGGCAGTGTTGAGTCAGTTATTGGTTCATCACTTAGATTACTAGTGTTCCGTGAATGAGTATTTTAAAAAGCACACA +>chr1:6266967-6267167 +ATTAAAATCTGATAGTGAATTAtagttacctttctgttggctgttaaaaagccaaggtaacctgtagaagaaagggtggtgtttggcatatggtttcagatgtccatgatgatgatggagcagaggtgacaggtggcagacagctaattggagcagcagcagctgagagttcacatttcaacTCTCTTCCAGGTTCAACC +>chr1:6277750-6277950 +CCCGTCTCCTTTCAGCTTGTCTTATTGAATTCACATTCAAAACTCATTGCCACCATCAAGGAGGAAAAATCTGGAAAACAAAAGTGCTCTGCTATCTATCATTCCAAGACCAGCTTACAGCTGCCAGAAACACCCAGAAAAGCAAGCATCTCAGGAAGCTGAAATCAGTCCTCAGAGCTAATGCCAGGAAAGGGCTCTCT +>chr1:6360131-6360331 +gtttttgtttaaggcagagttttgcaatatagtccagggtaaacttaagctccagtcctcctgcttcagcctccggagtgccaagattactggcatgtgccaccctgactagttatgtgggttctTTCTTAAAGCATTTTTATTAAACATGACAGAGGTTCCTAGCCTGATGCTTTGCTGTACACTTACATAGAAAATGA +>chr1:6396504-6396704 +TAAATAAAGATCAAACAACAAACAAAACTGGGAAGTCAACTGATCACTGCGGGAATTATTTTTAATGTAGGGTCATCTTTCTTCCACACGGTGGCGCCTCGCTGTCAGACTTAAAAACACTGAAACATTGTGGCCACCAACAACTCCAGATGGGCTTCTCTGTGCTAGTAGTTTCATTTTGAAACAGAAAGTCAGTACTT +>chr1:6437830-6438030 +GTTAAATGTCAAGTTCTTGCACCCCCTGAACCTCGGTGCAAGGAAGGCAGGGCTGAAAGGAGGGGATTGTGTCTTCTCCAGGAAATGTCTGAAGTTTTGTTGGATATACAACCGCTGCAGGTGTGAATAATTCATCTTTTGTTTCAGTCCTGAATGCAGTCCAGGAGTCAGCTCTTTACAAAGCCTCATCTCACTTACTG +>chr1:6472202-6472402 +ATATATTATTCTGTTTAGGATGAGTGACAGTAATTTTACTAGTTCTTAGTGTCCACCTTATTTGGTGGCTAGGATGTGATTGTGGGACAGAAATGTACAGTGACAGAGTGCTAAGTAGTTCTATGCAGGTTTGTTTTAGGCTAGTCATTTATGTTGCATTCCCATAGGTCCCAACAGTAATCTCCCTTATAGTTACCATA +>chr1:6588721-6588921 +GGTCAGTTAGTAATGCCTTCCATGTGAGCATTCACCCAAGTGTGGAATCTCCAGGTGGGATCTCTCCTGACAGGACTCATTTTTCTTTATAAAAGTAGTCTTGAAATTTCCCTTTTTGCTCTGATGAGCACTGTCTAGTTGTAATTATCAATAACAATAAATGTAGACACAAAATGTTTGGCATATTGTGATGCAGTGTC +>chr1:6719561-6719761 +AAGTGACAGCGCTCCTGCAAGCTCTCCTGCTGTATCTTGAGGCACTTTTCTTTTCAGTATCGGTGCTGTCTAATTCGTGAGCTTCTGTGAAGCAAGCGCAGGAGGGTACAAGGCCTACAGGGAGGAATTACACCTAATTACTCAAATAATTGCTAGGCTGTTGCCTTCTTAGTCACTGAAAATAATTTCCAGGTCCCTGG +>chr1:6721868-6722068 +ACTGTGTCCTGGAGCACATGGGGTTTGTATCTGAAACCTGTAAAGTACATTGGAACCTTTCTCAAGAGTCTAATTCAGTTAGCCTTGAATGAGAGTCTTTATAGTGACTGCCATTCTTCTGCTATTATACCAACCACAAAGATTTAAAGTTTTGTGTTTCCTTGTCTGTGGATTAACTACAGGTGACAAGGAAAGGCAAG +>chr1:6750817-6751017 +GTTCTAGTCTCCAGGGAGGGAAGCCACAGCAGATGGGGTTAGGGCAGAGGGCTGACGCATCTGACAGCAAACGGCGGTGAGGCTGCAGAGTAGAAACTATGAGCGTCAGGAAGCTCCTTCTCCCAGGAGACCCAACTAACCACTGAAGAAAGTTGAAAATGAGCCCTTTGAGAAACGCTATTAAAAGTAGTAAAAAAATG +>chr1:6878570-6878770 +ATGTTGATCTTGCCTCATGGGCTTTTGTCAGGCTGATTGTAATAATGCATATGTGAATTGTGGAGTAATGACATGTTCTATGTCCTGAATTAATCAAAACTCACATCCTTCCTGTCTTTGTACATGTGGAAACAGATGTTCTGCATTATCAGCAATCTGAACCCTGCCAGCCAGCCAGCCTCCTCCATCAGTCTCTTTTC +>chr1:7209035-7209235 +aatcaggaaaatacttcacagtcttgctttcaggcaatctgatgaactaattttttcaactgatgatgcttgtcccaaataactctagcttgagtgatggtgacaaaaaTCCTCAAACACATGTCAATGTGAACATTTTGCCAAAAGAGGGCAGCATTCACCTACTATAAAGAGCCTGAAAAATTACTGCAGAGGAGGAT +>chr1:7303541-7303741 +AATTGCTTCATTAAAACCAAGTTTTTCTTTGTTCATTAGGCGTTAGCCAGATGGGAATTCAGTGTTTTTAAGCAGACACTCACATGGGGTTTTGTTTCTGACATTGATGAATGACTGCCTGCATCCCAAGATGGAAGTTTCCACCCTGGGCTCTGACTGCAACTTTTGTTATTCATAGCAGAAGTCACACCAGTCCACAG +>chr1:7309743-7309943 +ttcatccttctctcctccccccacccccacctctcctcgtcctttttcacttccagtccccctttcaggttatccagttcatccaatgctgctggagagctacatgtagtgaggtttgggtggagaggagctttgagttttgagctttcaaaagccatcggttgtttccagtgtgttttctctgcttcctatgtgtggtt +>chr1:7386917-7387117 +ttcaatcctttaaaaatatccaatatcttttaaaatccaaagtctttacaattaaaagtctcttaactgtgggatctgagttcgaggccagcctagagtacagagtgagttccagggctacacagagaaacccagcctcaaaaaaaaaacaaaaAACAAAACAAAACAAAAAGCAAACAAAAAGCGAGCGGGAGCATTGC +>chr1:7388025-7388225 +CTTTCAATCAGGAAGCTTCCCCACTTGCTATTTCCAGGAGCAGGGAATAGCGCCAGAAGTCCCGCCCGCCCGGCTGTGACTGACAGCCGCGCCTTCCAATGGAAGGCTGGATTTCTTCCCCCCCCCCTCCCCCACACACACCCACCGTCCATGTCTGACAAGAGGCAGTGTCTGAGGCTTTTTTTCCCTCTATAAAGGAA +>chr1:7388801-7389001 +ggcgtgtgccacccctgcctggcGACCCTAACACTCTTAAACCCCCCAAATAAAACCCTAGCATTTTAAATAAAAATACTCAAGAGCACTAAACACTACTCttaaaaaaaaaaaaaaaaaaaGCAGCAAAACCCTGACCCTCCCCCCCaaaaaaaaaaaaaaaaaaaaaaCTCTAAATGTCTGGGGCGGGGCCGGAGCAC +>chr1:7413722-7413922 +tacaggggcaattttacagagacaaattgcagagagaacaggtaaagacagaataagttagagactgagaacgagccagatcagaatatattgccagagttagtttgagactaagcagaactattcaatgaaaaggagagagaagtcagaccaagtcagtcaacttgagtcaacccagaattgctgagtttaaccagcca +>chr1:7444756-7444956 +gagatatttgtcaatcatccttattttctaagtgacaggagtcatattcagaagtctttacctatgccagtataatgaagTGTTTTCCATTtgtggtggtttaaatgatgatggcctctataggcttctatgtttgaatactcggATTCTTCCAAAACAGTTTCAAAACTGTACCTGGAAACACTATTTAACTAGTCATA +>chr1:7768736-7768936 +aaatatagtggtggattaccttagggtaaggagctacgtctgttttctagctctcaagctacagttaccttacctgtcaatgttatcagaagtctggggaagaataaattatgatctaagagtcatgtattaattaaaatgacagagtttcatctatggtccattacctaaacagttcccaagtccctgcggtaaccatg +>chr1:8797248-8797448 +catttattgatttatgtaggctgaaccaccttcacatctgtgggatgaagtctactctatcatggtggtgctattttttatgtacttaattccatttcttcataacatctttatgtggtttaagtgtcaacataactatgatttcataaaatgagctgggaaaggtttcttctgtttcaattatgtggaacattattgga +>chr1:8851545-8851745 +gaggtgtagtggcttagaacagctctgctttatcagcacccaattttgtaagtcaaaagtcagacatgggatagatggactcttaattcatagactcaaatgttaaaatagaggGGGAATCAATCATCCATGAAGACAAAAGAATACTGGTGGTCTGTTTGGATTTCAGCCAGAGCAATCTGTTCAAATTTGACAGCAGA +>chr1:9013525-9013725 +AGAAACACCACACACACATTTCATTTTTCTGAAGAAAGAAATGATCTGCTGTTTCACTTACATTCCACACACCAGGGAATATAAAGTCTAAATAGCACTCTAGTCTGCAGTTTAAAATTCATGTGTCACAAGAATAAAATAACAAGTCAGAGTCTCCAGTTGTTTAAGTAAGGTTAATCATAAGTAATCTTTATGACTCT +>chr1:9460371-9460571 +ccaaccaccaagtcaccgggttcaaggcctgtgcttgactgggaaccaggctgtctatgcactgcctcacTGCACCGCAGCTGGTATCAATTtgtcttagtcagagttactattgttatgatgaaacaccatgagcaaagagcaaattggggaggaaatggcttatttagcttacatttccactgttcatcatcaaaaga +>chr1:9535189-9535389 +TTTACCCATAGACACTGTGGTGTAGACGCTCCATTCGGAAGCTACAATGCAGGCACTTCCAAGAGTTTGAGCAGCCCGCGTCCTACTGCACTACCTCTGCCCCACAGCATGCTGGGAAACGTAGTCCCAACCAGGTCCTGAGCTGGTTAGCCAACCCTCAGCGCCAGTCGGGCCAACATCCGGTGACGAATCCAAGTCCC +>chr1:9554705-9554905 +ACACCAGCCCTTTGTGTGCCCCAGGGCTCCAGGTGCTGTGTGGGGAATGTTAATGTCAGAAGCCCGGGACTTGGACCCAAGCCCAGGCTTCAGTTCACAGTATGTACTGTGTGCACACATTGGCAGTGATTCCAGGGGCCTGTATCCCTTCTCCCTTTAGGGAAGGGAATTTCGGCCATGCTGAGACATAGCTCTGGCCT +>chr1:9568576-9568776 +TCTAAGTTGAGCCACGTCACTGTGATTTATTGTCACATGCCACAATAAGCATGTTTTGCTCTCTTGTCCCTTCTGCAGGAAGGGCTGTGTTGAGTTGGGCATTGTTTGTATAGTGCTGGACACAGCTGCAGGGTGTTTTTACGTGTTCAATGAAGTAATTCTGTTCTCAGAAAGCTCAGACAAGTACAGTATCAAAACAG +>chr1:9574131-9574331 +AGGGTAAACAGTTCCATTTGTGACCATGAAATCCCCCATCTGTTGAGTGTATTCATGACTTAAGAGTTCTCGGCAGAGACAGATGAATGAACCTGCCTCCCCGATGTTAATGACATCCATCATACAACTGAACATGTGGCTTCATAAATCAAGGGCTAATGGTAATACTGTCTGCTCGTGTCATGAACAAGCTATCCGAT +>chr1:9738400-9738600 +CTTCTCACTTCTCCACTGCCTTAGCCGTTGCCCCGAACGTAACGGCCACCACCCCACCCCGCACTCACACTCACTCACTCTCGCTCTCTCCCTCAGACACAGACATACACGCCCTCACTGAGACTGCGCAGGCGTAGCTTCTGCTCTGCCCTCTGGGAACCAGAGTCTTCCGGCTCCTCTCTCGCGAAGGAGTGCTAGGC +>chr1:9813405-9813605 +TTTAGTTTGCTAGCAGCTGTCAGGAAGTACAATATAGGTCGAAGGACCCCATGTTCAAATCTCTAATGTGAGGACAGCCTGGGCCCCTCAGGAAAGTGAAAGCGGTGTGTCTTTCCCGTTCTCTGGTTTTTTAAAATGAGTTTTAAATTACAAgcatgtgccgctggagtccaggtgttggcttctctggggttggggtt +>chr1:9932599-9932799 +CGGCAAAAACTTAAGTCATCATTGTGCCAGCTATTAAGGCCCTGTTAATCAAACCTCAAAGAAAAAAAAAAACCACACACACAATTTGTACCTTGTTATTGGCAAATGTAGTCCTGGCAGCTTGTTTCAATCACTCTCCATGACAAGTGTTTAGAAAATATTTGTTCAAGCACCTTCAAATGAGCAGGTTTTGTTGCACC +>chr1:9942670-9942870 +AAGAGAAACATAAATGACCCTAGTCAGAAATGCATGACTTTCCTATATAAAAGCCTTTCACCCTCCGGCAAGATTATTTTTAAACATAATATTGACATTGAGATAATCGTAGCACTTATATATTTTGTCCAATTTAAAATTCCAAATCTGTGGAAAATACATTGAATTGTCAGAAAATAGGGCATTGATCTAGATGAAAC +>chr1:9948240-9948440 +CAATTCTCCAGCTCACAATTTAAAAGCTTCTGAGAGTTAAATGCTATGGCATAGTTTATAATGGAGCACTCGAGTGACTGATGTGTATTTCAGTGAACTGGTCTCTCGGAGTTGAAACTACGACTTGTATAAACAGGAATGGAAATCCTTCTGGGTTTATGTAGTTAGGCAAGAATATTCCCAACTGTGTGCCGCTCCCA +>chr1:9956513-9956713 +TCCCACTCATGACCACTTCACATATACGTAAAGAAATCATGCAAAATGGTTTCTCTTGGCTAATCTGAATACAGACCAATTCACTCAACATACAGGCAATCTAATATGACAGTTAATGGAAACATGGGTTTTCTTCTAACATTCTTGGTGGCTGTATATTCCACTAGATTTAACTGTGATTGAAATTTTACTCTATGGGG +>chr1:9962925-9963125 +TATTTTTTAAACCACCAAGGTTCAAGCACCCCCTAGCATATTTGAAGAAAGGAAACGTCTGTCAGAAGTGCTCGCTGTGACAAATACAGAGGGAAGTGACAATGTGCTGCCTTTGCTTACTTAACACCATACATTATTTTTAACTGAACAGTGAAGGCATTTGTTCAGAACCAGCCTTTACATAAAAAATAATTTAGACC +>chr1:9978791-9978991 +AAATCCTTTCACTCTCCTGCCCCTCGATAAATTACTCAAGGCACCAGACACTTTTCTGGACAGTCTCTGTTTGATAAATGATCATGTCATGCTATCGCTTAGAGGCGCACTGCAAAATTCTGAGTGGCCAATTGTCTTTCCCTGCAGGCTGGTGCCCTGCTCCTGCCTGGGTGTTTGTGGCAGGCGGTCTCAGCTTTAAT
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Wed Nov 19 07:56:27 2014 -0500 @@ -0,0 +1,39 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="weeder" version="2.0"> + <install version="1.0"> + <actions> + <action type="download_by_url">http://159.149.160.51/modtools/downloads/weeder2.0.tar.gz</action> + <action type="shell_command"> + g++ weeder2.cpp -o weeder2 -O3 + </action> + <!-- Move weeder2 executable --> + <action type="move_file"> + <source>weeder2</source> + <destination>$INSTALL_DIR/bin</destination> + </action> + <!-- Move data files --> + <action type="move_directory_files"> + <source_directory>FreqFiles</source_directory> + <destination_directory>$INSTALL_DIR/FreqFiles</destination_directory> + </action> + <!-- Set environment variables --> + <action type="set_environment"> + <environment_variable name="WEEDER_DIR" action="set_to">$INSTALL_DIR</environment_variable> + </action> + <action type="set_environment"> + <environment_variable name="WEEDER_FREQFILES_DIR" action="set_to">$INSTALL_DIR/FreqFiles</environment_variable> + </action> + <action type="set_environment"> + <environment_variable action="prepend_to" name="PATH">$INSTALL_DIR/bin</environment_variable> + </action> + </actions> + </install> + <readme>Installs Weeder 2.0 + + See http://159.149.160.51/modtools/downloads/weeder2.html + and http://159.149.160.51/modtools/ + </readme> + </package> +</tool_dependency> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/weeder2_wrapper.sh Wed Nov 19 07:56:27 2014 -0500 @@ -0,0 +1,62 @@ +#!/bin/sh -e +# +# Wrapper script to run weeder2 as a Galaxy tool +# +# Usage: weeder_wrapper.sh FASTA_IN SPECIES_CODE MOTIFS_OUT MATRIX_OUT [ ARGS... ] +# +# ARGS: one or more arguments to supply directly to weeder2 +# +# Process command line +FASTA_IN=$1 +SPECIES_CODE=$2 +MOTIFS_OUT=$3 +MATRIX_OUT=$4 +# +# Other arguments +ARGS="" +while [ ! -z "$5" ] ; do + ARGS="$ARGS $5" + shift +done +# +# Link to input file +ln -s $FASTA_IN +# +# Link to the FreqFiles directory as weeder2 executable +# expects it to be the same directory +freqfiles_dir=$WEEDER_FREQFILES_DIR +if [ -d $freqfiles_dir ] ; then + echo "Linking to FreqFiles directory" + ln -s $freqfiles_dir FreqFiles +else + echo "ERROR FreqFiles directory not found" >&2 + exit 1 +fi +# +# Construct names of input and output files +fasta=`basename $FASTA_IN` +motifs_out=$fasta.w2 +matrix_out=$fasta.matrix.w2 +# +# Construct and run weeder command +# NB weeder logs output to stderr so redirect to stdout +# to prevent the Galaxy tool reporting failure +weeder_cmd="weeder2 -f $fasta -O $SPECIES_CODE $ARGS" +echo "Running $weeder_cmd" +$weeder_cmd 2>&1 +status=$? +if [ $status -ne 0 ] ; then + echo weeder2 command finished with nonzero exit code $status >&2 + echo Command was: $weeder_cmd + exit $status +fi +# +# Move outputs to final destinations +if [ -e $motifs_out ] ; then + /bin/mv $motifs_out $MOTIFS_OUT +fi +if [ -e $matrix_out ] ; then + /bin/mv $matrix_out $MATRIX_OUT +fi +# +# Done
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/weeder2_wrapper.xml Wed Nov 19 07:56:27 2014 -0500 @@ -0,0 +1,181 @@ +<tool id="motiffinding_weeder2" name="Weeder2" version="2.0.0"> + <description>Motif discovery in sequences from coregulated genes of a single species</description> + <command interpreter="bash">weeder2_wrapper.sh + $sequence_file $species_code + $output_motifs_file $output_matrix_file + $strands + #if $chipseq.use_chipseq + -chipseq -top $chipseq.top + #end if + #if str( $advanced_options.advanced_options_selector ) == "on" + -maxm $advanced_options.n_motifs_report + -b $advanced_options.n_motifs_build + -sim $advanced_options.sim_threshold + -em $advanced_options.em_cycles + #end if +</command> + <requirements> + <requirement type="package" version="2.0">weeder</requirement> + </requirements> + <inputs> + <param name="sequence_file" type="data" format="fasta" label="Input sequence" /> + <param name="species_code" type="select" label="Species to use for background comparison"> + <!-- Hard code options for now + See weeder's "organisms.txt" for full list + --> + <option value="HS">Homo sapiens (HS)</option> + <option value="MM">Mus musculus (MM)</option> + <option value="DM">Drosophila melanogaster (DM)</option> + <option value="SC">Saccharomyces cerevisiae (SC)</option> + <option value="AT">Arabidopsis thaliana (AT)</option> + </param> + <param name="strands" label="Use both strands of sequence" type="boolean" + truevalue="" falsevalue="-ss" checked="True" + help="If not checked then use -ss option" /> + <conditional name="chipseq"> + <param name="use_chipseq" type="boolean" + label="Use the ChIP-seq heuristic" + help="Speeds up the computation (-chipseq)" + truevalue="yes" falsevalue="no" checked="on" /> + <when value="yes"> + <param name="top" type="integer" value="100" + label="Number of top input sequences with oligos to scan for" + help="Increase this value to improve the chance of finding motifs enriched only in a subset of your input sequences (-top)" /> + </when> + <when value="no"></when> + </conditional> + <conditional name="advanced_options"> + <param name="advanced_options_selector" type="select" + label="Display advanced options"> + <option value="off">Hide</option> + <option value="on">Display</option> + </param> + <when value="on"> + <param name="n_motifs_report" type="integer" value="25" + label="Number of discovered motifs to report" help="(-maxm)" /> + <param name="n_motifs_build" type="integer" value="50" + label="Number of top scoring motifs to build occurrences matrix profiles and outputs for" + help="(-b)" /> + <param name="sim_threshold" type="float" min="0.0" max="1.0" value="0.95" + label="Similarity threshold for the redundancy filter" + help="Remove motifs that are too similar, with lower values imposing a stricter filter. Must be between 0.0 and 1.0 (-sim)" /> + <param name="em_cycles" type="integer" min="0" max="100" value="1" + label="Number of expectation maximization (EM) cycles to perform" + help="Number of cycles must be between 0 and 100 (-em)" /> + </when> + <when value="off"> + </when> + </conditional> + </inputs> + <outputs> + <data name="output_motifs_file" format="txt" label="Weeder2 on ${on_string} (motifs)" /> + <data name="output_matrix_file" format="txt" label="Weeder2 on ${on_string} (matrix)" /> + </outputs> + <tests> + <test> + <param name="sequence_file" value="weeder_in.fa" ftype="fasta" /> + <param name="species_code" value="MM" /> + <output name="output_motifs_file" file="weeder2_motifs.out" lines_diff="2" /> + <output name="output_matrix_file" file="weeder2_matrix.out" /> + </test> + </tests> + <help> + +.. class:: infomark + +**What it does** + +Weeder2 is a program for finding novel motifs (transcription factor binding sites) +conserved in a set of regulatory regions of related genes. + +------------- + +.. class:: infomark + +**Usage advice** + +Guidelines on how to use this tool can be seen in Zambelli et al. 2014 (see link +below), but the following is a brief guide. Please note that **motifs** are a model +or matrix that describes a set of sequences that may differ in the base composition. +**Oligos** are specific sequences found within the input sequences or genomic +background. + +**Input sequence** (in FASTA format) should be short (100-200bp) and be reasonably +expected to contain an enriched motif(s). This is not generally an issue with +transcription factor ChIP-seq derived sequences centred on the summit of binding +regions that are expected to contain a dominant motif and possibly secondary motifs. + +There is **no need to mask sequence for repetitive sequence** as factors may +legitimately bind repetitive sequence. + +**Use both strands of sequence** by default, unless there is a specific reason not +to do so. + +**Species to use for background comparison** should match the genome used to +generate the **input sequence**. The background genome motif frequencies are +generated from within the promoter regions of annotated genes and are shown to be a +good background for both promoter and other regulatory regions. + +**Use the ChIP-seq heuristic** (-chipseq) when there are a large number of +input sequences (hundreds or thousands). When -chipseq is used Weeder will use +only oligos from the first 100 sequences to build motifs with which it scans +all of the input sequences. This speeds up the computational time without too much +risk of losing important motifs. Even if not strictly necessary it's advisable to +order input sequences by their significance, e.g. fold enrichment or Pvalue. For +large data sets (-top) should be set to a number equating at least 10 to 20% of +input sequences (as recommended by the authors). + +**Number of discovered motifs to report** (-maxm) limits the number of reported +motifs even if there are more than -maxm. **Number of top scoring motifs to build +occurrences matrix profiles and outputs for** (-b) changes the number of top +scoring motifs of length 6, 8 and 10 for which the occurrence matrix is built. +Increasing -b may result in a larger number of reported motifs, but with potentially +more of low significance and increases the computational time. If increasing -b does +not result in more motifs in your results it means that the additional motifs are +filtered out by the redundancy filter or that the maximum number of reported motifs +set by -maxm has been reached. + +**Similarity threshold for the redundancy filter** (-sim) default setting is +recommended. + +**Number of expectation maximization (EM) cycles to perform** (-em) default is +recommended. The option is included to help "clean up" the resulting motif matrices. +In this version the number of EM steps can be increased, which can be useful for +motifs with highly redundant stretches of sequence. + +------------- + +.. class:: infomark + +**A note on the results** + +The resulting matrices are the result of scanning (by default both strands) for +oligos of length 6, 8 and 8, allowing 1, 2 and 3 substitutions respectively. The +matrices within the matrix.w2 file can be input into other tools. The recommended +next step is to use **STAMP** (http://www.benoslab.pitt.edu/stamp/), which displays +the motifs as logos and identifies matches with libraries of known DNA binding +motifs, such as TRANSFAC or JASPAR. + +------------- + +.. class:: infomark + +**Credits** + +This Galaxy tool has been developed by Peter Briggs and Ian Donaldson within the +Bioinformatics Core Facility at the University of Manchester, and runs the Weeder2 +motif discovery package: + + * Zambelli, F., Pesole, G. and Pavesi, G. 2014. Using Weeder, Pscan, and PscanChIP + for the Discovery of Enriched Transcription Factor Binding Site Motifs in + Nucleotide Sequences. Current Protocols in Bioinformatics. 47:2.11:2.11.1–2.11.31. + * http://onlinelibrary.wiley.com/doi/10.1002/0471250953.bi0211s47/full + +This tool is compatible with Weeder 2.0: + + * http://159.149.160.51/modtools/downloads/weeder2.html + +Please kindly acknowledge both this Galaxy tool, the Weeder package and the utility +scripts if you use it in your work. + </help> +</tool>