changeset 0:1625927fc16f draft

"Release v0.8.4"
author althonos
date Sun, 21 Nov 2021 16:53:12 +0000
parents
children 0699939e6dd6
files README.rst gecco.xml test-data/BGC0001866.1_cluster_1.gbk test-data/BGC0001866.fna test-data/clusters.tsv test-data/features.tsv
diffstat 6 files changed, 1928 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.rst	Sun Nov 21 16:53:12 2021 +0000
@@ -0,0 +1,136 @@
+Hi, I’m GECCO!
+==============
+
+🦎 ️Overview
+---------------
+
+GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast
+and scalable method for identifying putative novel Biosynthetic Gene
+Clusters (BGCs) in genomic and metagenomic data using Conditional Random
+Fields (CRFs).
+
+|GitLabCI| |License| |Coverage| |Docs| |Source| |Mirror| |Changelog|
+|Issues| |Preprint| |PyPI| |Bioconda| |Versions| |Wheel|
+
+🔧 Installing GECCO
+-------------------
+
+GECCO is implemented in `Python <https://www.python.org/>`__, and
+supports `all versions <https://endoflife.date/python>`__ from Python
+3.6. It requires additional libraries that can be installed directly
+from `PyPI <https://pypi.org>`__, the Python Package Index.
+
+Use ```pip`` <https://pip.pypa.io/en/stable/>`__ to install GECCO on
+your machine:
+
+.. code:: console
+
+   $ pip install gecco-tool
+
+If you’d rather use `Conda <https://conda.io>`__, a package is available
+in the ```bioconda`` <https://bioconda.github.io/>`__ channel. You can
+install with:
+
+.. code:: console
+
+   $ conda install -c bioconda gecco
+
+This will install GECCO, its dependencies, and the data needed to run
+predictions. This requires around 100MB of data to be downloaded, so it
+could take some time depending on your Internet connection. Once done,
+you will have a ``gecco`` command available in your $PATH.
+
+*Note that GECCO uses*\ `HMMER3 <http://hmmer.org/>`__\ *, which can
+only run on PowerPC and recent x86-64 machines running a POSIX operating
+system. Therefore, Linux and OSX are supported platforms, but GECCO will
+not be able to run on Windows.*
+
+🧬 Running GECCO
+-----------------
+
+Once ``gecco`` is installed, you can run it from the terminal by giving
+it a FASTA or GenBank file with the genomic sequence you want to
+analyze, as well as an output directory:
+
+.. code:: console
+
+   $ gecco run --genome some_genome.fna -o some_output_dir
+
+Additional parameters of interest are:
+
+-  ``--jobs``, which controls the number of threads that will be spawned
+   by GECCO whenever a step can be parallelized. The default, *0*, will
+   autodetect the number of CPUs on the machine using
+   ```os.cpu_count`` <https://docs.python.org/3/library/os.html#os.cpu_count>`__.
+-  ``--cds``, controlling the minimum number of consecutive genes a BGC
+   region must have to be detected by GECCO (default is 3).
+-  ``--threshold``, controlling the minimum probability for a gene to be
+   considered part of a BGC region. Using a lower number will increase
+   the number (and possibly length) of predictions, but reduce accuracy.
+
+🔖 Reference
+-------------
+
+GECCO can be cited using the following preprint:
+
+   **Accurate de novo identification of biosynthetic gene clusters with
+   GECCO**. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby
+   Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller.
+   bioRxiv 2021.05.03.442509;
+   `doi:10.1101/2021.05.03.442509 <https://doi.org/10.1101/2021.05.03.442509>`__
+
+💭 Feedback
+------------
+
+⚠️ Issue Tracker
+~~~~~~~~~~~~~~~~
+
+Found a bug ? Have an enhancement request ? Head over to the `GitHub
+issue tracker <https://github.com/zellerlab/GECCO/issues>`__ if you need
+to report or ask something. If you are filing in on a bug, please
+include as much information as you can about the issue, and try to
+recreate the same bug in a simple, easily reproducible situation.
+
+🏗️ Contributing
+~~~~~~~~~~~~~~~~
+
+Contributions are more than welcome! See
+```CONTRIBUTING.md`` <https://github.com/althonos/pyhmmer/blob/master/CONTRIBUTING.md>`__
+for more details.
+
+⚖️ License
+----------
+
+This software is provided under the `GNU General Public License v3.0 or
+later <https://choosealicense.com/licenses/gpl-3.0/>`__. GECCO is
+developped by the `Zeller
+Team <https://www.embl.de/research/units/scb/zeller/index.html>`__ at
+the `European Molecular Biology Laboratory <https://www.embl.de/>`__ in
+Heidelberg.
+
+.. |GitLabCI| image:: https://img.shields.io/gitlab/pipeline/grp-zeller/GECCO/master?gitlab_url=https%3A%2F%2Fgit.embl.de&style=flat-square&maxAge=600
+   :target: https://git.embl.de/grp-zeller/GECCO/-/pipelines/
+.. |License| image:: https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square&maxAge=2678400
+   :target: https://choosealicense.com/licenses/gpl-3.0/
+.. |Coverage| image:: https://img.shields.io/codecov/c/gh/zellerlab/GECCO?style=flat-square&maxAge=600
+   :target: https://codecov.io/gh/zellerlab/GECCO/
+.. |Docs| image:: https://img.shields.io/badge/docs-gecco.embl.de-green.svg?maxAge=2678400&style=flat-square
+   :target: https://gecco.embl.de
+.. |Source| image:: https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square
+   :target: https://github.com/zellerlab/GECCO/
+.. |Mirror| image:: https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400
+   :target: https://git.embl.de/grp-zeller/GECCO/
+.. |Changelog| image:: https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square
+   :target: https://github.com/zellerlab/GECCO/blob/master/CHANGELOG.md
+.. |Issues| image:: https://img.shields.io/github/issues/zellerlab/GECCO.svg?style=flat-square&maxAge=600
+   :target: https://github.com/zellerlab/GECCO/issues
+.. |Preprint| image:: https://img.shields.io/badge/preprint-bioRxiv-darkblue?style=flat-square&maxAge=2678400
+   :target: https://www.biorxiv.org/content/10.1101/2021.05.03.442509v1
+.. |PyPI| image:: https://img.shields.io/pypi/v/gecco-tool.svg?style=flat-square&maxAge=3600
+   :target: https://pypi.python.org/pypi/gecco-tool
+.. |Bioconda| image:: https://img.shields.io/conda/vn/bioconda/gecco?style=flat-square&maxAge=3600
+   :target: https://anaconda.org/bioconda/gecco
+.. |Versions| image:: https://img.shields.io/pypi/pyversions/gecco-tool.svg?style=flat-square&maxAge=3600
+   :target: https://pypi.org/project/gecco-tool/#files
+.. |Wheel| image:: https://img.shields.io/pypi/wheel/gecco-tool?style=flat-square&maxAge=3600
+   :target: https://pypi.org/project/gecco-tool/#files
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/gecco.xml	Sun Nov 21 16:53:12 2021 +0000
@@ -0,0 +1,86 @@
+<?xml version='1.0' encoding='utf-8'?>
+<tool id="gecco" name="GECCO" version="0.8.4" python_template_version="3.5">
+    <description>GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).</description>
+    <requirements>
+        <requirement type="package" version="0.8.4">gecco</requirement>
+    </requirements>
+    <version_command>gecco --version</version_command>
+    <command detect_errors="aggressive"><![CDATA[
+
+        #if str($input.ext) == 'genbank':
+            #set $file_extension = 'gbk'
+        #else:
+            #set $file_extension = $input.ext
+        #end if
+        ln -s '$input' input_tempfile.$file_extension &&
+
+        gecco -vv run -g input_tempfile.$file_extension &&
+        mv input_tempfile.features.tsv $features &&
+        mv input_tempfile.clusters.tsv $clusters
+
+    ]]></command>
+    <inputs>
+        <param name="input" type="data" format="genbank,fasta" label="Sequence file in GenBank or FASTA format"/>
+    </inputs>
+    <outputs>
+        <collection name="records" type="list" label="${tool.name} detected Biosynthetic Gene Clusters on ${on_string} (GenBank)">
+            <discover_datasets pattern="(?P&lt;designation&gt;.*)\.gbk" ext="genbank" visible="false" />
+        </collection>
+        <data name="features" format="tabular" label="${tool.name} summary of detected features on ${on_string} (TSV)"/>
+        <data name="clusters" format="tabular" label="${tool.name} summary of detected BGCs on ${on_string} (TSV)"/>
+    </outputs>
+    <tests>
+        <test>
+            <param name="input" value="BGC0001866.fna"/>
+            <output name="features" file="features.tsv"/>
+            <output name="clusters" file="clusters.tsv"/>
+            <output_collection name="records" type="list">
+                <element name="BGC0001866.1_cluster_1" file="BGC0001866.1_cluster_1.gbk" ftype="genbank" lines_diff="2"/>
+            </output_collection>
+        </test>
+    </tests>
+    <help>
+<![CDATA[
+
+**Overview**
+
+GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
+It is developed in the Zeller group and is part of the suite of computational microbiome analysis tools hosted at EMBL.
+
+**Input**
+
+GECCO works with DNA sequences, and loads them using Biopython, allowing it to support a large variety of formats, including the common FASTA and GenBank files.
+
+**Output**
+
+GECCO will create the following files once done (using the same prefix as the input file):
+
+- features.tsv: The features file, containing the identified proteins and domains in the input sequences.
+- clusters.tsv: If any were found, a clusters file, containing the coordinates of the predicted clusters, along their putative biosynthetic type.
+- {sequence}_cluster_{N}.gbk: If any BGCs were found, a GenBank file per cluster, containing the cluster sequence annotated with its member proteins and domains.
+
+**Contact**
+
+If you have any question about GECCO, if you run into any issue, or if you would like to make a feature request, please create an issue in the GitHub repository. 
+You can also directly contact Martin Larralde via email. If you want to contribute to GECCO, please have a look at the contribution guide first, and feel free to 
+open a pull request on the GitHub repository.
+
+]]>
+    </help>
+    <citations>
+        <citation type="bibtex">
+@article {Carroll2021.05.03.442509,
+	author = {Carroll, Laura M. and Larralde, Martin and Fleck, Jonas Simon and Ponnudurai, Ruby and Milanese, Alessio and Cappio, Elisa and Zeller, Georg},
+	title = {Accurate de novo identification of biosynthetic gene clusters with GECCO},
+	elocation-id = {2021.05.03.442509},
+	year = {2021},
+	doi = {10.1101/2021.05.03.442509},
+	publisher = {Cold Spring Harbor Laboratory},
+	abstract = {Biosynthetic gene clusters (BGCs) are enticing targets for (meta)genomic mining efforts, as they may encode novel, specialized metabolites with potential uses in medicine and biotechnology. Here, we describe GECCO (GEne Cluster prediction with COnditional random fields; https://gecco.embl.de), a high-precision, scalable method for identifying novel BGCs in (meta)genomic data using conditional random fields (CRFs). Based on an extensive evaluation of de novo BGC prediction, we found GECCO to be more accurate and over 3x faster than a state-of-the-art deep learning approach. When applied to over 12,000 genomes, GECCO identified nearly twice as many BGCs compared to a rule-based approach, while achieving higher accuracy than other machine learning approaches. Introspection of the GECCO CRF revealed that its predictions rely on protein domains with both known and novel associations to secondary metabolism. The method developed here represents a scalable, interpretable machine learning approach, which can identify BGCs de novo with high precision.Competing Interest StatementThe authors have declared no competing interest.},
+	URL = {https://www.biorxiv.org/content/early/2021/05/04/2021.05.03.442509},
+	eprint = {https://www.biorxiv.org/content/early/2021/05/04/2021.05.03.442509.full.pdf},
+	journal = {bioRxiv}
+}
+        </citation>
+    </citations>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/BGC0001866.1_cluster_1.gbk	Sun Nov 21 16:53:12 2021 +0000
@@ -0,0 +1,1110 @@
+LOCUS       BGC0001866.1_cluster_1 32633 bp    DNA     linear   UNK 21-NOV-2021
+DEFINITION  BGC0001866.1 Byssochlamys spectabilis strain CBS 101075 chromosome
+            Unknown C8Q69scaffold_14, whole genome shotgun sequence.
+ACCESSION   BGC0001866.1_cluster_1
+VERSION     BGC0001866.1_cluster_1
+KEYWORDS    .
+SOURCE      .
+  ORGANISM  .
+            .
+REFERENCE   1
+  AUTHORS   Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby
+            Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller
+  TITLE     Accurate de novo identification of biosynthetic gene clusters with
+            GECCO
+  JOURNAL   bioRxiv (2021.05.03.442509)
+  REMARK    doi:10.1101/2021.05.03.442509
+COMMENT     ##GECCO-Data-START##
+            version                :: GECCO v0.8.4
+            creation_date          :: 2021-11-21T16:33:58.470847
+            biosyn_class           :: Polyketide
+            alkaloid_probability   :: 0.0
+            polyketide_probability :: 0.98
+            ripp_probability       :: 0.0
+            saccharide_probability :: 0.0
+            terpene_probability    :: 0.0
+            nrp_probability        :: 0.14
+            other_probability      :: 0.0
+            ##GECCO-Data-END##
+FEATURES             Location/Qualifiers
+     CDS             complement(1..1143)
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_1"
+                     /translation="MWIYEVDGHYIEPRRADTFLIWAGERYSAMIRLDKKPMDYSIRVP
+                     DGGYSQMIAAFGILRYKNGDPNARQKPDRFGVTTISKPYFDYNAWPMRDAVFLDKLDLP
+                     PWPRKVPAAHGDDMHVLYLGKANSTWEFTLSGKKKYPPDRSAYEPLLYNVNSEQAHDDD
+                     LIIRTQNGTWQDIVLQVGHSPLWPVDFPHAVHKHANKYWRIGGGQGLWNYSSVEEAMAD
+                     QPESFNMVNPPYRDTFLTEFTGAMWVVLRYQVTSPGAWLLHCHFEMHLDNGMAMAILDG
+                     VDKWPHVPPEYTQGFHGFREHELPGPAGFWGLVSKILRPESLVWAGGAAVVLLSLFIGG
+                     LWRLWQRRMQGTYYVLSQEDERDRFSMDKEAWKSEETKRM*"
+     misc_feature    1..189
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00394"
+                     /db_xref="InterPro:IPR001117"
+                     /note="e-value: 2.1941888078432915e-08"
+                     /note="p-value: 8.178117062405111e-12"
+                     /function="Multicopper oxidase"
+                     /standard_name="PF00394"
+     misc_feature    448..843
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF07731"
+                     /db_xref="InterPro:IPR011706"
+                     /note="e-value: 3.9374169295176556e-23"
+                     /note="p-value: 1.467542649838858e-26"
+                     /function="Multicopper oxidase"
+                     /standard_name="PF07731"
+     CDS             1179..1670
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_2"
+                     /translation="MSSLRSSSHSPSGLPGQPRLPLLDRSREHSLPGDRAGWRTRSRLR
+                     ATDLLSMVRMGSTYTIIRDMNYTDDESPGRSPFVCDSVIRPALVHERDLLVNKPLMART
+                     IDAPFAVEKNTIDATDFISQSTRNVLISVHWNHTRSAVGCLHLLLYTGSSCSSPSQKAS
+                     *"
+     CDS             complement(2167..2376)
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_3"
+                     /translation="MPAYLLLLACNVLLVLGAHVQRELVLTWEEGAPNGQSRQMIKTNG
+                     QFPSPTLIFDEGDDVEVGGISFAN*"
+     CDS             2559..3032
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_4"
+                     /translation="MLFNSEVGVEEHVVLWSFQETTSITMAEEIKLTPLETFAQAISAS
+                     AKTIATYCRDSGHPQLSDDNSSGLTGDVLPPSAPQAVTAARQTILEASYRLQQLVTEPS
+                     QYLPRLTVYVSVEQSPMKDQTNDRKAPAPGCLTLAVPFQNPGAHPRARHQDIL*"
+     CDS             3007..3576
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_5"
+                     /translation="MQGTRTYYELATEAKVPLHQLQSIARMAITGSFLREPEPNIVAHS
+                     RTSAHFVENPSLRDWTLFLAEDTAPMAMKLVEATEKWGDTRSKTETAFNLALGTDLAFF
+                     KYLSSNPQFTQKFSGYMKNVTASEGTSIKHLVNGFDWASLGNAIVVDVRLQSSFTPYRS
+                     HTDVIFYRLAVLLVMQALLSRNRSPI*"
+     CDS             3600..4043
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_6"
+                     /translation="MVTSTSKDNREKTPLPETVASRISFESHDFFKPQPVQNADVYLLR
+                     MILHDWSFKEAGEILANLVPSVKQGARILIMDTVLPRHGTVPVTEEALLRVRDMTMMET
+                     FNSHEREIDEWKDLIQGVHTGLRVQQVIQPAGSSMAIIEVVRG*"
+     misc_feature    3648..3962
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00891"
+                     /db_xref="InterPro:IPR001077"
+                     /note="e-value: 4.743887678074703e-16"
+                     /note="p-value: 1.7681280946979883e-19"
+                     /function="O-methyltransferase domain"
+                     /standard_name="PF00891"
+     CDS             4337..4792
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_7"
+                     /translation="MTQIVFGIAPTLLKTFSHLTALDLWRPSAPYVFDPVTSSTYLGTI
+                     ADGVEEFLGIFYGQDTGGSNRFAPPKPYIPSRHSFINASTAGAACPQPYVPLPADPYTV
+                     LTNVSEDCLSLRIARPENTKSTAKLPVMVWLYGGAYNRLPTDLQWET*"
+     misc_feature    4478..4756
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00135"
+                     /db_xref="InterPro:IPR002018"
+                     /note="e-value: 4.674605664377319e-21"
+                     /note="p-value: 1.7423055029360116e-24"
+                     /function="Carboxylesterase family"
+                     /standard_name="PF00135"
+     CDS             5038..5466
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_8"
+                     /translation="MQDQRLGIEWIKNHISAFGGDPDNITLFGEDEGATYIALHILSNH
+                     EVPFHRAILQSGAAITHHDVNGNRSARNFAAVAARCNCLSDGDRQVDSQDTVDCLRRVP
+                     MEDLVNATFEVAHSVDPVNGFRALYVLLHFPSHKCKQD*"
+     misc_feature    5041..5379
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00135"
+                     /db_xref="InterPro:IPR002018"
+                     /note="e-value: 3.9706994470948554e-30"
+                     /note="p-value: 1.4799476135277136e-33"
+                     /function="Carboxylesterase family"
+                     /standard_name="PF00135"
+     CDS             5477..6253
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_9"
+                     /translation="MPAVDGYMIPDEPSNLLSRGQVPANISILAGWTRDESSMSVPTSI
+                     RTAADAASFISTQFPLLNASTIHHFLTSLYPESDFTTNSPSSPEKVTPAWRATSALHRD
+                     LTLTCPTIFQAWSLRLSSNCTTPVYLYELRQSPFATALNNSGVGYLGIVHFSDVPYVFN
+                     ELERTYYITDPEENKLAQRMSASWTAFASGAFPLCERSERSLGRWEEAYGGDRVCRDRM
+                     PEHVRVKGIGDNGDQDDGDEIGKLMARCGFINRLEY*"
+     misc_feature    5480..6103
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00135"
+                     /db_xref="InterPro:IPR002018"
+                     /note="e-value: 1.4185801852307574e-15"
+                     /note="p-value: 5.287291037013632e-19"
+                     /function="Carboxylesterase family"
+                     /standard_name="PF00135"
+     CDS             7412..8683
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_10"
+                     /translation="MTGARFDESDHKWTVEGINGSHGTIRIRCRWYILALGFASKPYIP
+                     DFEGLNRFQGPCFHSSAWPQEGIDLKGRRVAVVGTGASAVQIIQTISKEVGHLTVYQRT
+                     PCTAMPMRQQSLTPEYQDNFKASGEMAATMRRTKYERFGGQDVQFVSRRWHEDTPEQRR
+                     AVFEQAWQKGGFHLLLSTYFEVFDDVEVNHAAWRFWAEKSRERIHNTKYKDILAPLEAV
+                     HAFGGKRTPFEQDYFEAFNRRNVDLIDMKASPILSFAEKGIITQNEGLQEFDVIILATG
+                     FDTNTGALTSIHIQDTDGILLKDRWSYDGVMTTFGMSTSKFPNMFFFYGPQAPTAFSNG
+                     PSCIELQGEFVEELILDMIGKGVTRVDTTSEAEKRWKESTLSLWNQFVFSSTKGFYTGE
+                     NIPGKKAEPLNWYVLVLGLGVSKR*"
+     misc_feature    7448..7783
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF13434"
+                     /db_xref="InterPro:IPR025700"
+                     /note="e-value: 5.777178703900199e-08"
+                     /note="p-value: 2.153253337271785e-11"
+                     /function="L-lysine 6-monooxygenase (NADPH-requiring)"
+                     /standard_name="PF13434"
+     misc_feature    7517..7717
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00743"
+                     /db_xref="InterPro:IPR020946"
+                     /note="e-value: 5.089108077410868e-07"
+                     /note="p-value: 1.8967976434628658e-10"
+                     /function="Flavin-binding monooxygenase-like"
+                     /standard_name="PF00743"
+     CDS             9454..10038
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_11"
+                     /translation="MCRGRLTRTVDERGIVSTESAHAAQRHHLASHVLDARFAGSIARL
+                     GSLCLFLALLVAFVQELQKSESHHQRSGGVGLEDRRVVREGLVKPVVTHFGYVPFRRRS
+                     CGMGSQVRCGDSSVIHQEVDIPILGGDVVDDALKVSMRGNAALDRVDVAMGLSQIVSTI
+                     VIALWTWFVLNQPTAWLLLVRARAVARVCRL*"
+     CDS             10763..11191
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_12"
+                     /translation="MRAGQLVPLVSTPTPSCLALQIVFCCCSTFLSDPLVLQNHRKMAD
+                     EQKTPLESGQQPAVAQHTSTAELQTEKPGQMNGNGTADKPGPPGGKPFGPGMGPPIQYP
+                     TGFKLYSIMTGLYLASFLTALVGWRSITDLTDSETYIG*"
+     CDS             11204..12316
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_13"
+                     /translation="MLVVAIPQITDHFNSIDDIGWYGSAYLLTFCAFQLLFGKIYSFYN
+                     PKWVFLSAVLIFEIGSAICGAAPNSTALIIGRAIAGLGSSGIFGGSVIITFFTVPLHQR
+                     PIYTGIAGVIFALASSVGPLIGGGFTNNVSWRWCFYINLPVGALTVVTILLFLNLPPAR
+                     KAGTPLREQLLQMDPLGNLCLIPGIICLLLAIQWGGSTYAWSNGRIVALLVLAGVLLIA
+                     FVGVQLWLQDKGTIPPRVMKQRSIAAGMAFTICVTAGFMSFNYYLPIWFQAIKNASSFH
+                     SGVMMLPTVISSGVASLACGFIIHRVGYYTPFMIGGSVLMAIGAGLLTTFTPTTEHPKW
+                     IGYQVLWALGCGMSTFQPPFFARCIFVGGY*"
+     misc_feature    11204..12289
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF07690"
+                     /db_xref="InterPro:IPR011701"
+                     /note="e-value: 5.839871260376694e-37"
+                     /note="p-value: 2.1766199255969786e-40"
+                     /function="Major Facilitator Superfamily"
+                     /standard_name="PF07690"
+     misc_feature    11252..11935
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF06609"
+                     /db_xref="InterPro:IPR010573"
+                     /note="e-value: 9.543170598318239e-09"
+                     /note="p-value: 3.55690294383833e-12"
+                     /function="Fungal trichothecene efflux pump (TRI12)"
+                     /standard_name="PF06609"
+     CDS             12335..12781
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_14"
+                     /translation="MQQASLAAQTVLPKPDAPIGISLIFFSQSLGGSVFLAVDDSIYSN
+                     RLAAKLGSIPNLPQSALTNTGATNIRNLVAPQYLGRLLGGYNDALMDVFRVAVASSCAC
+                     VVAAAFMEWKNVRAAKAAGPGGPGGPGGPGGPGGPEGLRGGNKV*"
+     CDS             14574..15566
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_15"
+                     /translation="MTFEEMLSRPSPPPFAGPSHNSNRPTNMASTNQDQYYHDKGKHGE
+                     TMDEMLQTLVPDSVQFIEFPNTAREDQKQHPELRSEEEYSDYRSKSLFEEGLARIAPDC
+                     AGGIMDVLYGEEALVQMPNLPSSTHEGSSNTHVTSSHNCTRAVMENLAKLYQVCAPAGV
+                     ENGSHPTTDQVLKANSDAMKDAADLLACPCAKDFCFPIILGITACRVLAWYQVVIDMYD
+                     PEIPMATMPTAREDIKHCPIAFGAYQLDEEVSQAMTSQFVLRNLRAMTRFVKTYVENFC
+                     SDINKNRPGSCSLIYRSLGTFMQTRLGNTIEQLEDRLAAFDGEYTKNIG*"
+     misc_feature    14988..15245
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF08493"
+                     /db_xref="InterPro:IPR013700"
+                     /note="e-value: 2.6165794251055913e-17"
+                     /note="p-value: 9.752439154325723e-21"
+                     /function="Aflatoxin regulatory protein"
+                     /standard_name="PF08493"
+     CDS             16827..18797
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_16"
+                     /translation="MAICGIAVRLPGGISNDAQLWDFLLAKRDARSQVPGSRYNISGYH
+                     SDSGKHGTSKSKYGYFLDESVDLGTLDTSFFSFTKLELEYIDPCQRQLLEVVRECFESA
+                     GEVNYRGKDIGCFVGSFGDDWTENLTHDEQTSAKYPLMVGGDFATPNRVSYEYNLHGPS
+                     VSIRTACSSSLVALHSACLSIQNGDCSAAIVAGFNLILTPTMTMIMSSKGVLSADGSSK
+                     SFDADADGYGRGEAVNAVYIKPLHDAIRDGNPIRAVIRGTATNSDGKSAGFTVPSADAQ
+                     EDVIRKAYKAAGISDLSQTAFVECHGTGTTVGDPIEVAAIANTFGGDMYIGSVKPNVGH
+                     SEGASGLTSLIKAVLAVENRTIPPNIKFNTPNPKIPFEAKKITVPVEATPWPWNRCVRA
+                     SVNSFGMGGVNAHVIIESADNFTPPTSEVIEEHDSTPQLLLFSANTQDSLEAMIQRNLA
+                     YLRENTDSLRDLVYTMGARREHLSFRAASIVHSDMSVTTASFGKAPSSPPDIVMVFAGQ
+                     GAQWPGMGVELFKSNATFRRSILEMDSVLQSLPDAPAWSIADEISKEHQTSMLYLSSYS
+                     QPICTALQVALVNTLFELNIRPYAVIGHSSGELAAAYAAGRLTASQAVTLAYYRGIVAG
+                     KVAQAGCMAAVGMGASEIIHF*"
+     misc_feature    16830..17570
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00109"
+                     /db_xref="InterPro:IPR014030"
+                     /note="e-value: 9.025888536170949e-60"
+                     /note="p-value: 3.364103069761815e-63"
+                     /function="Beta-ketoacyl synthase, N-terminal domain"
+                     /standard_name="PF00109"
+     misc_feature    17595..17930
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF02801"
+                     /db_xref="InterPro:IPR014031"
+                     /note="e-value: 2.2171445990751238e-35"
+                     /note="p-value: 8.263677223537547e-39"
+                     /function="Beta-ketoacyl synthase, C-terminal domain"
+                     /standard_name="PF02801"
+     misc_feature    17937..18287
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF16197"
+                     /db_xref="InterPro:IPR032821"
+                     /note="e-value: 3.8698172759236842e-25"
+                     /note="p-value: 1.4423471024687604e-28"
+                     /function="Ketoacyl-synthetase C-terminal extension"
+                     /standard_name="PF16197"
+     misc_feature    18360..18770
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00698"
+                     /db_xref="InterPro:IPR014043"
+                     /note="e-value: 1.0799913424517567e-26"
+                     /note="p-value: 4.025312495161225e-30"
+                     /function="Acyl transferase domain"
+                     /standard_name="PF00698"
+     CDS             18806..22078
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_17"
+                     /translation="MVVACENSPSSVTISGDIDQVQYVMQEISLAHPEILCRQIKSDTA
+                     YHSHHMKSVGDTYHSFINPFFRGETEVNCQPVHFFSTVTGDELSDGDHVGPKYWQQNLE
+                     SRVLFQGALENIISRQRSRHLLFLDVSPHSTLAGPIRQTLEQAEVAHPYVPCLIRFKNC
+                     AESFLSTIGQLYSHRQPLDFNMLTNPDRTAKVLTDVPTYPWQHGYSNLYTTRQNNEWLF
+                     RKQPKHELLGTRVVDSTDNEPCWRNVLYLEHVTWLRDHKVSGNIVFPAAGYVMMAGEAV
+                     RQIGSTASGFIVRQMVLDTAMVLNQSNPTEIVTSLRKHRRDRWYSFTISSHNGVKWIEH
+                     CYGEVAQGHDLPSTASTHKENLSRDINVSNWYKTLSRGGVEFGPAFQCVESQSCSVTSN
+                     TVSGRIVSKVNESTYYPVHPTQLDSVLHIVYGAIYKGFDWQVESLPVPTSIGEIMIGEC
+                     VSDLDVTMWADVSRNSNILVNGEAFGSDGCLLIRIKDIVLRPLGANQACFEEDESHAGA
+                     RLLWKPSMQFLNLADLIQTPVNWTKQTMLLNDFTSLCIERALCLLHAQGVSSSISHLQK
+                     YQDWLQRQPKPSSEQSMESLVEKILATSAAPCARAMIKVLDNIVPICKGEIDALEVLMG
+                     DDTLYELYNYLNEVDRTPLFDSLGHYQPQQRILEIGAGTGGTTAKILPRTKYSTYTFTD
+                     ISAAFFPAAKDRFQCHANVVYRTLDITKDPLDQGFEPESFDLIVAANVLHATPNLYETL
+                     SNVRKLLHPRGKLLLEELCGDAKFTNFIVGVLPGWWAGESDGRADEPYISPDRWDSILK
+                     AAGFNPLDDVAFDAAPPLHSLAFMLASPSCVPESPLKRNVTLLSDVTSSEIAVRMQKQL
+                     LSRGYSVGVQSLDQSLMDGEDVIILVDTVSPFFHNLDSRKLSTFQNLLRELQRSHSGAL
+                     WVTRSIQIDCRDPRYSPTLGVARTVRSEFGLDFGTCEVDTLKYTSIGLVIDVFEAFHGR
+                     RHGQNAYPEYEYAIREDTVHIGRLSSFSVQEELRRIQKAHVETKDNRISLVAGTSGFDS
+                     LAWQADAGQQVQLLGDDEVELQVDTAGVNFLVRCSFQFQGES*"
+     misc_feature    18809..19258
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00698"
+                     /db_xref="InterPro:IPR014043"
+                     /note="e-value: 2.639223271303753e-16"
+                     /note="p-value: 9.836836642950999e-20"
+                     /function="Acyl transferase domain"
+                     /standard_name="PF00698"
+     misc_feature    19487..20317
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF14765"
+                     /db_xref="InterPro:IPR020807"
+                     /note="e-value: 2.520598829779557e-60"
+                     /note="p-value: 9.394703055458656e-64"
+                     /function="Polyketide synthase dehydratase"
+                     /standard_name="PF14765"
+     misc_feature    20786..21256
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF13489"
+                     /note="e-value: 1.0131254482174088e-12"
+                     /note="p-value: 3.776091868123029e-16"
+                     /function="Methyltransferase domain"
+                     /standard_name="PF13489"
+     misc_feature    20801..21133
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF13847"
+                     /db_xref="InterPro:IPR025714"
+                     /note="e-value: 8.939870258494623e-11"
+                     /note="p-value: 3.332042586095648e-14"
+                     /function="Methyltransferase domain"
+                     /standard_name="PF13847"
+     misc_feature    20804..21097
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF13649"
+                     /db_xref="InterPro:IPR041698"
+                     /note="e-value: 2.319131521369124e-13"
+                     /note="p-value: 8.643799930559537e-17"
+                     /function="Methyltransferase domain"
+                     /standard_name="PF13649"
+     misc_feature    20807..21103
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF08242"
+                     /db_xref="InterPro:IPR013217"
+                     /note="e-value: 3.6288099491186147e-22"
+                     /note="p-value: 1.3525195486837923e-25"
+                     /function="Methyltransferase domain"
+                     /standard_name="PF08242"
+     misc_feature    20807..21106
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF08241"
+                     /db_xref="InterPro:IPR013216"
+                     /note="e-value: 5.245291385894328e-12"
+                     /note="p-value: 1.9550098344742185e-15"
+                     /function="Methyltransferase domain"
+                     /standard_name="PF08241"
+     CDS             22416..22889
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_18"
+                     /translation="MQTVLINSASDGVGLAAIQISKMIGATIYATVIGEDKVEYLTASH
+                     GIPRDHIFNSRDSSFLDGIMRVTNGRGVDLVLTSLSADFIQASCDCVANFGKLVNLSKP
+                     TAANQGQFPIDSFHPNMSYASVDIIDYIKRRPKESKRYVITFRHSYQLCPACN*"
+     misc_feature    22449..22766
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00107"
+                     /db_xref="InterPro:IPR013149"
+                     /note="e-value: 1.0960342036668699e-15"
+                     /note="p-value: 4.085106983476965e-19"
+                     /function="Zinc-binding dehydrogenase"
+                     /standard_name="PF00107"
+     CDS             22922..24277
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_19"
+                     /translation="MELYKQGHIQPITPVKTFTATDIRQCFDYMQSGQHIGQLRLSLKS
+                     QDTFIEAVCSPKTMIFQSDASYLLVGGLGGLGAEIARWMAEHGARNLIFLSRSADAESN
+                     IRLFRELESQGCSVQAIKGSVCNASDVKRAISAARIKLKGIFNMSMVLQDASLLKMSSD
+                     EWNAATGPKIQGTWNLHDASLDQDLDFFLLFSSMGGILGIPGQANYASANTFMDAFVQF
+                     RHSSHLPASVIDIGEVQGIGHVANNPEILNRLKLLECARMSQKDLFHAITIAISHSLPP
+                     QTLDYSRYENPAQFITGLRDTTGMLDSTGGKSMLLDSRLAAYVGNSAAVTAPTETKTSA
+                     NKLNNFVSSAATDSAILSEPSATQFVSLEIARWVFDLLMKPVDDDSEIDLSRSLVDVGL
+                     DSLAAVEMRSWLKSSLGLDISVLEIMASPSLAAMGEHVIRELVRKFGGDNKN*"
+     misc_feature    23114..23638
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF08659"
+                     /db_xref="InterPro:IPR013968"
+                     /note="e-value: 1.5141662612831146e-61"
+                     /note="p-value: 5.643556695054471e-65"
+                     /function="KR domain"
+                     /standard_name="PF08659"
+     misc_feature    23123..23584
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00106"
+                     /db_xref="InterPro:IPR002347"
+                     /note="e-value: 1.1379002942545491e-07"
+                     /note="p-value: 4.2411490654288077e-11"
+                     /function="short chain dehydrogenase"
+                     /standard_name="PF00106"
+     misc_feature    24071..24232
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00550"
+                     /db_xref="InterPro:IPR009081"
+                     /note="e-value: 3.359618716013185e-10"
+                     /note="p-value: 1.2521873708584363e-13"
+                     /function="Phosphopantetheine attachment site"
+                     /standard_name="PF00550"
+     CDS             25423..25710
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_20"
+                     /translation="MAQKLRFYLFGDQTYDYDEQLRALLTSHDPVVRSFLERAYYTLRA
+                     EVARIPNGYQARISRFSSIAELLSQRREHGVDASLEQALTVVYQLASFMR*"
+     misc_feature    25444..25704
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF16073"
+                     /db_xref="InterPro:IPR032088"
+                     /note="e-value: 1.3071857188363548e-23"
+                     /note="p-value: 4.872104803713585e-27"
+                     /function="Starter unit:ACP transacylase in aflatoxin
+                     biosynthesis"
+                     /standard_name="PF16073"
+     CDS             26198..29653
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_21"
+                     /translation="MSRPYISAYASGGVTISGPPSVLAELRNTPGLSKLRAKDIPIHAP
+                     YHSSAIFNQCDVETILSSALIDLASRATHVPILSTGTGRLVWAGTLPAAIQSALQDVLL
+                     RPISWENMSCGISTCLQSIDPSEVEVIPIATLAGPLLCRSVQVAKSQIPATIDPKNDVM
+                     NEAQSQIAEAMDRAKIAIVGMSGRFPGAENVDSLWELLMAGRDMCKEVPPTRWNVDTHV
+                     DPTGKRKNTSKIRWGCWLDNPDMFDARFFNMSPREAPQVDPAQRIALITAYEAIEQAGI
+                     VPGRTPSTQEDRVGVFFGTTSNDWCESNSGQDIDTYYIPGANRAFIPGRINYVFKFSGP
+                     SYSIDTACSSSLSALHVACNALWHGDIDTAIAGGTNVLTNPDMTAGLDRGHFLSATGNC
+                     KTFDDTADGYCRGEGVATVVLKRMDDAIADKDPILGVIRGVYTNHSAEAESITRPHVGA
+                     QKAIFQHVLNHSGIRPQDISYIEMHGTGTQAGDMREMTSVLDTFSPQYPGAIQREKPLY
+                     LGAVKSNIGHGESVSGVTALVKVIMMMQNNTIPPHRGVHTRLNRRFPSNLDERNVHIAF
+                     QATEWPRGQTPRRAFINNFSAAGGNSSVLVEDPPLILKEEGADPRSSHVIAVSAKSPSA
+                     LRKNLESMRRYAMSEHTEKSLCELSYTTTARRIHHSHRLMFAGSSLEDILREMESKLAI
+                     KEPFSPCAPLQSVIFTFTGQGAQYPGMGQVFFNNFSVFRSDLCRLDDLAQKLGFPTFLP
+                     IFSASTHARLEGFTPTVVQLANTCMQLALTRLWVSWGIRPSAVVGHSIGEYAALNTAGV
+                     LSDADTVYLVGKRAQLLEEKCNRGSHTMLAALASFEKVSRLLDSAPCEVACINGPEEIV
+                     LAGPRSHMTDIQKILVAHSIRCTMLQVPFAFHSSQVDPILQDFQSAIEGVTFHKPTIPV
+                     ISPLLGDFVTETGTFNPNYLARHCREPVNILQALRQASTMNLVHDSSVVMEFGPHPVVS
+                     GMVKSTLGNSIKALPTLQRNRNTWEVLTESVSTLYCMGFDINWTEYHRDFPSSQRVLRL
+                     PSYSWDLKSYWIPYRNDWTLYKGDIVPESSIALPTHQNKPHSTSPKQQAPTPILETTTL
+                     HRIVDEKSTEGTFSITCESDVSRPDLSPLVQGHKVEGIGLCTPV*"
+     misc_feature    26201..26338
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF16073"
+                     /db_xref="InterPro:IPR032088"
+                     /note="e-value: 8.208876065249628e-11"
+                     /note="p-value: 3.059588544632735e-14"
+                     /function="Starter unit:ACP transacylase in aflatoxin
+                     biosynthesis"
+                     /standard_name="PF16073"
+     misc_feature    26729..27475
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00109"
+                     /db_xref="InterPro:IPR014030"
+                     /note="e-value: 2.667462237983852e-82"
+                     /note="p-value: 9.942088102809735e-86"
+                     /function="Beta-ketoacyl synthase, N-terminal domain"
+                     /standard_name="PF00109"
+     misc_feature    27497..27862
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF02801"
+                     /db_xref="InterPro:IPR014031"
+                     /note="e-value: 2.4031043351141288e-34"
+                     /note="p-value: 8.956780973217029e-38"
+                     /function="Beta-ketoacyl synthase, C-terminal domain"
+                     /standard_name="PF02801"
+     misc_feature    27896..28216
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF16197"
+                     /db_xref="InterPro:IPR032821"
+                     /note="e-value: 2.535893425129411e-07"
+                     /note="p-value: 9.451708628883381e-11"
+                     /function="Ketoacyl-synthetase C-terminal extension"
+                     /standard_name="PF16197"
+     misc_feature    28322..29233
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00698"
+                     /db_xref="InterPro:IPR014043"
+                     /note="e-value: 4.597134671955754e-38"
+                     /note="p-value: 1.7134307387088164e-41"
+                     /function="Acyl transferase domain"
+                     /standard_name="PF00698"
+     CDS             29804..30544
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_22"
+                     /translation="MVIEKALMPLNAGPQLLRVTASLIWSEKEASVRFYSVDVRRPSSK
+                     SQMNTNIHNSQENHTETVQHSHCRIKFSDRSTYQAYQEQISAVKARMFEMKTNSSSGRT
+                     YRFNGPMAYNMVQALAEFHPDYRCIDETILDNETLEAACTVSFGNVKKEGVFHTHPGYI
+                     DGLTQSGGFVMNANDKTNLGVEVFVNHGWDSFQLYEPVTDDRSYQTHVRMRPAESNQWK
+                     GDVVVLSGENLVACVRGLTVSRET*"
+     misc_feature    29918..30535
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF14765"
+                     /db_xref="InterPro:IPR020807"
+                     /note="e-value: 7.778696660229127e-11"
+                     /note="p-value: 2.8992533209948296e-14"
+                     /function="Polyketide synthase dehydratase"
+                     /standard_name="PF14765"
+     CDS             30591..32633
+                     /inference="ab initio prediction:Prodigal:2.6"
+                     /transl_table=11
+                     /locus_tag="BGC0001866.1_23"
+                     /translation="MLTTFQIQGVPRRVLRYILQSSAKTTQTATSSVPAPSQAPVMVPQ
+                     IVQVPKAKPISQISGTLTEALRIICEQSGVPLAELTDDATFANIGVDSLLALTITSAFV
+                     EELDLDVDSSLFMDYPTVADLKRFFDKINTQHAPAPAPVSDAPKQLQPSSSPVASATPS
+                     APIHGRSKFESVLNILTEESGVEMAGLPDSTALADIGIDSLLSLVVTSRLNDELELDVS
+                     SEDFNDCLTIRDLKAHFMSKNSDNGSSAVLTPQPSRDSALPERTRPRVADTSDEEDAPV
+                     SANEFTTSARSTSKYMAVLNIISEESGMAIEDFTDNVMFADIGIDSLLSLVIGGRIREE
+                     LSFDLEVDSLFVDYPDVKGLRSFFGFESNKTATNPTASQSSSSISSGTSVFDTSPSPTD
+                     LDILTPESSLSQEEFEQPLTIATKPLPPATSVTLQGLPSKAHKILFLFPDGSGSATSYA
+                     KLPRLGADVAIIGLNSPYLMDGANMTCTFDELVTLYLTEIQRRQPAGPYHLGGWSAGGI
+                     LAYRAAQILQKAAANPQKPVVESLLLLDSPPPTGLGKLPKHFFDYCDQIGIFGQGTAKA
+                     PEWLITHFQGTNSVLHEYHATPFSFGTAPRTGIIWASQTVFETRAVAPPPVRPDDTEDM
+                     KFLTERRTDFSAGSWGHMFPGTEVLIETAYGADHFSLLVSLLFRD*"
+     misc_feature    30789..30974
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00550"
+                     /db_xref="InterPro:IPR009081"
+                     /note="e-value: 5.884377030377924e-14"
+                     /note="p-value: 2.193207987468477e-17"
+                     /function="Phosphopantetheine attachment site"
+                     /standard_name="PF00550"
+     misc_feature    31110..31304
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00550"
+                     /db_xref="InterPro:IPR009081"
+                     /note="e-value: 3.9212317886052276e-10"
+                     /note="p-value: 1.461510170930014e-13"
+                     /function="Phosphopantetheine attachment site"
+                     /standard_name="PF00550"
+     misc_feature    31485..31670
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00550"
+                     /db_xref="InterPro:IPR009081"
+                     /note="e-value: 1.367829688372301e-08"
+                     /note="p-value: 5.098135252971677e-12"
+                     /function="Phosphopantetheine attachment site"
+                     /standard_name="PF00550"
+     misc_feature    31917..32240
+                     /inference="protein motif"
+                     /db_xref="PFAM:PF00975"
+                     /db_xref="InterPro:IPR001031"
+                     /note="e-value: 6.711355516947163e-24"
+                     /note="p-value: 2.5014370171252933e-27"
+                     /function="Thioesterase domain"
+                     /standard_name="PF00975"
+ORIGIN
+        1 ttacatccgc ttagtctcct cggacttcca tgcttccttg tccattgaga aacgatccct
+       61 ctcatcttct tgggacaaaa cataatatgt tccttgcatc cttcgttgcc acagacgcca
+      121 cagaccccca atgaaaagcg acagtagaac aaccgcagct cctcccgccc ataccaaact
+      181 ctccggccgc agaatcttcg atacaagacc ccaaaacccc gctgggcctg gcagttcgtg
+      241 ttcccgaaac ccatggaaac cctgcgtata ctccggcgga acgtggggcc acttatctac
+      301 accgtctagg atagccattg ccattccatt atccagatgc atctcgaaat ggcagtgtaa
+      361 cagccacgcg cctggcgacg taacctggta ccgcagcaca acccacatcg ctcccgtgaa
+      421 ctctgtcagg aacgtgtctc ggtatggggg attcaccatg ttaaagcttt cgggctggtc
+      481 ggccattgct tcttctaccg aggaatagtt ccacaggcct tgcccccctc caatccgcca
+      541 gtatttgttc gcgtgtttat ggaccgcgtg cgggaagtcg actggccata atggtgaatg
+      601 acctacctgc agtacgatgt cttgccaggt gccgttctgg gttcggatga tcaaatcgtc
+      661 atcatgggcc tgctcggaat tgacgttgta aagaaggggt tcgtaggctg agcggtctgg
+      721 gggatatttt ttcttcccac tcagagtaaa ttcccaagtg gagttagcct tgcccaagta
+      781 caacacatgc atatcatctc cgtgggcagc tggaactttt ctcggccaag gaggcaggtc
+      841 aagcttgtct aggaaaacag catctcgcat cggccaagcg ttgtaatcga aatagggctt
+      901 agatatagtc gtgacaccaa atcggtctgg tttctgacgg gcgttggggt ctccattttt
+      961 gtatcggagg atcccgaatg ctgcgatcat ctgggaatat ccaccatctg ggacacggat
+     1021 cgagtagtcc atgggcttct tatccagtct gatcatggct gaatatcgct ctccggccca
+     1081 aatcaggaag gtgtctgccc ggcgaggctc gatgtaatgg ccgtcgactt catagatcca
+     1141 catttcgtgc tcatcgattg ttggctggag ggtcttgaat gtcgagcctc cgatccagtt
+     1201 cacactcgcc cagcggtctg ccgggtcaac ctcgattgcc tctgttggac cggagtaggg
+     1261 aacacagcct tcctggagac cgggcgggat ggcggacacg ttcccgtctg cgagccacgg
+     1321 accttctgtc gatggtacga atgggaagca cctatacgat aatcagagac atgaactata
+     1381 ccgatgatga atctcctggc cgctcaccct ttgtctgtga tagtgtcatt cggccagctc
+     1441 ttgtgcatga acgggatctg cttgtcaata agccattgat ggccaggaca atagacgctc
+     1501 cctttgccgt tgagaagaat actatcgacg caactgattt tataagtcag tcgactagaa
+     1561 acgtactcat atcagtacac tggaatcata caagatctgc agtcggctgt ttgcatctgc
+     1621 ttctttatac tgggtcgagt tgtagtagtc ccagtcagaa agcatcatga tgtgaggatt
+     1681 gttgcttgcg cgctccatgg cagcaatgtc ctcgggatct tcgctgatca ttgcccaggg
+     1741 tccagcggtg cctggctttc ggctgccttc tgtgagctac atccaacaat agcagagaga
+     1801 aaacaaaccg aatgaacaat gcaccatata agccatccag cagagtagcc cgagaatgag
+     1861 agtggtacct gattctgtca gtctacttct gactgggtgt gtttctcatg acgacctacc
+     1921 agtattgacc aggggggtag gcagtaaagc gatagacgta gctctcacca ggctctatag
+     1981 gcttctgact cagacctgga actccatctg accacggcgt atcttgcatc cttcattaca
+     2041 ccattagcac ctggcgaaat atcgaaggtt gaataggtct actcacaaaa ttccatgcca
+     2101 gtggattgtg gtgttctcat gcatgtaatt acgaacaaca atctgttctt cgtcagtcct
+     2161 gcagcctcaa ttggcgaaag atatgccacc gacttcaacg tcgtcgccct cgtcgaatat
+     2221 aagcgtcgga gatggaaatt gaccatttgt cttgatcatc tgccgagatt ggccgttggg
+     2281 agcgccctct tcccaggtga gcactagctc gcgttggaca tgggcgccca ggacgagcaa
+     2341 tacattgcag gccagcagta aaagataggc cggcattgtg aaagaatgga tagcctgatt
+     2401 agttcagttg cggttatgcc tggtattaat atcacggttg ggatacttgg ggctgaagaa
+     2461 ctcattaagc cccggttctt cggggaccga aaacattgcc aacctcactg ccacaatact
+     2521 cgtactcttc ggattgaaac agttactgag tgaaattaat gttatttaat agcgaagttg
+     2581 gagtcgaaga acatgtcgta ctctggtcat tccaagaaac aacgtcgatt actatggcag
+     2641 aggaaatcaa gctgactccc ctggagacct tcgcacaggc aatcagtgcc tctgcgaaga
+     2701 ctattgcaac ttactgcaga gactccggtc atcctcaact gtccgatgat aattctagcg
+     2761 gcctcactgg ggatgttctc cccccttccg caccacaggc agtcaccgcc gccagacaga
+     2821 ccatcttgga ggcatcgtac cgactacagc aattggtcac tgagcctagc caatacctgc
+     2881 cgcgactgac cgtttacgtg agtgttgaac aatctcccat gaaagatcaa actaacgaca
+     2941 gaaaagcccc agcacctggc tgccttacgc tggctgtgcc atttcagaat cccggagctc
+     3001 atccccgtgc aaggcaccag gacatactat gagctggcta cagaagccaa agttcctctt
+     3061 catcaactgc agagcattgc aagaatggca attactggga gctttctccg agagccggag
+     3121 cccaatatcg tcgcccacag caggacgtca gcccattttg ttgagaatcc ttcgctccgt
+     3181 gactggacac tattcctggc agaggatacc gcgcccatgg cgatgaagct tgttgaggcg
+     3241 actgaaaagt ggggagacac gaggagcaag acagagacgg cctttaacct ggcgctgggc
+     3301 acggatctgg ccttcttcaa gtatctttcc agcaacccgc agttcaccca gaaattctcg
+     3361 ggatatatga aaaatgtgac agcgagcgag ggtactagca tcaaacatct cgtcaacgga
+     3421 ttcgactggg cgagcctcgg aaatgcgatc gtggttgatg tacgtcttca atcgtcgttt
+     3481 actccatacc gatcccatac tgatgtgata ttctacaggt tggcggttct actggtcatg
+     3541 caagcattgc tctcgcggaa tcgttccccg atctgaaatt catcgtgcaa gacctgccca
+     3601 tggtgacatc tacctcgaag gacaatcgcg aaaagacccc tctcccagag acggtcgctt
+     3661 cccgcatctc cttcgagagc catgacttct tcaagcctca gccggtgcag aatgcggatg
+     3721 tctatcttct tcgcatgatt ctgcatgact ggtcattcaa agaagcaggc gagatccttg
+     3781 ccaatctagt accgtccgtc aagcagggtg ctcggatcct tattatggac actgtgcttc
+     3841 cccgtcatgg tactgtcccc gtaactgagg aagcgttgct tcgtgtgcga gatatgacga
+     3901 tgatggagac attcaacagc catgagcggg agattgacga gtggaaggac ctgattcagg
+     3961 gggtgcatac tgggcttcgg gtgcagcagg tcattcagcc ggcggggagc tcaatggcga
+     4021 tcattgaggt tgttcgggga tgacgagact catcacgtat tggtaccttt cccttgggct
+     4081 gcgcaaaatc ctacatattg agctgtaagg ctgtgtggca cacatggggt agatggataa
+     4141 acgaggagtt tagttttcgg accccgaaag cgatctgcgg atcgactcat gttttgatct
+     4201 catgtcagca tctagagtaa gtatctagag tacttctgat tgctatagta catgaaacca
+     4261 tgatattcct acctatggat ccggaagcaa tctaagtagt tctgatcggt cttttccttc
+     4321 cgttaccacg atgtttatga cccagatcgt gttcgggata gccccgaccc ttctcaagac
+     4381 cttctcccac ctcactgccc tcgatctctg gcgaccatcc gcaccctacg tgttcgatcc
+     4441 tgtcacgagt agcacctacc tagggactat agccgatggg gtcgaggagt tccttgggat
+     4501 cttctacggc caagacacgg gtggatcgaa ccgtttcgca cccccaaagc cctatattcc
+     4561 ctcccgccac agtttcatta atgcgagcac ggcgggcgcc gcatgtcccc agccctatgt
+     4621 tcctctgcca gccgatccat ataccgttct caccaatgta tcagaggact gtctcagcct
+     4681 gcgcattgcg cgaccagaaa atacgaagtc tactgcgaag ctgcccgtga tggtttggct
+     4741 gtatggaggt gcgtacaaca gattacccac tgatctccag tgggagacat gaagaccgct
+     4801 aactgaaatg acaggaggag cttccgtcgg aacagcctat gatgtatcgt ataatcctgt
+     4861 cggactgatc cagcagtccg tggtgaatgg gagtccagtg atctacgtcg ctatcaacta
+     4921 ccgggtaaac cgtacgcaga ccgatagtca cttttaaaat tcatatctaa caagccttca
+     4981 gtttttggac atgccttctc ggacgctctt ctaaagtcca agtccacgaa tctggctatg
+     5041 caagaccaac gtcttgggat cgaatggatc aagaatcata tttctgcgtt cggaggcgat
+     5101 ccagacaata tcaccctctt cggagaagac gagggtgcaa cgtacatcgc tcttcacatt
+     5161 ctctcaaacc atgaagtgcc atttcataga gcaatcttgc agagtggagc cgccataacg
+     5221 catcacgatg tcaacgggaa tagatccgcg aggaacttcg cggccgtcgc ggccaggtgc
+     5281 aattgtctct ctgatggcga ccgacaggta gactcccaag acacagttga ctgtctccga
+     5341 cgagttccta tggaagatct agtcaacgca acgtttgaag tcgcgcactc tgttgatccc
+     5401 gtgaacgggt tccgcgcatt gtacgtcctc ttacactttc cctctcacaa atgcaagcaa
+     5461 gactaacagc cccagtatgc ccgccgtaga cggctacatg ataccggatg agccatccaa
+     5521 ccttctttca agaggccaag taccagccaa tatatctatc ctagcaggat ggacgcgcga
+     5581 cgaaagctcc atgtccgtcc caacgagtat ccgcaccgca gccgacgctg cctccttcat
+     5641 ctcaacccaa tttccgttat tgaacgcctc aactatccac cactttctca catctctcta
+     5701 ccccgaatcg gactttacca ccaacagtcc atcttcaccg gagaaagtta cgcccgcctg
+     5761 gcgcgcgaca tcagctctcc acagagacct tacccttacg tgcccgacta tcttccaagc
+     5821 atggtccctg cgcctctcat ccaactgtac aactccggtg tatctgtacg aactgcgtca
+     5881 aagtcccttc gcgacagctc tcaacaacag tggtgtggga tacttgggca tcgtacattt
+     5941 ctcggatgta ccgtatgtct ttaatgagtt ggaaagaaca tactacatta ctgacccgga
+     6001 ggagaataag cttgctcaga ggatgagcgc gagctggact gctttcgcaa gcggtgcttt
+     6061 ccctctttgt gaacggtctg agagatcatt ggggagatgg gaggaggcgt atggggggga
+     6121 cagggtttgt agggatcgaa tgccagagca tgtacgggtg aaaggtattg gggataatgg
+     6181 cgaccaggat gatggggatg agatagggaa gcttatggca aggtgtgggt ttattaatcg
+     6241 gttggagtac taggatgatc gagggtcctg ggagactagg tacttactca gtccacgtgg
+     6301 aatctccgag tgagggtaga ggagattccg tcttggccct gggatctggt ttttgagtga
+     6361 acaatgccac gtctatatga ggaggtcgat ggaagttgca aattcggagc aaagacagta
+     6421 agtatacagg cataataata ataatcggtc ctgttagtgg cgggccagtt ctgccggtcc
+     6481 attacccctg agtaccttac tgtctgtacg tacttagctt agctacccaa actcgagtcc
+     6541 aacgaagaat acttcgtcca caatggaact gttcgtttcc tcggagggat gcttgcacat
+     6601 ttgcgtcccg atcccattgg gatgtgacgg ataccttcta gctgggtact tcctatggtc
+     6661 gatatcagct ggttcgggca cttaaactta ttctccaaag ccgaaggaac tgtactcgaa
+     6721 atggatctaa aacagcgctg ccccgtttcc cgacccgacc agacccttcg cggcatgctt
+     6781 caaggcatac ctaggtaggg cgtaatcaat gtttcaatgg tgttctgttc tattggatcc
+     6841 cttcacctga cttcgaaccc tgattgggtt gacggtgtcc gaaaaacccg catgggggcc
+     6901 aaagccggta gaatcgatct gatatcggct gggtaatgga tggttggaga tggaaatata
+     6961 tatgtttagt cgcttaagtc ctttatggtc cacaaaatac ctagcggata cattattata
+     7021 tgattggcca aaatggcaag tatgcaagaa gtcgatgcgc tcgtggtcgg tgctggattc
+     7081 ggcggcttgt ggatgacaaa caggtacgtg ttcagccttg ttgcgaggag atgtatttca
+     7141 gagttgatag cgttgcgaca gactgaaaga ggccggtcta aatgtcctct gcgttgagaa
+     7201 agcacctcaa gcaggcggag tctggtactg gaactgctac cccggcgccc gggtagacag
+     7261 tcgctaccct gtctatcagt attccgacga gtcgctctgt aaagactgga attggagtga
+     7321 acttttccca gggtacgaag agatccgcaa gtacctctcc tacgcggtcg acaagtggca
+     7381 gctaaacagc cacatccggt acaatacaac agtgacaggt gcacgcttcg acgagtcgga
+     7441 ccacaaatgg accgttgaag gtattaatgg ctcacatggg acgatacgca tccggtgtcg
+     7501 ttggtacatc cttgccctag ggtttgcctc aaagccatac atccccgact tcgagggtct
+     7561 caaccgcttc cagggtccct gcttccactc ctcggcgtgg ccacaggaag gaatcgacct
+     7621 gaaaggcaga cgcgtcgccg tcgttggcac aggtgcaagc gctgtgcaga tcatccagac
+     7681 catttccaaa gaagtcggcc acttgaccgt ctaccagcgg accccgtgca ctgccatgcc
+     7741 aatgagacag cagtccctga cacccgaata ccaggacaat ttcaaagcct ccggcgagat
+     7801 ggcagccacg atgcgtcgta caaagtacga gagatttggt ggccaagacg tgcagtttgt
+     7861 tagccgtcgc tggcacgaag atacccctga gcaacggcgt gcagtgttcg agcaagcctg
+     7921 gcagaagggc ggattccacc tgctcctatc cacctacttc gaagtcttcg acgacgtgga
+     7981 ggtgaaccac gccgcatggc gattctgggc cgagaagtcg cgcgagcgca tccataacac
+     8041 caaatacaag gacatcctag ccccgctaga agcggtccac gccttcggag gaaaacgtac
+     8101 cccgttcgaa caagattact tcgaggcctt taaccgtcgt aatgtagatc taatcgatat
+     8161 gaaagcgtca ccaattctct cctttgcgga gaaaggcatc ataacccaga acgagggctt
+     8221 gcaggaattt gatgtaatca ttctagcgac tggcttcgac acaaacactg gcgccctaac
+     8281 atccatccat atccaagaca cagatggcat cctactcaag gatcgttgga gctacgacgg
+     8341 cgttatgacg accttcggga tgtcgactag caagttccct aacatgtttt tcttctacgg
+     8401 gccgcaggca ccgacagctt tctcgaatgg accctcgtgc attgaactac agggcgagtt
+     8461 cgtcgaggag ctgatcctcg atatgatcgg gaaaggtgta acgcgtgtcg acacgactag
+     8521 tgaggccgag aagaggtgga aggagtcgac tttgtcgctg tggaaccaat ttgtgttttc
+     8581 ttctacaaag ggattctata cgggggagaa tatcccgggg aagaaagctg agccgttgaa
+     8641 ttggtacgta ctagtccttg gtttaggtgt ctcaaaacgc tgatgtgtga tataggtttg
+     8701 ggggctttcc tcgctatagg aaggcgttga cggagtgtcg cgacggcggg tataaggaat
+     8761 attctcttcg gtctttgcct aaagttccgg atccagagca cagggggttg attgataagg
+     8821 tggccgttgt tacatcggcc cagccggtgg gagcgtaatg catcgttatg gaaagtttca
+     8881 gtgactggct atctcaatga agcgccttct ttccccccgc gcgctttgtc ttctgtggac
+     8941 gataatttac tctgagttac aattacattc actcgagact gaagtgagag taggccaatt
+     9001 gaaggtgaag aattagccgg tggactcgag gagaaagaca aagcagagga aataaatcga
+     9061 cgaaactgta ttatcacggt ttgcaatcaa tagcttaaga tactaacgca gggctgaata
+     9121 ttagcatttc aatatgaact tggcttattc gacacataac cgccatccac aagcagaacc
+     9181 atcccattac agtacgcccc agcttttgat gccaaatata caatggttcc cccgatctcc
+     9241 tgcgcatccc caaacctctc ttctggggcc agtgaagcag gcagtcgacc tccagttgct
+     9301 ttgtagaagt tgcccatcat gtcggtgttg aaatctgcgt cctgtcagca ctgatgacct
+     9361 ttcgatctgg ataatggcaa acatacatcc aggggcaatg acattaactc tgatcttcca
+     9421 cggggccaga tatgtggcta gattcttcat catatgtgtc gcggccgcct tactcgcacc
+     9481 gtagatgaac gcggaattgt ctctaccgaa tccgcccacg ctgctcaacg tcaccacctg
+     9541 gcttcgcacg tactcgatgc gcgctttgcc ggctccattg cgcgcctggg aagcctttgc
+     9601 ctcttcctcg ctcttcttgt tgcctttgtc caagagctcc agaaaagcga aagtcaccat
+     9661 cagcgaagcg gtggtgttgg tctcgaagac cgccgcgtag tccgcgaagg acttgtgaaa
+     9721 ccagtagtca cgcactttgg atatgtccca tttcggcgtc ggtcgtgtgg catggggtcc
+     9781 caggttcggt gtggagattc cagcgttatt caccaggagg ttgatatacc catccttggc
+     9841 ggcgatgtgg tcgacgacgc cttgaaggtc agcatgcgag gtaacgctgc actggaccgg
+     9901 gtggatgttg ccatgggtct atcacagata gtcagtacca tagtgattgc tttgtggact
+     9961 tggtttgtct tgaaccaacc gactgcttgg ctgcttcttg taagggctcg agccgtcgcc
+    10021 cgagtatgta gactttagcc ccgttggcct cgagggcctt cgccatcatc agcccaagcc
+    10081 ctgttcctcc gcctgttatc acggccacac ggccagtgat gtcgaacagg ctcgatgcgg
+    10141 ttaatgcttt tccatccatg gtctgttctg ttgttggtgc atcaaagcag taagagagga
+    10201 agatccgtcc aaacgggcgc gctttcttct actctggatg gccttccata tcttgcctcc
+    10261 ccctctctct ctctctctgt tcttccatgg taattgtgca gctccaaatg ttttcggata
+    10321 ccgacgtgaa gctgagatgg gagaacatgt cgcgatcagc cactaagtat ctacgtagtc
+    10381 ctagatagaa agcaaacaca cttagaacat cggttgttag cacatgcagc cactgtcgat
+    10441 cctcacactc tagaatgcta ccgtaggcga ctaggccaac ggatggatga cttactaaat
+    10501 agacgacaat aaaacgacca gccagtcggg ttcatcccag aaatgagcga tcgatctctc
+    10561 attgctgcat cggacaccga agcttttctc gtggcttcga tgaagctacc taggtagcca
+    10621 tggatatcga tccttggggg ctgagacagc tacaaaccct gcacgagaac acacggaagt
+    10681 actagcgaca taaacacatc aggtctcaga ttcttgctag ccccaagatt gcaaaagtac
+    10741 cacgtatgtt gacaggcatc gcgtgagagc aggacaactt gttcctctcg tttccacccc
+    10801 cacgccatcg tgtctggcac ttcagatcgt attctgctgc tgctcgactt ttctctcaga
+    10861 tccgcttgtg cttcaaaacc accgcaagat ggcagacgag caaaagaccc ccttggagag
+    10921 cggccagcaa ccagcagtcg cacaacacac atcgaccgct gagctgcaga cagaaaagcc
+    10981 cggccagatg aatggtaatg gaacagcaga caagccaggc cctccaggag gcaaaccttt
+    11041 tgggcctggc atgggcccgc ctatacagta tcccactgga ttcaagctct actcgatcat
+    11101 gaccgggctc tacctcgcga gctttctcac ggctttggta ggttggcgct cgatcaccga
+    11161 tctgacggac agtgaaactt atatcggatg acaggatcgt accgtgttgg tggtcgccat
+    11221 tccccaaatt accgaccatt tcaactcgat tgacgacatc ggttggtacg gcagtgcata
+    11281 tctgctcacc ttttgcgcct tccagctgtt gtttggcaaa atttactcgt tctacaatcc
+    11341 caaatgggtc tttctgtcag ctgttttgat cttcgagatc ggatcggcca tttgcggtgc
+    11401 tgctcctaac tccactgcct tgatcatcgg tcgtgcgatc gcgggtcttg ggtcttcagg
+    11461 gatttttggt ggaagtgtca ttatcacctt cttcacggtc cccttgcatc agcgaccgat
+    11521 ctacactggc atcgccggcg tcatattcgc gttggcctcc tcagtcggac cgctcatcgg
+    11581 tggtggattc accaacaacg tgtcctggcg gtggtgcttc tatatcaacc tacccgttgg
+    11641 agccctgacc gtggtgacta ttctgctgtt cttgaacctg ccacccgctc gtaaggccgg
+    11701 gacacctctc cgtgaacagt tactgcaaat ggacccactg ggtaaccttt gcttaattcc
+    11761 tggtatcatt tgccttctgc ttgccatcca atggggtggc tcaacatatg cgtggagcaa
+    11821 tggtcggata gttgcactgt tggttttggc cggtgttctc ttaatcgcct ttgtgggagt
+    11881 ccagctgtgg ctgcaggata agggcactat ccctccacgc gtgatgaagc agcgcagcat
+    11941 tgcggctggt atggcattca cgatctgtgt gaccgcaggc ttcatgtctt tcaactacta
+    12001 cctcccgatc tggttccaag caatcaaaaa tgcgtcatcc ttccactcgg gtgtgatgat
+    12061 gttgcccaca gtaatctcat caggagtagc cagcttggcc tgtggattca tcattcatcg
+    12121 agttggatac tacacgccgt ttatgatcgg tggctctgtg ctcatggcga ttggcgcagg
+    12181 tttgctcacc acgttcacgc ccaccacaga gcaccctaaa tggattggct atcaggttct
+    12241 atgggcatta ggatgcggaa tgagtacgtt tcagcctcct ttctttgccc gctgtatctt
+    12301 tgtcggcgga tactaactcg acctctgagc aggcatgcag caagcctccc tggccgccca
+    12361 aacagtgctc ccgaagcctg acgcgccaat tggtatctcg ctcatctttt tctcacaatc
+    12421 attaggcggt tcagtatttc tagcggtgga cgattccatc tacagcaacc ggcttgcggc
+    12481 caaactgggc agcattccca atctgcctca gtcagcgctg acaaacacgg gagccaccaa
+    12541 tattcgcaac ttggtggcac ctcaatattt gggtcgtctg cttggcggct acaatgacgc
+    12601 attaatggat gtcttccggg ttgcggtagc cagcagttgc gcgtgcgtag tagctgctgc
+    12661 ctttatggaa tggaagaatg tcagggcggc caaggcagct ggaccaggtg gcccaggggg
+    12721 tccagggggt ccagggggtc cagggggtcc tgaaggcctg aggggaggaa acaaggtata
+    12781 atcgaataaa cccccaaatc cggaaatatg catgtatagt aagcaagtag caaggttgtc
+    12841 taagatgtta tgttatggtc tcctgggtag acaggcgtac tatcaggcca tttgcagtaa
+    12901 tacgaaatag agatagattg atagaagacc atctcccaca atggttctga aaagccgatt
+    12961 gtctaaatga ttgatattgc tatatcacta aagagattgt aagctaggta tactttatag
+    13021 atacagggaa gggctagtaa aaaaagaaat aataataata attgatgaaa agcttcagat
+    13081 tatgtcttgt tctggtatta ttctagtgat taacagtaat gctgatctag ctgacaagca
+    13141 ttaaggagct ccaataagct ctaagatggc ggagcattcg ttaggcgtaa taaacctact
+    13201 aactagtgat gcacttctcc gtatatagtg agactatacg aactattaca gccacgcatc
+    13261 agagaacccc taagctgcct gggggctcga cccaggatta tatgtatatg caagaattac
+    13321 tcgggggggt tttgaatcct acattagcca ccacattcga cgacggtagg ttaaccactg
+    13381 aggggtagcc tatataatct cgtgtacagg tgggagataa ggatagttag cgtgtagcat
+    13441 gatttcaaat tgcttgattc ttatccaatt atttgtatta aacgtccaat atagttgtaa
+    13501 agttttggag gtttctttcc attaggtagg tataatgtcg aaggtccagg tagaccaatt
+    13561 aggagttctg atctttggat ggatggtaac aaatgctctc cgacgtacct agtaccatat
+    13621 agatgactga agcgaactcc tgcgagggtc gaatgatccg acacaacgcg gagtacactg
+    13681 cgggctcatt gcttgcatat cactagcagc agatgaatca cacgcgctaa tccagagtag
+    13741 gagtttcgtg tgatccaggc actaaaagct cagaacagca agatatgttg gagaacagtc
+    13801 tcctacagga ttacaggatt ttcaattatt ttatcaaata taccgcgcaa aggtaaagat
+    13861 ctcaaccctc ggttacgctc taggctctgt ggtccaaagt ataggtagga gctgcagtaa
+    13921 tgggctgggc cagaagagga tcgtcctatc agagaaatat gacctgtagt tagatctcta
+    13981 caaggcgaaa ttgcatccgc tagggtagta tcgcgcatgg cgggaaggat ttgggggcca
+    14041 ttttagttct cttgagattg gaagtgccta aggaccgcta ggatgtccta taatagaaat
+    14101 aagcatccat cctgttaaag ctggtaagga ctgaaattgt ctatatggcc ggtctgtgct
+    14161 gatcaactcc aagtctcccc tcctcttatc gataggttcg tctggtttat agcgcttgct
+    14221 ccgctagtaa gtcgccagtc aaaccctatc tcaacatgtc gtggcagggt catgaaggcc
+    14281 ccaaggtgaa gctgcggtcg gcatgcgacc ggtgctctgc gaacaaggtc aaatgcactc
+    14341 aggagaagcc cgaatgcgag cgttgtcgtc ttctgagctt gccttgcaac tatagccgtt
+    14401 cgatgcggat agggaaacca cccaaatccc gacagcgtgg attgtccaat attgacccga
+    14461 agactctcat gggcgggact gttaccaaaa agctcaggcc gtgtccgagc gcgccagagt
+    14521 cagcgtgtag aggttcattc gaagatggag atgggggccc gtggacagaa actatgacat
+    14581 tcgaggagat gttatcccgt ccctccccac ctccctttgc tggaccgtca cacaattcaa
+    14641 accgtcctac gaacatggct tctacgaacc aagatcagta ctatcacgac aaagggaaac
+    14701 atggcgaaac aatggatgag atgctgcaga cgctggtccc agactcagtg cagttcatag
+    14761 agttccccaa cacagcccgg gaagaccaga aacaacatcc agaactcagg tcggaagaag
+    14821 agtatagtga ttatagatcc aagtctctct tcgaggaagg cttggcacgc atcgcacctg
+    14881 attgtgctgg gggtattatg gacgtcttat atggcgaaga ggccttagtg cagatgccca
+    14941 atctgccctc gagtacgcat gaaggaagct caaataccca tgttacttca tcccacaact
+    15001 gtacgagagc cgtgatggag aatctagcca agctatacca agtatgcgca cctgctggag
+    15061 tagagaatgg ttcccacccc acaaccgacc aagtactcaa agcaaacagc gacgccatga
+    15121 aggacgctgc ggatcttttg gcgtgcccgt gtgccaagga cttctgcttt cccatcatac
+    15181 ttggcatcac agcatgcaga gttctggcct ggtatcaggt cgtgatcgac atgtatgacc
+    15241 cggagattcc catggccacg atgccaacgg cacgcgagga catcaagcac tgtccgattg
+    15301 catttggagc gtatcagctg gatgaggagg tgagccaagc aatgaccagt caattcgttc
+    15361 tacgaaacct ccgtgcaatg actcgatttg tcaagaccta tgtggagaac ttctgctctg
+    15421 atatcaataa gaaccggcca ggaagctgta gcctcatcta ccgctccctg ggcactttta
+    15481 tgcagacgcg gcttggaaat accattgagc aactggagga tcggttggct gcatttgatg
+    15541 gcgagtatac aaagaacatc ggatagcttc gggcaccgaa acaatatctg ccttttattg
+    15601 cttctcaatc aagcagttaa tggattccta aaccttccac accaaagttg gggcaacggc
+    15661 tccagggaga ggaacgtcaa aaggacgatg gcagaatatt tataccacgt ggaaaccctc
+    15721 actgaaactt gatttaggag ttgtttggat ataatcgacc tgtacagatc ttcaatgaca
+    15781 gtttgtcttt cggggtctcg tcttcctgcg cctcgggaat cctccgggcc tgggagtttt
+    15841 tgagtagctg ctattgacgt ggagaaggca cggagatcag gagagaaaaa tgctacagtc
+    15901 gagataaggc aagcaggata tagctcacga caagatatag tatatgaaag agcaactaga
+    15961 aactgactgg caatctgtaa cgtacatgaa caatgatcct ccatattctg ggtccctccg
+    16021 atccatatat aggatttgta tctaaaacaa tccaaaaagc cgatataaag gattagcagc
+    16081 agaaataata aggcaaaaga ggatctatac catcatagca gcccaagcgc atgttctaaa
+    16141 cccctgcggg gatatcaatc tacccgagtt gcatgcccta ggcaaataat catgtcacgg
+    16201 gttccccgtg acactatata gatgacatct gtattctgct gtatccagac ctctgtgccg
+    16261 taatattaat aacttgtaga tttaataaga aaataatatt ttctttctct aagaggggca
+    16321 agttacccct caccttagac cccgcttggt ttactttctt gagtatgaaa tgcacagacc
+    16381 gcgagacgtg tcacataatg ataactatta ataattgacc actcaataat ttaaatctca
+    16441 agtttaatct ctctgaggtt gaaggatttg tctgttctaa tctcaaattt attacatttc
+    16501 ccctccaata tgagctattg aaagactctt ataaacaaac actatagaag caaggagaga
+    16561 atatagaaaa taaagtatag aaaagaaaga aagaaagaaa gaaaagaaag aaagaaaaga
+    16621 aagaaaagaa agaaaagaaa gaaaagaaag aaaagaaaga aaagaaagaa aagaaagaaa
+    16681 agaaagaaaa cacaaactcg gaatcgtcta cacaattcaa cagcatgtct actagtgtac
+    16741 gtactgcctg cttgagaata atcgttcttg atactaaatt actcggtgtc tagacttgtg
+    16801 attcattgcc tggtacttct accccaatgg caatctgcgg tatagccgtg agactgcccg
+    16861 gtggaatttc caacgatgct cagctctggg actttctcct tgccaagcgc gatgctagat
+    16921 cacaagtccc cggtagtcgg tacaatattt cgggatacca ctccgattcg ggaaagcacg
+    16981 ggacttcaaa atcgaagtat gggtattttc tcgatgaatc tgtcgacctc ggaaccctcg
+    17041 atacctcgtt tttcagcttt acgaagcttg aactggagta cattgatccc tgtcaacgtc
+    17101 aactgttgga agttgtccga gagtgttttg aaagcgccgg agaggtcaac tatcgtggaa
+    17161 aggatatcgg atgttttgtt ggctcgtttg gcgatgattg gaccgaaaac cttacacatg
+    17221 atgaacaaac atcggccaag tatcctctga tggttggagg tgactttgct actccaaatc
+    17281 gagtctcata cgagtacaac cttcatgggc caagtgtgag cattcgaacg gcctgctcct
+    17341 cgtcgctcgt cgcgcttcac tctgcatgtt tatcgatcca gaatggagac tgctcagctg
+    17401 ccattgttgc tggtttcaac ctcattctca cccccacaat gaccatgatc atgtcgtcaa
+    17461 agggcgtact ttctgctgat gggtcttcca aatcatttga tgcggacgcg gacggatatg
+    17521 gccgaggaga ggcggtcaat gccgtataca tcaagccgct ccatgacgcc atacgtgatg
+    17581 gaaaccctat acgtgccgtt attaggggta ctgctacaaa ttccgacgga aagtcggcag
+    17641 gttttactgt acccagtgct gatgcccagg aggacgtcat tcggaaagcc tataaagctg
+    17701 ctggtataag tgacctctcc cagaccgcgt ttgtggaatg ccatggcaca ggaaccacag
+    17761 tcggtgaccc tatagaggtt gcggcaattg cgaatacctt tggcggcgat atgtacattg
+    17821 ggtccgtcaa gcccaatgtg ggccattctg aaggtgcatc ggggctcacc tctttgatca
+    17881 aggcagtgct ggctgttgag aaccgcacca ttccacctaa catcaaattc aataccccaa
+    17941 accccaagat tccgtttgag gctaagaaaa tcactgtccc tgtggaggca accccatggc
+    18001 cttggaaccg ctgtgtaaga gccagtgtga acagctttgg aatggggggc gtcaatgctc
+    18061 atgttatcat cgagtctgcc gataatttca ctccaccaac ctcagaagtg atcgaagagc
+    18121 atgattcaac cccacaatta ctgttattct cagcgaacac gcaggattcc ctcgaagcaa
+    18181 tgattcaacg gaaccttgct tatctgaggg agaatactga ctctctgcgt gatcttgtat
+    18241 ataccatggg tgcaaggcgg gaacaccttt cgttccgggc tgcatccatt gtacacagcg
+    18301 acatgtcggt tactacagcc tcgtttggaa aagctccctc aagtccaccg gacatcgtca
+    18361 tggtcttcgc gggtcagggc gctcagtggc caggcatggg agttgagctt ttcaagagca
+    18421 atgccacctt tcgaaggtca attttggaaa tggatagtgt cttgcagagt ctgcccgatg
+    18481 cccctgcctg gtcgatcgca gatgaaatct caaaggaaca ccagacaagc atgctttatc
+    18541 tatcctctta ctcgcaaccg atatgcactg cacttcaggt cgcactggtt aatactcttt
+    18601 tcgagctcaa catcaggccc tatgccgtca ttggtcattc gtcaggggag cttgccgcgg
+    18661 cctacgctgc aggtaggctc actgctagcc aggctgtgac gctcgcctac taccgcggca
+    18721 ttgtggcagg aaaagtagca caggccgggt gtatggccgc ggttggaatg ggagcgtctg
+    18781 agattatcca tttttgagac caggagtggt tgtcgcttgt gagaactctc catccagtgt
+    18841 cactatatct ggtgatatcg atcaggtgca gtatgtgatg caggaaatat ctttggcaca
+    18901 cccagagatc ctgtgccggc aaatcaagag tgataccgcc tatcattcac atcatatgaa
+    18961 atctgtcgga gatacatatc attcattcat caatcccttt ttccgcggtg agaccgaggt
+    19021 caactgtcaa cccgtccact ttttttccac ggttactggc gatgaactga gtgatggaga
+    19081 tcatgtggga cccaaatact ggcaacagaa tctagagtcc cgggtcttgt tccaaggagc
+    19141 tttggaaaac atcatttctc gccaaagaag ccgccatctt ttatttctcg acgtttcgcc
+    19201 tcatagcact ctcgctggcc caattcgtca gacgctggag caagcagaag tggcccatcc
+    19261 ttatgtccct tgtctgattc gttttaaaaa ctgtgctgaa tctttccttt ctaccattgg
+    19321 tcaactgtat tcgcacaggc aaccccttga tttcaatatg ctcaccaacc cggatcgaac
+    19381 tgccaaagtg ctcaccgatg taccgaccta tccatggcaa catggctact ccaatctgta
+    19441 cacgacacgc caaaataatg aatggctatt ccgcaaacaa ccgaaacacg aacttctggg
+    19501 aactcgagtg gttgatagca cagataatga gccctgctgg agaaatgttc tttatctaga
+    19561 acatgtcaca tggcttcggg atcacaaagt ctctggcaat attgtctttc cagcggctgg
+    19621 ctatgtcatg atggctggag aggccgtgcg acagatcggc tcaactgcgt ccggattcat
+    19681 cgtccgccaa atggttctgg acaccgcaat ggtcctgaat cagtcaaacc ccacggagat
+    19741 cgtaacctcc cttcggaagc atcgacggga tcgatggtat agcttcacaa tcagttcaca
+    19801 caacggcgtg aagtggattg agcactgtta cggggaagtt gcccagggtc atgacctacc
+    19861 ctccacggcg agcacacaca aagaaaacct ctctcgggat atcaacgtct ccaattggta
+    19921 caaaaccctc agtcgtgggg gggttgaatt tggacctgcc tttcaatgcg tcgaaagcca
+    19981 atcctgctca gtcaccagta acactgtgtc cggaaggatc gtatctaagg tgaatgaatc
+    20041 gacatactat ccagttcacc ctacacagtt ggattccgtt ctccacatcg tatacggtgc
+    20101 aatctacaaa gggttcgact ggcaggttga atcattgcct gttccaacga gtataggaga
+    20161 aattatgatt ggcgaatgtg tatcggactt agatgtcacg atgtgggcgg atgtctccag
+    20221 aaatagcaat atactcgtta acggggaggc ctttgggtct gatggctgtc ttttgatacg
+    20281 catcaaagac atcgttcttc gaccccttgg ggcaaatcag gcttgttttg aagaagatga
+    20341 atcccacgct ggagcaaggc tgctttggaa accaagtatg caattcctca atcttgccga
+    20401 tcttattcaa acgccggtaa attggacgaa gcaaacgatg ctattgaatg attttacttc
+    20461 tttgtgcatt gaacgagctc tctgtcttct tcacgcgcag ggtgtgtctt caagcatcag
+    20521 ccatctccaa aaatatcagg actggctcca aagacagcca aaaccaagtt ctgagcaaag
+    20581 catggaatca ttggttgaga agatcctagc tacttcggca gcaccctgtg cccgagcaat
+    20641 gattaaagta ttggacaaca ttgtcccgat ctgcaaaggt gaaatcgatg ctttagaagt
+    20701 cctcatgggt gatgatacac tttacgaact ctacaactac ctcaatgagg ttgaccgcac
+    20761 tccacttttc gactctctag gtcactacca gccgcaacag cgaattcttg agattggggc
+    20821 tggtacagga ggaacaacag caaagattct tcctcggacg aagtactcga catatacatt
+    20881 cacagatatt tccgcggcct tttttccagc agcaaaagac cgttttcaat gtcatgcgaa
+    20941 cgtagtctac cggacacttg acataaccaa agaccccttg gaccagggtt tcgaaccaga
+    21001 gtcgtttgat ctgattgttg cagcaaatgt tttgcacgcc acaccgaatc tgtacgagac
+    21061 gctcagcaac gtccgcaagc tcctccaccc gcgggggaag ctattgttag aggaactttg
+    21121 tggtgacgct aagttcacca atttcattgt cggggttctg cctggctggt gggcaggcga
+    21181 gtctgatggc agagcagacg agccctacat ctctcccgac cgttgggaca gtatcctgaa
+    21241 ggctgctggt ttcaaccctc tcgacgatgt tgcgtttgat gcagcaccgc cacttcatag
+    21301 ccttgcattc atgcttgcaa gcccttcttg tgtccctgaa tccccattga aaagaaacgt
+    21361 cacgttgctg tctgatgtga catcatcaga aattgcagta cgaatgcaga agcaattact
+    21421 ttctagaggg tatagtgtcg gtgttcagtc actcgatcag tctttgatgg atggggaaga
+    21481 cgtcattatt cttgtcgata cagtatctcc tttcttccat aacctggata gtagaaagct
+    21541 ttctaccttt cagaacctcc tacgtgaact gcagcgatca cattccgggg ccctttgggt
+    21601 gacccgatcg attcagatag actgcaggga tcctcgatac tcacctactc tcggcgtcgc
+    21661 acgcacggtt cgctccgaat ttggcctgga cttcgggaca tgtgaggtcg acactctcaa
+    21721 gtacacgagc atcgggctcg tcattgatgt tttcgaggcg tttcatggcc ggcgtcatgg
+    21781 acaaaatgcc taccctgaat atgaatatgc cattcgggaa gacacagttc atatcggacg
+    21841 attatcttct ttctctgtcc aggaggaatt gaggcggatc caaaaagcgc atgtggaaac
+    21901 caaggacaac agaatatctc ttgtggctgg gacaagtggg tttgactctt tagcttggca
+    21961 ggccgatgca gggcagcaag ttcagttact gggagatgac gaggttgaac ttcaggtaga
+    22021 cactgctggt gtgaacttcc tggtacgttg ctcatttcaa ttccaaggag aaagctaatg
+    22081 gaacaaggat atctatacct gtttgggcaa tttggaaatg ccttgtacgg gaatgggtct
+    22141 cgaggccgct ggtattgtgc ttcgggttgg tgcacaagtt cagggcctta gtccgggtga
+    22201 cagaatagct acatttggcc ctggggcctt tgcaacgaca atgatacttc cagaatcacg
+    22261 ctgcatcaag atccctgact gtctaacact ccaagatgca gccgtcatgc ctcttgcgtt
+    22321 tgccactgcc cttcacggat tgtgtgacct cggaaatttg caaaagaatc aggtaattgt
+    22381 gccgttgcct ttgatagata caccactgac tggacgtgca gactgttctt atcaattctg
+    22441 cctccgatgg agtaggcctt gctgctatcc aaatcagtaa aatgattgga gcgaccatct
+    22501 acgccactgt cattggcgag gataaagtgg aatatcttac agcgtcgcat ggtatcccgc
+    22561 gagatcacat tttcaattcg cgcgattcat cgtttctaga tggaataatg agagtaacta
+    22621 atggccgtgg agttgatctg gttctaacct cactttctgc agacttcatt caggcttcct
+    22681 gtgactgtgt tgctaacttt ggtaaactgg tcaacctttc taagccgact gcagccaacc
+    22741 aaggacaatt tcccatagat tcctttcatc ctaacatgtc atacgcttcc gtcgacatca
+    22801 ttgactacat taagcgtaga ccaaaagaga gtaaacggta cgtgatcact ttcaggcact
+    22861 cgtaccagct ttgtcctgca tgtaactgac atgtgcgcgt tccagtctgt tggaggaaat
+    22921 cgtggagctt tataaacagg gacatatcca acctattacc cctgtgaaga catttactgc
+    22981 gacggatatc cgacaatgct ttgattacat gcaaagcggg caacatattg gccagctgag
+    23041 gctctccttg aaatcacaag atacgttcat tgaggcggtt tgctctccca agaccatgat
+    23101 cttccaaagt gatgcgtctt acctcctcgt cggggggctt ggagggttgg gtgctgagat
+    23161 agcaaggtgg atggctgagc atggtgccag aaatctcatc ttcttgtctc gaagtgccga
+    23221 tgcagagtct aatatcagac tattccgtga actggaaagt caaggctgtt ccgtgcaagc
+    23281 aatcaaggga agcgtctgca acgcatctga cgttaagaga gccatttcag ccgccaggat
+    23341 caaattgaaa ggcattttta atatgtccat ggtgctacag gatgcatcgt tgctcaaaat
+    23401 gtcatctgat gaatggaacg ctgcaaccgg tccaaagatt caaggcactt ggaatctgca
+    23461 cgacgcaagc ttggaccagg atcttgactt tttcctcctt ttcagctcaa tgggtggtat
+    23521 tcttggaata cctggacagg ccaattatgc ctctgcgaac accttcatgg atgcgtttgt
+    23581 gcaattccgc catagctccc atttgcctgc ttctgtcatt gacattggtg aagtccaagg
+    23641 tattggacat gtcgccaata acccggagat tttgaatcga ctcaaattgc tcgaatgtgc
+    23701 ccgcatgagc caaaaggatc tgttccacgc catcacaatt gccatatctc atagccttcc
+    23761 tccacaaacc ctggattaca gccgttacga aaacccagcg caattcatca ctgggctgag
+    23821 agacaccact ggtatgctgg acagcaccgg cggaaagtcc atgttgctgg atagtcgcct
+    23881 agctgcttat gttggcaatt ctgctgctgt caccgcaccc acggagacga aaacttcggc
+    23941 gaacaaactc aacaatttcg tgtcttcagc tgccacagac tctgccattc tttcagaacc
+    24001 ctcggcgacg caatttgtta gcttggagat cgcaagatgg gtattcgacc ttctcatgaa
+    24061 accggtggac gatgattcag agattgacct ttctcgctcc ctcgttgatg tcggacttga
+    24121 cagtctggct gctgtcgaga tgcggtcttg gttaaaatcc tctctggggt tggacattag
+    24181 cgttctggaa atcatggcat cccccagtct ggctgcgatg ggtgagcatg ttataagaga
+    24241 gctggttaga aagttcgggg gagacaacaa aaattgagag ggtagatagc gtcatgtgaa
+    24301 attcttataa ggtacttata attcgaaaca agtggtttct tgattcaatc gctcctaaga
+    24361 catttatgcc tagaatccgc cttctacgtc agtcagtcat gtattctatc gtaactcatc
+    24421 cgacagctag aaagtggttg ccagctcatt aggcgactat ttcaaaacct tcgcaggtca
+    24481 actttctgat cggtcgtcct tgaataaaaa cgtaagtaac ttaactaatt ctctgcctat
+    24541 agtcgagatc ttaagccgac cctctacgtg aggatttagc ttttttttcc ctttcatatt
+    24601 accttattag tataacttca ggaacttcta tatgtattct ccgattagcc tggtcatatt
+    24661 tgcggacagc caagcagcgc taaaagctct ccgaagacct cgcacggtat aagggttaaa
+    24721 aaatgccgat tgtgctcatg aaacattttc ctacatcttg attcaattct aaacggccaa
+    24781 gtctcccttg acaaacatcc cgattgctga gataattata gaggagttaa tctaaaactt
+    24841 tctatatgag ataggacgat atgtcgctga ttactcctct gggcttcagg cttaaagtct
+    24901 agaattagaa tctgcatctc atcccttgaa cgacgtccgg ataaaatcca gtcctcggga
+    24961 catgtccctc ggaatatcaa aggcagagtg ctgatgccag tgtcaattaa tccccgacaa
+    25021 gcgaataggc cccgaatgca ctatgtttac aatgcgcggg gttcacaatc ggtcctttgt
+    25081 tggaggtgta catctatccc gaaagcgccc ccgtcttcgg actccgaatg ccatcctaat
+    25141 accccgacac ggcaaatgac ctgcatatca gcatgagcta ggcaagttta tagcacgtgg
+    25201 ctgttcgtat agatccgaca tgcggattgg cttcatagaa tcaagcgcca ttgtgcgtac
+    25261 gccgcatgta acagtccaag cttatcatgg ccggcaaata cccagggcat tgatcaatac
+    25321 attggtattg ctatgactcg aagaagaata tcggggtaaa tctttccctt ccgtacacta
+    25381 ctctctgttt caacaacaat atttgccctt tgcgtcccaa acatggcgca aaagcttcgt
+    25441 ttttacctgt ttggtgatca gacctatgat tacgatgagc agctgagggc tcttctcacg
+    25501 agccatgacc ctgttgtgag gtcgtttctc gagcgagcct attatactct ccgagcagag
+    25561 gttgcccgta taccaaacgg ataccaggca cggatttcgc gcttctctag catcgctgaa
+    25621 ctactttccc aacggcggga acatggagta gacgccagct tggagcaggc tttaactgtt
+    25681 gtttaccagc ttgcgtcgtt tatgcggtga gtaacattct ttctctctct ctctctctct
+    25741 ctctctctct ctctctctct ctctctctct ctctctctgt ttttgttttt ttttttttcc
+    25801 tcctgtgcac agctaattgc atgcagactt cattctgaac gaagcctttc gtacccatca
+    25861 gcggacgatg cgcattgtct tggtctgtgc actggtgccc taagtgcagc ggctgtgagc
+    25921 agctccaggt cactcagcga actgctgccg gccgcgattg agacagtgat actcgccttt
+    25981 cgaaccggtc tccacgccag tgactctgga agacggattg aagagtcatc agcggcagca
+    26041 aaatgctggt caatatcgtt gcaaggtttg gaaggccatg tagccaggaa gctgctggag
+    26101 gaatggtcca ataaaaaggt agaccacctt catacaccat tacgttgtgc gtctcggaca
+    26161 ataaaactaa gcttatcacc aacagagact ccctccaatg tcaaggccat atataagcgc
+    26221 atatgcatct ggcggagtca ccatcagtgg tccgccatca gttctggcag agcttcgaaa
+    26281 cacacctggg ctgtctaaac tgcgtgctaa agacattcca atacacgcac catatcactc
+    26341 ctcagcaata ttcaatcaat gtgacgtaga gacaattctg agttctgcgt taatcgatct
+    26401 ggcctctcgc gcgactcatg ttccaattct atcaaccggt accggacgac tggtctgggc
+    26461 aggcactctt ccagccgcaa tacagtccgc cctgcaggat gtactccttc gtccgataag
+    26521 ctgggaaaat atgtcatgtg gaataagcac ctgtcttcag tccatagatc caagcgaggt
+    26581 ggaggtgatc ccaatcgcca ccttggccgg cccactgctc tgtcgctcag tacaggtcgc
+    26641 aaaaagccag attccagcta caatcgatcc aaagaacgat gtcatgaacg aagcacaaag
+    26701 tcaaattgca gaggctatgg accgagccaa aattgccata gtgggcatgt ccggtcgttt
+    26761 tccaggcgct gagaatgtcg attctctctg ggagcttttg atggccggcc gtgatatgtg
+    26821 caaagaggta ccacccaccc ggtggaacgt tgacactcac gttgatccca ctggtaaacg
+    26881 gaagaatacc agcaagatcc gatggggctg ctggctcgac aacccggata tgtttgacgc
+    26941 gcggtttttc aacatgtctc cgcgagaggc gccgcaggtt gatccggccc agcgaattgc
+    27001 gctcatcact gcgtacgaag ctattgagca agctggtatc gttccaggga gaacgccatc
+    27061 aacgcaggaa gatcgagtgg gcgtcttctt cggtacgacc agcaatgact ggtgcgagag
+    27121 caacagcggg caagacatcg acacgtatta cattccgggc gccaaccgtg ccttcatccc
+    27181 aggccgaatc aactacgtgt tcaagttcag cgggcccagc tatagcatcg acactgcatg
+    27241 tagttcaagc ctgtccgcgc ttcatgtagc atgtaatgcc ctctggcatg gagatatcga
+    27301 cactgcaatt gcgggtggca ctaatgtcct cacgaacccc gatatgactg ccggcctgga
+    27361 cagaggccac ttcctttccg caactggtaa ctgcaagacg tttgatgaca ccgctgatgg
+    27421 gtactgccgt ggtgaaggcg tagcgaccgt cgtcctcaag cgcatggatg acgctattgc
+    27481 agacaaggat ccaatcctag gtgtgatccg tggcgtatat accaaccact ctgcagaagc
+    27541 tgagtcaatc acacggcctc atgtcggcgc ccaaaaagcc attttccaac atgtcttgaa
+    27601 tcactcgggt attcgacctc aggatatcag ttacattgag atgcacggaa ctggaaccca
+    27661 ggcaggagac atgcgagaga tgacctccgt gcttgatacc tttagcccgc agtacccagg
+    27721 agcaatccag cgagaaaagc ctttatatct gggggccgtc aagtcaaaca tcggacatgg
+    27781 agagtctgtt tcgggggtta cagcccttgt caaagtgata atgatgatgc agaacaacac
+    27841 tatacctccc catcgcggag tccacacgcg cttaaatcgg aggtttccct caaacctcga
+    27901 tgaacgaaat gttcatattg cattccaggc gaccgagtgg ccccgcgggc agactccccg
+    27961 acgagctttt atcaacaatt tcagtgccgc tggggggaat agctcagttc tagtagagga
+    28021 cccaccactg atactgaagg aagagggtgc tgatcctagg tcatctcatg ttattgcagt
+    28081 gtccgctaaa tcaccttcgg cattgaggaa gaacctagag tctatgcgtc gatatgcgat
+    28141 gtcagaacat acagaaaaat ctctatgtga gctgtcttat accacaacag ctcgacgcat
+    28201 tcaccactcg catcggttga tgtttgctgg gtcatctcta gaggatattc tgcgtgagat
+    28261 ggagagcaag ttagcgatta aagaaccatt cagtccttgc gcaccacttc aatcggtcat
+    28321 tttcaccttc accggccaag gcgcacaata cccgggaatg ggtcaagtct tttttaataa
+    28381 cttctccgtg ttccggtctg atctctgccg ccttgacgat ttggcccaaa agcttggatt
+    28441 tccgacattt ctcccgattt tctcagcaag tacccatgcc agactggaag gtttcacacc
+    28501 cactgtggtc cagcttgcca atacgtgcat gcagcttgca ctcaccaggc tctgggtgtc
+    28561 gtggggtatt cgtccgtcgg cagtagtcgg tcacagcatt ggggagtacg cagcgctgaa
+    28621 cacggcgggc gtcctgtccg acgcggacac ggtttatttg gtaggcaaaa gagcccagct
+    28681 gctcgaggag aagtgcaacc gagggtcaca cactatgctg gcagcgctgg cctctttcga
+    28741 aaaggtgtca cgtctacttg atagcgcacc gtgtgaggtt gcgtgtatca acggacccga
+    28801 ggagatcgtt ctcgctggac cgcgttcgca catgacagat atccaaaaga tcctcgtggc
+    28861 gcattcaatt agatgcacca tgctgcaagt cccatttgca ttccattcgt cccaggtgga
+    28921 tccaattctg caagacttcc agtctgcaat cgaaggcgtt accttccata aaccaactat
+    28981 cccggtcatt agtccactcc ttggtgattt tgtgacagaa actgggacct tcaacccaaa
+    29041 ctatctggca cgccattgcc gggaaccagt gaacatacta caagcacttc gccaagccag
+    29101 cacaatgaat cttgtccatg acagcagcgt agtcatggag tttggaccac atcctgtcgt
+    29161 atcaggcatg gtgaaatcaa cgctggggaa cagcatcaag gcacttccca ctctgcaacg
+    29221 gaaccgaaac acctgggaag tactcacgga gagcgtgtca acactatact gtatgggatt
+    29281 cgacatcaac tggaccgagt accatcgaga ttttccatca tcgcagcgtg tcttgcgact
+    29341 cccatcgtac tcctgggatc tgaagtcgta ctggattccg taccggaatg attggactct
+    29401 gtacaagggc gatattgtgc ctgaatcaag catcgcgctg ccaacccacc aaaacaagcc
+    29461 acacagtaca tcgccgaaac agcaagcacc gacaccaatc ctggagacga caacattaca
+    29521 ccggattgtg gacgagaagt ccaccgaagg gacgttttca atcacatgcg agtcagatgt
+    29581 atcccgacca gacctcagcc ctctggttca gggccataag gtcgaaggga tcggactttg
+    29641 tacaccggta tgaatctcca cactcatgtt cgctgcgcag cataatcact gactccttct
+    29701 gcagtccgtt tatgccgata taggattcac gctgggaaat taccttctag atcgtttccc
+    29761 aactcgattc ggaccggata ctaaagttgt ggatgtcacg gacatggtga ttgaaaaggc
+    29821 tcttatgccg ttgaatgcgg gaccacaatt actgcgagtc acggcttcat taatctggtc
+    29881 cgagaaagag gcttctgtcc ggttctacag cgtggatgta agacgtccct cttctaaatc
+    29941 tcagatgaat actaatattc ataattccca ggaaaatcac accgaaacag tacaacattc
+    30001 ccactgccgc attaaattca gcgaccgttc aacgtaccaa gcctatcaag agcaaatctc
+    30061 cgccgttaag gctcgtatgt ttgagatgaa gaccaactcc tcatcgggta gaacctaccg
+    30121 attcaacgga ccaatggcat acaatatggt gcaggcgttg gcggaattcc acccggatta
+    30181 ccggtgtatt gacgagacga ttctcgacaa cgagacactc gaagcagcct gtacagtcag
+    30241 cttcgggaat gtcaagaagg agggtgtatt ccacacacat cctggctata tagatggact
+    30301 cacgcagtcg ggcgggtttg tgatgaacgc taacgacaag actaatctcg gagtagaagt
+    30361 gttcgttaat catgggtggg actcgttcca gttgtacgag cctgtcactg atgatcgttc
+    30421 gtatcagact catgttcgga tgaggccggc ggagtcgaat cagtggaagg gtgatgtggt
+    30481 cgttctaagt ggggagaatt tggtcgcttg tgttcgagga ttgacggtaa gtcgagagac
+    30541 ctaagtaaca atctcctgtt tagaggagaa aaaagaaaga gaaagcggat ttgctgacta
+    30601 ccttccagat ccaaggagta cccaggcgag tcctgcggta tatcctgcaa agcagtgcaa
+    30661 aaaccacaca gacagccact tcgagcgtgc ctgccccgtc tcaagctccg gtgatggtgc
+    30721 cacagattgt ccaagtacca aaagctaagc ctatctccca aatttccggg accctgacag
+    30781 aggctctccg gattatttgt gaacaaagtg gtgtgcctct agcagagctc acggatgatg
+    30841 caactttcgc gaacatcggc gtagactctc tcctagcgct gactatcaca agtgcatttg
+    30901 ttgaggagct ggatctagac gtcgattctt ccttgttcat ggactatcct actgtggcgg
+    30961 acctgaagcg gttcttcgac aagatcaaca cgcagcatgc tccggcacca gccccggtat
+    31021 cagacgcgcc aaagcaatta caaccaagca gtagcccagt tgcatctgct actccgtctg
+    31081 cacccatcca tggcagatcg aaatttgaat cagttcttaa catccttacc gaggaaagtg
+    31141 gtgttgaaat ggcaggtctt ccggactcta ctgcgcttgc agacataggt atcgattcgc
+    31201 tcttgtccct ggtagtcacg agccggctga acgatgagtt agagctagat gtgtcgtctg
+    31261 aagacttcaa tgactgtctg actatccggg atctcaaggc acatttcatg tccaagaact
+    31321 ccgacaatgg ttcgtctgcg gttcttactc ctcagccatc tcgggactcc gcactccctg
+    31381 agcgcacgag acctagggtc gctgatacaa gcgatgagga ggatgcaccg gtttcagcaa
+    31441 atgaattcac aaccagtgcc cgctctacat ctaagtatat ggctgtgctc aacataattt
+    31501 ccgaagaaag cggcatggca atcgaagact tcaccgacaa tgtaatgttc gcagatatcg
+    31561 gaatagactc gctgctgtcc ttggtcattg gaggtagaat acgggaagag ctatctttcg
+    31621 acctcgaggt ggactctctt ttcgtggact acccagatgt caagggactg aggtcatttt
+    31681 tcggatttga gagcaacaag acggcgacaa atccaactgc gagtcaatcg tcttcgtcca
+    31741 tttcaagcgg cacttcggtc ttcgatacat caccttctcc cacagactta gacatcctaa
+    31801 ctccagaatc cagcctctca caagaggagt tcgagcaacc gctcacaata gcaacaaagc
+    31861 cacttccacc cgcaacttca gtcactctgc agggtttacc ctccaaggca cacaagatac
+    31921 ttttcctttt cccagatggc tctggctcag caacatcata cgcgaaactc ccccgactcg
+    31981 gtgcggacgt agccattatc ggcctgaact caccctacct gatggacggc gccaacatga
+    32041 cctgcacctt cgacgagctc gttacactgt acctcacaga aatccagcga cgtcaacccg
+    32101 caggcccata ccacttgggc ggctggtccg ccggtggcat tctcgcttac cgcgctgcgc
+    32161 aaatcctcca aaaagccgcc gccaaccccc agaaaccagt agtagaatcc ctgctcctcc
+    32221 tcgactctcc accaccaaca gggctcggca agctccccaa acatttcttt gactactgtg
+    32281 accaaattgg cattttcggg caagggacag ccaaggcccc ggagtggctg atcacccatt
+    32341 tccagggcac gaactccgtt ctgcacgaat accacgccac gccgttctca ttcggtacag
+    32401 cacccagaac tgggatcatc tgggcttcgc agacagtgtt cgagacgagg gccgtggcgc
+    32461 ccccacctgt acgtcctgac gatacggagg acatgaagtt tttgacggag cgacggacag
+    32521 atttctcggc cgggtcttgg ggacatatgt ttcctggtac agaggtattg attgagacgg
+    32581 cctatggggc ggatcatttt agtttgctgg tgagtcttct cttccgtgat taa
+//
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/BGC0001866.fna	Sun Nov 21 16:53:12 2021 +0000
@@ -0,0 +1,556 @@
+>BGC0001866.1 Byssochlamys spectabilis strain CBS 101075 chromosome Unknown C8Q69scaffold_14, whole genome shotgun sequence
+GTGCTAATTCTATAAGGCTCTCTATATAAGGCTCTATATACTATAGAAGCTGCGGATAGC
+TATTTTTCTTATAGTCAAAATGAATTTTAATTAATAGTTATTACGAGGAGGGAACTAGAG
+GATCTAAATACACGATTCAAATGACGCGGTAAATAGCCTCCGTGGTCGTCGTTTTCGGCG
+TCATTTTCTCACATTCTATGTGTAAGTAAACTAGACACCATTTTGTACCGTGGCTTCTTG
+CATCCTACCTTAATCTGCTCCATGGCAATCCACTCAACCTTTCATGCTAGTTGATATATT
+CAATGTCGTTATTGGTGGAGGCGTATGTTATGAACTATGAAAGTGCTTACATCCGCTTAG
+TCTCCTCGGACTTCCATGCTTCCTTGTCCATTGAGAAACGATCCCTCTCATCTTCTTGGG
+ACAAAACATAATATGTTCCTTGCATCCTTCGTTGCCACAGACGCCACAGACCCCCAATGA
+AAAGCGACAGTAGAACAACCGCAGCTCCTCCCGCCCATACCAAACTCTCCGGCCGCAGAA
+TCTTCGATACAAGACCCCAAAACCCCGCTGGGCCTGGCAGTTCGTGTTCCCGAAACCCAT
+GGAAACCCTGCGTATACTCCGGCGGAACGTGGGGCCACTTATCTACACCGTCTAGGATAG
+CCATTGCCATTCCATTATCCAGATGCATCTCGAAATGGCAGTGTAACAGCCACGCGCCTG
+GCGACGTAACCTGGTACCGCAGCACAACCCACATCGCTCCCGTGAACTCTGTCAGGAACG
+TGTCTCGGTATGGGGGATTCACCATGTTAAAGCTTTCGGGCTGGTCGGCCATTGCTTCTT
+CTACCGAGGAATAGTTCCACAGGCCTTGCCCCCCTCCAATCCGCCAGTATTTGTTCGCGT
+GTTTATGGACCGCGTGCGGGAAGTCGACTGGCCATAATGGTGAATGACCTACCTGCAGTA
+CGATGTCTTGCCAGGTGCCGTTCTGGGTTCGGATGATCAAATCGTCATCATGGGCCTGCT
+CGGAATTGACGTTGTAAAGAAGGGGTTCGTAGGCTGAGCGGTCTGGGGGATATTTTTTCT
+TCCCACTCAGAGTAAATTCCCAAGTGGAGTTAGCCTTGCCCAAGTACAACACATGCATAT
+CATCTCCGTGGGCAGCTGGAACTTTTCTCGGCCAAGGAGGCAGGTCAAGCTTGTCTAGGA
+AAACAGCATCTCGCATCGGCCAAGCGTTGTAATCGAAATAGGGCTTAGATATAGTCGTGA
+CACCAAATCGGTCTGGTTTCTGACGGGCGTTGGGGTCTCCATTTTTGTATCGGAGGATCC
+CGAATGCTGCGATCATCTGGGAATATCCACCATCTGGGACACGGATCGAGTAGTCCATGG
+GCTTCTTATCCAGTCTGATCATGGCTGAATATCGCTCTCCGGCCCAAATCAGGAAGGTGT
+CTGCCCGGCGAGGCTCGATGTAATGGCCGTCGACTTCATAGATCCACATTTCGTGCTCAT
+CGATTGTTGGCTGGAGGGTCTTGAATGTCGAGCCTCCGATCCAGTTCACACTCGCCCAGC
+GGTCTGCCGGGTCAACCTCGATTGCCTCTGTTGGACCGGAGTAGGGAACACAGCCTTCCT
+GGAGACCGGGCGGGATGGCGGACACGTTCCCGTCTGCGAGCCACGGACCTTCTGTCGATG
+GTACGAATGGGAAGCACCTATACGATAATCAGAGACATGAACTATACCGATGATGAATCT
+CCTGGCCGCTCACCCTTTGTCTGTGATAGTGTCATTCGGCCAGCTCTTGTGCATGAACGG
+GATCTGCTTGTCAATAAGCCATTGATGGCCAGGACAATAGACGCTCCCTTTGCCGTTGAG
+AAGAATACTATCGACGCAACTGATTTTATAAGTCAGTCGACTAGAAACGTACTCATATCA
+GTACACTGGAATCATACAAGATCTGCAGTCGGCTGTTTGCATCTGCTTCTTTATACTGGG
+TCGAGTTGTAGTAGTCCCAGTCAGAAAGCATCATGATGTGAGGATTGTTGCTTGCGCGCT
+CCATGGCAGCAATGTCCTCGGGATCTTCGCTGATCATTGCCCAGGGTCCAGCGGTGCCTG
+GCTTTCGGCTGCCTTCTGTGAGCTACATCCAACAATAGCAGAGAGAAAACAAACCGAATG
+AACAATGCACCATATAAGCCATCCAGCAGAGTAGCCCGAGAATGAGAGTGGTACCTGATT
+CTGTCAGTCTACTTCTGACTGGGTGTGTTTCTCATGACGACCTACCAGTATTGACCAGGG
+GGGTAGGCAGTAAAGCGATAGACGTAGCTCTCACCAGGCTCTATAGGCTTCTGACTCAGA
+CCTGGAACTCCATCTGACCACGGCGTATCTTGCATCCTTCATTACACCATTAGCACCTGG
+CGAAATATCGAAGGTTGAATAGGTCTACTCACAAAATTCCATGCCAGTGGATTGTGGTGT
+TCTCATGCATGTAATTACGAACAACAATCTGTTCTTCGTCAGTCCTGCAGCCTCAATTGG
+CGAAAGATATGCCACCGACTTCAACGTCGTCGCCCTCGTCGAATATAAGCGTCGGAGATG
+GAAATTGACCATTTGTCTTGATCATCTGCCGAGATTGGCCGTTGGGAGCGCCCTCTTCCC
+AGGTGAGCACTAGCTCGCGTTGGACATGGGCGCCCAGGACGAGCAATACATTGCAGGCCA
+GCAGTAAAAGATAGGCCGGCATTGTGAAAGAATGGATAGCCTGATTAGTTCAGTTGCGGT
+TATGCCTGGTATTAATATCACGGTTGGGATACTTGGGGCTGAAGAACTCATTAAGCCCCG
+GTTCTTCGGGGACCGAAAACATTGCCAACCTCACTGCCACAATACTCGTACTCTTCGGAT
+TGAAACAGTTACTGAGTGAAATTAATGTTATTTAATAGCGAAGTTGGAGTCGAAGAACAT
+GTCGTACTCTGGTCATTCCAAGAAACAACGTCGATTACTATGGCAGAGGAAATCAAGCTG
+ACTCCCCTGGAGACCTTCGCACAGGCAATCAGTGCCTCTGCGAAGACTATTGCAACTTAC
+TGCAGAGACTCCGGTCATCCTCAACTGTCCGATGATAATTCTAGCGGCCTCACTGGGGAT
+GTTCTCCCCCCTTCCGCACCACAGGCAGTCACCGCCGCCAGACAGACCATCTTGGAGGCA
+TCGTACCGACTACAGCAATTGGTCACTGAGCCTAGCCAATACCTGCCGCGACTGACCGTT
+TACGTGAGTGTTGAACAATCTCCCATGAAAGATCAAACTAACGACAGAAAAGCCCCAGCA
+CCTGGCTGCCTTACGCTGGCTGTGCCATTTCAGAATCCCGGAGCTCATCCCCGTGCAAGG
+CACCAGGACATACTATGAGCTGGCTACAGAAGCCAAAGTTCCTCTTCATCAACTGCAGAG
+CATTGCAAGAATGGCAATTACTGGGAGCTTTCTCCGAGAGCCGGAGCCCAATATCGTCGC
+CCACAGCAGGACGTCAGCCCATTTTGTTGAGAATCCTTCGCTCCGTGACTGGACACTATT
+CCTGGCAGAGGATACCGCGCCCATGGCGATGAAGCTTGTTGAGGCGACTGAAAAGTGGGG
+AGACACGAGGAGCAAGACAGAGACGGCCTTTAACCTGGCGCTGGGCACGGATCTGGCCTT
+CTTCAAGTATCTTTCCAGCAACCCGCAGTTCACCCAGAAATTCTCGGGATATATGAAAAA
+TGTGACAGCGAGCGAGGGTACTAGCATCAAACATCTCGTCAACGGATTCGACTGGGCGAG
+CCTCGGAAATGCGATCGTGGTTGATGTACGTCTTCAATCGTCGTTTACTCCATACCGATC
+CCATACTGATGTGATATTCTACAGGTTGGCGGTTCTACTGGTCATGCAAGCATTGCTCTC
+GCGGAATCGTTCCCCGATCTGAAATTCATCGTGCAAGACCTGCCCATGGTGACATCTACC
+TCGAAGGACAATCGCGAAAAGACCCCTCTCCCAGAGACGGTCGCTTCCCGCATCTCCTTC
+GAGAGCCATGACTTCTTCAAGCCTCAGCCGGTGCAGAATGCGGATGTCTATCTTCTTCGC
+ATGATTCTGCATGACTGGTCATTCAAAGAAGCAGGCGAGATCCTTGCCAATCTAGTACCG
+TCCGTCAAGCAGGGTGCTCGGATCCTTATTATGGACACTGTGCTTCCCCGTCATGGTACT
+GTCCCCGTAACTGAGGAAGCGTTGCTTCGTGTGCGAGATATGACGATGATGGAGACATTC
+AACAGCCATGAGCGGGAGATTGACGAGTGGAAGGACCTGATTCAGGGGGTGCATACTGGG
+CTTCGGGTGCAGCAGGTCATTCAGCCGGCGGGGAGCTCAATGGCGATCATTGAGGTTGTT
+CGGGGATGACGAGACTCATCACGTATTGGTACCTTTCCCTTGGGCTGCGCAAAATCCTAC
+ATATTGAGCTGTAAGGCTGTGTGGCACACATGGGGTAGATGGATAAACGAGGAGTTTAGT
+TTTCGGACCCCGAAAGCGATCTGCGGATCGACTCATGTTTTGATCTCATGTCAGCATCTA
+GAGTAAGTATCTAGAGTACTTCTGATTGCTATAGTACATGAAACCATGATATTCCTACCT
+ATGGATCCGGAAGCAATCTAAGTAGTTCTGATCGGTCTTTTCCTTCCGTTACCACGATGT
+TTATGACCCAGATCGTGTTCGGGATAGCCCCGACCCTTCTCAAGACCTTCTCCCACCTCA
+CTGCCCTCGATCTCTGGCGACCATCCGCACCCTACGTGTTCGATCCTGTCACGAGTAGCA
+CCTACCTAGGGACTATAGCCGATGGGGTCGAGGAGTTCCTTGGGATCTTCTACGGCCAAG
+ACACGGGTGGATCGAACCGTTTCGCACCCCCAAAGCCCTATATTCCCTCCCGCCACAGTT
+TCATTAATGCGAGCACGGCGGGCGCCGCATGTCCCCAGCCCTATGTTCCTCTGCCAGCCG
+ATCCATATACCGTTCTCACCAATGTATCAGAGGACTGTCTCAGCCTGCGCATTGCGCGAC
+CAGAAAATACGAAGTCTACTGCGAAGCTGCCCGTGATGGTTTGGCTGTATGGAGGTGCGT
+ACAACAGATTACCCACTGATCTCCAGTGGGAGACATGAAGACCGCTAACTGAAATGACAG
+GAGGAGCTTCCGTCGGAACAGCCTATGATGTATCGTATAATCCTGTCGGACTGATCCAGC
+AGTCCGTGGTGAATGGGAGTCCAGTGATCTACGTCGCTATCAACTACCGGGTAAACCGTA
+CGCAGACCGATAGTCACTTTTAAAATTCATATCTAACAAGCCTTCAGTTTTTGGACATGC
+CTTCTCGGACGCTCTTCTAAAGTCCAAGTCCACGAATCTGGCTATGCAAGACCAACGTCT
+TGGGATCGAATGGATCAAGAATCATATTTCTGCGTTCGGAGGCGATCCAGACAATATCAC
+CCTCTTCGGAGAAGACGAGGGTGCAACGTACATCGCTCTTCACATTCTCTCAAACCATGA
+AGTGCCATTTCATAGAGCAATCTTGCAGAGTGGAGCCGCCATAACGCATCACGATGTCAA
+CGGGAATAGATCCGCGAGGAACTTCGCGGCCGTCGCGGCCAGGTGCAATTGTCTCTCTGA
+TGGCGACCGACAGGTAGACTCCCAAGACACAGTTGACTGTCTCCGACGAGTTCCTATGGA
+AGATCTAGTCAACGCAACGTTTGAAGTCGCGCACTCTGTTGATCCCGTGAACGGGTTCCG
+CGCATTGTACGTCCTCTTACACTTTCCCTCTCACAAATGCAAGCAAGACTAACAGCCCCA
+GTATGCCCGCCGTAGACGGCTACATGATACCGGATGAGCCATCCAACCTTCTTTCAAGAG
+GCCAAGTACCAGCCAATATATCTATCCTAGCAGGATGGACGCGCGACGAAAGCTCCATGT
+CCGTCCCAACGAGTATCCGCACCGCAGCCGACGCTGCCTCCTTCATCTCAACCCAATTTC
+CGTTATTGAACGCCTCAACTATCCACCACTTTCTCACATCTCTCTACCCCGAATCGGACT
+TTACCACCAACAGTCCATCTTCACCGGAGAAAGTTACGCCCGCCTGGCGCGCGACATCAG
+CTCTCCACAGAGACCTTACCCTTACGTGCCCGACTATCTTCCAAGCATGGTCCCTGCGCC
+TCTCATCCAACTGTACAACTCCGGTGTATCTGTACGAACTGCGTCAAAGTCCCTTCGCGA
+CAGCTCTCAACAACAGTGGTGTGGGATACTTGGGCATCGTACATTTCTCGGATGTACCGT
+ATGTCTTTAATGAGTTGGAAAGAACATACTACATTACTGACCCGGAGGAGAATAAGCTTG
+CTCAGAGGATGAGCGCGAGCTGGACTGCTTTCGCAAGCGGTGCTTTCCCTCTTTGTGAAC
+GGTCTGAGAGATCATTGGGGAGATGGGAGGAGGCGTATGGGGGGGACAGGGTTTGTAGGG
+ATCGAATGCCAGAGCATGTACGGGTGAAAGGTATTGGGGATAATGGCGACCAGGATGATG
+GGGATGAGATAGGGAAGCTTATGGCAAGGTGTGGGTTTATTAATCGGTTGGAGTACTAGG
+ATGATCGAGGGTCCTGGGAGACTAGGTACTTACTCAGTCCACGTGGAATCTCCGAGTGAG
+GGTAGAGGAGATTCCGTCTTGGCCCTGGGATCTGGTTTTTGAGTGAACAATGCCACGTCT
+ATATGAGGAGGTCGATGGAAGTTGCAAATTCGGAGCAAAGACAGTAAGTATACAGGCATA
+ATAATAATAATCGGTCCTGTTAGTGGCGGGCCAGTTCTGCCGGTCCATTACCCCTGAGTA
+CCTTACTGTCTGTACGTACTTAGCTTAGCTACCCAAACTCGAGTCCAACGAAGAATACTT
+CGTCCACAATGGAACTGTTCGTTTCCTCGGAGGGATGCTTGCACATTTGCGTCCCGATCC
+CATTGGGATGTGACGGATACCTTCTAGCTGGGTACTTCCTATGGTCGATATCAGCTGGTT
+CGGGCACTTAAACTTATTCTCCAAAGCCGAAGGAACTGTACTCGAAATGGATCTAAAACA
+GCGCTGCCCCGTTTCCCGACCCGACCAGACCCTTCGCGGCATGCTTCAAGGCATACCTAG
+GTAGGGCGTAATCAATGTTTCAATGGTGTTCTGTTCTATTGGATCCCTTCACCTGACTTC
+GAACCCTGATTGGGTTGACGGTGTCCGAAAAACCCGCATGGGGGCCAAAGCCGGTAGAAT
+CGATCTGATATCGGCTGGGTAATGGATGGTTGGAGATGGAAATATATATGTTTAGTCGCT
+TAAGTCCTTTATGGTCCACAAAATACCTAGCGGATACATTATTATATGATTGGCCAAAAT
+GGCAAGTATGCAAGAAGTCGATGCGCTCGTGGTCGGTGCTGGATTCGGCGGCTTGTGGAT
+GACAAACAGGTACGTGTTCAGCCTTGTTGCGAGGAGATGTATTTCAGAGTTGATAGCGTT
+GCGACAGACTGAAAGAGGCCGGTCTAAATGTCCTCTGCGTTGAGAAAGCACCTCAAGCAG
+GCGGAGTCTGGTACTGGAACTGCTACCCCGGCGCCCGGGTAGACAGTCGCTACCCTGTCT
+ATCAGTATTCCGACGAGTCGCTCTGTAAAGACTGGAATTGGAGTGAACTTTTCCCAGGGT
+ACGAAGAGATCCGCAAGTACCTCTCCTACGCGGTCGACAAGTGGCAGCTAAACAGCCACA
+TCCGGTACAATACAACAGTGACAGGTGCACGCTTCGACGAGTCGGACCACAAATGGACCG
+TTGAAGGTATTAATGGCTCACATGGGACGATACGCATCCGGTGTCGTTGGTACATCCTTG
+CCCTAGGGTTTGCCTCAAAGCCATACATCCCCGACTTCGAGGGTCTCAACCGCTTCCAGG
+GTCCCTGCTTCCACTCCTCGGCGTGGCCACAGGAAGGAATCGACCTGAAAGGCAGACGCG
+TCGCCGTCGTTGGCACAGGTGCAAGCGCTGTGCAGATCATCCAGACCATTTCCAAAGAAG
+TCGGCCACTTGACCGTCTACCAGCGGACCCCGTGCACTGCCATGCCAATGAGACAGCAGT
+CCCTGACACCCGAATACCAGGACAATTTCAAAGCCTCCGGCGAGATGGCAGCCACGATGC
+GTCGTACAAAGTACGAGAGATTTGGTGGCCAAGACGTGCAGTTTGTTAGCCGTCGCTGGC
+ACGAAGATACCCCTGAGCAACGGCGTGCAGTGTTCGAGCAAGCCTGGCAGAAGGGCGGAT
+TCCACCTGCTCCTATCCACCTACTTCGAAGTCTTCGACGACGTGGAGGTGAACCACGCCG
+CATGGCGATTCTGGGCCGAGAAGTCGCGCGAGCGCATCCATAACACCAAATACAAGGACA
+TCCTAGCCCCGCTAGAAGCGGTCCACGCCTTCGGAGGAAAACGTACCCCGTTCGAACAAG
+ATTACTTCGAGGCCTTTAACCGTCGTAATGTAGATCTAATCGATATGAAAGCGTCACCAA
+TTCTCTCCTTTGCGGAGAAAGGCATCATAACCCAGAACGAGGGCTTGCAGGAATTTGATG
+TAATCATTCTAGCGACTGGCTTCGACACAAACACTGGCGCCCTAACATCCATCCATATCC
+AAGACACAGATGGCATCCTACTCAAGGATCGTTGGAGCTACGACGGCGTTATGACGACCT
+TCGGGATGTCGACTAGCAAGTTCCCTAACATGTTTTTCTTCTACGGGCCGCAGGCACCGA
+CAGCTTTCTCGAATGGACCCTCGTGCATTGAACTACAGGGCGAGTTCGTCGAGGAGCTGA
+TCCTCGATATGATCGGGAAAGGTGTAACGCGTGTCGACACGACTAGTGAGGCCGAGAAGA
+GGTGGAAGGAGTCGACTTTGTCGCTGTGGAACCAATTTGTGTTTTCTTCTACAAAGGGAT
+TCTATACGGGGGAGAATATCCCGGGGAAGAAAGCTGAGCCGTTGAATTGGTACGTACTAG
+TCCTTGGTTTAGGTGTCTCAAAACGCTGATGTGTGATATAGGTTTGGGGGCTTTCCTCGC
+TATAGGAAGGCGTTGACGGAGTGTCGCGACGGCGGGTATAAGGAATATTCTCTTCGGTCT
+TTGCCTAAAGTTCCGGATCCAGAGCACAGGGGGTTGATTGATAAGGTGGCCGTTGTTACA
+TCGGCCCAGCCGGTGGGAGCGTAATGCATCGTTATGGAAAGTTTCAGTGACTGGCTATCT
+CAATGAAGCGCCTTCTTTCCCCCCGCGCGCTTTGTCTTCTGTGGACGATAATTTACTCTG
+AGTTACAATTACATTCACTCGAGACTGAAGTGAGAGTAGGCCAATTGAAGGTGAAGAATT
+AGCCGGTGGACTCGAGGAGAAAGACAAAGCAGAGGAAATAAATCGACGAAACTGTATTAT
+CACGGTTTGCAATCAATAGCTTAAGATACTAACGCAGGGCTGAATATTAGCATTTCAATA
+TGAACTTGGCTTATTCGACACATAACCGCCATCCACAAGCAGAACCATCCCATTACAGTA
+CGCCCCAGCTTTTGATGCCAAATATACAATGGTTCCCCCGATCTCCTGCGCATCCCCAAA
+CCTCTCTTCTGGGGCCAGTGAAGCAGGCAGTCGACCTCCAGTTGCTTTGTAGAAGTTGCC
+CATCATGTCGGTGTTGAAATCTGCGTCCTGTCAGCACTGATGACCTTTCGATCTGGATAA
+TGGCAAACATACATCCAGGGGCAATGACATTAACTCTGATCTTCCACGGGGCCAGATATG
+TGGCTAGATTCTTCATCATATGTGTCGCGGCCGCCTTACTCGCACCGTAGATGAACGCGG
+AATTGTCTCTACCGAATCCGCCCACGCTGCTCAACGTCACCACCTGGCTTCGCACGTACT
+CGATGCGCGCTTTGCCGGCTCCATTGCGCGCCTGGGAAGCCTTTGCCTCTTCCTCGCTCT
+TCTTGTTGCCTTTGTCCAAGAGCTCCAGAAAAGCGAAAGTCACCATCAGCGAAGCGGTGG
+TGTTGGTCTCGAAGACCGCCGCGTAGTCCGCGAAGGACTTGTGAAACCAGTAGTCACGCA
+CTTTGGATATGTCCCATTTCGGCGTCGGTCGTGTGGCATGGGGTCCCAGGTTCGGTGTGG
+AGATTCCAGCGTTATTCACCAGGAGGTTGATATACCCATCCTTGGCGGCGATGTGGTCGA
+CGACGCCTTGAAGGTCAGCATGCGAGGTAACGCTGCACTGGACCGGGTGGATGTTGCCAT
+GGGTCTATCACAGATAGTCAGTACCATAGTGATTGCTTTGTGGACTTGGTTTGTCTTGAA
+CCAACCGACTGCTTGGCTGCTTCTTGTAAGGGCTCGAGCCGTCGCCCGAGTATGTAGACT
+TTAGCCCCGTTGGCCTCGAGGGCCTTCGCCATCATCAGCCCAAGCCCTGTTCCTCCGCCT
+GTTATCACGGCCACACGGCCAGTGATGTCGAACAGGCTCGATGCGGTTAATGCTTTTCCA
+TCCATGGTCTGTTCTGTTGTTGGTGCATCAAAGCAGTAAGAGAGGAAGATCCGTCCAAAC
+GGGCGCGCTTTCTTCTACTCTGGATGGCCTTCCATATCTTGCCTCCCCCTCTCTCTCTCT
+CTCTGTTCTTCCATGGTAATTGTGCAGCTCCAAATGTTTTCGGATACCGACGTGAAGCTG
+AGATGGGAGAACATGTCGCGATCAGCCACTAAGTATCTACGTAGTCCTAGATAGAAAGCA
+AACACACTTAGAACATCGGTTGTTAGCACATGCAGCCACTGTCGATCCTCACACTCTAGA
+ATGCTACCGTAGGCGACTAGGCCAACGGATGGATGACTTACTAAATAGACGACAATAAAA
+CGACCAGCCAGTCGGGTTCATCCCAGAAATGAGCGATCGATCTCTCATTGCTGCATCGGA
+CACCGAAGCTTTTCTCGTGGCTTCGATGAAGCTACCTAGGTAGCCATGGATATCGATCCT
+TGGGGGCTGAGACAGCTACAAACCCTGCACGAGAACACACGGAAGTACTAGCGACATAAA
+CACATCAGGTCTCAGATTCTTGCTAGCCCCAAGATTGCAAAAGTACCACGTATGTTGACA
+GGCATCGCGTGAGAGCAGGACAACTTGTTCCTCTCGTTTCCACCCCCACGCCATCGTGTC
+TGGCACTTCAGATCGTATTCTGCTGCTGCTCGACTTTTCTCTCAGATCCGCTTGTGCTTC
+AAAACCACCGCAAGATGGCAGACGAGCAAAAGACCCCCTTGGAGAGCGGCCAGCAACCAG
+CAGTCGCACAACACACATCGACCGCTGAGCTGCAGACAGAAAAGCCCGGCCAGATGAATG
+GTAATGGAACAGCAGACAAGCCAGGCCCTCCAGGAGGCAAACCTTTTGGGCCTGGCATGG
+GCCCGCCTATACAGTATCCCACTGGATTCAAGCTCTACTCGATCATGACCGGGCTCTACC
+TCGCGAGCTTTCTCACGGCTTTGGTAGGTTGGCGCTCGATCACCGATCTGACGGACAGTG
+AAACTTATATCGGATGACAGGATCGTACCGTGTTGGTGGTCGCCATTCCCCAAATTACCG
+ACCATTTCAACTCGATTGACGACATCGGTTGGTACGGCAGTGCATATCTGCTCACCTTTT
+GCGCCTTCCAGCTGTTGTTTGGCAAAATTTACTCGTTCTACAATCCCAAATGGGTCTTTC
+TGTCAGCTGTTTTGATCTTCGAGATCGGATCGGCCATTTGCGGTGCTGCTCCTAACTCCA
+CTGCCTTGATCATCGGTCGTGCGATCGCGGGTCTTGGGTCTTCAGGGATTTTTGGTGGAA
+GTGTCATTATCACCTTCTTCACGGTCCCCTTGCATCAGCGACCGATCTACACTGGCATCG
+CCGGCGTCATATTCGCGTTGGCCTCCTCAGTCGGACCGCTCATCGGTGGTGGATTCACCA
+ACAACGTGTCCTGGCGGTGGTGCTTCTATATCAACCTACCCGTTGGAGCCCTGACCGTGG
+TGACTATTCTGCTGTTCTTGAACCTGCCACCCGCTCGTAAGGCCGGGACACCTCTCCGTG
+AACAGTTACTGCAAATGGACCCACTGGGTAACCTTTGCTTAATTCCTGGTATCATTTGCC
+TTCTGCTTGCCATCCAATGGGGTGGCTCAACATATGCGTGGAGCAATGGTCGGATAGTTG
+CACTGTTGGTTTTGGCCGGTGTTCTCTTAATCGCCTTTGTGGGAGTCCAGCTGTGGCTGC
+AGGATAAGGGCACTATCCCTCCACGCGTGATGAAGCAGCGCAGCATTGCGGCTGGTATGG
+CATTCACGATCTGTGTGACCGCAGGCTTCATGTCTTTCAACTACTACCTCCCGATCTGGT
+TCCAAGCAATCAAAAATGCGTCATCCTTCCACTCGGGTGTGATGATGTTGCCCACAGTAA
+TCTCATCAGGAGTAGCCAGCTTGGCCTGTGGATTCATCATTCATCGAGTTGGATACTACA
+CGCCGTTTATGATCGGTGGCTCTGTGCTCATGGCGATTGGCGCAGGTTTGCTCACCACGT
+TCACGCCCACCACAGAGCACCCTAAATGGATTGGCTATCAGGTTCTATGGGCATTAGGAT
+GCGGAATGAGTACGTTTCAGCCTCCTTTCTTTGCCCGCTGTATCTTTGTCGGCGGATACT
+AACTCGACCTCTGAGCAGGCATGCAGCAAGCCTCCCTGGCCGCCCAAACAGTGCTCCCGA
+AGCCTGACGCGCCAATTGGTATCTCGCTCATCTTTTTCTCACAATCATTAGGCGGTTCAG
+TATTTCTAGCGGTGGACGATTCCATCTACAGCAACCGGCTTGCGGCCAAACTGGGCAGCA
+TTCCCAATCTGCCTCAGTCAGCGCTGACAAACACGGGAGCCACCAATATTCGCAACTTGG
+TGGCACCTCAATATTTGGGTCGTCTGCTTGGCGGCTACAATGACGCATTAATGGATGTCT
+TCCGGGTTGCGGTAGCCAGCAGTTGCGCGTGCGTAGTAGCTGCTGCCTTTATGGAATGGA
+AGAATGTCAGGGCGGCCAAGGCAGCTGGACCAGGTGGCCCAGGGGGTCCAGGGGGTCCAG
+GGGGTCCAGGGGGTCCTGAAGGCCTGAGGGGAGGAAACAAGGTATAATCGAATAAACCCC
+CAAATCCGGAAATATGCATGTATAGTAAGCAAGTAGCAAGGTTGTCTAAGATGTTATGTT
+ATGGTCTCCTGGGTAGACAGGCGTACTATCAGGCCATTTGCAGTAATACGAAATAGAGAT
+AGATTGATAGAAGACCATCTCCCACAATGGTTCTGAAAAGCCGATTGTCTAAATGATTGA
+TATTGCTATATCACTAAAGAGATTGTAAGCTAGGTATACTTTATAGATACAGGGAAGGGC
+TAGTAAAAAAAGAAATAATAATAATAATTGATGAAAAGCTTCAGATTATGTCTTGTTCTG
+GTATTATTCTAGTGATTAACAGTAATGCTGATCTAGCTGACAAGCATTAAGGAGCTCCAA
+TAAGCTCTAAGATGGCGGAGCATTCGTTAGGCGTAATAAACCTACTAACTAGTGATGCAC
+TTCTCCGTATATAGTGAGACTATACGAACTATTACAGCCACGCATCAGAGAACCCCTAAG
+CTGCCTGGGGGCTCGACCCAGGATTATATGTATATGCAAGAATTACTCGGGGGGGTTTTG
+AATCCTACATTAGCCACCACATTCGACGACGGTAGGTTAACCACTGAGGGGTAGCCTATA
+TAATCTCGTGTACAGGTGGGAGATAAGGATAGTTAGCGTGTAGCATGATTTCAAATTGCT
+TGATTCTTATCCAATTATTTGTATTAAACGTCCAATATAGTTGTAAAGTTTTGGAGGTTT
+CTTTCCATTAGGTAGGTATAATGTCGAAGGTCCAGGTAGACCAATTAGGAGTTCTGATCT
+TTGGATGGATGGTAACAAATGCTCTCCGACGTACCTAGTACCATATAGATGACTGAAGCG
+AACTCCTGCGAGGGTCGAATGATCCGACACAACGCGGAGTACACTGCGGGCTCATTGCTT
+GCATATCACTAGCAGCAGATGAATCACACGCGCTAATCCAGAGTAGGAGTTTCGTGTGAT
+CCAGGCACTAAAAGCTCAGAACAGCAAGATATGTTGGAGAACAGTCTCCTACAGGATTAC
+AGGATTTTCAATTATTTTATCAAATATACCGCGCAAAGGTAAAGATCTCAACCCTCGGTT
+ACGCTCTAGGCTCTGTGGTCCAAAGTATAGGTAGGAGCTGCAGTAATGGGCTGGGCCAGA
+AGAGGATCGTCCTATCAGAGAAATATGACCTGTAGTTAGATCTCTACAAGGCGAAATTGC
+ATCCGCTAGGGTAGTATCGCGCATGGCGGGAAGGATTTGGGGGCCATTTTAGTTCTCTTG
+AGATTGGAAGTGCCTAAGGACCGCTAGGATGTCCTATAATAGAAATAAGCATCCATCCTG
+TTAAAGCTGGTAAGGACTGAAATTGTCTATATGGCCGGTCTGTGCTGATCAACTCCAAGT
+CTCCCCTCCTCTTATCGATAGGTTCGTCTGGTTTATAGCGCTTGCTCCGCTAGTAAGTCG
+CCAGTCAAACCCTATCTCAACATGTCGTGGCAGGGTCATGAAGGCCCCAAGGTGAAGCTG
+CGGTCGGCATGCGACCGGTGCTCTGCGAACAAGGTCAAATGCACTCAGGAGAAGCCCGAA
+TGCGAGCGTTGTCGTCTTCTGAGCTTGCCTTGCAACTATAGCCGTTCGATGCGGATAGGG
+AAACCACCCAAATCCCGACAGCGTGGATTGTCCAATATTGACCCGAAGACTCTCATGGGC
+GGGACTGTTACCAAAAAGCTCAGGCCGTGTCCGAGCGCGCCAGAGTCAGCGTGTAGAGGT
+TCATTCGAAGATGGAGATGGGGGCCCGTGGACAGAAACTATGACATTCGAGGAGATGTTA
+TCCCGTCCCTCCCCACCTCCCTTTGCTGGACCGTCACACAATTCAAACCGTCCTACGAAC
+ATGGCTTCTACGAACCAAGATCAGTACTATCACGACAAAGGGAAACATGGCGAAACAATG
+GATGAGATGCTGCAGACGCTGGTCCCAGACTCAGTGCAGTTCATAGAGTTCCCCAACACA
+GCCCGGGAAGACCAGAAACAACATCCAGAACTCAGGTCGGAAGAAGAGTATAGTGATTAT
+AGATCCAAGTCTCTCTTCGAGGAAGGCTTGGCACGCATCGCACCTGATTGTGCTGGGGGT
+ATTATGGACGTCTTATATGGCGAAGAGGCCTTAGTGCAGATGCCCAATCTGCCCTCGAGT
+ACGCATGAAGGAAGCTCAAATACCCATGTTACTTCATCCCACAACTGTACGAGAGCCGTG
+ATGGAGAATCTAGCCAAGCTATACCAAGTATGCGCACCTGCTGGAGTAGAGAATGGTTCC
+CACCCCACAACCGACCAAGTACTCAAAGCAAACAGCGACGCCATGAAGGACGCTGCGGAT
+CTTTTGGCGTGCCCGTGTGCCAAGGACTTCTGCTTTCCCATCATACTTGGCATCACAGCA
+TGCAGAGTTCTGGCCTGGTATCAGGTCGTGATCGACATGTATGACCCGGAGATTCCCATG
+GCCACGATGCCAACGGCACGCGAGGACATCAAGCACTGTCCGATTGCATTTGGAGCGTAT
+CAGCTGGATGAGGAGGTGAGCCAAGCAATGACCAGTCAATTCGTTCTACGAAACCTCCGT
+GCAATGACTCGATTTGTCAAGACCTATGTGGAGAACTTCTGCTCTGATATCAATAAGAAC
+CGGCCAGGAAGCTGTAGCCTCATCTACCGCTCCCTGGGCACTTTTATGCAGACGCGGCTT
+GGAAATACCATTGAGCAACTGGAGGATCGGTTGGCTGCATTTGATGGCGAGTATACAAAG
+AACATCGGATAGCTTCGGGCACCGAAACAATATCTGCCTTTTATTGCTTCTCAATCAAGC
+AGTTAATGGATTCCTAAACCTTCCACACCAAAGTTGGGGCAACGGCTCCAGGGAGAGGAA
+CGTCAAAAGGACGATGGCAGAATATTTATACCACGTGGAAACCCTCACTGAAACTTGATT
+TAGGAGTTGTTTGGATATAATCGACCTGTACAGATCTTCAATGACAGTTTGTCTTTCGGG
+GTCTCGTCTTCCTGCGCCTCGGGAATCCTCCGGGCCTGGGAGTTTTTGAGTAGCTGCTAT
+TGACGTGGAGAAGGCACGGAGATCAGGAGAGAAAAATGCTACAGTCGAGATAAGGCAAGC
+AGGATATAGCTCACGACAAGATATAGTATATGAAAGAGCAACTAGAAACTGACTGGCAAT
+CTGTAACGTACATGAACAATGATCCTCCATATTCTGGGTCCCTCCGATCCATATATAGGA
+TTTGTATCTAAAACAATCCAAAAAGCCGATATAAAGGATTAGCAGCAGAAATAATAAGGC
+AAAAGAGGATCTATACCATCATAGCAGCCCAAGCGCATGTTCTAAACCCCTGCGGGGATA
+TCAATCTACCCGAGTTGCATGCCCTAGGCAAATAATCATGTCACGGGTTCCCCGTGACAC
+TATATAGATGACATCTGTATTCTGCTGTATCCAGACCTCTGTGCCGTAATATTAATAACT
+TGTAGATTTAATAAGAAAATAATATTTTCTTTCTCTAAGAGGGGCAAGTTACCCCTCACC
+TTAGACCCCGCTTGGTTTACTTTCTTGAGTATGAAATGCACAGACCGCGAGACGTGTCAC
+ATAATGATAACTATTAATAATTGACCACTCAATAATTTAAATCTCAAGTTTAATCTCTCT
+GAGGTTGAAGGATTTGTCTGTTCTAATCTCAAATTTATTACATTTCCCCTCCAATATGAG
+CTATTGAAAGACTCTTATAAACAAACACTATAGAAGCAAGGAGAGAATATAGAAAATAAA
+GTATAGAAAAGAAAGAAAGAAAGAAAGAAAAGAAAGAAAGAAAAGAAAGAAAAGAAAGAA
+AAGAAAGAAAAGAAAGAAAAGAAAGAAAAGAAAGAAAAGAAAGAAAAGAAAGAAAACACA
+AACTCGGAATCGTCTACACAATTCAACAGCATGTCTACTAGTGTACGTACTGCCTGCTTG
+AGAATAATCGTTCTTGATACTAAATTACTCGGTGTCTAGACTTGTGATTCATTGCCTGGT
+ACTTCTACCCCAATGGCAATCTGCGGTATAGCCGTGAGACTGCCCGGTGGAATTTCCAAC
+GATGCTCAGCTCTGGGACTTTCTCCTTGCCAAGCGCGATGCTAGATCACAAGTCCCCGGT
+AGTCGGTACAATATTTCGGGATACCACTCCGATTCGGGAAAGCACGGGACTTCAAAATCG
+AAGTATGGGTATTTTCTCGATGAATCTGTCGACCTCGGAACCCTCGATACCTCGTTTTTC
+AGCTTTACGAAGCTTGAACTGGAGTACATTGATCCCTGTCAACGTCAACTGTTGGAAGTT
+GTCCGAGAGTGTTTTGAAAGCGCCGGAGAGGTCAACTATCGTGGAAAGGATATCGGATGT
+TTTGTTGGCTCGTTTGGCGATGATTGGACCGAAAACCTTACACATGATGAACAAACATCG
+GCCAAGTATCCTCTGATGGTTGGAGGTGACTTTGCTACTCCAAATCGAGTCTCATACGAG
+TACAACCTTCATGGGCCAAGTGTGAGCATTCGAACGGCCTGCTCCTCGTCGCTCGTCGCG
+CTTCACTCTGCATGTTTATCGATCCAGAATGGAGACTGCTCAGCTGCCATTGTTGCTGGT
+TTCAACCTCATTCTCACCCCCACAATGACCATGATCATGTCGTCAAAGGGCGTACTTTCT
+GCTGATGGGTCTTCCAAATCATTTGATGCGGACGCGGACGGATATGGCCGAGGAGAGGCG
+GTCAATGCCGTATACATCAAGCCGCTCCATGACGCCATACGTGATGGAAACCCTATACGT
+GCCGTTATTAGGGGTACTGCTACAAATTCCGACGGAAAGTCGGCAGGTTTTACTGTACCC
+AGTGCTGATGCCCAGGAGGACGTCATTCGGAAAGCCTATAAAGCTGCTGGTATAAGTGAC
+CTCTCCCAGACCGCGTTTGTGGAATGCCATGGCACAGGAACCACAGTCGGTGACCCTATA
+GAGGTTGCGGCAATTGCGAATACCTTTGGCGGCGATATGTACATTGGGTCCGTCAAGCCC
+AATGTGGGCCATTCTGAAGGTGCATCGGGGCTCACCTCTTTGATCAAGGCAGTGCTGGCT
+GTTGAGAACCGCACCATTCCACCTAACATCAAATTCAATACCCCAAACCCCAAGATTCCG
+TTTGAGGCTAAGAAAATCACTGTCCCTGTGGAGGCAACCCCATGGCCTTGGAACCGCTGT
+GTAAGAGCCAGTGTGAACAGCTTTGGAATGGGGGGCGTCAATGCTCATGTTATCATCGAG
+TCTGCCGATAATTTCACTCCACCAACCTCAGAAGTGATCGAAGAGCATGATTCAACCCCA
+CAATTACTGTTATTCTCAGCGAACACGCAGGATTCCCTCGAAGCAATGATTCAACGGAAC
+CTTGCTTATCTGAGGGAGAATACTGACTCTCTGCGTGATCTTGTATATACCATGGGTGCA
+AGGCGGGAACACCTTTCGTTCCGGGCTGCATCCATTGTACACAGCGACATGTCGGTTACT
+ACAGCCTCGTTTGGAAAAGCTCCCTCAAGTCCACCGGACATCGTCATGGTCTTCGCGGGT
+CAGGGCGCTCAGTGGCCAGGCATGGGAGTTGAGCTTTTCAAGAGCAATGCCACCTTTCGA
+AGGTCAATTTTGGAAATGGATAGTGTCTTGCAGAGTCTGCCCGATGCCCCTGCCTGGTCG
+ATCGCAGATGAAATCTCAAAGGAACACCAGACAAGCATGCTTTATCTATCCTCTTACTCG
+CAACCGATATGCACTGCACTTCAGGTCGCACTGGTTAATACTCTTTTCGAGCTCAACATC
+AGGCCCTATGCCGTCATTGGTCATTCGTCAGGGGAGCTTGCCGCGGCCTACGCTGCAGGT
+AGGCTCACTGCTAGCCAGGCTGTGACGCTCGCCTACTACCGCGGCATTGTGGCAGGAAAA
+GTAGCACAGGCCGGGTGTATGGCCGCGGTTGGAATGGGAGCGTCTGAGATTATCCATTTT
+TGAGACCAGGAGTGGTTGTCGCTTGTGAGAACTCTCCATCCAGTGTCACTATATCTGGTG
+ATATCGATCAGGTGCAGTATGTGATGCAGGAAATATCTTTGGCACACCCAGAGATCCTGT
+GCCGGCAAATCAAGAGTGATACCGCCTATCATTCACATCATATGAAATCTGTCGGAGATA
+CATATCATTCATTCATCAATCCCTTTTTCCGCGGTGAGACCGAGGTCAACTGTCAACCCG
+TCCACTTTTTTTCCACGGTTACTGGCGATGAACTGAGTGATGGAGATCATGTGGGACCCA
+AATACTGGCAACAGAATCTAGAGTCCCGGGTCTTGTTCCAAGGAGCTTTGGAAAACATCA
+TTTCTCGCCAAAGAAGCCGCCATCTTTTATTTCTCGACGTTTCGCCTCATAGCACTCTCG
+CTGGCCCAATTCGTCAGACGCTGGAGCAAGCAGAAGTGGCCCATCCTTATGTCCCTTGTC
+TGATTCGTTTTAAAAACTGTGCTGAATCTTTCCTTTCTACCATTGGTCAACTGTATTCGC
+ACAGGCAACCCCTTGATTTCAATATGCTCACCAACCCGGATCGAACTGCCAAAGTGCTCA
+CCGATGTACCGACCTATCCATGGCAACATGGCTACTCCAATCTGTACACGACACGCCAAA
+ATAATGAATGGCTATTCCGCAAACAACCGAAACACGAACTTCTGGGAACTCGAGTGGTTG
+ATAGCACAGATAATGAGCCCTGCTGGAGAAATGTTCTTTATCTAGAACATGTCACATGGC
+TTCGGGATCACAAAGTCTCTGGCAATATTGTCTTTCCAGCGGCTGGCTATGTCATGATGG
+CTGGAGAGGCCGTGCGACAGATCGGCTCAACTGCGTCCGGATTCATCGTCCGCCAAATGG
+TTCTGGACACCGCAATGGTCCTGAATCAGTCAAACCCCACGGAGATCGTAACCTCCCTTC
+GGAAGCATCGACGGGATCGATGGTATAGCTTCACAATCAGTTCACACAACGGCGTGAAGT
+GGATTGAGCACTGTTACGGGGAAGTTGCCCAGGGTCATGACCTACCCTCCACGGCGAGCA
+CACACAAAGAAAACCTCTCTCGGGATATCAACGTCTCCAATTGGTACAAAACCCTCAGTC
+GTGGGGGGGTTGAATTTGGACCTGCCTTTCAATGCGTCGAAAGCCAATCCTGCTCAGTCA
+CCAGTAACACTGTGTCCGGAAGGATCGTATCTAAGGTGAATGAATCGACATACTATCCAG
+TTCACCCTACACAGTTGGATTCCGTTCTCCACATCGTATACGGTGCAATCTACAAAGGGT
+TCGACTGGCAGGTTGAATCATTGCCTGTTCCAACGAGTATAGGAGAAATTATGATTGGCG
+AATGTGTATCGGACTTAGATGTCACGATGTGGGCGGATGTCTCCAGAAATAGCAATATAC
+TCGTTAACGGGGAGGCCTTTGGGTCTGATGGCTGTCTTTTGATACGCATCAAAGACATCG
+TTCTTCGACCCCTTGGGGCAAATCAGGCTTGTTTTGAAGAAGATGAATCCCACGCTGGAG
+CAAGGCTGCTTTGGAAACCAAGTATGCAATTCCTCAATCTTGCCGATCTTATTCAAACGC
+CGGTAAATTGGACGAAGCAAACGATGCTATTGAATGATTTTACTTCTTTGTGCATTGAAC
+GAGCTCTCTGTCTTCTTCACGCGCAGGGTGTGTCTTCAAGCATCAGCCATCTCCAAAAAT
+ATCAGGACTGGCTCCAAAGACAGCCAAAACCAAGTTCTGAGCAAAGCATGGAATCATTGG
+TTGAGAAGATCCTAGCTACTTCGGCAGCACCCTGTGCCCGAGCAATGATTAAAGTATTGG
+ACAACATTGTCCCGATCTGCAAAGGTGAAATCGATGCTTTAGAAGTCCTCATGGGTGATG
+ATACACTTTACGAACTCTACAACTACCTCAATGAGGTTGACCGCACTCCACTTTTCGACT
+CTCTAGGTCACTACCAGCCGCAACAGCGAATTCTTGAGATTGGGGCTGGTACAGGAGGAA
+CAACAGCAAAGATTCTTCCTCGGACGAAGTACTCGACATATACATTCACAGATATTTCCG
+CGGCCTTTTTTCCAGCAGCAAAAGACCGTTTTCAATGTCATGCGAACGTAGTCTACCGGA
+CACTTGACATAACCAAAGACCCCTTGGACCAGGGTTTCGAACCAGAGTCGTTTGATCTGA
+TTGTTGCAGCAAATGTTTTGCACGCCACACCGAATCTGTACGAGACGCTCAGCAACGTCC
+GCAAGCTCCTCCACCCGCGGGGGAAGCTATTGTTAGAGGAACTTTGTGGTGACGCTAAGT
+TCACCAATTTCATTGTCGGGGTTCTGCCTGGCTGGTGGGCAGGCGAGTCTGATGGCAGAG
+CAGACGAGCCCTACATCTCTCCCGACCGTTGGGACAGTATCCTGAAGGCTGCTGGTTTCA
+ACCCTCTCGACGATGTTGCGTTTGATGCAGCACCGCCACTTCATAGCCTTGCATTCATGC
+TTGCAAGCCCTTCTTGTGTCCCTGAATCCCCATTGAAAAGAAACGTCACGTTGCTGTCTG
+ATGTGACATCATCAGAAATTGCAGTACGAATGCAGAAGCAATTACTTTCTAGAGGGTATA
+GTGTCGGTGTTCAGTCACTCGATCAGTCTTTGATGGATGGGGAAGACGTCATTATTCTTG
+TCGATACAGTATCTCCTTTCTTCCATAACCTGGATAGTAGAAAGCTTTCTACCTTTCAGA
+ACCTCCTACGTGAACTGCAGCGATCACATTCCGGGGCCCTTTGGGTGACCCGATCGATTC
+AGATAGACTGCAGGGATCCTCGATACTCACCTACTCTCGGCGTCGCACGCACGGTTCGCT
+CCGAATTTGGCCTGGACTTCGGGACATGTGAGGTCGACACTCTCAAGTACACGAGCATCG
+GGCTCGTCATTGATGTTTTCGAGGCGTTTCATGGCCGGCGTCATGGACAAAATGCCTACC
+CTGAATATGAATATGCCATTCGGGAAGACACAGTTCATATCGGACGATTATCTTCTTTCT
+CTGTCCAGGAGGAATTGAGGCGGATCCAAAAAGCGCATGTGGAAACCAAGGACAACAGAA
+TATCTCTTGTGGCTGGGACAAGTGGGTTTGACTCTTTAGCTTGGCAGGCCGATGCAGGGC
+AGCAAGTTCAGTTACTGGGAGATGACGAGGTTGAACTTCAGGTAGACACTGCTGGTGTGA
+ACTTCCTGGTACGTTGCTCATTTCAATTCCAAGGAGAAAGCTAATGGAACAAGGATATCT
+ATACCTGTTTGGGCAATTTGGAAATGCCTTGTACGGGAATGGGTCTCGAGGCCGCTGGTA
+TTGTGCTTCGGGTTGGTGCACAAGTTCAGGGCCTTAGTCCGGGTGACAGAATAGCTACAT
+TTGGCCCTGGGGCCTTTGCAACGACAATGATACTTCCAGAATCACGCTGCATCAAGATCC
+CTGACTGTCTAACACTCCAAGATGCAGCCGTCATGCCTCTTGCGTTTGCCACTGCCCTTC
+ACGGATTGTGTGACCTCGGAAATTTGCAAAAGAATCAGGTAATTGTGCCGTTGCCTTTGA
+TAGATACACCACTGACTGGACGTGCAGACTGTTCTTATCAATTCTGCCTCCGATGGAGTA
+GGCCTTGCTGCTATCCAAATCAGTAAAATGATTGGAGCGACCATCTACGCCACTGTCATT
+GGCGAGGATAAAGTGGAATATCTTACAGCGTCGCATGGTATCCCGCGAGATCACATTTTC
+AATTCGCGCGATTCATCGTTTCTAGATGGAATAATGAGAGTAACTAATGGCCGTGGAGTT
+GATCTGGTTCTAACCTCACTTTCTGCAGACTTCATTCAGGCTTCCTGTGACTGTGTTGCT
+AACTTTGGTAAACTGGTCAACCTTTCTAAGCCGACTGCAGCCAACCAAGGACAATTTCCC
+ATAGATTCCTTTCATCCTAACATGTCATACGCTTCCGTCGACATCATTGACTACATTAAG
+CGTAGACCAAAAGAGAGTAAACGGTACGTGATCACTTTCAGGCACTCGTACCAGCTTTGT
+CCTGCATGTAACTGACATGTGCGCGTTCCAGTCTGTTGGAGGAAATCGTGGAGCTTTATA
+AACAGGGACATATCCAACCTATTACCCCTGTGAAGACATTTACTGCGACGGATATCCGAC
+AATGCTTTGATTACATGCAAAGCGGGCAACATATTGGCCAGCTGAGGCTCTCCTTGAAAT
+CACAAGATACGTTCATTGAGGCGGTTTGCTCTCCCAAGACCATGATCTTCCAAAGTGATG
+CGTCTTACCTCCTCGTCGGGGGGCTTGGAGGGTTGGGTGCTGAGATAGCAAGGTGGATGG
+CTGAGCATGGTGCCAGAAATCTCATCTTCTTGTCTCGAAGTGCCGATGCAGAGTCTAATA
+TCAGACTATTCCGTGAACTGGAAAGTCAAGGCTGTTCCGTGCAAGCAATCAAGGGAAGCG
+TCTGCAACGCATCTGACGTTAAGAGAGCCATTTCAGCCGCCAGGATCAAATTGAAAGGCA
+TTTTTAATATGTCCATGGTGCTACAGGATGCATCGTTGCTCAAAATGTCATCTGATGAAT
+GGAACGCTGCAACCGGTCCAAAGATTCAAGGCACTTGGAATCTGCACGACGCAAGCTTGG
+ACCAGGATCTTGACTTTTTCCTCCTTTTCAGCTCAATGGGTGGTATTCTTGGAATACCTG
+GACAGGCCAATTATGCCTCTGCGAACACCTTCATGGATGCGTTTGTGCAATTCCGCCATA
+GCTCCCATTTGCCTGCTTCTGTCATTGACATTGGTGAAGTCCAAGGTATTGGACATGTCG
+CCAATAACCCGGAGATTTTGAATCGACTCAAATTGCTCGAATGTGCCCGCATGAGCCAAA
+AGGATCTGTTCCACGCCATCACAATTGCCATATCTCATAGCCTTCCTCCACAAACCCTGG
+ATTACAGCCGTTACGAAAACCCAGCGCAATTCATCACTGGGCTGAGAGACACCACTGGTA
+TGCTGGACAGCACCGGCGGAAAGTCCATGTTGCTGGATAGTCGCCTAGCTGCTTATGTTG
+GCAATTCTGCTGCTGTCACCGCACCCACGGAGACGAAAACTTCGGCGAACAAACTCAACA
+ATTTCGTGTCTTCAGCTGCCACAGACTCTGCCATTCTTTCAGAACCCTCGGCGACGCAAT
+TTGTTAGCTTGGAGATCGCAAGATGGGTATTCGACCTTCTCATGAAACCGGTGGACGATG
+ATTCAGAGATTGACCTTTCTCGCTCCCTCGTTGATGTCGGACTTGACAGTCTGGCTGCTG
+TCGAGATGCGGTCTTGGTTAAAATCCTCTCTGGGGTTGGACATTAGCGTTCTGGAAATCA
+TGGCATCCCCCAGTCTGGCTGCGATGGGTGAGCATGTTATAAGAGAGCTGGTTAGAAAGT
+TCGGGGGAGACAACAAAAATTGAGAGGGTAGATAGCGTCATGTGAAATTCTTATAAGGTA
+CTTATAATTCGAAACAAGTGGTTTCTTGATTCAATCGCTCCTAAGACATTTATGCCTAGA
+ATCCGCCTTCTACGTCAGTCAGTCATGTATTCTATCGTAACTCATCCGACAGCTAGAAAG
+TGGTTGCCAGCTCATTAGGCGACTATTTCAAAACCTTCGCAGGTCAACTTTCTGATCGGT
+CGTCCTTGAATAAAAACGTAAGTAACTTAACTAATTCTCTGCCTATAGTCGAGATCTTAA
+GCCGACCCTCTACGTGAGGATTTAGCTTTTTTTTCCCTTTCATATTACCTTATTAGTATA
+ACTTCAGGAACTTCTATATGTATTCTCCGATTAGCCTGGTCATATTTGCGGACAGCCAAG
+CAGCGCTAAAAGCTCTCCGAAGACCTCGCACGGTATAAGGGTTAAAAAATGCCGATTGTG
+CTCATGAAACATTTTCCTACATCTTGATTCAATTCTAAACGGCCAAGTCTCCCTTGACAA
+ACATCCCGATTGCTGAGATAATTATAGAGGAGTTAATCTAAAACTTTCTATATGAGATAG
+GACGATATGTCGCTGATTACTCCTCTGGGCTTCAGGCTTAAAGTCTAGAATTAGAATCTG
+CATCTCATCCCTTGAACGACGTCCGGATAAAATCCAGTCCTCGGGACATGTCCCTCGGAA
+TATCAAAGGCAGAGTGCTGATGCCAGTGTCAATTAATCCCCGACAAGCGAATAGGCCCCG
+AATGCACTATGTTTACAATGCGCGGGGTTCACAATCGGTCCTTTGTTGGAGGTGTACATC
+TATCCCGAAAGCGCCCCCGTCTTCGGACTCCGAATGCCATCCTAATACCCCGACACGGCA
+AATGACCTGCATATCAGCATGAGCTAGGCAAGTTTATAGCACGTGGCTGTTCGTATAGAT
+CCGACATGCGGATTGGCTTCATAGAATCAAGCGCCATTGTGCGTACGCCGCATGTAACAG
+TCCAAGCTTATCATGGCCGGCAAATACCCAGGGCATTGATCAATACATTGGTATTGCTAT
+GACTCGAAGAAGAATATCGGGGTAAATCTTTCCCTTCCGTACACTACTCTCTGTTTCAAC
+AACAATATTTGCCCTTTGCGTCCCAAACATGGCGCAAAAGCTTCGTTTTTACCTGTTTGG
+TGATCAGACCTATGATTACGATGAGCAGCTGAGGGCTCTTCTCACGAGCCATGACCCTGT
+TGTGAGGTCGTTTCTCGAGCGAGCCTATTATACTCTCCGAGCAGAGGTTGCCCGTATACC
+AAACGGATACCAGGCACGGATTTCGCGCTTCTCTAGCATCGCTGAACTACTTTCCCAACG
+GCGGGAACATGGAGTAGACGCCAGCTTGGAGCAGGCTTTAACTGTTGTTTACCAGCTTGC
+GTCGTTTATGCGGTGAGTAACATTCTTTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCT
+CTCTCTCTCTCTCTCTCTCTCTCTGTTTTTGTTTTTTTTTTTTTCCTCCTGTGCACAGCT
+AATTGCATGCAGACTTCATTCTGAACGAAGCCTTTCGTACCCATCAGCGGACGATGCGCA
+TTGTCTTGGTCTGTGCACTGGTGCCCTAAGTGCAGCGGCTGTGAGCAGCTCCAGGTCACT
+CAGCGAACTGCTGCCGGCCGCGATTGAGACAGTGATACTCGCCTTTCGAACCGGTCTCCA
+CGCCAGTGACTCTGGAAGACGGATTGAAGAGTCATCAGCGGCAGCAAAATGCTGGTCAAT
+ATCGTTGCAAGGTTTGGAAGGCCATGTAGCCAGGAAGCTGCTGGAGGAATGGTCCAATAA
+AAAGGTAGACCACCTTCATACACCATTACGTTGTGCGTCTCGGACAATAAAACTAAGCTT
+ATCACCAACAGAGACTCCCTCCAATGTCAAGGCCATATATAAGCGCATATGCATCTGGCG
+GAGTCACCATCAGTGGTCCGCCATCAGTTCTGGCAGAGCTTCGAAACACACCTGGGCTGT
+CTAAACTGCGTGCTAAAGACATTCCAATACACGCACCATATCACTCCTCAGCAATATTCA
+ATCAATGTGACGTAGAGACAATTCTGAGTTCTGCGTTAATCGATCTGGCCTCTCGCGCGA
+CTCATGTTCCAATTCTATCAACCGGTACCGGACGACTGGTCTGGGCAGGCACTCTTCCAG
+CCGCAATACAGTCCGCCCTGCAGGATGTACTCCTTCGTCCGATAAGCTGGGAAAATATGT
+CATGTGGAATAAGCACCTGTCTTCAGTCCATAGATCCAAGCGAGGTGGAGGTGATCCCAA
+TCGCCACCTTGGCCGGCCCACTGCTCTGTCGCTCAGTACAGGTCGCAAAAAGCCAGATTC
+CAGCTACAATCGATCCAAAGAACGATGTCATGAACGAAGCACAAAGTCAAATTGCAGAGG
+CTATGGACCGAGCCAAAATTGCCATAGTGGGCATGTCCGGTCGTTTTCCAGGCGCTGAGA
+ATGTCGATTCTCTCTGGGAGCTTTTGATGGCCGGCCGTGATATGTGCAAAGAGGTACCAC
+CCACCCGGTGGAACGTTGACACTCACGTTGATCCCACTGGTAAACGGAAGAATACCAGCA
+AGATCCGATGGGGCTGCTGGCTCGACAACCCGGATATGTTTGACGCGCGGTTTTTCAACA
+TGTCTCCGCGAGAGGCGCCGCAGGTTGATCCGGCCCAGCGAATTGCGCTCATCACTGCGT
+ACGAAGCTATTGAGCAAGCTGGTATCGTTCCAGGGAGAACGCCATCAACGCAGGAAGATC
+GAGTGGGCGTCTTCTTCGGTACGACCAGCAATGACTGGTGCGAGAGCAACAGCGGGCAAG
+ACATCGACACGTATTACATTCCGGGCGCCAACCGTGCCTTCATCCCAGGCCGAATCAACT
+ACGTGTTCAAGTTCAGCGGGCCCAGCTATAGCATCGACACTGCATGTAGTTCAAGCCTGT
+CCGCGCTTCATGTAGCATGTAATGCCCTCTGGCATGGAGATATCGACACTGCAATTGCGG
+GTGGCACTAATGTCCTCACGAACCCCGATATGACTGCCGGCCTGGACAGAGGCCACTTCC
+TTTCCGCAACTGGTAACTGCAAGACGTTTGATGACACCGCTGATGGGTACTGCCGTGGTG
+AAGGCGTAGCGACCGTCGTCCTCAAGCGCATGGATGACGCTATTGCAGACAAGGATCCAA
+TCCTAGGTGTGATCCGTGGCGTATATACCAACCACTCTGCAGAAGCTGAGTCAATCACAC
+GGCCTCATGTCGGCGCCCAAAAAGCCATTTTCCAACATGTCTTGAATCACTCGGGTATTC
+GACCTCAGGATATCAGTTACATTGAGATGCACGGAACTGGAACCCAGGCAGGAGACATGC
+GAGAGATGACCTCCGTGCTTGATACCTTTAGCCCGCAGTACCCAGGAGCAATCCAGCGAG
+AAAAGCCTTTATATCTGGGGGCCGTCAAGTCAAACATCGGACATGGAGAGTCTGTTTCGG
+GGGTTACAGCCCTTGTCAAAGTGATAATGATGATGCAGAACAACACTATACCTCCCCATC
+GCGGAGTCCACACGCGCTTAAATCGGAGGTTTCCCTCAAACCTCGATGAACGAAATGTTC
+ATATTGCATTCCAGGCGACCGAGTGGCCCCGCGGGCAGACTCCCCGACGAGCTTTTATCA
+ACAATTTCAGTGCCGCTGGGGGGAATAGCTCAGTTCTAGTAGAGGACCCACCACTGATAC
+TGAAGGAAGAGGGTGCTGATCCTAGGTCATCTCATGTTATTGCAGTGTCCGCTAAATCAC
+CTTCGGCATTGAGGAAGAACCTAGAGTCTATGCGTCGATATGCGATGTCAGAACATACAG
+AAAAATCTCTATGTGAGCTGTCTTATACCACAACAGCTCGACGCATTCACCACTCGCATC
+GGTTGATGTTTGCTGGGTCATCTCTAGAGGATATTCTGCGTGAGATGGAGAGCAAGTTAG
+CGATTAAAGAACCATTCAGTCCTTGCGCACCACTTCAATCGGTCATTTTCACCTTCACCG
+GCCAAGGCGCACAATACCCGGGAATGGGTCAAGTCTTTTTTAATAACTTCTCCGTGTTCC
+GGTCTGATCTCTGCCGCCTTGACGATTTGGCCCAAAAGCTTGGATTTCCGACATTTCTCC
+CGATTTTCTCAGCAAGTACCCATGCCAGACTGGAAGGTTTCACACCCACTGTGGTCCAGC
+TTGCCAATACGTGCATGCAGCTTGCACTCACCAGGCTCTGGGTGTCGTGGGGTATTCGTC
+CGTCGGCAGTAGTCGGTCACAGCATTGGGGAGTACGCAGCGCTGAACACGGCGGGCGTCC
+TGTCCGACGCGGACACGGTTTATTTGGTAGGCAAAAGAGCCCAGCTGCTCGAGGAGAAGT
+GCAACCGAGGGTCACACACTATGCTGGCAGCGCTGGCCTCTTTCGAAAAGGTGTCACGTC
+TACTTGATAGCGCACCGTGTGAGGTTGCGTGTATCAACGGACCCGAGGAGATCGTTCTCG
+CTGGACCGCGTTCGCACATGACAGATATCCAAAAGATCCTCGTGGCGCATTCAATTAGAT
+GCACCATGCTGCAAGTCCCATTTGCATTCCATTCGTCCCAGGTGGATCCAATTCTGCAAG
+ACTTCCAGTCTGCAATCGAAGGCGTTACCTTCCATAAACCAACTATCCCGGTCATTAGTC
+CACTCCTTGGTGATTTTGTGACAGAAACTGGGACCTTCAACCCAAACTATCTGGCACGCC
+ATTGCCGGGAACCAGTGAACATACTACAAGCACTTCGCCAAGCCAGCACAATGAATCTTG
+TCCATGACAGCAGCGTAGTCATGGAGTTTGGACCACATCCTGTCGTATCAGGCATGGTGA
+AATCAACGCTGGGGAACAGCATCAAGGCACTTCCCACTCTGCAACGGAACCGAAACACCT
+GGGAAGTACTCACGGAGAGCGTGTCAACACTATACTGTATGGGATTCGACATCAACTGGA
+CCGAGTACCATCGAGATTTTCCATCATCGCAGCGTGTCTTGCGACTCCCATCGTACTCCT
+GGGATCTGAAGTCGTACTGGATTCCGTACCGGAATGATTGGACTCTGTACAAGGGCGATA
+TTGTGCCTGAATCAAGCATCGCGCTGCCAACCCACCAAAACAAGCCACACAGTACATCGC
+CGAAACAGCAAGCACCGACACCAATCCTGGAGACGACAACATTACACCGGATTGTGGACG
+AGAAGTCCACCGAAGGGACGTTTTCAATCACATGCGAGTCAGATGTATCCCGACCAGACC
+TCAGCCCTCTGGTTCAGGGCCATAAGGTCGAAGGGATCGGACTTTGTACACCGGTATGAA
+TCTCCACACTCATGTTCGCTGCGCAGCATAATCACTGACTCCTTCTGCAGTCCGTTTATG
+CCGATATAGGATTCACGCTGGGAAATTACCTTCTAGATCGTTTCCCAACTCGATTCGGAC
+CGGATACTAAAGTTGTGGATGTCACGGACATGGTGATTGAAAAGGCTCTTATGCCGTTGA
+ATGCGGGACCACAATTACTGCGAGTCACGGCTTCATTAATCTGGTCCGAGAAAGAGGCTT
+CTGTCCGGTTCTACAGCGTGGATGTAAGACGTCCCTCTTCTAAATCTCAGATGAATACTA
+ATATTCATAATTCCCAGGAAAATCACACCGAAACAGTACAACATTCCCACTGCCGCATTA
+AATTCAGCGACCGTTCAACGTACCAAGCCTATCAAGAGCAAATCTCCGCCGTTAAGGCTC
+GTATGTTTGAGATGAAGACCAACTCCTCATCGGGTAGAACCTACCGATTCAACGGACCAA
+TGGCATACAATATGGTGCAGGCGTTGGCGGAATTCCACCCGGATTACCGGTGTATTGACG
+AGACGATTCTCGACAACGAGACACTCGAAGCAGCCTGTACAGTCAGCTTCGGGAATGTCA
+AGAAGGAGGGTGTATTCCACACACATCCTGGCTATATAGATGGACTCACGCAGTCGGGCG
+GGTTTGTGATGAACGCTAACGACAAGACTAATCTCGGAGTAGAAGTGTTCGTTAATCATG
+GGTGGGACTCGTTCCAGTTGTACGAGCCTGTCACTGATGATCGTTCGTATCAGACTCATG
+TTCGGATGAGGCCGGCGGAGTCGAATCAGTGGAAGGGTGATGTGGTCGTTCTAAGTGGGG
+AGAATTTGGTCGCTTGTGTTCGAGGATTGACGGTAAGTCGAGAGACCTAAGTAACAATCT
+CCTGTTTAGAGGAGAAAAAAGAAAGAGAAAGCGGATTTGCTGACTACCTTCCAGATCCAA
+GGAGTACCCAGGCGAGTCCTGCGGTATATCCTGCAAAGCAGTGCAAAAACCACACAGACA
+GCCACTTCGAGCGTGCCTGCCCCGTCTCAAGCTCCGGTGATGGTGCCACAGATTGTCCAA
+GTACCAAAAGCTAAGCCTATCTCCCAAATTTCCGGGACCCTGACAGAGGCTCTCCGGATT
+ATTTGTGAACAAAGTGGTGTGCCTCTAGCAGAGCTCACGGATGATGCAACTTTCGCGAAC
+ATCGGCGTAGACTCTCTCCTAGCGCTGACTATCACAAGTGCATTTGTTGAGGAGCTGGAT
+CTAGACGTCGATTCTTCCTTGTTCATGGACTATCCTACTGTGGCGGACCTGAAGCGGTTC
+TTCGACAAGATCAACACGCAGCATGCTCCGGCACCAGCCCCGGTATCAGACGCGCCAAAG
+CAATTACAACCAAGCAGTAGCCCAGTTGCATCTGCTACTCCGTCTGCACCCATCCATGGC
+AGATCGAAATTTGAATCAGTTCTTAACATCCTTACCGAGGAAAGTGGTGTTGAAATGGCA
+GGTCTTCCGGACTCTACTGCGCTTGCAGACATAGGTATCGATTCGCTCTTGTCCCTGGTA
+GTCACGAGCCGGCTGAACGATGAGTTAGAGCTAGATGTGTCGTCTGAAGACTTCAATGAC
+TGTCTGACTATCCGGGATCTCAAGGCACATTTCATGTCCAAGAACTCCGACAATGGTTCG
+TCTGCGGTTCTTACTCCTCAGCCATCTCGGGACTCCGCACTCCCTGAGCGCACGAGACCT
+AGGGTCGCTGATACAAGCGATGAGGAGGATGCACCGGTTTCAGCAAATGAATTCACAACC
+AGTGCCCGCTCTACATCTAAGTATATGGCTGTGCTCAACATAATTTCCGAAGAAAGCGGC
+ATGGCAATCGAAGACTTCACCGACAATGTAATGTTCGCAGATATCGGAATAGACTCGCTG
+CTGTCCTTGGTCATTGGAGGTAGAATACGGGAAGAGCTATCTTTCGACCTCGAGGTGGAC
+TCTCTTTTCGTGGACTACCCAGATGTCAAGGGACTGAGGTCATTTTTCGGATTTGAGAGC
+AACAAGACGGCGACAAATCCAACTGCGAGTCAATCGTCTTCGTCCATTTCAAGCGGCACT
+TCGGTCTTCGATACATCACCTTCTCCCACAGACTTAGACATCCTAACTCCAGAATCCAGC
+CTCTCACAAGAGGAGTTCGAGCAACCGCTCACAATAGCAACAAAGCCACTTCCACCCGCA
+ACTTCAGTCACTCTGCAGGGTTTACCCTCCAAGGCACACAAGATACTTTTCCTTTTCCCA
+GATGGCTCTGGCTCAGCAACATCATACGCGAAACTCCCCCGACTCGGTGCGGACGTAGCC
+ATTATCGGCCTGAACTCACCCTACCTGATGGACGGCGCCAACATGACCTGCACCTTCGAC
+GAGCTCGTTACACTGTACCTCACAGAAATCCAGCGACGTCAACCCGCAGGCCCATACCAC
+TTGGGCGGCTGGTCCGCCGGTGGCATTCTCGCTTACCGCGCTGCGCAAATCCTCCAAAAA
+GCCGCCGCCAACCCCCAGAAACCAGTAGTAGAATCCCTGCTCCTCCTCGACTCTCCACCA
+CCAACAGGGCTCGGCAAGCTCCCCAAACATTTCTTTGACTACTGTGACCAAATTGGCATT
+TTCGGGCAAGGGACAGCCAAGGCCCCGGAGTGGCTGATCACCCATTTCCAGGGCACGAAC
+TCCGTTCTGCACGAATACCACGCCACGCCGTTCTCATTCGGTACAGCACCCAGAACTGGG
+ATCATCTGGGCTTCGCAGACAGTGTTCGAGACGAGGGCCGTGGCGCCCCCACCTGTACGT
+CCTGACGATACGGAGGACATGAAGTTTTTGACGGAGCGACGGACAGATTTCTCGGCCGGG
+TCTTGGGGACATATGTTTCCTGGTACAGAGGTATTGATTGAGACGGCCTATGGGGCGGAT
+CATTTTAGTTTGCTGGTGAGTCTTCTCTTCCGTGATTAAGTTGCGAATACTAATAGAGGC
+TATAGCAGGAGGAACCCTATAAGGGTGCCGTCAGGGCGTTCATGTCTCGAGTTTTGCAGT
+TATAAGGGCTAGAGGAGCAGAGGTTGGTGGCAATAAAGTCGTCCTCACTGCTGGGTACAT
+TCATTTGGATGAATTCTTCTTTTTTCGTCGTGTTTTCATTACTGTATGTATTTTGATGTT
+GGGTTATACCTCTAGGTCGGGATAACGCTTTTCGGCTGTGGCATGACAACCGGAATATAT
+ATAATAGAACAATCCTATGTACATCTTTGCTGTGCTTACACGACGCACAG
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/clusters.tsv	Sun Nov 21 16:53:12 2021 +0000
@@ -0,0 +1,2 @@
+sequence_id	bgc_id	start	end	average_p	max_p	type	alkaloid_probability	polyketide_probability	ripp_probability	saccharide_probability	terpene_probability	nrp_probability	other_probability	proteins	domains
+BGC0001866.1	BGC0001866.1_cluster_1	347	32979	0.9969495815733557	0.9999999447224028	Polyketide	0.0	0.98	0.0	0.0	0.0	0.14	0.0	BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23	PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/features.tsv	Sun Nov 21 16:53:12 2021 +0000
@@ -0,0 +1,38 @@
+sequence_id	protein_id	start	end	strand	domain	hmm	i_evalue	pvalue	domain_start	domain_end	bgc_probability
+BGC0001866.1	BGC0001866.1_1	347	1489	-	PF00394	Pfam	2.1941888078432915e-08	8.178117062405111e-12	1	63	0.9852038761627908
+BGC0001866.1	BGC0001866.1_1	347	1489	-	PF07731	Pfam	3.9374169295176556e-23	1.467542649838858e-26	150	281	0.9852038761627908
+BGC0001866.1	BGC0001866.1_6	3946	4389	+	PF00891	Pfam	4.743887678074703e-16	1.7681280946979883e-19	17	121	0.9910535094227727
+BGC0001866.1	BGC0001866.1_7	4683	5138	+	PF00135	Pfam	4.674605664377319e-21	1.7423055029360116e-24	48	140	0.9913598896683397
+BGC0001866.1	BGC0001866.1_8	5384	5812	+	PF00135	Pfam	3.9706994470948554e-30	1.4799476135277136e-33	2	114	0.9925093258822111
+BGC0001866.1	BGC0001866.1_9	5823	6599	+	PF00135	Pfam	1.4185801852307574e-15	5.287291037013632e-19	2	209	0.9946019708257335
+BGC0001866.1	BGC0001866.1_10	7758	9029	+	PF13434	Pfam	5.777178703900199e-08	2.153253337271785e-11	13	124	0.9978201609931655
+BGC0001866.1	BGC0001866.1_10	7758	9029	+	PF00743	Pfam	5.089108077410868e-07	1.8967976434628658e-10	36	102	0.9978201609931655
+BGC0001866.1	BGC0001866.1_13	11550	12662	+	PF07690	Pfam	5.839871260376694e-37	2.1766199255969786e-40	1	362	0.9990971143689635
+BGC0001866.1	BGC0001866.1_13	11550	12662	+	PF06609	Pfam	9.543170598318239e-09	3.55690294383833e-12	17	244	0.9990971143689635
+BGC0001866.1	BGC0001866.1_15	14920	15912	+	PF08493	Pfam	2.6165794251055913e-17	9.752439154325723e-21	139	224	0.9999977987864139
+BGC0001866.1	BGC0001866.1_16	17173	19143	+	PF00109	Pfam	9.025888536170949e-60	3.364103069761815e-63	2	248	0.9999994272691842
+BGC0001866.1	BGC0001866.1_16	17173	19143	+	PF02801	Pfam	2.2171445990751238e-35	8.263677223537547e-39	257	368	0.9999994272691842
+BGC0001866.1	BGC0001866.1_16	17173	19143	+	PF16197	Pfam	3.8698172759236842e-25	1.4423471024687604e-28	371	487	0.9999994272691842
+BGC0001866.1	BGC0001866.1_16	17173	19143	+	PF00698	Pfam	1.0799913424517567e-26	4.025312495161225e-30	512	648	0.9999994272691842
+BGC0001866.1	BGC0001866.1_17	19152	22424	+	PF00698	Pfam	2.639223271303753e-16	9.836836642950999e-20	2	151	0.9999940983719267
+BGC0001866.1	BGC0001866.1_17	19152	22424	+	PF14765	Pfam	2.520598829779557e-60	9.394703055458656e-64	228	504	0.9999940983719267
+BGC0001866.1	BGC0001866.1_17	19152	22424	+	PF13489	Pfam	1.0131254482174088e-12	3.776091868123029e-16	661	817	0.9999940983719267
+BGC0001866.1	BGC0001866.1_17	19152	22424	+	PF13847	Pfam	8.939870258494623e-11	3.332042586095648e-14	666	776	0.9999940983719267
+BGC0001866.1	BGC0001866.1_17	19152	22424	+	PF13649	Pfam	2.319131521369124e-13	8.643799930559537e-17	667	764	0.9999940983719267
+BGC0001866.1	BGC0001866.1_17	19152	22424	+	PF08242	Pfam	3.6288099491186147e-22	1.3525195486837923e-25	668	766	0.9999940983719267
+BGC0001866.1	BGC0001866.1_17	19152	22424	+	PF08241	Pfam	5.245291385894328e-12	1.9550098344742185e-15	668	767	0.9999940983719267
+BGC0001866.1	BGC0001866.1_18	22762	23235	+	PF00107	Pfam	1.0960342036668699e-15	4.085106983476965e-19	12	117	0.9999176675645223
+BGC0001866.1	BGC0001866.1_19	23268	24623	+	PF08659	Pfam	1.5141662612831146e-61	5.643556695054471e-65	65	239	0.9999724741067139
+BGC0001866.1	BGC0001866.1_19	23268	24623	+	PF00106	Pfam	1.1379002942545491e-07	4.2411490654288077e-11	68	221	0.9999724741067139
+BGC0001866.1	BGC0001866.1_19	23268	24623	+	PF00550	Pfam	3.359618716013185e-10	1.2521873708584363e-13	384	437	0.9999724741067139
+BGC0001866.1	BGC0001866.1_20	25769	26056	+	PF16073	Pfam	1.3071857188363548e-23	4.872104803713585e-27	8	94	0.999988513111687
+BGC0001866.1	BGC0001866.1_21	26544	29999	+	PF16073	Pfam	8.208876065249628e-11	3.059588544632735e-14	2	47	0.9999999447224028
+BGC0001866.1	BGC0001866.1_21	26544	29999	+	PF00109	Pfam	2.667462237983852e-82	9.942088102809735e-86	178	426	0.9999999447224028
+BGC0001866.1	BGC0001866.1_21	26544	29999	+	PF02801	Pfam	2.4031043351141288e-34	8.956780973217029e-38	434	555	0.9999999447224028
+BGC0001866.1	BGC0001866.1_21	26544	29999	+	PF16197	Pfam	2.535893425129411e-07	9.451708628883381e-11	567	673	0.9999999447224028
+BGC0001866.1	BGC0001866.1_21	26544	29999	+	PF00698	Pfam	4.597134671955754e-38	1.7134307387088164e-41	709	1012	0.9999999447224028
+BGC0001866.1	BGC0001866.1_22	30150	30890	+	PF14765	Pfam	7.778696660229127e-11	2.8992533209948296e-14	39	244	0.9999460955852995
+BGC0001866.1	BGC0001866.1_23	30937	32979	+	PF00550	Pfam	5.884377030377924e-14	2.193207987468477e-17	67	128	0.9997314383315643
+BGC0001866.1	BGC0001866.1_23	30937	32979	+	PF00550	Pfam	3.9212317886052276e-10	1.461510170930014e-13	174	238	0.9997314383315643
+BGC0001866.1	BGC0001866.1_23	30937	32979	+	PF00550	Pfam	1.367829688372301e-08	5.098135252971677e-12	299	360	0.9997314383315643
+BGC0001866.1	BGC0001866.1_23	30937	32979	+	PF00975	Pfam	6.711355516947163e-24	2.5014370171252933e-27	443	550	0.9997314383315643