Mercurial > repos > mvdbeek > dedup_hash
diff README.rst @ 0:f33e9e6a6c88 draft default tip
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
author | mvdbeek |
---|---|
date | Wed, 23 Nov 2016 07:49:05 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Wed Nov 23 07:49:05 2016 -0500 @@ -0,0 +1,26 @@ +.. image:: https://travis-ci.org/mvdbeek/dedup_hash.svg?branch=master + :target: https://travis-ci.org/mvdbeek/dedup_hash + +dedup_hash +---------------------------- + + +This is a commandline utility to remove exact duplicate reads +from paired-end fastq files. Reads are assumed to be in 2 separate +files. Read sequences are then concatenated and a short hash is calculated on +the concatenated sequence. If the hash has been previsouly seen the read will +be dropped from the output file. This means that reads that have the same +start and end coordinate, but differ in lengths will not be removed (but those +will be "flattened" to at most 1 occurence). + +This algorithm is very simple and fast, and saves memory as compared to +reading the whole fastq file into memory, such as fastuniq does. + +Installation +------------ + +depdup_city relies on the cityhash python package, +which supports python-2.7 exclusively. + +``pip install dedup_hash`` +