Mercurial > repos > mvdbeek > dedup_hash
comparison README.rst @ 0:f33e9e6a6c88 draft default tip
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
author | mvdbeek |
---|---|
date | Wed, 23 Nov 2016 07:49:05 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:f33e9e6a6c88 |
---|---|
1 .. image:: https://travis-ci.org/mvdbeek/dedup_hash.svg?branch=master | |
2 :target: https://travis-ci.org/mvdbeek/dedup_hash | |
3 | |
4 dedup_hash | |
5 ---------------------------- | |
6 | |
7 | |
8 This is a commandline utility to remove exact duplicate reads | |
9 from paired-end fastq files. Reads are assumed to be in 2 separate | |
10 files. Read sequences are then concatenated and a short hash is calculated on | |
11 the concatenated sequence. If the hash has been previsouly seen the read will | |
12 be dropped from the output file. This means that reads that have the same | |
13 start and end coordinate, but differ in lengths will not be removed (but those | |
14 will be "flattened" to at most 1 occurence). | |
15 | |
16 This algorithm is very simple and fast, and saves memory as compared to | |
17 reading the whole fastq file into memory, such as fastuniq does. | |
18 | |
19 Installation | |
20 ------------ | |
21 | |
22 depdup_city relies on the cityhash python package, | |
23 which supports python-2.7 exclusively. | |
24 | |
25 ``pip install dedup_hash`` | |
26 |