Mercurial > repos > mvdbeek > dedup_hash
annotate README.rst @ 0:f33e9e6a6c88 draft default tip
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
author | mvdbeek |
---|---|
date | Wed, 23 Nov 2016 07:49:05 -0500 |
parents | |
children |
rev | line source |
---|---|
0
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
1 .. image:: https://travis-ci.org/mvdbeek/dedup_hash.svg?branch=master |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
2 :target: https://travis-ci.org/mvdbeek/dedup_hash |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
3 |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
4 dedup_hash |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
5 ---------------------------- |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
6 |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
7 |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
8 This is a commandline utility to remove exact duplicate reads |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
9 from paired-end fastq files. Reads are assumed to be in 2 separate |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
10 files. Read sequences are then concatenated and a short hash is calculated on |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
11 the concatenated sequence. If the hash has been previsouly seen the read will |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
12 be dropped from the output file. This means that reads that have the same |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
13 start and end coordinate, but differ in lengths will not be removed (but those |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
14 will be "flattened" to at most 1 occurence). |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
15 |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
16 This algorithm is very simple and fast, and saves memory as compared to |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
17 reading the whole fastq file into memory, such as fastuniq does. |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
18 |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
19 Installation |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
20 ------------ |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
21 |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
22 depdup_city relies on the cityhash python package, |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
23 which supports python-2.7 exclusively. |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
24 |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
25 ``pip install dedup_hash`` |
f33e9e6a6c88
planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
mvdbeek
parents:
diff
changeset
|
26 |