Mercurial > repos > bgruening > chemfp
annotate nxn_clustering.py @ 9:97899048dfa1 draft
"planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 621ba9193927fb8454e915303169276cac764f69"
| author | bgruening | 
|---|---|
| date | Tue, 10 Sep 2019 09:35:23 -0400 | 
| parents | 198b1e30c739 | 
| children | 3b14765c22ee | 
| rev | line source | 
|---|---|
| 2 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 1 #!/usr/bin/env python | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 2 """ | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 3 Modified version of code examples from the chemfp project. | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 4 http://code.google.com/p/chem-fingerprints/ | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 5 Thanks to Andrew Dalke of Andrew Dalke Scientific! | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 6 """ | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 7 import matplotlib | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 8 matplotlib.use('Agg') | 
| 8 
198b1e30c739
"planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit e78367b77f2294891914151f642685644d43a5b7"
 bgruening parents: 
7diff
changeset | 9 from matplotlib import rcParams | 
| 
198b1e30c739
"planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit e78367b77f2294891914151f642685644d43a5b7"
 bgruening parents: 
7diff
changeset | 10 rcParams.update({'figure.autolayout': True}) | 
| 2 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 11 import argparse | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 12 import os | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 13 import chemfp | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 14 import scipy.cluster.hierarchy as hcluster | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 15 import pylab | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 16 import numpy | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 17 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 18 def distance_matrix(arena, tanimoto_threshold = 0.0): | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 19 n = len(arena) | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 20 # Start off a similarity matrix with 1.0s along the diagonal | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 21 try: | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 22 similarities = numpy.identity(n, "d") | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 23 except: | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 24 raise Exception('Input dataset is to large!') | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 25 chemfp.set_num_threads( args.processors ) | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 26 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 27 ## Compute the full similarity matrix. | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 28 # The implementation computes the upper-triangle then copies | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 29 # the upper-triangle into lower-triangle. It does not include | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 30 # terms for the diagonal. | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 31 results = chemfp.search.threshold_tanimoto_search_symmetric(arena, threshold=tanimoto_threshold) | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 32 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 33 # Copy the results into the NumPy array. | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 34 for row_index, row in enumerate(results.iter_indices_and_scores()): | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 35 for target_index, target_score in row: | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 36 similarities[row_index, target_index] = target_score | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 37 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 38 # Return the distance matrix using the similarity matrix | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 39 return 1.0 - similarities | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 40 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 41 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 42 if __name__ == "__main__": | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 43 parser = argparse.ArgumentParser(description="""NxN clustering for fps files. | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 44 For more details please see the chemfp documentation: | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 45 https://chemfp.readthedocs.org | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 46 """) | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 47 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 48 parser.add_argument("-i", "--input", dest="input_path", | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 49 required=True, | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 50 help="Path to the input file.") | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 51 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 52 parser.add_argument("-c", "--cluster", dest="cluster_image", | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 53 help="Path to the output cluster image.") | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 54 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 55 parser.add_argument("-s", "--smatrix", dest="similarity_matrix", | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 56 help="Path to the similarity matrix output file.") | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 57 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 58 parser.add_argument("-t", "--threshold", dest="tanimoto_threshold", | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 59 type=float, default=0.0, | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 60 help="Tanimoto threshold [0.0]") | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 61 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 62 parser.add_argument("--oformat", default='png', help="Output format (png, svg)") | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 63 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 64 parser.add_argument('-p', '--processors', type=int, | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 65 default=4) | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 66 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 67 args = parser.parse_args() | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 68 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 69 targets = chemfp.open( args.input_path, format='fps' ) | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 70 arena = chemfp.load_fingerprints( targets ) | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 71 distances = distance_matrix( arena, args.tanimoto_threshold ) | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 72 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 73 if args.similarity_matrix: | 
| 7 
0d88631bb7de
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit ed9b6859de648aa5f7cde483732f5df20aaff90e
 bgruening parents: 
2diff
changeset | 74 numpy.savetxt(args.similarity_matrix, distances) | 
| 2 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 75 | 
| 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 76 if args.cluster_image: | 
| 7 
0d88631bb7de
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit ed9b6859de648aa5f7cde483732f5df20aaff90e
 bgruening parents: 
2diff
changeset | 77 linkage = hcluster.linkage(distances, method="single", metric="euclidean") | 
| 
0d88631bb7de
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit ed9b6859de648aa5f7cde483732f5df20aaff90e
 bgruening parents: 
2diff
changeset | 78 hcluster.dendrogram(linkage, labels=arena.ids, leaf_rotation=90.) | 
| 
0d88631bb7de
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit ed9b6859de648aa5f7cde483732f5df20aaff90e
 bgruening parents: 
2diff
changeset | 79 pylab.savefig(args.cluster_image, format=args.oformat) | 
| 2 
70b071de9bee
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/chemicaltoolbox/chemfp commit 01da22e4184a5a6f6a3dd4631a7b9c31d1b6d502
 bgruening parents: diff
changeset | 80 | 
