Mercurial > repos > guerler > springsuite
annotate planemo/lib/python3.7/site-packages/boltons/statsutils.py @ 0:d30785e31577 draft
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
| author | guerler | 
|---|---|
| date | Fri, 31 Jul 2020 00:18:57 -0400 | 
| parents | |
| children | 
| rev | line source | 
|---|---|
| 
0
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
1 # -*- coding: utf-8 -*- | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
2 """``statsutils`` provides tools aimed primarily at descriptive | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
3 statistics for data analysis, such as :func:`mean` (average), | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
4 :func:`median`, :func:`variance`, and many others, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
6 The :class:`Stats` type provides all the main functionality of the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
7 ``statsutils`` module. A :class:`Stats` object wraps a given dataset, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
8 providing all statistical measures as property attributes. These | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
9 attributes cache their results, which allows efficient computation of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
10 multiple measures, as many measures rely on other measures. For | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
11 example, relative standard deviation (:attr:`Stats.rel_std_dev`) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
12 relies on both the mean and standard deviation. The Stats object | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
13 caches those results so no rework is done. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
14 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
15 The :class:`Stats` type's attributes have module-level counterparts for | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
16 convenience when the computation reuse advantages do not apply. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
17 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
18 >>> stats = Stats(range(42)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
19 >>> stats.mean | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
20 20.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
21 >>> mean(range(42)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
22 20.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
23 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
24 Statistics is a large field, and ``statsutils`` is focused on a few | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
25 basic techniques that are useful in software. The following is a brief | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
26 introduction to those techniques. For a more in-depth introduction, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
27 `Statistics for Software | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
28 <https://www.paypal-engineering.com/2016/04/11/statistics-for-software/>`_, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
29 an article I wrote on the topic. It introduces key terminology vital | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
30 to effective usage of statistics. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
31 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
32 Statistical moments | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
33 ------------------- | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
34 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
35 Python programmers are probably familiar with the concept of the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
36 *mean* or *average*, which gives a rough quantitiative middle value by | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
37 which a sample can be can be generalized. However, the mean is just | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
38 the first of four `moment`_-based measures by which a sample or | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
39 distribution can be measured. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
40 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
41 The four `Standardized moments`_ are: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
42 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
43 1. `Mean`_ - :func:`mean` - theoretical middle value | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
44 2. `Variance`_ - :func:`variance` - width of value dispersion | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
45 3. `Skewness`_ - :func:`skewness` - symmetry of distribution | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
46 4. `Kurtosis`_ - :func:`kurtosis` - "peakiness" or "long-tailed"-ness | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
47 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
48 For more information check out `the Moment article on Wikipedia`_. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
49 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
50 .. _moment: https://en.wikipedia.org/wiki/Moment_(mathematics) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
51 .. _Standardized moments: https://en.wikipedia.org/wiki/Standardized_moment | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
52 .. _Mean: https://en.wikipedia.org/wiki/Mean | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
53 .. _Variance: https://en.wikipedia.org/wiki/Variance | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
54 .. _Skewness: https://en.wikipedia.org/wiki/Skewness | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
55 .. _Kurtosis: https://en.wikipedia.org/wiki/Kurtosis | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
56 .. _the Moment article on Wikipedia: https://en.wikipedia.org/wiki/Moment_(mathematics) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
57 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
58 Keep in mind that while these moments can give a bit more insight into | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
59 the shape and distribution of data, they do not guarantee a complete | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
60 picture. Wildly different datasets can have the same values for all | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
61 four moments, so generalize wisely. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
62 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
63 Robust statistics | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
64 ----------------- | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
65 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
66 Moment-based statistics are notorious for being easily skewed by | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
67 outliers. The whole field of robust statistics aims to mitigate this | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
68 dilemma. ``statsutils`` also includes several robust statistical methods: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
69 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
70 * `Median`_ - The middle value of a sorted dataset | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
71 * `Trimean`_ - Another robust measure of the data's central tendency | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
72 * `Median Absolute Deviation`_ (MAD) - A robust measure of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
73 variability, a natural counterpart to :func:`variance`. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
74 * `Trimming`_ - Reducing a dataset to only the middle majority of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
75 data is a simple way of making other estimators more robust. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
76 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
77 .. _Median: https://en.wikipedia.org/wiki/Median | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
78 .. _Trimean: https://en.wikipedia.org/wiki/Trimean | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
79 .. _Median Absolute Deviation: https://en.wikipedia.org/wiki/Median_absolute_deviation | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
80 .. _Trimming: https://en.wikipedia.org/wiki/Trimmed_estimator | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
81 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
82 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
83 Online and Offline Statistics | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
84 ----------------------------- | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
85 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
86 Unrelated to computer networking, `online`_ statistics involve | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
87 calculating statistics in a `streaming`_ fashion, without all the data | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
88 being available. The :class:`Stats` type is meant for the more | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
89 traditional offline statistics when all the data is available. For | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
90 pure-Python online statistics accumulators, look at the `Lithoxyl`_ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
91 system instrumentation package. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
92 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
93 .. _Online: https://en.wikipedia.org/wiki/Online_algorithm | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
94 .. _streaming: https://en.wikipedia.org/wiki/Streaming_algorithm | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
95 .. _Lithoxyl: https://github.com/mahmoud/lithoxyl | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
96 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
97 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
98 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
99 from __future__ import print_function | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
100 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
101 import bisect | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
102 from math import floor, ceil | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
103 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
104 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
105 class _StatsProperty(object): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
106 def __init__(self, name, func): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
107 self.name = name | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
108 self.func = func | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
109 self.internal_name = '_' + name | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
110 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
111 doc = func.__doc__ or '' | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
112 pre_doctest_doc, _, _ = doc.partition('>>>') | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
113 self.__doc__ = pre_doctest_doc | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
114 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
115 def __get__(self, obj, objtype=None): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
116 if obj is None: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
117 return self | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
118 if not obj.data: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
119 return obj.default | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
120 try: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
121 return getattr(obj, self.internal_name) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
122 except AttributeError: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
123 setattr(obj, self.internal_name, self.func(obj)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
124 return getattr(obj, self.internal_name) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
125 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
126 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
127 class Stats(object): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
128 """The ``Stats`` type is used to represent a group of unordered | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
129 statistical datapoints for calculations such as mean, median, and | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
130 variance. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
131 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
132 Args: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
133 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
134 data (list): List or other iterable containing numeric values. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
135 default (float): A value to be returned when a given | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
136 statistical measure is not defined. 0.0 by default, but | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
137 ``float('nan')`` is appropriate for stricter applications. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
138 use_copy (bool): By default Stats objects copy the initial | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
139 data into a new list to avoid issues with | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
140 modifications. Pass ``False`` to disable this behavior. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
141 is_sorted (bool): Presorted data can skip an extra sorting | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
142 step for a little speed boost. Defaults to False. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
143 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
144 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
145 def __init__(self, data, default=0.0, use_copy=True, is_sorted=False): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
146 self._use_copy = use_copy | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
147 self._is_sorted = is_sorted | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
148 if use_copy: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
149 self.data = list(data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
150 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
151 self.data = data | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
152 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
153 self.default = default | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
154 cls = self.__class__ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
155 self._prop_attr_names = [a for a in dir(self) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
156 if isinstance(getattr(cls, a, None), | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
157 _StatsProperty)] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
158 self._pearson_precision = 0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
159 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
160 def __len__(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
161 return len(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
162 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
163 def __iter__(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
164 return iter(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
165 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
166 def _get_sorted_data(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
167 """When using a copy of the data, it's better to have that copy be | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
168 sorted, but we do it lazily using this method, in case no | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
169 sorted measures are used. I.e., if median is never called, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
170 sorting would be a waste. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
171 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
172 When not using a copy, it's presumed that all optimizations | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
173 are on the user. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
174 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
175 if not self._use_copy: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
176 return sorted(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
177 elif not self._is_sorted: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
178 self.data.sort() | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
179 return self.data | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
180 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
181 def clear_cache(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
182 """``Stats`` objects automatically cache intermediary calculations | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
183 that can be reused. For instance, accessing the ``std_dev`` | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
184 attribute after the ``variance`` attribute will be | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
185 significantly faster for medium-to-large datasets. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
186 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
187 If you modify the object by adding additional data points, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
188 call this function to have the cached statistics recomputed. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
189 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
190 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
191 for attr_name in self._prop_attr_names: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
192 attr_name = getattr(self.__class__, attr_name).internal_name | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
193 if not hasattr(self, attr_name): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
194 continue | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
195 delattr(self, attr_name) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
196 return | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
197 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
198 def _calc_count(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
199 """The number of items in this Stats object. Returns the same as | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
200 :func:`len` on a Stats object, but provided for pandas terminology | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
201 parallelism. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
202 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
203 >>> Stats(range(20)).count | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
204 20 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
205 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
206 return len(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
207 count = _StatsProperty('count', _calc_count) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
208 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
209 def _calc_mean(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
210 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
211 The arithmetic mean, or "average". Sum of the values divided by | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
212 the number of values. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
213 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
214 >>> mean(range(20)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
215 9.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
216 >>> mean(list(range(19)) + [949]) # 949 is an arbitrary outlier | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
217 56.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
218 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
219 return sum(self.data, 0.0) / len(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
220 mean = _StatsProperty('mean', _calc_mean) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
221 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
222 def _calc_max(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
223 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
224 The maximum value present in the data. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
225 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
226 >>> Stats([2, 1, 3]).max | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
227 3 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
228 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
229 if self._is_sorted: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
230 return self.data[-1] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
231 return max(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
232 max = _StatsProperty('max', _calc_max) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
233 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
234 def _calc_min(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
235 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
236 The minimum value present in the data. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
237 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
238 >>> Stats([2, 1, 3]).min | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
239 1 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
240 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
241 if self._is_sorted: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
242 return self.data[0] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
243 return min(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
244 min = _StatsProperty('min', _calc_min) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
245 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
246 def _calc_median(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
247 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
248 The median is either the middle value or the average of the two | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
249 middle values of a sample. Compared to the mean, it's generally | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
250 more resilient to the presence of outliers in the sample. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
251 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
252 >>> median([2, 1, 3]) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
253 2 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
254 >>> median(range(97)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
255 48 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
256 >>> median(list(range(96)) + [1066]) # 1066 is an arbitrary outlier | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
257 48 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
258 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
259 return self._get_quantile(self._get_sorted_data(), 0.5) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
260 median = _StatsProperty('median', _calc_median) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
261 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
262 def _calc_iqr(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
263 """Inter-quartile range (IQR) is the difference between the 75th | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
264 percentile and 25th percentile. IQR is a robust measure of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
265 dispersion, like standard deviation, but safer to compare | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
266 between datasets, as it is less influenced by outliers. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
267 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
268 >>> iqr([1, 2, 3, 4, 5]) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
269 2 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
270 >>> iqr(range(1001)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
271 500 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
272 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
273 return self.get_quantile(0.75) - self.get_quantile(0.25) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
274 iqr = _StatsProperty('iqr', _calc_iqr) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
275 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
276 def _calc_trimean(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
277 """The trimean is a robust measure of central tendency, like the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
278 median, that takes the weighted average of the median and the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
279 upper and lower quartiles. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
280 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
281 >>> trimean([2, 1, 3]) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
282 2.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
283 >>> trimean(range(97)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
284 48.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
285 >>> trimean(list(range(96)) + [1066]) # 1066 is an arbitrary outlier | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
286 48.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
287 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
288 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
289 sorted_data = self._get_sorted_data() | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
290 gq = lambda q: self._get_quantile(sorted_data, q) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
291 return (gq(0.25) + (2 * gq(0.5)) + gq(0.75)) / 4.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
292 trimean = _StatsProperty('trimean', _calc_trimean) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
293 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
294 def _calc_variance(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
295 """\ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
296 Variance is the average of the squares of the difference between | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
297 each value and the mean. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
298 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
299 >>> variance(range(97)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
300 784.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
301 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
302 global mean # defined elsewhere in this file | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
303 return mean(self._get_pow_diffs(2)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
304 variance = _StatsProperty('variance', _calc_variance) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
305 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
306 def _calc_std_dev(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
307 """\ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
308 Standard deviation. Square root of the variance. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
309 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
310 >>> std_dev(range(97)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
311 28.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
312 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
313 return self.variance ** 0.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
314 std_dev = _StatsProperty('std_dev', _calc_std_dev) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
315 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
316 def _calc_median_abs_dev(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
317 """\ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
318 Median Absolute Deviation is a robust measure of statistical | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
319 dispersion: http://en.wikipedia.org/wiki/Median_absolute_deviation | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
320 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
321 >>> median_abs_dev(range(97)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
322 24.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
323 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
324 global median # defined elsewhere in this file | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
325 sorted_vals = sorted(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
326 x = float(median(sorted_vals)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
327 return median([abs(x - v) for v in sorted_vals]) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
328 median_abs_dev = _StatsProperty('median_abs_dev', _calc_median_abs_dev) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
329 mad = median_abs_dev # convenience | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
330 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
331 def _calc_rel_std_dev(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
332 """\ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
333 Standard deviation divided by the absolute value of the average. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
334 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
335 http://en.wikipedia.org/wiki/Relative_standard_deviation | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
336 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
337 >>> print('%1.3f' % rel_std_dev(range(97))) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
338 0.583 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
339 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
340 abs_mean = abs(self.mean) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
341 if abs_mean: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
342 return self.std_dev / abs_mean | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
343 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
344 return self.default | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
345 rel_std_dev = _StatsProperty('rel_std_dev', _calc_rel_std_dev) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
346 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
347 def _calc_skewness(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
348 """\ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
349 Indicates the asymmetry of a curve. Positive values mean the bulk | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
350 of the values are on the left side of the average and vice versa. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
351 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
352 http://en.wikipedia.org/wiki/Skewness | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
353 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
354 See the module docstring for more about statistical moments. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
355 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
356 >>> skewness(range(97)) # symmetrical around 48.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
357 0.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
358 >>> left_skewed = skewness(list(range(97)) + list(range(10))) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
359 >>> right_skewed = skewness(list(range(97)) + list(range(87, 97))) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
360 >>> round(left_skewed, 3), round(right_skewed, 3) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
361 (0.114, -0.114) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
362 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
363 data, s_dev = self.data, self.std_dev | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
364 if len(data) > 1 and s_dev > 0: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
365 return (sum(self._get_pow_diffs(3)) / | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
366 float((len(data) - 1) * (s_dev ** 3))) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
367 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
368 return self.default | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
369 skewness = _StatsProperty('skewness', _calc_skewness) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
370 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
371 def _calc_kurtosis(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
372 """\ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
373 Indicates how much data is in the tails of the distribution. The | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
374 result is always positive, with the normal "bell-curve" | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
375 distribution having a kurtosis of 3. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
376 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
377 http://en.wikipedia.org/wiki/Kurtosis | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
378 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
379 See the module docstring for more about statistical moments. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
380 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
381 >>> kurtosis(range(9)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
382 1.99125 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
383 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
384 With a kurtosis of 1.99125, [0, 1, 2, 3, 4, 5, 6, 7, 8] is more | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
385 centrally distributed than the normal curve. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
386 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
387 data, s_dev = self.data, self.std_dev | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
388 if len(data) > 1 and s_dev > 0: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
389 return (sum(self._get_pow_diffs(4)) / | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
390 float((len(data) - 1) * (s_dev ** 4))) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
391 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
392 return 0.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
393 kurtosis = _StatsProperty('kurtosis', _calc_kurtosis) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
394 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
395 def _calc_pearson_type(self): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
396 precision = self._pearson_precision | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
397 skewness = self.skewness | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
398 kurtosis = self.kurtosis | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
399 beta1 = skewness ** 2.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
400 beta2 = kurtosis * 1.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
401 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
402 # TODO: range checks? | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
403 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
404 c0 = (4 * beta2) - (3 * beta1) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
405 c1 = skewness * (beta2 + 3) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
406 c2 = (2 * beta2) - (3 * beta1) - 6 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
407 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
408 if round(c1, precision) == 0: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
409 if round(beta2, precision) == 3: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
410 return 0 # Normal | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
411 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
412 if beta2 < 3: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
413 return 2 # Symmetric Beta | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
414 elif beta2 > 3: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
415 return 7 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
416 elif round(c2, precision) == 0: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
417 return 3 # Gamma | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
418 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
419 k = c1 ** 2 / (4 * c0 * c2) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
420 if k < 0: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
421 return 1 # Beta | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
422 raise RuntimeError('missed a spot') | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
423 pearson_type = _StatsProperty('pearson_type', _calc_pearson_type) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
424 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
425 @staticmethod | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
426 def _get_quantile(sorted_data, q): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
427 data, n = sorted_data, len(sorted_data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
428 idx = q / 1.0 * (n - 1) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
429 idx_f, idx_c = int(floor(idx)), int(ceil(idx)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
430 if idx_f == idx_c: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
431 return data[idx_f] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
432 return (data[idx_f] * (idx_c - idx)) + (data[idx_c] * (idx - idx_f)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
433 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
434 def get_quantile(self, q): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
435 """Get a quantile from the dataset. Quantiles are floating point | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
436 values between ``0.0`` and ``1.0``, with ``0.0`` representing | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
437 the minimum value in the dataset and ``1.0`` representing the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
438 maximum. ``0.5`` represents the median: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
439 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
440 >>> Stats(range(100)).get_quantile(0.5) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
441 49.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
442 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
443 q = float(q) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
444 if not 0.0 <= q <= 1.0: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
445 raise ValueError('expected q between 0.0 and 1.0, not %r' % q) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
446 elif not self.data: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
447 return self.default | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
448 return self._get_quantile(self._get_sorted_data(), q) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
449 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
450 def get_zscore(self, value): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
451 """Get the z-score for *value* in the group. If the standard deviation | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
452 is 0, 0 inf or -inf will be returned to indicate whether the value is | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
453 equal to, greater than or below the group's mean. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
454 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
455 mean = self.mean | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
456 if self.std_dev == 0: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
457 if value == mean: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
458 return 0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
459 if value > mean: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
460 return float('inf') | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
461 if value < mean: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
462 return float('-inf') | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
463 return (float(value) - mean) / self.std_dev | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
464 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
465 def trim_relative(self, amount=0.15): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
466 """A utility function used to cut a proportion of values off each end | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
467 of a list of values. This has the effect of limiting the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
468 effect of outliers. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
469 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
470 Args: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
471 amount (float): A value between 0.0 and 0.5 to trim off of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
472 each side of the data. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
473 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
474 .. note: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
475 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
476 This operation modifies the data in-place. It does not | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
477 make or return a copy. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
478 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
479 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
480 trim = float(amount) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
481 if not 0.0 <= trim < 0.5: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
482 raise ValueError('expected amount between 0.0 and 0.5, not %r' | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
483 % trim) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
484 size = len(self.data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
485 size_diff = int(size * trim) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
486 if size_diff == 0.0: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
487 return | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
488 self.data = self._get_sorted_data()[size_diff:-size_diff] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
489 self.clear_cache() | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
490 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
491 def _get_pow_diffs(self, power): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
492 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
493 A utility function used for calculating statistical moments. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
494 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
495 m = self.mean | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
496 return [(v - m) ** power for v in self.data] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
497 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
498 def _get_bin_bounds(self, count=None, with_max=False): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
499 if not self.data: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
500 return [0.0] # TODO: raise? | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
501 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
502 data = self.data | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
503 len_data, min_data, max_data = len(data), min(data), max(data) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
504 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
505 if len_data < 4: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
506 if not count: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
507 count = len_data | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
508 dx = (max_data - min_data) / float(count) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
509 bins = [min_data + (dx * i) for i in range(count)] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
510 elif count is None: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
511 # freedman algorithm for fixed-width bin selection | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
512 q25, q75 = self.get_quantile(0.25), self.get_quantile(0.75) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
513 dx = 2 * (q75 - q25) / (len_data ** (1 / 3.0)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
514 bin_count = max(1, int(ceil((max_data - min_data) / dx))) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
515 bins = [min_data + (dx * i) for i in range(bin_count + 1)] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
516 bins = [b for b in bins if b < max_data] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
517 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
518 dx = (max_data - min_data) / float(count) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
519 bins = [min_data + (dx * i) for i in range(count)] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
520 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
521 if with_max: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
522 bins.append(float(max_data)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
523 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
524 return bins | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
525 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
526 def get_histogram_counts(self, bins=None, **kw): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
527 """Produces a list of ``(bin, count)`` pairs comprising a histogram of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
528 the Stats object's data, using fixed-width bins. See | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
529 :meth:`Stats.format_histogram` for more details. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
530 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
531 Args: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
532 bins (int): maximum number of bins, or list of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
533 floating-point bin boundaries. Defaults to the output of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
534 Freedman's algorithm. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
535 bin_digits (int): Number of digits used to round down the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
536 bin boundaries. Defaults to 1. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
537 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
538 The output of this method can be stored and/or modified, and | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
539 then passed to :func:`statsutils.format_histogram_counts` to | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
540 achieve the same text formatting as the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
541 :meth:`~Stats.format_histogram` method. This can be useful for | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
542 snapshotting over time. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
543 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
544 bin_digits = int(kw.pop('bin_digits', 1)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
545 if kw: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
546 raise TypeError('unexpected keyword arguments: %r' % kw.keys()) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
547 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
548 if not bins: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
549 bins = self._get_bin_bounds() | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
550 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
551 try: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
552 bin_count = int(bins) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
553 except TypeError: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
554 try: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
555 bins = [float(x) for x in bins] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
556 except Exception: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
557 raise ValueError('bins expected integer bin count or list' | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
558 ' of float bin boundaries, not %r' % bins) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
559 if self.min < bins[0]: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
560 bins = [self.min] + bins | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
561 else: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
562 bins = self._get_bin_bounds(bin_count) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
563 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
564 # floor and ceil really should have taken ndigits, like round() | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
565 round_factor = 10.0 ** bin_digits | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
566 bins = [floor(b * round_factor) / round_factor for b in bins] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
567 bins = sorted(set(bins)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
568 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
569 idxs = [bisect.bisect(bins, d) - 1 for d in self.data] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
570 count_map = {} # would have used Counter, but py26 support | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
571 for idx in idxs: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
572 try: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
573 count_map[idx] += 1 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
574 except KeyError: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
575 count_map[idx] = 1 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
576 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
577 bin_counts = [(b, count_map.get(i, 0)) for i, b in enumerate(bins)] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
578 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
579 return bin_counts | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
580 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
581 def format_histogram(self, bins=None, **kw): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
582 """Produces a textual histogram of the data, using fixed-width bins, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
583 allowing for simple visualization, even in console environments. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
584 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
585 >>> data = list(range(20)) + list(range(5, 15)) + [10] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
586 >>> print(Stats(data).format_histogram(width=30)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
587 0.0: 5 ######### | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
588 4.4: 8 ############### | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
589 8.9: 11 #################### | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
590 13.3: 5 ######### | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
591 17.8: 2 #### | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
592 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
593 In this histogram, five values are between 0.0 and 4.4, eight | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
594 are between 4.4 and 8.9, and two values lie between 17.8 and | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
595 the max. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
596 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
597 You can specify the number of bins, or provide a list of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
598 bin boundaries themselves. If no bins are provided, as in the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
599 example above, `Freedman's algorithm`_ for bin selection is | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
600 used. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
601 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
602 Args: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
603 bins (int): Maximum number of bins for the | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
604 histogram. Also accepts a list of floating-point | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
605 bin boundaries. If the minimum boundary is still | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
606 greater than the minimum value in the data, that | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
607 boundary will be implicitly added. Defaults to the bin | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
608 boundaries returned by `Freedman's algorithm`_. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
609 bin_digits (int): Number of digits to round each bin | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
610 to. Note that bins are always rounded down to avoid | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
611 clipping any data. Defaults to 1. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
612 width (int): integer number of columns in the longest line | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
613 in the histogram. Defaults to console width on Python | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
614 3.3+, or 80 if that is not available. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
615 format_bin (callable): Called on each bin to create a | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
616 label for the final output. Use this function to add | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
617 units, such as "ms" for milliseconds. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
618 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
619 Should you want something more programmatically reusable, see | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
620 the :meth:`~Stats.get_histogram_counts` method, the output of | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
621 is used by format_histogram. The :meth:`~Stats.describe` | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
622 method is another useful summarization method, albeit less | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
623 visual. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
624 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
625 .. _Freedman's algorithm: https://en.wikipedia.org/wiki/Freedman%E2%80%93Diaconis_rule | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
626 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
627 width = kw.pop('width', None) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
628 format_bin = kw.pop('format_bin', None) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
629 bin_counts = self.get_histogram_counts(bins=bins, **kw) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
630 return format_histogram_counts(bin_counts, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
631 width=width, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
632 format_bin=format_bin) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
633 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
634 def describe(self, quantiles=None, format=None): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
635 """Provides standard summary statistics for the data in the Stats | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
636 object, in one of several convenient formats. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
637 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
638 Args: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
639 quantiles (list): A list of numeric values to use as | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
640 quantiles in the resulting summary. All values must be | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
641 0.0-1.0, with 0.5 representing the median. Defaults to | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
642 ``[0.25, 0.5, 0.75]``, representing the standard | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
643 quartiles. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
644 format (str): Controls the return type of the function, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
645 with one of three valid values: ``"dict"`` gives back | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
646 a :class:`dict` with the appropriate keys and | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
647 values. ``"list"`` is a list of key-value pairs in an | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
648 order suitable to pass to an OrderedDict or HTML | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
649 table. ``"text"`` converts the values to text suitable | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
650 for printing, as seen below. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
651 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
652 Here is the information returned by a default ``describe``, as | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
653 presented in the ``"text"`` format: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
654 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
655 >>> stats = Stats(range(1, 8)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
656 >>> print(stats.describe(format='text')) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
657 count: 7 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
658 mean: 4.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
659 std_dev: 2.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
660 mad: 2.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
661 min: 1 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
662 0.25: 2.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
663 0.5: 4 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
664 0.75: 5.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
665 max: 7 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
666 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
667 For more advanced descriptive statistics, check out my blog | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
668 post on the topic `Statistics for Software | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
669 <https://www.paypal-engineering.com/2016/04/11/statistics-for-software/>`_. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
670 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
671 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
672 if format is None: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
673 format = 'dict' | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
674 elif format not in ('dict', 'list', 'text'): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
675 raise ValueError('invalid format for describe,' | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
676 ' expected one of "dict"/"list"/"text", not %r' | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
677 % format) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
678 quantiles = quantiles or [0.25, 0.5, 0.75] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
679 q_items = [] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
680 for q in quantiles: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
681 q_val = self.get_quantile(q) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
682 q_items.append((str(q), q_val)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
683 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
684 items = [('count', self.count), | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
685 ('mean', self.mean), | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
686 ('std_dev', self.std_dev), | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
687 ('mad', self.mad), | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
688 ('min', self.min)] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
689 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
690 items.extend(q_items) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
691 items.append(('max', self.max)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
692 if format == 'dict': | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
693 ret = dict(items) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
694 elif format == 'list': | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
695 ret = items | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
696 elif format == 'text': | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
697 ret = '\n'.join(['%s%s' % ((label + ':').ljust(10), val) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
698 for label, val in items]) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
699 return ret | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
700 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
701 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
702 def describe(data, quantiles=None, format=None): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
703 """A convenience function to get standard summary statistics useful | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
704 for describing most data. See :meth:`Stats.describe` for more | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
705 details. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
706 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
707 >>> print(describe(range(7), format='text')) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
708 count: 7 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
709 mean: 3.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
710 std_dev: 2.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
711 mad: 2.0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
712 min: 0 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
713 0.25: 1.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
714 0.5: 3 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
715 0.75: 4.5 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
716 max: 6 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
717 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
718 See :meth:`Stats.format_histogram` for another very useful | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
719 summarization that uses textual visualization. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
720 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
721 return Stats(data).describe(quantiles=quantiles, format=format) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
722 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
723 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
724 def _get_conv_func(attr_name): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
725 def stats_helper(data, default=0.0): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
726 return getattr(Stats(data, default=default, use_copy=False), | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
727 attr_name) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
728 return stats_helper | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
729 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
730 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
731 for attr_name, attr in list(Stats.__dict__.items()): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
732 if isinstance(attr, _StatsProperty): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
733 if attr_name in ('max', 'min', 'count'): # don't shadow builtins | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
734 continue | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
735 if attr_name in ('mad',): # convenience aliases | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
736 continue | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
737 func = _get_conv_func(attr_name) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
738 func.__doc__ = attr.func.__doc__ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
739 globals()[attr_name] = func | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
740 delattr(Stats, '_calc_' + attr_name) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
741 # cleanup | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
742 del attr | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
743 del attr_name | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
744 del func | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
745 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
746 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
747 def format_histogram_counts(bin_counts, width=None, format_bin=None): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
748 """The formatting logic behind :meth:`Stats.format_histogram`, which | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
749 takes the output of :meth:`Stats.get_histogram_counts`, and passes | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
750 them to this function. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
751 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
752 Args: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
753 bin_counts (list): A list of bin values to counts. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
754 width (int): Number of character columns in the text output, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
755 defaults to 80 or console width in Python 3.3+. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
756 format_bin (callable): Used to convert bin values into string | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
757 labels. | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
758 """ | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
759 lines = [] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
760 if not format_bin: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
761 format_bin = lambda v: v | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
762 if not width: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
763 try: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
764 import shutil # python 3 convenience | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
765 width = shutil.get_terminal_size()[0] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
766 except Exception: | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
767 width = 80 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
768 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
769 bins = [b for b, _ in bin_counts] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
770 count_max = max([count for _, count in bin_counts]) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
771 count_cols = len(str(count_max)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
772 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
773 labels = ['%s' % format_bin(b) for b in bins] | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
774 label_cols = max([len(l) for l in labels]) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
775 tmp_line = '%s: %s #' % ('x' * label_cols, count_max) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
776 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
777 bar_cols = max(width - len(tmp_line), 3) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
778 line_k = float(bar_cols) / count_max | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
779 tmpl = "{label:>{label_cols}}: {count:>{count_cols}} {bar}" | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
780 for label, (bin_val, count) in zip(labels, bin_counts): | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
781 bar_len = int(round(count * line_k)) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
782 bar = ('#' * bar_len) or '|' | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
783 line = tmpl.format(label=label, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
784 label_cols=label_cols, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
785 count=count, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
786 count_cols=count_cols, | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
787 bar=bar) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
788 lines.append(line) | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
789 | 
| 
 
d30785e31577
"planemo upload commit 6eee67778febed82ddd413c3ca40b3183a3898f1"
 
guerler 
parents:  
diff
changeset
 | 
790 return '\n'.join(lines) | 
