esda.mapclassify — Choropleth map classification

New in version 1.0.

A module of classification schemes for choropleth mapping.

class pysal.esda.mapclassify.Map_Classifier(y)[source]

Abstract class for all map classifications [Slocum2008]

For an array y of n values, a map classifier places each value y_i into one of k mutually exclusive and exhaustive classes. Each classifer defines the classes based on different criteria, but in all cases the following hold for the classifiers in PySAL:

C_j^l < y_i \le C_j^u  \ \ \forall  i \in C_j

where C_j denotes class j which has lower bound
C_j^l and upper bound C_j^u.

Map Classifiers Supported

Utilities:

In addition to the classifiers, there are several utility functions that can be used to evaluate the properties of a specific classifier for different parameter values, or for automatic selection of a classifier and number of classes.

find_bin(x)[source]

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()[source]

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()[source]

Goodness of absolute deviation of fit

get_tss()[source]

Total sum of squares around class means

Returns sum of squares over all class means

classmethod make(*args, **kwargs)[source]

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
pysal.esda.mapclassify.quantile(y, k=4)[source]

Calculates the quantiles for an array

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of quantiles
Returns:

implicit – (n,1), quantile values

Return type:

array

Examples

>>> x = np.arange(1000)
>>> quantile(x)
array([ 249.75,  499.5 ,  749.25,  999.  ])
>>> quantile(x, k = 3)
array([ 333.,  666.,  999.])
>>>

Note that if there are enough ties that the quantile values repeat, we collapse to pseudo quantiles in which case the number of classes will be less than k

>>> x = [1.0] * 100
>>> x.extend([3.0] * 40)
>>> len(x)
140
>>> y = np.array(x)
>>> quantile(y)
array([ 1.,  3.])
class pysal.esda.mapclassify.Box_Plot(y, hinge=1.5)[source]

Box_Plot Map Classification

Parameters:
  • y (array) – attribute to classify
  • hinge (float) – multiplier for IQR
yb

array – (n,1), bin ids for observations

bins

array – (n,1), the upper bounds of each class (monotonic)

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

low_outlier_ids

array – indices of observations that are low outliers

high_outlier_ids

array – indices of observations that are high outliers

Notes

The bins are set as follows:

bins[0] = q[0]-hinge*IQR
bins[1] = q[0]
bins[2] = q[1]
bins[3] = q[2]
bins[4] = q[2]+hinge*IQR
bins[5] = inf  (see Notes)

where q is an array of the first three quartiles of y and IQR=q[2]-q[0]

If q[2]+hinge*IQR > max(y) there will only be 5 classes and no high outliers, otherwise, there will be 6 classes and at least one high outlier.

Examples

>>> cal = load_example()
>>> bp = Box_Plot(cal)
>>> bp.bins
array([ -5.28762500e+01,   2.56750000e+00,   9.36500000e+00,
         3.95300000e+01,   9.49737500e+01,   4.11145000e+03])
>>> bp.counts
array([ 0, 15, 14, 14,  6,  9])
>>> bp.high_outlier_ids
array([ 0,  6, 18, 29, 33, 36, 37, 40, 42])
>>> cal[bp.high_outlier_ids]
array([  329.92,   181.27,   370.5 ,   722.85,   192.05,   110.74,
        4111.45,   317.11,   264.93])
>>> bx = Box_Plot(np.arange(100))
>>> bx.bins
array([ -49.5 ,   24.75,   49.5 ,   74.25,  148.5 ])
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Equal_Interval(y, k=5)[source]

Equal Interval Classification

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
yb

array – (n,1), bin ids for observations, each value is the id of the class the observation belongs to yb[i] = j for j>=1 if bins[j-1] < y[i] <= bins[j], yb[i] = 0 otherwise

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> cal = load_example()
>>> ei = Equal_Interval(cal, k = 5)
>>> ei.k
5
>>> ei.counts
array([57,  0,  0,  0,  1])
>>> ei.bins
array([  822.394,  1644.658,  2466.922,  3289.186,  4111.45 ])
>>>

Notes

Intervals defined to have equal width:

bins_j = min(y)+w*(j+1)

with w=\frac{max(y)-min(j)}{k}

find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Fisher_Jenks(y, k=5)[source]

Fisher Jenks optimal classifier - mean based

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
yb

array – (n,1), bin ids for observations

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> cal = load_example()
>>> fj = Fisher_Jenks(cal)
>>> fj.adcm
799.24000000000001
>>> fj.bins
array([   75.29,   192.05,   370.5 ,   722.85,  4111.45])
>>> fj.counts
array([49,  3,  4,  1,  1])
>>>
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Fisher_Jenks_Sampled(y, k=5, pct=0.1, truncate=True)[source]

Fisher Jenks optimal classifier - mean based using random sample

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
  • pct (float) – The percentage of n that should form the sample If pct is specified such that n*pct > 1000, then pct = 1000./n, unless truncate is False
  • truncate (boolean) – truncate pct in cases where pct * n > 1000., (Default True)
yb

array – (n,1), bin ids for observations

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

(Turned off due to timing being different across hardware)

find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Jenks_Caspall(y, k=5)[source]

Jenks Caspall Map Classification

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
yb

array – (n,1), bin ids for observations,

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> cal = load_example()
>>> jc = Jenks_Caspall(cal, k = 5)
>>> jc.bins
array([  1.81000000e+00,   7.60000000e+00,   2.98200000e+01,
         1.81270000e+02,   4.11145000e+03])
>>> jc.counts
array([14, 13, 14, 10,  7])
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Jenks_Caspall_Forced(y, k=5)[source]

Jenks Caspall Map Classification with forced movements

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
yb

array – (n,1), bin ids for observations

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> cal = load_example()
>>> jcf = Jenks_Caspall_Forced(cal, k = 5)
>>> jcf.k
5
>>> jcf.bins
array([[  1.34000000e+00],
       [  5.90000000e+00],
       [  1.67000000e+01],
       [  5.06500000e+01],
       [  4.11145000e+03]])
>>> jcf.counts
array([12, 12, 13,  9, 12])
>>> jcf4 = Jenks_Caspall_Forced(cal, k = 4)
>>> jcf4.k
4
>>> jcf4.bins
array([[  2.51000000e+00],
       [  8.70000000e+00],
       [  3.66800000e+01],
       [  4.11145000e+03]])
>>> jcf4.counts
array([15, 14, 14, 15])
>>>
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Jenks_Caspall_Sampled(y, k=5, pct=0.1)[source]

Jenks Caspall Map Classification using a random sample

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
  • pct (float) – The percentage of n that should form the sample If pct is specified such that n*pct > 1000, then pct = 1000./n
yb

array – (n,1), bin ids for observations,

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> cal = load_example()
>>> x = np.random.random(100000)
>>> jc = Jenks_Caspall(x)
>>> jcs = Jenks_Caspall_Sampled(x)
>>> jc.bins
array([ 0.19770952,  0.39695769,  0.59588617,  0.79716865,  0.99999425])
>>> jcs.bins
array([ 0.18877882,  0.39341638,  0.6028286 ,  0.80070925,  0.99999425])
>>> jc.counts
array([19804, 20005, 19925, 20178, 20088])
>>> jcs.counts
array([18922, 20521, 20980, 19826, 19751])
>>>

# not for testing since we get different times on different hardware # just included for documentation of likely speed gains #>>> t1 = time.time(); jc = Jenks_Caspall(x); t2 = time.time() #>>> t1s = time.time(); jcs = Jenks_Caspall_Sampled(x); t2s = time.time() #>>> t2 - t1; t2s - t1s #1.8292930126190186 #0.061631917953491211

Notes

This is intended for large n problems. The logic is to apply Jenks_Caspall to a random subset of the y space and then bin the complete vector y on the bins obtained from the subset. This would trade off some “accuracy” for a gain in speed.

find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Max_P_Classifier(y, k=5, initial=1000)[source]

Max_P Map Classification

Based on Max_p regionalization algorithm

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
  • initial (int) – number of initial solutions to use prior to swapping
yb

array – (n,1), bin ids for observations,

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> import pysal
>>> cal = pysal.esda.mapclassify.load_example()
>>> mp = pysal.Max_P_Classifier(cal)
>>> mp.bins
array([    8.7 ,    16.7 ,    20.47,    66.26,  4111.45])
>>> mp.counts
array([29,  8,  1, 10, 10])
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Maximum_Breaks(y, k=5, mindiff=0)[source]

Maximum Breaks Map Classification

Parameters:
  • y (array) – (n, 1), values to classify
  • k (int) – number of classes required
  • mindiff (float) – The minimum difference between class breaks
yb

array – (n, 1), bin ids for observations

bins

array – (k, 1), the upper bounds of each class

k

int – the number of classes

counts

array – (k, 1), the number of observations falling in each class (numpy array k x 1)

Examples

>>> cal = load_example()
>>> mb = Maximum_Breaks(cal, k = 5)
>>> mb.k
5
>>> mb.bins
array([  146.005,   228.49 ,   546.675,  2417.15 ,  4111.45 ])
>>> mb.counts
array([50,  2,  4,  1,  1])
>>>
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Natural_Breaks(y, k=5, initial=100)[source]

Natural Breaks Map Classification

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
  • initial (int) – number of initial solutions to generate, (default=100)
yb

array – (n,1), bin ids for observations,

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> import numpy
>>> import pysal
>>> numpy.random.seed(123456)
>>> cal = pysal.esda.mapclassify.load_example()
>>> nb = pysal.Natural_Breaks(cal, k=5)
>>> nb.k
5
>>> nb.counts
array([41,  9,  6,  1,  1])
>>> nb.bins
array([   29.82,   110.74,   370.5 ,   722.85,  4111.45])
>>> x = numpy.array([1] * 50)
>>> x[-1] = 20
>>> nb = pysal.Natural_Breaks(x, k = 5, initial = 0)
Warning: Not enough unique values in array to form k classes
Warning: setting k to 2
>>> nb.bins
array([ 1, 20])
>>> nb.counts
array([49,  1])

Notes

There is a tradeoff here between speed and consistency of the classification If you want more speed, set initial to a smaller value (0 would result in the best speed, if you want more consistent classes in multiple runs of Natural_Breaks on the same data, set initial to a higher value.

find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Quantiles(y, k=5)[source]

Quantile Map Classification

Parameters:
  • y (array) – (n,1), values to classify
  • k (int) – number of classes required
yb

array – (n,1), bin ids for observations, each value is the id of the class the observation belongs to yb[i] = j for j>=1 if bins[j-1] < y[i] <= bins[j], yb[i] = 0 otherwise

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> cal = load_example()
>>> q = Quantiles(cal, k = 5)
>>> q.bins
array([  1.46400000e+00,   5.79800000e+00,   1.32780000e+01,
         5.46160000e+01,   4.11145000e+03])
>>> q.counts
array([12, 11, 12, 11, 12])
>>>
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Percentiles(y, pct=[1, 10, 50, 90, 99, 100])[source]

Percentiles Map Classification

Parameters:
  • y (array) – attribute to classify
  • pct (array) – percentiles default=[1,10,50,90,99,100]
yb

array – bin ids for observations (numpy array n x 1)

bins

array – the upper bounds of each class (numpy array k x 1)

k

int – the number of classes

counts

int – the number of observations falling in each class (numpy array k x 1)

Examples

>>> cal = load_example()
>>> p = Percentiles(cal)
>>> p.bins
array([  1.35700000e-01,   5.53000000e-01,   9.36500000e+00,
         2.13914000e+02,   2.17994800e+03,   4.11145000e+03])
>>> p.counts
array([ 1,  5, 23, 23,  5,  1])
>>> p2 = Percentiles(cal, pct = [50, 100])
>>> p2.bins
array([    9.365,  4111.45 ])
>>> p2.counts
array([29, 29])
>>> p2.k
2
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.Std_Mean(y, multiples=[-2, -1, 1, 2])[source]

Standard Deviation and Mean Map Classification

Parameters:
  • y (array) – (n,1), values to classify
  • multiples (array) – the multiples of the standard deviation to add/subtract from the sample mean to define the bins, default=[-2,-1,1,2]
yb

array – (n,1), bin ids for observations,

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> cal = load_example()
>>> st = Std_Mean(cal)
>>> st.k
5
>>> st.bins
array([ -967.36235382,  -420.71712519,   672.57333208,  1219.21856072,
        4111.45      ])
>>> st.counts
array([ 0,  0, 56,  1,  1])
>>>
>>> st3 = Std_Mean(cal, multiples = [-3, -1.5, 1.5, 3])
>>> st3.bins
array([-1514.00758246,  -694.03973951,   945.8959464 ,  1765.86378936,
        4111.45      ])
>>> st3.counts
array([ 0,  0, 57,  0,  1])
>>>
find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
class pysal.esda.mapclassify.User_Defined(y, bins)[source]

User Specified Binning

Parameters:
  • y (array) – (n,1), values to classify
  • bins (array) – (k,1), upper bounds of classes (have to be monotically increasing)
yb

array – (n,1), bin ids for observations,

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> cal = load_example()
>>> bins = [20, max(cal)]
>>> bins
[20, 4111.4499999999998]
>>> ud = User_Defined(cal, bins)
>>> ud.bins
array([   20.  ,  4111.45])
>>> ud.counts
array([37, 21])
>>> bins = [20, 30]
>>> ud = User_Defined(cal, bins)
>>> ud.bins
array([   20.  ,    30.  ,  4111.45])
>>> ud.counts
array([37,  4, 17])
>>>

Notes

If upper bound of user bins does not exceed max(y) we append an additional bin.

find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –
pysal.esda.mapclassify.gadf(y, method='Quantiles', maxk=15, pct=0.8)[source]

Evaluate the Goodness of Absolute Deviation Fit of a Classifier Finds the minimum value of k for which gadf>pct

Parameters:
  • y (array) – (n, 1) values to be classified
  • method ({'Quantiles, 'Fisher_Jenks', 'Maximum_Breaks', 'Natrual_Breaks'}) –
  • maxk (int) – maximum value of k to evaluate
  • pct (float) – The percentage of GADF to exceed
Returns:

  • k (int) – number of classes
  • cl (object) – instance of the classifier at k
  • gadf (float) – goodness of absolute deviation fit

Examples

>>> cal = load_example()
>>> qgadf = gadf(cal)
>>> qgadf[0]
15
>>> qgadf[-1]
0.37402575909092828

Quantiles fail to exceed 0.80 before 15 classes. If we lower the bar to 0.2 we see quintiles as a result

>>> qgadf2 = gadf(cal, pct = 0.2)
>>> qgadf2[0]
5
>>> qgadf2[-1]
0.21710231966462412
>>>

Notes

The GADF is defined as:

GADF = 1 - \sum_c \sum_{i \in c}
       |y_i - y_{c,med}|  / \sum_i |y_i - y_{med}|

where y_{med} is the global median and y_{c,med} is the median for class c.

See also

K_classifiers

class pysal.esda.mapclassify.K_classifiers(y, pct=0.8)[source]

Evaluate all k-classifers and pick optimal based on k and GADF

Parameters:
  • y (array) – (n,1), values to be classified
  • pct (float) – The percentage of GADF to exceed
best

object – instance of the optimal Map_Classifier

results

dictionary – keys are classifier names, values are the Map_Classifier instances with the best pct for each classifer

Examples

>>> cal = load_example()
>>> ks = K_classifiers(cal)
>>> ks.best.name
'Fisher_Jenks'
>>> ks.best.k
4
>>> ks.best.gadf
0.84810327199081048
>>>

Notes

This can be used to suggest a classification scheme.

See also

gadf

class pysal.esda.mapclassify.HeadTail_Breaks(y)[source]

Head/tail Breaks Map Classification for Heavy-tailed Distributions

Parameters:y (array) – (n,1), values to classify
yb

array – (n,1), bin ids for observations,

bins

array – (k,1), the upper bounds of each class

k

int – the number of classes

counts

array – (k,1), the number of observations falling in each class

Examples

>>> import numpy as np
>>> np.random.seed(10)
>>> cal = load_example()
>>> htb = HeadTail_Breaks(cal)
>>> htb.k
3
>>> htb.counts
array([50,  7,  1])
>>> htb.bins
array([  125.92810345,   811.26      ,  4111.45      ])
>>> np.random.seed(123456)
>>> x = np.random.lognormal(3, 1, 1000)
>>> htb = HeadTail_Breaks(x)
>>> htb.bins
array([  32.26204423,   72.50205622,  128.07150107,  190.2899093 ,
        264.82847377,  457.88157946,  576.76046949])
>>> htb.counts
array([695, 209,  62,  22,  10,   1,   1])

Notes

Head/tail Breaks is a relatively new classification method developed and introduced by [Jiang2013] for data with a heavy-tailed distribution.

Based on contributions by Alessandra Sozzi <alessandra.sozzi@gmail.com>.

find_bin(x)

Sort input or inputs according to the current bin estimate

Parameters:x (array or numeric) – a value or array of values to fit within the estimated bins
Returns:
  • a bin index or array of bin indices that classify the input into one of
  • the classifiers’ bins
get_adcm()

Absolute deviation around class median (ADCM).

Calculates the absolute deviations of each observation about its class median as a measure of fit for the classification method.

Returns sum of ADCM over all classes

get_gadf()

Goodness of absolute deviation of fit

get_tss()

Total sum of squares around class means

Returns sum of squares over all class means

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

Note that this like a partial application of the relevant class constructor. make creates a function that returns classifications; it does not actually do the classification.

If you want to classify data directly, use the appropriate class constructor, like Quantiles, Max_Breaks, etc.

If you have a classifier object, but want to find which bins new data falls into, use find_bin.

Parameters:
  • *args (required positional arguments) – all positional arguments required by the classifier, excluding the input data.
  • rolling (bool) – a boolean configuring the outputted classifier to use a rolling classifier rather than a new classifier for each input. If rolling, this adds the current data to all of the previous data in the classifier, and rebalances the bins, like a running median computation.
  • return_object (bool) – a boolean configuring the outputted classifier to return the classifier object or not
  • return_bins (bool) – a boolean configuring the outputted classifier to return the bins/breaks or not
  • return_counts (bool) – a boolean configuring the outputted classifier to return the histogram of objects falling into each bin or not
Returns:

  • A function that consumes data and returns their bins (and object,
  • bins/breaks, or counts, if requested).

Note

This is most useful when you want to run a classifier many times with a given configuration, such as when classifying many columns of an array or dataframe using the same configuration.

Examples

>>> import pysal as ps
>>> df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
>>> classifier = ps.Quantiles.make(k=9)
>>> classifier
>>> classifications = df[['HOVAL', 'CRIME', 'INC']].apply(ps.Quantiles.make(k=9))
>>> classifications.head()
    HOVAL  CRIME   INC
0       8      0     7
1       7      1     8
2       2      3     5
3       4      4     0
4       1      6     3
>>> import pandas as pd; from numpy import linspace as lsp
>>> data = [lsp(3,8,num=10), lsp(10, 0, num=10), lsp(-5, 15, num=10)]
>>> data = pd.DataFrame(data).T
>>> data
         0          1          2
0 3.000000  10.000000  -5.000000
1 3.555556   8.888889  -2.777778
2 4.111111   7.777778  -0.555556
3 4.666667   6.666667   1.666667
4 5.222222   5.555556   3.888889
5 5.777778   4.444444   6.111111
6 6.333333   3.333333   8.333333
7 6.888888   2.222222  10.555556
8 7.444444   1.111111  12.777778
9 8.000000   0.000000  15.000000
>>> data.apply(ps.Quantiles.make(rolling=True))
    0   1   3
0   0   4   0
1   0   4   0
2   1   4   0
3   1   3   0
4   2   2   1
5   2   1   2
6   3   0   4
7   3   0   4
8   4   0   4
9   4   0   4
>>> dbf = ps.open(ps.examples.get_path('baltim.dbf'))
>>> data = dbf.by_col_array('PRICE', 'LOTSZ', 'SQFT')
>>> my_bins = [1, 10, 20, 40, 80]
>>> classifications = [ps.User_Defined.make(bins=my_bins)(a) for a in data.T]
>>> len(classifications)
3
>>> print(classifications)
[array([4, 5, 5, 5, 4, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 1, 2, 2, 3, 4, 4, 3, 3,
    ...
    2, 2, 2, 2])]
update(y=None, inplace=False, **kwargs)

Add data or change classification parameters.

Parameters:
  • y (array) – (n,1) array of data to classify
  • inplace (bool) – whether to conduct the update in place or to return a copy estimated from the additional specifications.
  • parameters provided in **kwargs are passed to the init (Additional) –
  • of the class. For documentation, check the class constructor. (function) –