# pysal.viz.mapclassify.Jenks_Caspall_Sampled¶

class pysal.viz.mapclassify.Jenks_Caspall_Sampled(y, k=5, pct=0.1)[source]

Jenks Caspall Map Classification using a random sample

Parameters: y : array (n,1), values to classify k : int number of classes required pct : float The percentage of n that should form the sample If pct is specified such that n*pct > 1000, then pct = 1000./n

Notes

This is intended for large n problems. The logic is to apply Jenks_Caspall to a random subset of the y space and then bin the complete vector y on the bins obtained from the subset. This would trade off some “accuracy” for a gain in speed.

Examples

>>> import pysal.viz.mapclassify as mc
>>> x = np.random.random(100000)
>>> jc = mc.Jenks_Caspall(x)
>>> jcs = mc.Jenks_Caspall_Sampled(x)
>>> jc.bins
array([0.1988721 , 0.39624334, 0.59441487, 0.79624357, 0.99999251])
>>> jcs.bins
array([0.20998558, 0.42112792, 0.62752937, 0.80543819, 0.99999251])
>>> jc.counts
array([19943, 19510, 19547, 20297, 20703])
>>> jcs.counts
array([21039, 20908, 20425, 17813, 19815])


# not for testing since we get different times on different hardware # just included for documentation of likely speed gains #>>> t1 = time.time(); jc = Jenks_Caspall(x); t2 = time.time() #>>> t1s = time.time(); jcs = Jenks_Caspall_Sampled(x); t2s = time.time() #>>> t2 - t1; t2s - t1s #1.8292930126190186 #0.061631917953491211

Attributes: yb : array (n,1), bin ids for observations, bins : array (k,1), the upper bounds of each class k : int the number of classes counts : array (k,1), the number of observations falling in each class

Methods

 __call__(*args, **kwargs) This will allow the classifier to be called like it’s a function. find_bin(x) Sort input or inputs according to the current bin estimate get_adcm() Absolute deviation around class median (ADCM). get_gadf() Goodness of absolute deviation of fit get_tss() Total sum of squares around class means make(*args, **kwargs) Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function. update([y, inplace]) Add data or change classification parameters.
__init__(y, k=5, pct=0.1)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

 __init__(y[, k, pct]) Initialize self. find_bin(x) Sort input or inputs according to the current bin estimate get_adcm() Absolute deviation around class median (ADCM). get_gadf() Goodness of absolute deviation of fit get_tss() Total sum of squares around class means make(*args, **kwargs) Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function. update([y, inplace]) Add data or change classification parameters.