pysal.viz.mapclassify.Jenks_Caspall_Sampled

class pysal.viz.mapclassify.Jenks_Caspall_Sampled(y, k=5, pct=0.1)[source]

Jenks Caspall Map Classification using a random sample

Parameters:
y : array

(n,1), values to classify

k : int

number of classes required

pct : float

The percentage of n that should form the sample If pct is specified such that n*pct > 1000, then pct = 1000./n

Notes

This is intended for large n problems. The logic is to apply Jenks_Caspall to a random subset of the y space and then bin the complete vector y on the bins obtained from the subset. This would trade off some “accuracy” for a gain in speed.

Examples

>>> import pysal.viz.mapclassify as mc
>>> cal = mc.load_example()
>>> x = np.random.random(100000)
>>> jc = mc.Jenks_Caspall(x)
>>> jcs = mc.Jenks_Caspall_Sampled(x)
>>> jc.bins
array([0.1988721 , 0.39624334, 0.59441487, 0.79624357, 0.99999251])
>>> jcs.bins
array([0.20998558, 0.42112792, 0.62752937, 0.80543819, 0.99999251])
>>> jc.counts
array([19943, 19510, 19547, 20297, 20703])
>>> jcs.counts
array([21039, 20908, 20425, 17813, 19815])

# not for testing since we get different times on different hardware # just included for documentation of likely speed gains #>>> t1 = time.time(); jc = Jenks_Caspall(x); t2 = time.time() #>>> t1s = time.time(); jcs = Jenks_Caspall_Sampled(x); t2s = time.time() #>>> t2 - t1; t2s - t1s #1.8292930126190186 #0.061631917953491211

Attributes:
yb : array

(n,1), bin ids for observations,

bins : array

(k,1), the upper bounds of each class

k : int

the number of classes

counts : array

(k,1), the number of observations falling in each class

Methods

__call__(*args, **kwargs) This will allow the classifier to be called like it’s a function.
find_bin(x) Sort input or inputs according to the current bin estimate
get_adcm() Absolute deviation around class median (ADCM).
get_gadf() Goodness of absolute deviation of fit
get_tss() Total sum of squares around class means
make(*args, **kwargs) Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.
update([y, inplace]) Add data or change classification parameters.
__init__(y, k=5, pct=0.1)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(y[, k, pct]) Initialize self.
find_bin(x) Sort input or inputs according to the current bin estimate
get_adcm() Absolute deviation around class median (ADCM).
get_gadf() Goodness of absolute deviation of fit
get_tss() Total sum of squares around class means
make(*args, **kwargs) Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.
update([y, inplace]) Add data or change classification parameters.