esda.smoothing — Smoothing of spatial rates

New in version 1.0.

class pysal.esda.smoothing.Excess_Risk(e, b)[source]

Excess Risk

Parameters:
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
r

array (n, 1) – execess risk values

Examples

Reading data in stl_hom.csv into stl to extract values for event and population-at-risk variables

>>> stl = pysal.open(pysal.examples.get_path('stl_hom.csv'), 'r')

The 11th and 14th columns in stl_hom.csv includes the number of homocides and population. Creating two arrays from these columns.

>>> stl_e, stl_b = np.array(stl[:,10]), np.array(stl[:,13])

Creating an instance of Excess_Risk class using stl_e and stl_b

>>> er = Excess_Risk(stl_e, stl_b)

Extracting the excess risk values through the property r of the Excess_Risk instance, er

>>> er.r[:10]
array([ 0.20665681,  0.43613787,  0.42078261,  0.22066928,  0.57981596,
        0.35301709,  0.56407549,  0.17020994,  0.3052372 ,  0.25821905])
class pysal.esda.smoothing.Empirical_Bayes(e, b)[source]

Aspatial Empirical Bayes Smoothing

Parameters:
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
r

array (n, 1) – rate values from Empirical Bayes Smoothing

Examples

Reading data in stl_hom.csv into stl to extract values for event and population-at-risk variables

>>> stl = pysal.open(pysal.examples.get_path('stl_hom.csv'), 'r')

The 11th and 14th columns in stl_hom.csv includes the number of homocides and population. Creating two arrays from these columns.

>>> stl_e, stl_b = np.array(stl[:,10]), np.array(stl[:,13])

Creating an instance of Empirical_Bayes class using stl_e and stl_b

>>> eb = Empirical_Bayes(stl_e, stl_b)

Extracting the risk values through the property r of the Empirical_Bayes instance, eb

>>> eb.r[:10]
array([  2.36718950e-05,   4.54539167e-05,   4.78114019e-05,
         2.76907146e-05,   6.58989323e-05,   3.66494122e-05,
         5.79952721e-05,   2.03064590e-05,   3.31152999e-05,
         3.02748380e-05])
class pysal.esda.smoothing.Spatial_Empirical_Bayes(e, b, w)[source]

Spatial Empirical Bayes Smoothing

Parameters:
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
  • w (spatial weights instance) –
r

array (n, 1) – rate values from Empirical Bayes Smoothing

Examples

Reading data in stl_hom.csv into stl to extract values for event and population-at-risk variables

>>> stl = pysal.open(pysal.examples.get_path('stl_hom.csv'), 'r')

The 11th and 14th columns in stl_hom.csv includes the number of homocides and population. Creating two arrays from these columns.

>>> stl_e, stl_b = np.array(stl[:,10]), np.array(stl[:,13])

Creating a spatial weights instance by reading in stl.gal file.

>>> stl_w = pysal.open(pysal.examples.get_path('stl.gal'), 'r').read()

Ensuring that the elements in the spatial weights instance are ordered by the given sequential numbers from 1 to the number of observations in stl_hom.csv

>>> if not stl_w.id_order_set: stl_w.id_order = range(1,len(stl) + 1)

Creating an instance of Spatial_Empirical_Bayes class using stl_e, stl_b, and stl_w

>>> s_eb = Spatial_Empirical_Bayes(stl_e, stl_b, stl_w)

Extracting the risk values through the property r of s_eb

>>> s_eb.r[:10]
array([  4.01485749e-05,   3.62437513e-05,   4.93034844e-05,
         5.09387329e-05,   3.72735210e-05,   3.69333797e-05,
         5.40245456e-05,   2.99806055e-05,   3.73034109e-05,
         3.47270722e-05])
class pysal.esda.smoothing.Spatial_Rate(e, b, w)[source]

Spatial Rate Smoothing

Parameters:
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
  • w (spatial weights instance) –
r

array (n, 1) – rate values from spatial rate smoothing

Examples

Reading data in stl_hom.csv into stl to extract values for event and population-at-risk variables

>>> stl = pysal.open(pysal.examples.get_path('stl_hom.csv'), 'r')

The 11th and 14th columns in stl_hom.csv includes the number of homocides and population. Creating two arrays from these columns.

>>> stl_e, stl_b = np.array(stl[:,10]), np.array(stl[:,13])

Creating a spatial weights instance by reading in stl.gal file.

>>> stl_w = pysal.open(pysal.examples.get_path('stl.gal'), 'r').read()

Ensuring that the elements in the spatial weights instance are ordered by the given sequential numbers from 1 to the number of observations in stl_hom.csv

>>> if not stl_w.id_order_set: stl_w.id_order = range(1,len(stl) + 1)

Creating an instance of Spatial_Rate class using stl_e, stl_b, and stl_w

>>> sr = Spatial_Rate(stl_e,stl_b,stl_w)

Extracting the risk values through the property r of sr

>>> sr.r[:10]
array([  4.59326407e-05,   3.62437513e-05,   4.98677081e-05,
         5.09387329e-05,   3.72735210e-05,   4.01073093e-05,
         3.79372794e-05,   3.27019246e-05,   4.26204928e-05,
         3.47270722e-05])
class pysal.esda.smoothing.Kernel_Smoother(e, b, w)[source]

Kernal smoothing

Parameters:
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
  • w (Kernel weights instance) –
r

array (n, 1) – rate values from spatial rate smoothing

Examples

Creating an array including event values for 6 regions

>>> e = np.array([10, 1, 3, 4, 2, 5])

Creating another array including population-at-risk values for the 6 regions

>>> b = np.array([100, 15, 20, 20, 80, 90])

Creating a list containing geographic coordinates of the 6 regions’ centroids

>>> points=[(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)]

Creating a kernel-based spatial weights instance by using the above points

>>> kw=Kernel(points)

Ensuring that the elements in the kernel-based weights are ordered by the given sequential numbers from 0 to 5

>>> if not kw.id_order_set: kw.id_order = range(0,len(points))

Applying kernel smoothing to e and b

>>> kr = Kernel_Smoother(e, b, kw)

Extracting the smoothed rates through the property r of the Kernel_Smoother instance

>>> kr.r
array([ 0.10543301,  0.0858573 ,  0.08256196,  0.09884584,  0.04756872,
        0.04845298])
class pysal.esda.smoothing.Age_Adjusted_Smoother(e, b, w, s, alpha=0.05)[source]

Age-adjusted rate smoothing

Parameters:
  • e (array (n*h, 1)) – event variable measured for each age group across n spatial units
  • b (array (n*h, 1)) – population at risk variable measured for each age group across n spatial units
  • w (spatial weights instance) –
  • s (array (n*h, 1)) – standard population for each age group across n spatial units
r

array (n, 1) – rate values from spatial rate smoothing

Notes

Weights used to smooth age-specific events and populations are simple binary weights

Examples

Creating an array including 12 values for the 6 regions with 2 age groups

>>> e = np.array([10, 8, 1, 4, 3, 5, 4, 3, 2, 1, 5, 3])

Creating another array including 12 population-at-risk values for the 6 regions

>>> b = np.array([100, 90, 15, 30, 25, 20, 30, 20, 80, 80, 90, 60])

For age adjustment, we need another array of values containing standard population s includes standard population data for the 6 regions

>>> s = np.array([98, 88, 15, 29, 20, 23, 33, 25, 76, 80, 89, 66])

Creating a list containing geographic coordinates of the 6 regions’ centroids

>>> points=[(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)]

Creating a kernel-based spatial weights instance by using the above points

>>> kw=Kernel(points)

Ensuring that the elements in the kernel-based weights are ordered by the given sequential numbers from 0 to 5

>>> if not kw.id_order_set: kw.id_order = range(0,len(points))

Applying age-adjusted smoothing to e and b

>>> ar = Age_Adjusted_Smoother(e, b, kw, s)

Extracting the smoothed rates through the property r of the Age_Adjusted_Smoother instance

>>> ar.r
array([ 0.10519625,  0.08494318,  0.06440072,  0.06898604,  0.06952076,
        0.05020968])
classmethod by_col(df, e, b, w=None, s=None, **kwargs)[source]

Compute smoothing by columns in a dataframe.

Parameters:
  • df (pandas.DataFrame) – a dataframe containing the data to be smoothed
  • e (string or list of strings) – the name or names of columns containing event variables to be smoothed
  • b (string or list of strings) – the name or names of columns containing the population variables to be smoothed
  • w (pysal.weights.W or list of pysal.weights.W) – the spatial weights object or objects to use with the event-population pairs. If not provided and a weights object is in the dataframe’s metadata, that weights object will be used.
  • s (string or list of strings) – the name or names of columns to use as a standard population variable for the events e and at-risk populations b.
  • inplace (bool) – a flag denoting whether to output a copy of df with the relevant smoothed columns appended, or to append the columns directly to df itself.
  • **kwargs (optional keyword arguments) – optional keyword options that are passed directly to the smoother.
Returns:

  • a copy of df containing the columns. Or, if inplace, this returns
  • None, but implicitly adds columns to df.

class pysal.esda.smoothing.Disk_Smoother(e, b, w)[source]

Locally weighted averages or disk smoothing

Parameters:
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
  • w (spatial weights matrix) –
r

array (n, 1) – rate values from disk smoothing

Examples

Reading data in stl_hom.csv into stl to extract values for event and population-at-risk variables

>>> stl = pysal.open(pysal.examples.get_path('stl_hom.csv'), 'r')

The 11th and 14th columns in stl_hom.csv includes the number of homocides and population. Creating two arrays from these columns.

>>> stl_e, stl_b = np.array(stl[:,10]), np.array(stl[:,13])

Creating a spatial weights instance by reading in stl.gal file.

>>> stl_w = pysal.open(pysal.examples.get_path('stl.gal'), 'r').read()

Ensuring that the elements in the spatial weights instance are ordered by the given sequential numbers from 1 to the number of observations in stl_hom.csv

>>> if not stl_w.id_order_set: stl_w.id_order = range(1,len(stl) + 1)

Applying disk smoothing to stl_e and stl_b

>>> sr = Disk_Smoother(stl_e,stl_b,stl_w)

Extracting the risk values through the property r of s_eb

>>> sr.r[:10]
array([  4.56502262e-05,   3.44027685e-05,   3.38280487e-05,
         4.78530468e-05,   3.12278573e-05,   2.22596997e-05,
         2.67074856e-05,   2.36924573e-05,   3.48801587e-05,
         3.09511832e-05])
class pysal.esda.smoothing.Spatial_Median_Rate(e, b, w, aw=None, iteration=1)[source]

Spatial Median Rate Smoothing

Parameters:
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
  • w (spatial weights instance) –
  • aw (array (n, 1)) – auxiliary weight variable measured across n spatial units
  • iteration (integer) – the number of interations
r

array (n, 1) – rate values from spatial median rate smoothing

w

spatial weights instance

aw

array (n, 1) – auxiliary weight variable measured across n spatial units

Examples

Reading data in stl_hom.csv into stl to extract values for event and population-at-risk variables

>>> stl = pysal.open(pysal.examples.get_path('stl_hom.csv'), 'r')

The 11th and 14th columns in stl_hom.csv includes the number of homocides and population. Creating two arrays from these columns.

>>> stl_e, stl_b = np.array(stl[:,10]), np.array(stl[:,13])

Creating a spatial weights instance by reading in stl.gal file.

>>> stl_w = pysal.open(pysal.examples.get_path('stl.gal'), 'r').read()

Ensuring that the elements in the spatial weights instance are ordered by the given sequential numbers from 1 to the number of observations in stl_hom.csv

>>> if not stl_w.id_order_set: stl_w.id_order = range(1,len(stl) + 1)

Computing spatial median rates without iteration

>>> smr0 = Spatial_Median_Rate(stl_e,stl_b,stl_w)

Extracting the computed rates through the property r of the Spatial_Median_Rate instance

>>> smr0.r[:10]
array([  3.96047383e-05,   3.55386859e-05,   3.28308921e-05,
         4.30731238e-05,   3.12453969e-05,   1.97300409e-05,
         3.10159267e-05,   2.19279204e-05,   2.93763432e-05,
         2.93763432e-05])

Recomputing spatial median rates with 5 iterations

>>> smr1 = Spatial_Median_Rate(stl_e,stl_b,stl_w,iteration=5)

Extracting the computed rates through the property r of the Spatial_Median_Rate instance

>>> smr1.r[:10]
array([  3.11293620e-05,   2.95956330e-05,   3.11293620e-05,
         3.10159267e-05,   2.98436066e-05,   2.76406686e-05,
         3.10159267e-05,   2.94788171e-05,   2.99460806e-05,
         2.96981070e-05])

Computing spatial median rates by using the base variable as auxilliary weights without iteration

>>> smr2 = Spatial_Median_Rate(stl_e,stl_b,stl_w,aw=stl_b)

Extracting the computed rates through the property r of the Spatial_Median_Rate instance

>>> smr2.r[:10]
array([  5.77412020e-05,   4.46449551e-05,   5.77412020e-05,
         5.77412020e-05,   4.46449551e-05,   3.61363528e-05,
         3.61363528e-05,   4.46449551e-05,   5.77412020e-05,
         4.03987355e-05])

Recomputing spatial median rates by using the base variable as auxilliary weights with 5 iterations

>>> smr3 = Spatial_Median_Rate(stl_e,stl_b,stl_w,aw=stl_b,iteration=5)

Extracting the computed rates through the property r of the Spatial_Median_Rate instance

>>> smr3.r[:10]
array([  3.61363528e-05,   4.46449551e-05,   3.61363528e-05,
         3.61363528e-05,   4.46449551e-05,   3.61363528e-05,
         3.61363528e-05,   4.46449551e-05,   3.61363528e-05,
         4.46449551e-05])
>>>
class pysal.esda.smoothing.Spatial_Filtering(bbox, data, e, b, x_grid, y_grid, r=None, pop=None)[source]

Spatial Filtering

Parameters:
  • bbox (a list of two lists where each list is a pair of coordinates) – a bounding box for the entire n spatial units
  • data (array (n, 2)) – x, y coordinates
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
  • x_grid (integer) – the number of cells on x axis
  • y_grid (integer) – the number of cells on y axis
  • r (float) – fixed radius of a moving window
  • pop (integer) – population threshold to create adaptive moving windows
grid

array (x_grid*y_grid, 2) – x, y coordinates for grid points

r

array (x_grid*y_grid, 1) – rate values for grid points

Notes

No tool is provided to find an optimal value for r or pop.

Examples

Reading data in stl_hom.csv into stl to extract values for event and population-at-risk variables

>>> stl = pysal.open(pysal.examples.get_path('stl_hom.csv'), 'r')

Reading the stl data in the WKT format so that we can easily extract polygon centroids

>>> fromWKT = pysal.core.util.WKTParser()
>>> stl.cast('WKT',fromWKT)

Extracting polygon centroids through iteration

>>> d = np.array([i.centroid for i in stl[:,0]])

Specifying the bounding box for the stl_hom data. The bbox should includes two points for the left-bottom and the right-top corners

>>> bbox = [[-92.700676, 36.881809], [-87.916573, 40.3295669]]

The 11th and 14th columns in stl_hom.csv includes the number of homocides and population. Creating two arrays from these columns.

>>> stl_e, stl_b = np.array(stl[:,10]), np.array(stl[:,13])

Applying spatial filtering by using a 10*10 mesh grid and a moving window with 2 radius

>>> sf_0 = Spatial_Filtering(bbox,d,stl_e,stl_b,10,10,r=2)

Extracting the resulting rates through the property r of the Spatial_Filtering instance

>>> sf_0.r[:10]
array([  4.23561763e-05,   4.45290850e-05,   4.56456221e-05,
         4.49133384e-05,   4.39671835e-05,   4.44903042e-05,
         4.19845497e-05,   4.11936548e-05,   3.93463504e-05,
         4.04376345e-05])

Applying another spatial filtering by allowing the moving window to grow until 600000 people are found in the window

>>> sf = Spatial_Filtering(bbox,d,stl_e,stl_b,10,10,pop=600000)

Checking the size of the reulting array including the rates

>>> sf.r.shape
(100,)

Extracting the resulting rates through the property r of the Spatial_Filtering instance

>>> sf.r[:10]
array([  3.73728738e-05,   4.04456300e-05,   4.04456300e-05,
         3.81035327e-05,   4.54831940e-05,   4.54831940e-05,
         3.75658628e-05,   3.75658628e-05,   3.75658628e-05,
         3.75658628e-05])
classmethod by_col(df, e, b, x_grid, y_grid, geom_col='geometry', **kwargs)[source]

Compute smoothing by columns in a dataframe. The bounding box and point information is computed from the geometry column.

Parameters:
  • df (pandas.DataFrame) – a dataframe containing the data to be smoothed
  • e (string or list of strings) – the name or names of columns containing event variables to be smoothed
  • b (string or list of strings) – the name or names of columns containing the population variables to be smoothed
  • x_grid (integer) – number of grid cells to use along the x-axis
  • y_grid (integer) – number of grid cells to use along the y-axis
  • geom_col (string) – the name of the column in the dataframe containing the geometry information.
  • **kwargs (optional keyword arguments) – optional keyword options that are passed directly to the smoother.
Returns:

  • a new dataframe of dimension (x_grid*y_grid, 3), containing the
  • coordinates of the grid cells and the rates associated with those grid
  • cells.

class pysal.esda.smoothing.Headbanging_Triples(data, w, k=5, t=3, angle=135.0, edgecor=False)[source]

Generate a pseudo spatial weights instance that contains headbanging triples

Parameters:
  • data (array (n, 2)) – numpy array of x, y coordinates
  • w (spatial weights instance) –
  • k (integer number of nearest neighbors) –
  • t (integer) – the number of triples
  • angle (integer between 0 and 180) – the angle criterium for a set of triples
  • edgecorr (boolean) – whether or not correction for edge points is made
triples

dictionary – key is observation record id, value is a list of lists of triple ids

extra

dictionary – key is observation record id, value is a list of the following: tuple of original triple observations distance between original triple observations distance between an original triple observation and its extrapolated point

Examples

importing k-nearest neighbor weights creator

>>> from pysal import knnW_from_array

Reading data in stl_hom.csv into stl_db to extract values for event and population-at-risk variables

>>> stl_db = pysal.open(pysal.examples.get_path('stl_hom.csv'),'r')

Reading the stl data in the WKT format so that we can easily extract polygon centroids

>>> fromWKT = pysal.core.util.WKTParser()
>>> stl_db.cast('WKT',fromWKT)

Extracting polygon centroids through iteration

>>> d = np.array([i.centroid for i in stl_db[:,0]])

Using the centroids, we create a 5-nearst neighbor weights

>>> w = knnW_from_array(d,k=5)

Ensuring that the elements in the spatial weights instance are ordered by the order of stl_db’s IDs

>>> if not w.id_order_set: w.id_order = w.id_order

Finding headbaning triples by using 5 nearest neighbors

>>> ht = Headbanging_Triples(d,w,k=5)

Checking the members of triples

>>> for k, item in ht.triples.items()[:5]: print k, item
0 [(5, 6), (10, 6)]
1 [(4, 7), (4, 14), (9, 7)]
2 [(0, 8), (10, 3), (0, 6)]
3 [(4, 2), (2, 12), (8, 4)]
4 [(8, 1), (12, 1), (8, 9)]

Opening sids2.shp file

>>> sids = pysal.open(pysal.examples.get_path('sids2.shp'),'r')

Extracting the centroids of polygons in the sids data

>>> sids_d = np.array([i.centroid for i in sids])

Creating a 5-nearest neighbors weights from the sids centroids

>>> sids_w = knnW_from_array(sids_d,k=5)

Ensuring that the members in sids_w are ordered by the order of sids_d’s ID

>>> if not sids_w.id_order_set: sids_w.id_order = sids_w.id_order

Finding headbaning triples by using 5 nearest neighbors

>>> s_ht = Headbanging_Triples(sids_d,sids_w,k=5)

Checking the members of the found triples

>>> for k, item in s_ht.triples.items()[:5]: print k, item
0 [(1, 18), (1, 21), (1, 33)]
1 [(2, 40), (2, 22), (22, 40)]
2 [(39, 22), (1, 9), (39, 17)]
3 [(16, 6), (19, 6), (20, 6)]
4 [(5, 15), (27, 15), (35, 15)]

Finding headbanging triples by using 5 nearest neighbors with edge correction

>>> s_ht2 = Headbanging_Triples(sids_d,sids_w,k=5,edgecor=True)

Checking the members of the found triples

>>> for k, item in s_ht2.triples.items()[:5]: print k, item
0 [(1, 18), (1, 21), (1, 33)]
1 [(2, 40), (2, 22), (22, 40)]
2 [(39, 22), (1, 9), (39, 17)]
3 [(16, 6), (19, 6), (20, 6)]
4 [(5, 15), (27, 15), (35, 15)]

Checking the extrapolated point that is introduced into the triples during edge correction

>>> extrapolated = s_ht2.extra[72]

Checking the observation IDs constituting the extrapolated triple

>>> extrapolated[0]
(89, 77)

Checking the distances between the extraploated point and the observation 89 and 77

>>> round(extrapolated[1],5), round(extrapolated[2],6)
(0.33753, 0.302707)
class pysal.esda.smoothing.Headbanging_Median_Rate(e, b, t, aw=None, iteration=1)[source]

Headbaning Median Rate Smoothing

Parameters:
  • e (array (n, 1)) – event variable measured across n spatial units
  • b (array (n, 1)) – population at risk variable measured across n spatial units
  • t (Headbanging_Triples instance) –
  • aw (array (n, 1)) – auxilliary weight variable measured across n spatial units
  • iteration (integer) – the number of iterations
r

array (n, 1) – rate values from headbanging median smoothing

Examples

importing k-nearest neighbor weights creator

>>> from pysal import knnW_from_array

opening the sids2 shapefile

>>> sids = pysal.open(pysal.examples.get_path('sids2.shp'), 'r')

extracting the centroids of polygons in the sids2 data

>>> sids_d = np.array([i.centroid for i in sids])

creating a 5-nearest neighbors weights from the centroids

>>> sids_w = knnW_from_array(sids_d,k=5)

ensuring that the members in sids_w are ordered

>>> if not sids_w.id_order_set: sids_w.id_order = sids_w.id_order
finding headbanging triples by using 5 neighbors
return outdf
>>> s_ht = Headbanging_Triples(sids_d,sids_w,k=5)

reading in the sids2 data table

>>> sids_db = pysal.open(pysal.examples.get_path('sids2.dbf'), 'r')

extracting the 10th and 9th columns in the sids2.dbf and using data values as event and population-at-risk variables

>>> s_e, s_b = np.array(sids_db[:,9]), np.array(sids_db[:,8])

computing headbanging median rates from s_e, s_b, and s_ht

>>> sids_hb_r = Headbanging_Median_Rate(s_e,s_b,s_ht)

extracting the computed rates through the property r of the Headbanging_Median_Rate instance

>>> sids_hb_r.r[:5]
array([ 0.00075586,  0.        ,  0.0008285 ,  0.0018315 ,  0.00498891])

recomputing headbanging median rates with 5 iterations

>>> sids_hb_r2 = Headbanging_Median_Rate(s_e,s_b,s_ht,iteration=5)

extracting the computed rates through the property r of the Headbanging_Median_Rate instance

>>> sids_hb_r2.r[:5]
array([ 0.0008285 ,  0.00084331,  0.00086896,  0.0018315 ,  0.00498891])

recomputing headbanging median rates by considring a set of auxilliary weights

>>> sids_hb_r3 = Headbanging_Median_Rate(s_e,s_b,s_ht,aw=s_b)

extracting the computed rates through the property r of the Headbanging_Median_Rate instance

>>> sids_hb_r3.r[:5]
array([ 0.00091659,  0.        ,  0.00156838,  0.0018315 ,  0.00498891])
classmethod by_col(df, e, b, t=None, geom_col='geometry', inplace=False, **kwargs)[source]

Compute smoothing by columns in a dataframe. The bounding box and point information is computed from the geometry column.

Parameters:
  • df (pandas.DataFrame) – a dataframe containing the data to be smoothed
  • e (string or list of strings) – the name or names of columns containing event variables to be smoothed
  • b (string or list of strings) – the name or names of columns containing the population variables to be smoothed
  • t (Headbanging_Triples instance or list of Headbanging_Triples) – list of headbanging triples instances. If not provided, this is computed from the geometry column of the dataframe.
  • geom_col (string) – the name of the column in the dataframe containing the geometry information.
  • inplace (bool) – a flag denoting whether to output a copy of df with the relevant smoothed columns appended, or to append the columns directly to df itself.
  • **kwargs (optional keyword arguments) – optional keyword options that are passed directly to the smoother.
Returns:

  • a new dataframe containing the smoothed Headbanging Median Rates for the
  • event/population pairs. If done inplace, there is no return value and
  • df is modified in place.

pysal.esda.smoothing.flatten(l, unique=True)[source]

flatten a list of lists

Parameters:
  • l (list) – of lists
  • unique (boolean) – whether or not only unique items are wanted (default=True)
Returns:

of single items

Return type:

list

Examples

Creating a sample list whose elements are lists of integers

>>> l = [[1, 2], [3, 4, ], [5, 6]]

Applying flatten function

>>> flatten(l)
[1, 2, 3, 4, 5, 6]
pysal.esda.smoothing.weighted_median(d, w)[source]

A utility function to find a median of d based on w

Parameters:
  • d (array) – (n, 1), variable for which median will be found
  • w (array) – (n, 1), variable on which d’s median will be decided

Notes

d and w are arranged in the same order

Returns:median of d
Return type:float

Examples

Creating an array including five integers. We will get the median of these integers.

>>> d = np.array([5,4,3,1,2])

Creating another array including weight values for the above integers. The median of d will be decided with a consideration to these weight values.

>>> w = np.array([10, 22, 9, 2, 5])

Applying weighted_median function

>>> weighted_median(d, w)
4
pysal.esda.smoothing.sum_by_n(d, w, n)[source]
A utility function to summarize a data array into n values
after weighting the array with another weight array w
Parameters:
  • d (array) – (t, 1), numerical values
  • w (array) – (t, 1), numerical values for weighting
  • n (integer) – the number of groups t = c*n (c is a constant)
Returns:

(n, 1), an array with summarized values

Return type:

array

Examples

Creating an array including four integers. We will compute weighted means for every two elements.

>>> d = np.array([10, 9, 20, 30])

Here is another array with the weight values for d’s elements.

>>> w = np.array([0.5, 0.1, 0.3, 0.8])

We specify the number of groups for which the weighted mean is computed.

>>> n = 2

Applying sum_by_n function

>>> sum_by_n(d, w, n)
array([  5.9,  30. ])
pysal.esda.smoothing.crude_age_standardization(e, b, n)[source]

A utility function to compute rate through crude age standardization

Parameters:
  • e (array) – (n*h, 1), event variable measured for each age group across n spatial units
  • b (array) – (n*h, 1), population at risk variable measured for each age group across n spatial units
  • n (integer) – the number of spatial units

Notes

e and b are arranged in the same order

Returns:(n, 1), age standardized rate
Return type:array

Examples

Creating an array of an event variable (e.g., the number of cancer patients) for 2 regions in each of which 4 age groups are available. The first 4 values are event values for 4 age groups in the region 1, and the next 4 values are for 4 age groups in the region 2.

>>> e = np.array([30, 25, 25, 15, 33, 21, 30, 20])

Creating another array of a population-at-risk variable (e.g., total population) for the same two regions. The order for entering values is the same as the case of e.

>>> b = np.array([100, 100, 110, 90, 100, 90, 110, 90])

Specifying the number of regions.

>>> n = 2

Applying crude_age_standardization function to e and b

>>> crude_age_standardization(e, b, n)
array([ 0.2375    ,  0.26666667])
pysal.esda.smoothing.direct_age_standardization(e, b, s, n, alpha=0.05)[source]

A utility function to compute rate through direct age standardization

Parameters:
  • e (array) – (n*h, 1), event variable measured for each age group across n spatial units
  • b (array) – (n*h, 1), population at risk variable measured for each age group across n spatial units
  • s (array) – (n*h, 1), standard population for each age group across n spatial units
  • n (integer) – the number of spatial units
  • alpha (float) – significance level for confidence interval

Notes

e, b, and s are arranged in the same order

Returns:a list of n tuples; a tuple has a rate and its lower and upper limits age standardized rates and confidence intervals
Return type:list

Examples

Creating an array of an event variable (e.g., the number of cancer patients) for 2 regions in each of which 4 age groups are available. The first 4 values are event values for 4 age groups in the region 1, and the next 4 values are for 4 age groups in the region 2.

>>> e = np.array([30, 25, 25, 15, 33, 21, 30, 20])

Creating another array of a population-at-risk variable (e.g., total population) for the same two regions. The order for entering values is the same as the case of e.

>>> b = np.array([1000, 1000, 1100, 900, 1000, 900, 1100, 900])

For direct age standardization, we also need the data for standard population. Standard population is a reference population-at-risk (e.g., population distribution for the U.S.) whose age distribution can be used as a benchmarking point for comparing age distributions across regions (e.g., population distribution for Arizona and California). Another array including standard population is created.

>>> s = np.array([1000, 900, 1000, 900, 1000, 900, 1000, 900])

Specifying the number of regions.

>>> n = 2

Applying direct_age_standardization function to e and b

>>> [i[0] for i in direct_age_standardization(e, b, s, n)]
[0.023744019138755977, 0.026650717703349279]
pysal.esda.smoothing.indirect_age_standardization(e, b, s_e, s_b, n, alpha=0.05)[source]

A utility function to compute rate through indirect age standardization

Parameters:
  • e (array) – (n*h, 1), event variable measured for each age group across n spatial units
  • b (array) – (n*h, 1), population at risk variable measured for each age group across n spatial units
  • s_e (array) – (n*h, 1), event variable measured for each age group across n spatial units in a standard population
  • s_b (array) – (n*h, 1), population variable measured for each age group across n spatial units in a standard population
  • n (integer) – the number of spatial units
  • alpha (float) – significance level for confidence interval

Notes

e, b, s_e, and s_b are arranged in the same order

Returns:a list of n tuples; a tuple has a rate and its lower and upper limits age standardized rate
Return type:list

Examples

Creating an array of an event variable (e.g., the number of cancer patients) for 2 regions in each of which 4 age groups are available. The first 4 values are event values for 4 age groups in the region 1, and the next 4 values are for 4 age groups in the region 2.

>>> e = np.array([30, 25, 25, 15, 33, 21, 30, 20])

Creating another array of a population-at-risk variable (e.g., total population) for the same two regions. The order for entering values is the same as the case of e.

>>> b = np.array([100, 100, 110, 90, 100, 90, 110, 90])

For indirect age standardization, we also need the data for standard population and event. Standard population is a reference population-at-risk (e.g., population distribution for the U.S.) whose age distribution can be used as a benchmarking point for comparing age distributions across regions (e.g., popoulation distribution for Arizona and California). When the same concept is applied to the event variable, we call it standard event (e.g., the number of cancer patients in the U.S.). Two additional arrays including standard population and event are created.

>>> s_e = np.array([100, 45, 120, 100, 50, 30, 200, 80])
>>> s_b = np.array([1000, 900, 1000, 900, 1000, 900, 1000, 900])

Specifying the number of regions.

>>> n = 2

Applying indirect_age_standardization function to e and b

>>> [i[0] for i in indirect_age_standardization(e, b, s_e, s_b, n)]
[0.23723821989528798, 0.2610803324099723]
pysal.esda.smoothing.standardized_mortality_ratio(e, b, s_e, s_b, n)[source]

A utility function to compute standardized mortality ratio (SMR).

Parameters:
  • e (array) – (n*h, 1), event variable measured for each age group across n spatial units
  • b (array) – (n*h, 1), population at risk variable measured for each age group across n spatial units
  • s_e (array) – (n*h, 1), event variable measured for each age group across n spatial units in a standard population
  • s_b (array) – (n*h, 1), population variable measured for each age group across n spatial units in a standard population
  • n (integer) – the number of spatial units

Notes

e, b, s_e, and s_b are arranged in the same order

Returns:(nx1)
Return type:array

Examples

Creating an array of an event variable (e.g., the number of cancer patients) for 2 regions in each of which 4 age groups are available. The first 4 values are event values for 4 age groups in the region 1, and the next 4 values are for 4 age groups in the region 2.

>>> e = np.array([30, 25, 25, 15, 33, 21, 30, 20])

Creating another array of a population-at-risk variable (e.g., total population) for the same two regions. The order for entering values is the same as the case of e.

>>> b = np.array([100, 100, 110, 90, 100, 90, 110, 90])

To compute standardized mortality ratio (SMR), we need two additional arrays for standard population and event. Creating s_e and s_b for standard event and population, respectively.

>>> s_e = np.array([100, 45, 120, 100, 50, 30, 200, 80])
>>> s_b = np.array([1000, 900, 1000, 900, 1000, 900, 1000, 900])

Specifying the number of regions.

>>> n = 2

Applying indirect_age_standardization function to e and b

>>> standardized_mortality_ratio(e, b, s_e, s_b, n)
array([ 2.48691099,  2.73684211])
pysal.esda.smoothing.choynowski(e, b, n, threshold=None)[source]

Choynowski map probabilities [Choynowski1959] .

Parameters:
  • e (array(n*h, 1)) – event variable measured for each age group across n spatial units
  • b (array(n*h, 1)) – population at risk variable measured for each age group across n spatial units
  • n (integer) – the number of spatial units
  • threshold (float) – Returns zero for any p-value greater than threshold

Notes

e and b are arranged in the same order

Returns:
Return type:array (nx1)

Examples

Creating an array of an event variable (e.g., the number of cancer patients) for 2 regions in each of which 4 age groups are available. The first 4 values are event values for 4 age groups in the region 1, and the next 4 values are for 4 age groups in the region 2.

>>> e = np.array([30, 25, 25, 15, 33, 21, 30, 20])

Creating another array of a population-at-risk variable (e.g., total population) for the same two regions. The order for entering values is the same as the case of e.

>>> b = np.array([100, 100, 110, 90, 100, 90, 110, 90])

Specifying the number of regions.

>>> n = 2

Applying indirect_age_standardization function to e and b

>>> print choynowski(e, b, n)
[ 0.30437751  0.29367033]
pysal.esda.smoothing.assuncao_rate(e, b)[source]

The standardized rates where the mean and stadard deviation used for the standardization are those of Empirical Bayes rate estimates The standardized rates resulting from this function are used to compute Moran’s I corrected for rate variables [Choynowski1959] .

Parameters:
  • e (array(n, 1)) – event variable measured at n spatial units
  • b (array(n, 1)) – population at risk variable measured at n spatial units

Notes

e and b are arranged in the same order

Returns:
Return type:array (nx1)

Examples

Creating an array of an event variable (e.g., the number of cancer patients) for 8 regions.

>>> e = np.array([30, 25, 25, 15, 33, 21, 30, 20])

Creating another array of a population-at-risk variable (e.g., total population) for the same 8 regions. The order for entering values is the same as the case of e.

>>> b = np.array([100, 100, 110, 90, 100, 90, 110, 90])

Computing the rates

>>> print assuncao_rate(e, b)[:4]
[ 1.04319254 -0.04117865 -0.56539054 -1.73762547]