Spatial Dynamics

Introduction

PySAL implements a number of exploratory approaches to analyze the dynamics of longitudinal spatial data, or observations on fixed areal units over multiple time periods. Examples could include time series of voting patterns in US Presidential elections, time series of remote sensing images, labor market dynamics, regional business cycles, among many others. Two broad sets of spatial dynamics methods are implemented to analyze these data types. The first are Markov based methods, while the second are based on Rank dynamics.

Additionally, methods are included in this module to analyze patterns of individual events which have spatial and temporal coordinates associated with them. Examples include locations and times of individual cases of disease or crimes. Methods are included here to determine if these event patterns exhibit space-time interaction.

Markov Based Methods

The Markov based methods include classic Markov chains and extensions of these approaches to deal with spatially referenced data. In what follows we illustrate the functionality of these Markov methods. Readers interested in the methodological foundations of these approaches are directed to [1].

Classic Markov

We start with a look at a simple example of classic Markov methods implemented in PySAL. A Markov chain may be in one of k different states at any point in time. These states are exhaustive and mutually exclusive. For example, if one had a time series of remote sensing images used to develop land use classifications, then the states could be defined as the specific land use classes and interest would center on the transitions in and out of different classes for each pixel.

For example, let’s construct a small artificial chain consisting of 3 states (a,b,c) and 5 different pixels at three different points in time:

     >>> import pysal
     >>> import numpy as np
     >>> c = np.array([['b','a','c'],['c','c','a'],['c','b','c'],['a','a','b'],['a','b','c']])
     >>> c
     array([['b', 'a', 'c'],
            ['c', 'c', 'a'],
            ['c', 'b', 'c'],
            ['a', 'a', 'b'],
            ['a', 'b', 'c']],
           dtype='|S1')

So the first pixel was in class ‘b’ in period 1, class ‘a’ in period 2, and class ‘c’ in period 3. We can summarize the overall transition dynamics for the set of pixels by treating it as a Markov chain:

     >>> m = pysal.Markov(c)
     >>> m.classes
     array(['a', 'b', 'c'],
           dtype='|S1')

The Markov instance m has an attribute class extracted from the chain - the assumption is that the observations are on the rows of the input and the different points in time on the columns. In addition to extracting the classes as an attribute, our Markov instance will also have a transitions matrix:

>>> m.transitions
array([[ 1.,  2.,  1.],
       [ 1.,  0.,  2.],
       [ 1.,  1.,  1.]])

indicating that of the four pixels that began a transition interval in class ‘a’, 1 remained in that class, 2 transitioned to class ‘b’ and 1 transitioned to class ‘c’.

This simple example illustrates the basic creation of a Markov instance, but the small sample size makes it unrealistic for the more advanced features of this approach. For a larger example, we will look at an application of Markov methods to understanding regional income dynamics in the US. Here we will load in data on per capita income observed annually from 1929 to 2010 for the lower 48 US states:

>>> f = pysal.open("../pysal/examples/usjoin.csv")
>>> pci = np.array([f.by_col[str(y)] for y in range(1929,2010)])
>>> pci.shape
(81, 48)

The first row of the array is the per capita income for the first year:

>>> pci[0, :]
array([ 323,  600,  310,  991,  634, 1024, 1032,  518,  347,  507,  948,
        607,  581,  532,  393,  414,  601,  768,  906,  790,  599,  286,
        621,  592,  596,  868,  686,  918,  410, 1152,  332,  382,  771,
        455,  668,  772,  874,  271,  426,  378,  479,  551,  634,  434,
        741,  460,  673,  675])

In order to apply the classic Markov approach to this series, we first have to discretize the distribution by defining our classes. There are many ways to do this, but here we will use the quintiles for each annual income distribution to define the classes:

>>> q5 = np.array([pysal.Quantiles(y).yb for y in pci]).transpose()
>>> q5.shape
(48, 81)
>>> q5[:, 0]
array([0, 2, 0, 4, 2, 4, 4, 1, 0, 1, 4, 2, 2, 1, 0, 1, 2, 3, 4, 4, 2, 0, 2,
       2, 2, 4, 3, 4, 0, 4, 0, 0, 3, 1, 3, 3, 4, 0, 1, 0, 1, 2, 2, 1, 3, 1,
       3, 3])

A number of things need to be noted here. First, we are relying on the classification methods in PySAL for defining our quintiles. The class Quantiles uses quintiles as the default and will create an instance of this class that has multiple attributes, the one we are extracting in the first line is yb - the class id for each observation. The second thing to note is the transpose operator which gets our resulting array q5 in the proper structure required for use of Markov. Thus we see that the first spatial unit (Alabama with an income of 323) fell in the first quintile in 1929, while the last unit (Wyoming with an income of 675) fell in the fourth quintile [2].

So now we have a time series for each state of its quintile membership. For example, Colorado’s quintile time series is:

>>> q5[4, :]
array([2, 3, 2, 2, 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 2, 3, 2, 3,
       3, 3, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4,
       4, 4, 4, 4, 4, 3, 3, 3, 4, 3, 3, 3])

indicating that it has occupied the 3rd, 4th and 5th quintiles in the distribution at different points in time. To summarize the transition dynamics for all units, we instantiate a Markov object:

>>> m5 = pysal.Markov(q5)
>>> m5.transitions
array([[ 729.,   71.,    1.,    0.,    0.],
       [  72.,  567.,   80.,    3.,    0.],
       [   0.,   81.,  631.,   86.,    2.],
       [   0.,    3.,   86.,  573.,   56.],
       [   0.,    0.,    1.,   57.,  741.]])

Assuming we can treat these transitions as a first order Markov chain, we can estimate the transition probabilities:

>>> m5.p
matrix([[ 0.91011236,  0.0886392 ,  0.00124844,  0.        ,  0.        ],
        [ 0.09972299,  0.78531856,  0.11080332,  0.00415512,  0.        ],
        [ 0.        ,  0.10125   ,  0.78875   ,  0.1075    ,  0.0025    ],
        [ 0.        ,  0.00417827,  0.11977716,  0.79805014,  0.07799443],
        [ 0.        ,  0.        ,  0.00125156,  0.07133917,  0.92740926]])

as well as the long run steady state distribution:

>>> m5.steady_state
matrix([[ 0.20774716],
        [ 0.18725774],
        [ 0.20740537],
        [ 0.18821787],
        [ 0.20937187]])

With the transition probability matrix in hand, we can estimate the first mean passage time:

>>> pysal.ergodic.fmpt(m5.p)
matrix([[   4.81354357,   11.50292712,   29.60921231,   53.38594954,
          103.59816743],
        [  42.04774505,    5.34023324,   18.74455332,   42.50023268,
           92.71316899],
        [  69.25849753,   27.21075248,    4.82147603,   25.27184624,
           75.43305672],
        [  84.90689329,   42.85914824,   17.18082642,    5.31299186,
           51.60953369],
        [  98.41295543,   56.36521038,   30.66046735,   14.21158356,
            4.77619083]])

Thus, for a state with income in the first quintile, it takes on average 11.5 years for it to first enter the second quintile, 29.6 to get to the third quintile, 53.4 years to enter the fourth, and 103.6 years to reach the richest quintile.

Spatial Markov

Thus far we have treated all the spatial units as independent to estimate the transition probabilities. This hides a number of implicit assumptions. First, the transition dynamics are assumed to hold for all units and for all time periods. Second, interactions between the transitions of individual units are ignored. In other words regional context may be important to understand regional income dynamics, but the classic Markov approach is silent on this issue.

PySAL includes a number of spatially explicit extensions to the Markov framework. The first is the spatial Markov class that we illustrate here. We first are going to transform the income series to relative incomes (by standardizing by each period by the mean):

>>> import pysal
>>> f = pysal.open("../pysal/examples/usjoin.csv")
>>> pci = np.array([f.by_col[str(y)] for y in range(1929, 2010)])
>>> pci = pci.transpose()
>>> rpci = pci / (pci.mean(axis = 0))

Next, we require a spatial weights object, and here we will create one from an external GAL file:

>>> w = pysal.open("../pysal/examples/states48.gal").read()
>>> w.transform = 'r'

Finally, we create an instance of the Spatial Markov class using 5 states for the chain:

>>> sm = pysal.Spatial_Markov(rpci, w, fixed = True, k = 5)

Here we are keeping the quintiles fixed, meaning the data are pooled over space and time and the quintiles calculated for the pooled data. This is why we first transformed the data to relative incomes. We can next examine the global transition probability matrix for relative incomes:

>>> sm.p
matrix([[ 0.91461837,  0.07503234,  0.00905563,  0.00129366,  0.        ],
        [ 0.06570302,  0.82654402,  0.10512484,  0.00131406,  0.00131406],
        [ 0.00520833,  0.10286458,  0.79427083,  0.09505208,  0.00260417],
        [ 0.        ,  0.00913838,  0.09399478,  0.84856397,  0.04830287],
        [ 0.        ,  0.        ,  0.        ,  0.06217617,  0.93782383]])

The Spatial Markov allows us to compare the global transition dynamics to those conditioned on regional context. More specifically, the transition dynamics are split across economies who have spatial lags in different quintiles at the beginning of the year. In our example we have 5 classes, so 5 different conditioned transition probability matrices are estimated:

>>> for p in sm.P:
...     print p
...
[[ 0.96341463  0.0304878   0.00609756  0.          0.        ]
 [ 0.06040268  0.83221477  0.10738255  0.          0.        ]
 [ 0.          0.14        0.74        0.12        0.        ]
 [ 0.          0.03571429  0.32142857  0.57142857  0.07142857]
 [ 0.          0.          0.          0.16666667  0.83333333]]
[[ 0.79831933  0.16806723  0.03361345  0.          0.        ]
 [ 0.0754717   0.88207547  0.04245283  0.          0.        ]
 [ 0.00537634  0.06989247  0.8655914   0.05913978  0.        ]
 [ 0.          0.          0.06372549  0.90196078  0.03431373]
 [ 0.          0.          0.          0.19444444  0.80555556]]
[[ 0.84693878  0.15306122  0.          0.          0.        ]
 [ 0.08133971  0.78947368  0.1291866   0.          0.        ]
 [ 0.00518135  0.0984456   0.79274611  0.0984456   0.00518135]
 [ 0.          0.          0.09411765  0.87058824  0.03529412]
 [ 0.          0.          0.          0.10204082  0.89795918]]
[[ 0.8852459   0.09836066  0.          0.01639344  0.        ]
 [ 0.03875969  0.81395349  0.13953488  0.          0.00775194]
 [ 0.0049505   0.09405941  0.77722772  0.11881188  0.0049505 ]
 [ 0.          0.02339181  0.12865497  0.75438596  0.09356725]
 [ 0.          0.          0.          0.09661836  0.90338164]]
[[ 0.33333333  0.66666667  0.          0.          0.        ]
 [ 0.0483871   0.77419355  0.16129032  0.01612903  0.        ]
 [ 0.01149425  0.16091954  0.74712644  0.08045977  0.        ]
 [ 0.          0.01036269  0.06217617  0.89637306  0.03108808]
 [ 0.          0.          0.          0.02352941  0.97647059]]

The probability of a poor state remaining poor is 0.963 if their neighbors are in the 1st quintile and 0.798 if their neighbors are in the 2nd quintile. The probability of a rich economy remaining rich is 0.977 if their neighbors are in the 5th quintile, but if their neighbors are in the 4th quintile this drops to 0.903.

We can also explore the different steady state distributions implied by these different transition probabilities:

>>> sm.S
array([[ 0.43509425,  0.2635327 ,  0.20363044,  0.06841983,  0.02932278],
       [ 0.13391287,  0.33993305,  0.25153036,  0.23343016,  0.04119356],
       [ 0.12124869,  0.21137444,  0.2635101 ,  0.29013417,  0.1137326 ],
       [ 0.0776413 ,  0.19748806,  0.25352636,  0.22480415,  0.24654013],
       [ 0.01776781,  0.19964349,  0.19009833,  0.25524697,  0.3372434 ]])

The long run distribution for states with poor (rich) neighbors has 0.435 (0.018) of the values in the first quintile, 0.263 (0.200) in the second quintile, 0.204 (0.190) in the third, 0.0684 (0.255) in the fourth and 0.029 (0.337) in the fifth quintile. And, finally the first mean passage times:

>>> for f in sm.F:
...     print f
...
[[   2.29835259   28.95614035   46.14285714   80.80952381  279.42857143]
 [  33.86549708    3.79459555   22.57142857   57.23809524  255.85714286]
 [  43.60233918    9.73684211    4.91085714   34.66666667  233.28571429]
 [  46.62865497   12.76315789    6.25714286   14.61564626  198.61904762]
 [  52.62865497   18.76315789   12.25714286    6.           34.1031746 ]]
[[   7.46754205    9.70574606   25.76785714   74.53116883  194.23446197]
 [  27.76691978    2.94175577   24.97142857   73.73474026  193.4380334 ]
 [  53.57477715   28.48447637    3.97566318   48.76331169  168.46660482]
 [  72.03631562   46.94601483   18.46153846    4.28393653  119.70329314]
 [  77.17917276   52.08887197   23.6043956     5.14285714   24.27564033]]
[[   8.24751154    6.53333333   18.38765432   40.70864198  112.76732026]
 [  47.35040872    4.73094099   11.85432099   34.17530864  106.23398693]
 [  69.42288828   24.76666667    3.794921     22.32098765   94.37966594]
 [  83.72288828   39.06666667   14.3           3.44668119   76.36702977]
 [  93.52288828   48.86666667   24.1           9.8           8.79255406]]
[[  12.87974382   13.34847151   19.83446328   28.47257282   55.82395142]
 [  99.46114206    5.06359731   10.54545198   23.05133495   49.68944423]
 [ 117.76777159   23.03735526    3.94436301   15.0843986    43.57927247]
 [ 127.89752089   32.4393006    14.56853107    4.44831643   31.63099455]
 [ 138.24752089   42.7893006    24.91853107   10.35          4.05613474]]
[[  56.2815534     1.5          10.57236842   27.02173913  110.54347826]
 [  82.9223301     5.00892857    9.07236842   25.52173913  109.04347826]
 [  97.17718447   19.53125       5.26043557   21.42391304  104.94565217]
 [ 127.1407767    48.74107143   33.29605263    3.91777427   83.52173913]
 [ 169.6407767    91.24107143   75.79605263   42.5           2.96521739]]

States with incomes in the first quintile with neighbors in the first quintile return to the first quintile after 2.298 years, after leaving the first quintile. They enter the fourth quintile 80.810 years after leaving the first quintile, on average. Poor states within neighbors in the fourth quintile return to the first quintile, on average, after 12.88 years, and would enter the fourth quintile after 28.473 years.

LISA Markov

The Spatial Markov conditions the transitions on the value of the spatial lag for an observation at the beginning of the transition period. An alternative approach to spatial dynamics is to consider the joint transitions of an observation and its spatial lag in the distribution. By exploiting the form of the static LISA and embedding it in a dynamic context we develop the LISA Markov in which the states of the chain are defined as the four quadrants in the Moran scatter plot. Continuing on with our US example:

>>> import numpy as np
>>> f = pysal.open("../pysal/examples/usjoin.csv")
>>> pci = np.array([f.by_col[str(y)] for y in range(1929, 2010)]).transpose()
>>> w = pysal.open("../pysal/examples/states48.gal").read()
>>> lm = pysal.LISA_Markov(pci, w)
>>> lm.classes
array([1, 2, 3, 4])

The LISA transitions are:

>>> lm.transitions
array([[  1.08700000e+03,   4.40000000e+01,   4.00000000e+00,
          3.40000000e+01],
       [  4.10000000e+01,   4.70000000e+02,   3.60000000e+01,
          1.00000000e+00],
       [  5.00000000e+00,   3.40000000e+01,   1.42200000e+03,
          3.90000000e+01],
       [  3.00000000e+01,   1.00000000e+00,   4.00000000e+01,
          5.52000000e+02]])

and the estimated transition probability matrix is:

>>> lm.p
matrix([[ 0.92985458,  0.03763901,  0.00342173,  0.02908469],
        [ 0.07481752,  0.85766423,  0.06569343,  0.00182482],
        [ 0.00333333,  0.02266667,  0.948     ,  0.026     ],
        [ 0.04815409,  0.00160514,  0.06420546,  0.88603531]])

The diagonal elements indicate the staying probabilities and we see that there is greater mobility for observations in quadrants 1 and 3 than 2 and 4.

The implied long run steady state distribution of the chain is

>>> lm.steady_state
matrix([[ 0.28561505],
        [ 0.14190226],
        [ 0.40493672],
        [ 0.16754598]])

again reflecting the dominance of quadrants 1 and 3 (positive autocorrelation). [3] Finally the first mean passage time for the LISAs is:

>>> pysal.ergodic.fmpt(lm.p)
matrix([[  3.50121609,  37.93025465,  40.55772829,  43.17412009],
        [ 31.72800152,   7.04710419,  28.68182751,  49.91485137],
        [ 52.44489385,  47.42097495,   2.46952168,  43.75609676],
        [ 38.76794022,  51.51755827,  26.31568558,   5.96851095]])

Rank Based Methods

The second set of spatial dynamic methods in PySAL are based on rank correlations and spatial extensions of the classic rank statistics.

Spatial Rank Correlation

Kendall’s \tau is based on a comparison of the number of pairs of n observations that have concordant ranks between two variables. For spatial dynamics in PySAL, the two variables in question are the values of an attribute measured at two points in time over n spatial units. This classic measure of rank correlation indicates how much relative stability there has been in the map pattern over the two periods.

The spatial \tau decomposes these pairs into those that are spatial neighbors and those that are not, and examines whether the rank correlation is different between the two sets. [4] To illustrate this we turn to the case of regional incomes in Mexico over the 1940 to 2010 period:

>>> import pysal
>>> f = pysal.open("../pysal/examples/mexico.csv")
>>> vnames = ["pcgdp%d"%dec for dec in range(1940, 2010, 10)]
>>> y = np.transpose(np.array([f.by_col[v] for v in vnames]))

We also introduce the concept of regime weights that defines the neighbor set as those spatial units belonging to the same region. In this example the variable “esquivel99” represents a categorical classification of Mexican states into regions:

>>> regime = np.array(f.by_col['esquivel99'])
>>> w = pysal.weights.block_weights(regime)
>>> np.random.seed(12345)

Now we will calculate the spatial tau for decade transitions from 1940 through 2000 and report the observed spatial tau against that expected if the rank changes were randomly distributed in space by using 99 permutations:

>>> res=[pysal.SpatialTau(y[:,i],y[:,i+1],w,99) for i in range(6)]
>>> for r in res:
...     ev = r.taus.mean()
...     "%8.3f %8.3f %8.3f"%(r.tau_spatial, ev, r.tau_spatial_psim)
...
'   0.397    0.659    0.010'
'   0.492    0.706    0.010'
'   0.651    0.772    0.020'
'   0.714    0.752    0.210'
'   0.683    0.705    0.270'
'   0.810    0.819    0.280'

The observed level of spatial concordance during the 1940-50 transition was 0.397 which is significantly lower (p=0.010) than the average level of spatial concordance (0.659) from randomly permuted incomes in Mexico. Similar patterns are found for the next two transition periods as well. In other words the amount of rank concordance is significantly distinct between pairs of observations that are geographical neighbors and those that are not in these first three transition periods. This reflects the greater degree of spatial similarity within rather than between the regimes making the discordant pairs dominated by neighboring pairs.

Rank Decomposition

For a sequence of time periods, \theta measures the extent to which rank changes for a variable measured over n locations are in the same direction within mutually exclusive and exhaustive partitions (regimes) of the n locations.

Theta is defined as the sum of the absolute sum of rank changes within the regimes over the sum of all absolute rank changes. [4]

>>> import pysal
>>> f = pysal.open("../pysal/examples/mexico.csv")
>>> vnames = ["pcgdp%d"%dec for dec in range(1940, 2010, 10)]
>>> y = np.transpose(np.array([f.by_col[v] for v in vnames]))
>>> regime = np.array(f.by_col['esquivel99'])
>>> np.random.seed(10)
>>> t = pysal.Theta(y, regime, 999)
>>> t.theta
array([[ 0.41538462,  0.28070175,  0.61363636,  0.62222222,  0.33333333,
         0.47222222]])
>>> t.pvalue_left
array([ 0.307,  0.077,  0.823,  0.552,  0.045,  0.735])

Space-Time Interaction Tests

The third set of spatial dynamic methods in PySAL are global tests of space-time interaction. The purpose of these tests is to detect clustering within space-time event patterns. These patterns are composed of unique events that are labeled with spatial and temporal coordinates. The tests are designed to detect clustering of events in both space and time beyond “any purely spatial or purely temporal clustering” [5], that is, to determine if the events are “interacting.” Essentially, the tests examine the dataset to determine if pairs of events closest to each other in space are also those closest to each other in time. The null hypothesis of these tests is that the examined events are distributed randomly in space and time, i.e. the distance between pairs of events in space is independent of the distance in time. Three tests are currently implemented in PySAL: the Knox test, the Mantel test and the Jacquez k Nearest Neighbors test. These tests have been widely applied in epidemiology, criminology and biology. A more in-depth technical review of these methods is available in [6].

Knox Test

The Knox test for space-time interaction employs user-defined critical thresholds in space and time to define proximity between events. All pairs of events are examined to determine if the distance between them in space and time is within the respective thresholds. The Knox statistic is calculated as the total number of event pairs where the spatial and temporal distances separating the pair are within the specified thresholds [7]. If interaction is present, the test statistic will be large. Significance is traditionally established using a Monte Carlo permuation method where event timestamps are permuted and the statistic is recalculated. This procedure is repeated to generate a distribution of statistics which is used to establish the pseudo-significance of the observed test statistic. This approach assumes a static underlying population from which events are drawn. If this is not the case the results may be biased [8].

Formally, the specification of the Knox test is given as:

X=\sum_{i}^{n}\sum_{j}^{n}a_{ij}^{s}a_{ij}^{t}\\

\begin{align} \nonumber
a_{ij}^{s} &=
\begin{cases}
1, & \text{if $d^s_{ij}<\delta$}\\
0, & \text{otherwise}
\end{cases}
\end{align}

\begin{align} \nonumber
a_{ij}^{t} &=
\begin{cases}
1, & \text{if $d^t_{ij}<\tau$}\\
0, & \text{otherwise}
\end{cases}
\end{align}

Where n = number of events, a^{s} = adjacency in space, a^{t} = adjacency in time, d^{s} = distance in space, and d^{t} = distance in time. Critical space and time distance thresholds are defined as \delta and \tau, respectively.

We illustrate the use of the Knox test using data from a study of Burkitt’s Lymphoma in Uganda during the period 1961-75 [9]. We start by importing Numpy, PySAL and the interaction module:

>>> import numpy as np
>>> import pysal
>>> import pysal.spatial_dynamics.interaction as interaction
>>> np.random.seed(100)

The example data are then read in and used to create an instance of SpaceTimeEvents. This reformats the data so the test can be run by PySAL. This class requires the input of a point shapefile. The shapefile must contain a column that includes a timestamp for each point in the dataset. The class requires that the user input a path to an appropriate shapefile and the name of the column containing the timestamp. In this example, the appropriate column name is ‘T’.

>>> path = "../pysal/examples/burkitt"
>>> events = interaction.SpaceTimeEvents(path,'T')

Next, we run the Knox test with distance and time thresholds of 20 and 5,respectively. This counts the events that are closer than 20 units in space, and 5 units in time.

>>> result = interaction.knox(events.space, events.t ,delta=20,tau=5,permutations=99)

Finally we examine the results. We call the statistic from the results dictionary. This reports that there are 13 events close in both space and time, based on our threshold definitions.

>>> print(result['stat'])
13

Then we look at the pseudo-significance of this value, calculated by permuting the timestamps and rerunning the statistics. Here, 99 permutations were used, but an alternative number can be specified by the user. In this case, the results indicate that we fail to reject the null hypothesis of no space-time interaction using an alpha value of 0.05.

>>> print("%2.2f"%result['pvalue'])
0.17

Modified Knox Test

A modification to the Knox test was proposed by Baker [10]. Baker’s modification measures the difference between the original observed Knox statistic and its expected value. This difference serves as the test statistic. Again, the significance of this statistic is assessed using a Monte Carlo permutation procedure.

T=\frac{1}{2}\bigg(\sum_{i=1}^{n}\sum_{j=1}^{n}f_{ij}g_{ij} - \frac{1}{n-1}\sum_{k=1}^{n}\sum_{l=1}^{n}\sum_{j=1}^{n}f_{kj}g_{lj}\bigg)\\

Where n = number of events, f = adjacency in space, g = adjacency in time (calculated in a manner equivalent to a^{s} and a^{t} above in the Knox test). The first part of this statistic is equivalent to the original Knox test, while the second part is the expected value under spatio-temporal randomness.

Here we illustrate the use of the modified Knox test using the data on Burkitt’s Lymphoma cases in Uganda from above. We start by importing Numpy, PySAL and the interaction module. Next the example data are then read in and used to create an instance of SpaceTimeEvents.

>>> import numpy as np
>>> import pysal
>>> import pysal.spatial_dynamics.interaction as interaction
>>> np.random.seed(100)
>>> path = "../pysal/examples/burkitt"
>>> events = interaction.SpaceTimeEvents(path,'T')

Next, we run the modified Knox test with distance and time thresholds of 20 and 5,respectively. This counts the events that are closer than 20 units in space, and 5 units in time.

>>> result = interaction.modified_knox(events.space, events.t,delta=20,tau=5,permutations=99)

Finally we examine the results. We call the statistic from the results dictionary. This reports a statistic value of 2.810160.

>>> print("%2.8f"%result['stat'])
2.81016043

Next we look at the pseudo-significance of this value, calculated by permuting the timestamps and rerunning the statistics. Here, 99 permutations were used, but an alternative number can be specified by the user. In this case, the results indicate that we fail to reject the null hypothesis of no space-time interaction using an alpha value of 0.05.

>>> print("%2.2f"%result['pvalue'])
0.11

Mantel Test

Akin to the Knox test in its simplicity, the Mantel test keeps the distance information discarded by the Knox test. The unstandardized Mantel statistic is calculated by summing the product of the spatial and temporal distances between all event pairs [11]. To prevent multiplication by 0 in instances of colocated or simultaneous events, Mantel proposed adding a constant to the distance measurements. Additionally, he suggested a reciprocal transform of the resulting distance measurement to lessen the effect of the larger distances on the product sum. The test is defined formally below:

Z=\sum_{i}^{n}\sum_{j}^{n}(d_{ij}^{s}+c)^{p}(d_{ij}^{t}+c)^{p}

Where, again, d^{s} and d^{t} denote distance in space and time, respectively. The constant, c, and the power, p, are parameters set by the user. The default values are 0 and 1, respectively. A standardized version of the Mantel test is implemented here in PySAL, however. The standardized statistic (r) is a measure of correlation between the spatial and temporal distance matrices. This is expressed formally as:

r=\frac{1}{n^2-n-1}\sum_{i}^{n}\sum_{j}^{n}\Bigg[\frac{d_{ij}^{s}-\bar{d^{s}}}{\sigma_{d^{s}}}\Bigg] \Bigg[\frac{d_{ij}^{t}-\bar{d^{t}}}{\sigma_{d^{t}}}\Bigg]

Where \bar{d^{s}} refers to the average distance in space, and \bar{d^{t}} the average distance in time. For notational convenience \sigma_{d^{t}} and \sigma_{d^{t}} refer to the sample (not population) standard deviations, for distance in space and time, respectively. The same constant and power transformations may also be applied to the spatial and temporal distance matrices employed by the standardized Mantel. Significance is determined through a Monte Carlo permuation approach similar to that employed in the Knox test.

Again, we use the Burkitt’s Lymphoma data to illustrate the test. We start with the usual imports and read in the example data.

>>> import numpy as np
>>> import pysal
>>> import pysal.spatial_dynamics.interaction as interaction
>>> np.random.seed(100)
>>> path = "../pysal/examples/burkitt"
>>> events = interaction.SpaceTimeEvents(path,'T')

The following example runs the standardized Mantel test with constants of 0 and transformations of 1, meaning the distance matrices will remain unchanged; however, as recommended by Mantel, a small constant should be added and an inverse transformation (i.e. -1) specified.

>>> result = interaction.mantel(events.space, events.t,99,scon=0.0,spow=1.0,tcon=0.0,tpow=1.0)

Next, we examine the result of the test.

>>> print("%6.6f"%result['stat'])
0.014154

Finally, we look at the pseudo-significance of this value, calculated by permuting the timestamps and rerunning the statistic for each of the 99 permuatations. Again, note, the number of permutations can be changed by the user. According to these parameters, the results fail to reject the null hypothesis of no space-time interaction between the events.

>>> print("%2.2f"%result['pvalue'])
0.27

Jacquez Test

Instead of using a set distance in space and time to determine proximity (like the Knox test) the Jacquez test employs a nearest neighbor distance approach. This allows the test to account for changes in underlying population density. The statistic is calculated as the number of event pairs that are within the set of k nearest neighbors for each other in both space and time [12]. Significance of this count is established using a Monte Carlo permutation method. The test is expressed formally as:

J_{k}=\sum_{i=1}^{n} \sum_{j=1}^{n} a_{ijk}^{s}a_{ijk}^{t}\\

\begin{align} \nonumber
a_{ijk}^{s} =
\begin{cases}
1, & \text{if event \emph{j} is a \emph{k} nearest neighbor of event \emph{i} in space}\\
0, & \text{otherwise}
\end{cases}
\end{align}

\begin{align} \nonumber
a_{ijk}^{t} =
\begin{cases}
1, & \text{if event \emph{j} is a \emph{k} nearest neighbor of event \emph{i} in time}\\
0, & \text{otherwise}
\end{cases}
\end{align}

Where n = number of cases; a^{s} = adjacency in space; a^{t} = adjacency in time. To illustrate the test, the Burkitt’s Lymphoma data are employed again. We start with the usual imports and read in the example data.

>>> import numpy as np
>>> import pysal
>>> import pysal.spatial_dynamics.interaction as interaction
>>> np.random.seed(100)
>>> path = "../pysal/examples/burkitt"
>>> events = interaction.SpaceTimeEvents(path,'T')

The following runs the Jacquez test on the example data for a value of k = 3 and reports the resulting statistic. In this case, there are 13 instances where events are nearest neighbors in both space and time. The significance of this can be assessed by calling the p-value from the results dictionary. Again, there is not enough evidence to reject the null hypothesis of no space-time interaction.

>>> result = interaction.jacquez(events.space, events.t ,k=3,permutations=99)
>>> print result['stat'] 
13
>>> print "%3.1f"%result['pvalue'] 
0.2

Spatial Dynamics API

For further details see the Spatial Dynamics API.

Footnotes

[1]Rey, S.J. 2001. “Spatial empirics for economic growth and convergence”, 34 Geographical Analysis, 33, 195-214.
[2]The states are ordered alphabetically.
[3]The complex values of the steady state distribution arise from complex eigenvalues in the transition probability matrix which may indicate cyclicality in the chain.
[4]Rey, S.J. (2004) “Spatial dependence in the evolution of regional income distributions,” in A. Getis, J. Mur and H.Zoeller (eds). Spatial Econometrics and Spatial Statistics. Palgrave, London, pp. 194-213.
[5]Kulldorff, M. (1998). Statistical methods for spatial epidemiology: tests for randomness. In Gatrell, A. and Loytonen, M., editors, GIS and Health, pages 49–62. Taylor & Francis, London.
[6]Tango, T. (2010). Statistical Methods for Disease Clustering. Springer, New York.
[7]Knox, E. (1964). The detection of space-time interactions. Journal of the Royal Statistical Society. Series C (Applied Statistics), 13(1):25–30.
[8]R.D. Baker. (2004). Identifying space-time disease clusters. Acta Tropica, 91(3):291-299.
[9]Kulldorff, M. and Hjalmars, U. (1999). The Knox method and other tests for space- time interaction. Biometrics, 55(2):544–552.
[10]Williams, E., Smith, P., Day, N., Geser, A., Ellice, J., and Tukei, P. (1978). Space-time clustering of Burkitt’s lymphoma in the West Nile district of Uganda: 1961-1975. British Journal of Cancer, 37(1):109.
[11]Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27(2):209–220.
[12]Jacquez, G. (1996). A k nearest neighbour test for space-time interaction. Statistics in Medicine, 15(18):1935–1949.