# pysal.model.spreg.OLS¶

class pysal.model.spreg.OLS(y, x, w=None, robust=None, gwk=None, sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None)[source]

Ordinary least squares with results and diagnostics.

Parameters: y : array nx1 array for dependent variable x : array Two dimensional array with n rows and one column for each independent (exogenous) variable, excluding the constant w : pysal W object Spatial weights object (required if running spatial diagnostics) robust : string If ‘white’, then a White consistent estimator of the variance-covariance matrix is given. If ‘hac’, then a HAC consistent estimator of the variance-covariance matrix is given. Default set to None. gwk : pysal W object Kernel spatial weights needed for HAC estimation. Note: matrix must have ones along the main diagonal. sig2n_k : boolean If True, then use n-k to estimate sigma^2. If False, use n. nonspat_diag : boolean If True, then compute non-spatial diagnostics on the regression. spat_diag : boolean If True, then compute Lagrange multiplier tests (requires w). Note: see moran for further tests. moran : boolean If True, compute Moran’s I on the residuals. Note: requires spat_diag=True. white_test : boolean If True, compute White’s specification robust test. (requires nonspat_diag=True) vm : boolean If True, include variance-covariance matrix in summary results name_y : string Name of dependent variable for use in output name_x : list of strings Names of independent variables for use in output name_w : string Name of weights matrix for use in output name_gwk : string Name of kernel weights matrix for use in output name_ds : string Name of dataset for use in output

Examples

>>> import numpy as np
>>> import pysal.lib


Open data on Columbus neighborhood crime (49 areas) using pysal.lib.io.open(). This is the DBF associated with the Columbus shapefile. Note that pysal.lib.io.open() also reads data in CSV format; also, the actual OLS class requires data to be passed in as numpy arrays so the user can read their data in using any method.

>>> db = pysal.lib.io.open(pysal.lib.examples.get_path('columbus.dbf'),'r')


Extract the HOVAL column (home values) from the DBF file and make it the dependent variable for the regression. Note that PySAL requires this to be an nx1 numpy array.

>>> hoval = db.by_col("HOVAL")
>>> y = np.array(hoval)
>>> y.shape = (len(hoval), 1)


Extract CRIME (crime) and INC (income) vectors from the DBF to be used as independent variables in the regression. Note that PySAL requires this to be an nxj numpy array, where j is the number of independent variables (not including a constant). pysal.model.spreg.OLS adds a vector of ones to the independent variables passed in.

>>> X = []
>>> X.append(db.by_col("INC"))
>>> X.append(db.by_col("CRIME"))
>>> X = np.array(X).T


The minimum parameters needed to run an ordinary least squares regression are the two numpy arrays containing the independent variable and dependent variables respectively. To make the printed results more meaningful, the user can pass in explicit names for the variables used; this is optional.

>>> ols = OLS(y, X, name_y='home value', name_x=['income','crime'], name_ds='columbus', white_test=True)


pysal.model.spreg.OLS computes the regression coefficients and their standard errors, t-stats and p-values. It also computes a large battery of diagnostics on the regression. In this example we compute the white test which by default isn’t (‘white_test=True’). All of these results can be independently accessed as attributes of the regression object created by running pysal.model.spreg.OLS. They can also be accessed at one time by printing the summary attribute of the regression object. In the example below, the parameter on crime is -0.4849, with a t-statistic of -2.6544 and p-value of 0.01087.

>>> ols.betas
array([[ 46.42818268],
[  0.62898397],
[ -0.48488854]])
>>> print round(ols.t_stat[2][0],3)
-2.654
>>> print round(ols.t_stat[2][1],3)
0.011
>>> print round(ols.r2,3)
0.35


Or we can easily obtain a full summary of all the results nicely formatted and ready to be printed:

>>> print ols.summary
REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :    columbus
Dependent Variable  :  home value                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411
<BLANKLINE>
------------------------------------------------------------------------------------
Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
CONSTANT      46.4281827      13.1917570       3.5194844       0.0009867
crime      -0.4848885       0.1826729      -2.6544086       0.0108745
income       0.6289840       0.5359104       1.1736736       0.2465669
------------------------------------------------------------------------------------
<BLANKLINE>
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER           12.538
<BLANKLINE>
TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2          39.706           0.0000
<BLANKLINE>
DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                2           5.767           0.0559
Koenker-Bassett test              2           2.270           0.3214
<BLANKLINE>
SPECIFICATION ROBUST TEST
TEST                             DF        VALUE           PROB
White                             5           2.906           0.7145
================================ END OF REPORT =====================================


If the optional parameters w and spat_diag are passed to pysal.model.spreg.OLS, spatial diagnostics will also be computed for the regression. These include Lagrange multiplier tests and Moran’s I of the residuals. The w parameter is a PySAL spatial weights matrix. In this example, w is built directly from the shapefile columbus.shp, but w can also be read in from a GAL or GWT file. In this case a rook contiguity weights matrix is built, but PySAL also offers queen contiguity, distance weights and k nearest neighbor weights among others. In the example, the Moran’s I of the residuals is 0.204 with a standardized value of 2.592 and a p-value of 0.0095.

>>> w = pysal.lib.weights.Rook.from_shapefile(pysal.lib.examples.get_path("columbus.shp"))
>>> ols = OLS(y, X, w, spat_diag=True, moran=True, name_y='home value', name_x=['income','crime'], name_ds='columbus')
>>> ols.betas
array([[ 46.42818268],
[  0.62898397],
[ -0.48488854]])
>>> print round(ols.moran_res[0],3)
0.204
>>> print round(ols.moran_res[1],3)
2.592
>>> print round(ols.moran_res[2],4)
0.0095

Attributes: summary : string Summary of regression results and diagnostics (note: use in conjunction with the print command) betas : array kx1 array of estimated coefficients u : array nx1 array of residuals predy : array nx1 array of predicted y values n : integer Number of observations k : integer Number of variables for which coefficients are estimated (including the constant) y : array nx1 array for dependent variable x : array Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant robust : string Adjustment for robust standard errors mean_y : float Mean of dependent variable std_y : float Standard deviation of dependent variable vm : array Variance covariance matrix (kxk) r2 : float R squared ar2 : float Adjusted R squared utu : float Sum of squared residuals sig2 : float Sigma squared used in computations sig2ML : float Sigma squared (maximum likelihood) f_stat : tuple Statistic (float), p-value (float) logll : float Log likelihood aic : float Akaike information criterion schwarz : float Schwarz information criterion std_err : array 1xk array of standard errors of the betas t_stat : list of tuples t statistic; each tuple contains the pair (statistic, p-value), where each is a float mulColli : float Multicollinearity condition number jarque_bera : dictionary ‘jb’: Jarque-Bera statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int) breusch_pagan : dictionary ‘bp’: Breusch-Pagan statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int) koenker_bassett : dictionary ‘kb’: Koenker-Bassett statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int) white : dictionary ‘wh’: White statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int) lm_error : tuple Lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float lm_lag : tuple Lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float rlm_error : tuple Robust lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float rlm_lag : tuple Robust lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float lm_sarma : tuple Lagrange multiplier test for spatial SARMA model; tuple contains the pair (statistic, p-value), where each is a float moran_res : tuple Moran’s I for the residuals; tuple containing the triple (Moran’s I, standardized Moran’s I, p-value) name_y : string Name of dependent variable for use in output name_x : list of strings Names of independent variables for use in output name_w : string Name of weights matrix for use in output name_gwk : string Name of kernel weights matrix for use in output name_ds : string Name of dataset for use in output title : string Name of the regression method used sig2n : float Sigma squared (computed with n in the denominator) sig2n_k : float Sigma squared (computed with n-k in the denominator) xtx : float X’X xtxi : float (X’X)^-1
__init__(y, x, w=None, robust=None, gwk=None, sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

 __init__(y, x[, w, robust, gwk, sig2n_k, …]) Initialize self.

Attributes

 mean_y sig2n sig2n_k std_y utu vm