Nonparametric Model Estimation

Given a general unspecified functional relationship y = f(x), a typical regression model is to estimate E(y|x) as follows:

y = m(x) + e

where m(x) = E(y|x), which is assumed to be sufficiently differentiable (i.e. smooth) but may or may not have a specific parametric form. The purpose of nonparametric estimation of m(x) is to approximate E(y|x) arbitrarily closely, given a large enough sample. In addition to m(x), we are also interested in the derivatives of m(x).

For a sample of N observations of y and x, (yi, xi), i=1,2,...,N, the empirical model for nonparametric estimation is

yi = m(xi) + ei

We assume E(ei|xi) = 0 and Var(ei|xi) = s2(xi). Then the sample estimate of E(y|x) is the weighted average of sample observations of y as follows:

m(x) = åi=1,2,...Nwi(x)yi / åi=1,2,...Nwi(x)

where wi(x) = w(x-xi) is a weighting secheme for the i-th observation yi. Typically, wi(x) ³ 0 represents the distance of xi from x, which is high when the distance is small and low if the distance is large. wi(x) may be normalized so that wi(x) = 1 if x = xi (i.e. w(0)=1) or alternatively åi=1,2,...Nwi(x) = 1.

Local Approximation

The unknown functional relationship y = f(x) may be Taylor approximated at x = xi:

y » f(x)|x=xi = f(xi) + f'(xi)(x-xi)' + ½ (x-xi)f"(xi)(x-xi)' + ...

where f'(xi) = f(xi)/x is the slope and f"(xi) = 2f(xi)/xx' is the curvature of f(x) at xi, respectively. The local approximation can be made as a constant, a linear, a quadratic, or even a higher order functional of x:

With the Taylor approximation of y at xi, we can write the estimate of m(x) for three models:

For various models, the estimates of f(xi), f'(xi), and f"(xi) are obtainted by minimizing the weighted sum of squares:

S = åi=1,2,...N wi(x)(yi-m(x))2

Kernel Weights

The kernel is a continuous bounded function K such that K(u) ³ 0 and òK(u)du = 1. By rescaling the kernel with respect to a variable h > 0, we write Kh(u) = (1/h)K(u/h). It is clear that òKh(u)du = 1. That is, Kh itself is a kernel and h is called the bandwidth or window size of the kernel. h is used to control the precision or smoothness of the kernel.

The weights for nonparametric model estimation can be derived from the scaled kernel as follows:

wi(x) = Kh(x-xi) / åi=1,2,...NKh(x-xi)
= K((x-xi)/h) / åi=1,2,...NK((x-xi)/h)

where Kh(x-xi) = (1/h)K((x-xi)/h), and h > 0. Furthermore, wi(x) may be normalized in several convenient ways.

Nadaraya-Watson Estimator

Based on the constant model, the estimated of m(x) is

åi=1,2,...N wi(x)yi = åi=1,2,...NK((x-xi)/h)yi / åi=1,2,...NK((x-xi)/h)

Gaussian Kernel Weights

From a sample of N observations, compute the following sample statistics for the data matrix of x:

m = (1/N)åi=1,2,...Nxi (Mean)
S = (1/N)(x-m)'(x-m) (Covariance)
(s2 = (1/N)åi=1,2,...N (xi-m)2, if x is univarite)

If x is univariate, let h = ls with l > 0. Then the Gaussian kernel is defined by

K((x-xi)/h) = 1/((2p)½(ls)) exp(-1/(2l2)((x-xi)/s)2)

It is clear that the bandwidth of Gaussian kernel h is based on the size of standard deviation of the explanatory variable x. Simliarly, if x is multivariate, let h = (l2S)½. Then

K((x-xi)/h) = 1/((2p)K/2|det(l2S)|½) exp(-1/(2l2)((x-xi)S-1(x-xi)')

where K is the number of variables in x.

K-Nearest Kernel Weights

Let K = int(lN), and 0 £ l £ 1.

Compute the Euclidean distance between xi and a given x for i = 1,2,...,N:

di(x) = d(x-xi) = ((x-xi)(x-xi)')½

Order di(x) so that dK(x) is the maximal distance from x to include K nearest neighbors of x.

Define ui(x) = di(x)/dK(x), one of the following weights may be used:

Triangular Kernel Weight

wi(x) = 1-ui(x) if ui(x) < 1
0 otherwise

Quartic Kernel Weight

wi(x) = (15/16)(1-ui(x)2)2 if ui(x) < 1
0 otherwise

Tricube Kernel Weight

wi(x) = (1-ui(x)3)1/3 if ui(x) < 1
0 otherwise

Kernel Weighted Local Regression

Consider the first-order or linear local approximation of y = f(x) at xi,

y = ai + bi(x-xi)' + e

where ai = f(xi) and bi = f'(xi). Let di = [ai, bi]', and Zi(x) = [1 x-xi]. For each data observation i, the kernel weighted least squares estimator of di is obtained from minimizing the weighted sum-of-squares:

Si = åj=1,2,...N wi(xj)'(yj-Zi(xj)di)2

As the local regression is performed for each data observation i, we drop the subscript i in the following summary of least squares results:

d = (Z'W Z)-1(Z'W y)
Var(d) = s2(Z'W Z)-1(Z'W2Z)(Z'W Z)-1

where W is the diagonal weighting matrix with the diagonal vector wi(x), s2 = e'e/N is the estimated regression variance, and e = y-Zd is the vector of regression residuals.

Returning to the estimated model for each data observation i, the estimated intercept ai is exactly the fitted yi and the estimated bi is the slope or response coefficient of y with respect to x. Their estimated standard errors can be easily derived from Var(di). The sample pattern of the response coefficients may be of interest of structural analysis of the model.

Model Cross Validation

The estimated results of kernel weighted least squares depend on the use of the kernel and in particular on the slection of the kernel bandwidth. It is important to decide which type of kernel weights will suit the best for a particular model and sample. Given a chosen kernel weight, its bandwidth controls the precision of the model estimation. The method of cross validation for optimal bandwidth is oulined below:

For each data observation i, we first delete this observation from the sample. With the chosen kernel weight, we then carry out the weighted local regressions for the rest of N-1 observations. The predicted value of yi, which was eliminated from the sample, is obtained from setting x to xi in the estimated regression equation:

y = ai + bi(x-xi)'

That is, the estimated ai is the predicted yi with the prediction variance Var(ai). Finally, the collection of estimated ai is used to compared with the observed yi for i=1,2,...,N. The calulated goodness of fit statistics such as R-square and MSE may be used to compare and select the optimal value of bandwidth corresponding to the chosen kernel in weighting the data observations. Of course, the method of cross validation is model (constant, linear, or quadratic, etc.) and kernel (Gaussian or KNN) dependent.

Model Prediction

Since there is no fixed structure of the model, for the purpose of prediction, it is a straightforward reinterpretation of the kernel weighted local regression for observations of out-of-sample.

Given an out-of-sample observation xN+1, the kernel weighted local regression based on the sample of N observations (yi, xi), i=1,2,...,N, is

y = aN+1 + bN+1(x-xN+1)' + e

The estimated intercept aN+1 is the predicted value of yN+1 (corresponding to x=xN+1), and Var(aN+1) is the estimated prediction variance, provided xN+1 is known without error.

Extensions and Future Research

Example: Aggregate Consumption and Income in China

Variables

Consumption-Income Relationship

C = f(Y)

A specification based on Houthakker-Taylor savings function is used here. This is also consistent with an error-correction form of the consumption equation (Davidson, Hendry, Srba, and Yeo [1978]).

C = f(DY,Y-1,S-1)

The linear approximation of C = f(DY,Y-1,S-1) at (DYi,Yi-1,Si-1) is

C = ai + bi1(DY-DYi) + bi2(Y-1-Yi-1) + bi3(S-1-Si-1) + e

Empirical Results

For each observation i (see Program and Data),

Estimated Ci = f(DYi,Yi-1,Si-1) = ai

Estimated Ci/¶DYi = f(DYi,Yi-1,Si-1)/¶DYi = bi1

Estimated Ci/Yi-1 = f(DYi,Yi-1,Si-1)/Yi-1 = bi2

Estimated Ci/Si-1 = f(DYi,Yi-1,Si-1)/Si-1 = bi3


Copyright © Kuan-Pin Lin
Last updated: April 14, 2005