y = m(x) + e
where m(x) = E(y|x), which is assumed to be sufficiently differentiable (i.e. smooth) but may or may not have a specific parametric form. The purpose of nonparametric estimation of m(x) is to approximate E(y|x) arbitrarily closely, given a large enough sample. In addition to m(x), we are also interested in the derivatives of m(x).
For a sample of N observations of y and x, (yi, xi), i=1,2,...,N, the empirical model for nonparametric estimation is
yi = m(xi) + ei
We assume E(ei|xi) = 0 and Var(ei|xi) = s2(xi). Then the sample estimate of E(y|x) is the weighted average of sample observations of y as follows:
m(x) = åi=1,2,...Nwi(x)yi / åi=1,2,...Nwi(x)
where wi(x) = w(x-xi) is a weighting secheme for the i-th observation yi. Typically, wi(x) ³ 0 represents the distance of xi from x, which is high when the distance is small and low if the distance is large. wi(x) may be normalized so that wi(x) = 1 if x = xi (i.e. w(0)=1) or alternatively åi=1,2,...Nwi(x) = 1.
y » f(x)|x=xi = f(xi) + f'(xi)(x-xi)' + ½ (x-xi)f"(xi)(x-xi)' + ...
where f'(xi) = ¶f(xi)/¶x is the slope and f"(xi) = ¶2f(xi)/¶x¶x' is the curvature of f(x) at xi, respectively. The local approximation can be made as a constant, a linear, a quadratic, or even a higher order functional of x:
With the Taylor approximation of y at xi, we can write the estimate of m(x) for three models:
For various models, the estimates of f(xi), f'(xi), and f"(xi) are obtainted by minimizing the weighted sum of squares:
S = åi=1,2,...N wi(x)(yi-m(x))2
The weights for nonparametric model estimation can be derived from the scaled kernel as follows:
| wi(x) | = Kh(x-xi) / åi=1,2,...NKh(x-xi) |
| = K((x-xi)/h) / åi=1,2,...NK((x-xi)/h) |
where Kh(x-xi) = (1/h)K((x-xi)/h), and h > 0. Furthermore, wi(x) may be normalized in several convenient ways.
Nadaraya-Watson Estimator
Based on the constant model, the estimated of m(x) is
åi=1,2,...N wi(x)yi = åi=1,2,...NK((x-xi)/h)yi / åi=1,2,...NK((x-xi)/h)
m = (1/N)åi=1,2,...Nxi (Mean)
S = (1/N)(x-m)'(x-m) (Covariance)
(s2 = (1/N)åi=1,2,...N
(xi-m)2, if x is univarite)
If x is univariate, let h = ls with l > 0. Then the Gaussian kernel is defined by
K((x-xi)/h) = 1/((2p)½(ls)) exp(-1/(2l2)((x-xi)/s)2)
It is clear that the bandwidth of Gaussian kernel h is based on the size of standard deviation of the explanatory variable x. Simliarly, if x is multivariate, let h = (l2S)½. Then
K((x-xi)/h) = 1/((2p)K/2|det(l2S)|½) exp(-1/(2l2)((x-xi)S-1(x-xi)')
where K is the number of variables in x.
Compute the Euclidean distance between xi and a given x for i = 1,2,...,N:
di(x) = d(x-xi) = ((x-xi)(x-xi)')½
Order di(x) so that dK(x) is the maximal distance from x to include K nearest neighbors of x.
Define ui(x) = di(x)/dK(x), one of the following weights may be used:
Triangular Kernel Weight
| wi(x) = | 1-ui(x) | if ui(x) < 1 |
| 0 | otherwise |
Quartic Kernel Weight
| wi(x) = | (15/16)(1-ui(x)2)2 | if ui(x) < 1 |
| 0 | otherwise |
Tricube Kernel Weight
| wi(x) = | (1-ui(x)3)1/3 | if ui(x) < 1 |
| 0 | otherwise |
y = ai + bi(x-xi)' + e
where ai = f(xi) and bi = f'(xi). Let di = [ai, bi]', and Zi(x) = [1 x-xi]. For each data observation i, the kernel weighted least squares estimator of di is obtained from minimizing the weighted sum-of-squares:
Si = åj=1,2,...N wi(xj)'(yj-Zi(xj)di)2
As the local regression is performed for each data observation i, we drop the subscript i in the following summary of least squares results:
d = (Z'W Z)-1(Z'W y)
Var(d) =
s2(Z'W Z)-1(Z'W2Z)(Z'W Z)-1
where W is the diagonal weighting matrix with the diagonal vector wi(x), s2 = e'e/N is the estimated regression variance, and e = y-Zd is the vector of regression residuals.
Returning to the estimated model for each data observation i, the estimated intercept ai is exactly the fitted yi and the estimated bi is the slope or response coefficient of y with respect to x. Their estimated standard errors can be easily derived from Var(di). The sample pattern of the response coefficients may be of interest of structural analysis of the model.
For each data observation i, we first delete this observation from the sample. With the chosen kernel weight, we then carry out the weighted local regressions for the rest of N-1 observations. The predicted value of yi, which was eliminated from the sample, is obtained from setting x to xi in the estimated regression equation:
y = ai + bi(x-xi)'
That is, the estimated ai is the predicted yi with the prediction variance Var(ai). Finally, the collection of estimated ai is used to compared with the observed yi for i=1,2,...,N. The calulated goodness of fit statistics such as R-square and MSE may be used to compare and select the optimal value of bandwidth corresponding to the chosen kernel in weighting the data observations. Of course, the method of cross validation is model (constant, linear, or quadratic, etc.) and kernel (Gaussian or KNN) dependent.
Given an out-of-sample observation xN+1, the kernel weighted local regression based on the sample of N observations (yi, xi), i=1,2,...,N, is
y = aN+1 + bN+1(x-xN+1)' + e
The estimated intercept aN+1 is the predicted value of yN+1 (corresponding to x=xN+1), and Var(aN+1) is the estimated prediction variance, provided xN+1 is known without error.
A specification based on Houthakker-Taylor savings function is used here. This is also consistent with an error-correction form of the consumption equation (Davidson, Hendry, Srba, and Yeo [1978]).
C = f(DY,Y-1,S-1)
The linear approximation of C = f(DY,Y-1,S-1) at (DYi,Yi-1,Si-1) is
C = ai + bi1(DY-DYi) + bi2(Y-1-Yi-1) + bi3(S-1-Si-1) + e
Estimated Ci = f(DYi,Yi-1,Si-1) = ai
Estimated ¶Ci/¶DYi = ¶f(DYi,Yi-1,Si-1)/¶DYi = bi1
Estimated ¶Ci/¶Yi-1 = ¶f(DYi,Yi-1,Si-1)/¶Yi-1 = bi2
Estimated ¶Ci/¶Si-1 = ¶f(DYi,Yi-1,Si-1)/¶Si-1 = bi3
Gaussian Kernel Weights
Estimated Model: Actual vs Fitted
Estimated Response (Slope) Coefficients
KNN Kernel Weights
Estimated Model: Actual vs Fitted
Estimated Response (Slope) Coefficients