Nonlinear Econometric Models

Table of Contents


Box-Cox Variable Transformation

Readings and References:

The Box-Cox transformation of a data variable X is defined by

X(l) = (Xl-1)/l

Although the range of l can cover the whole real line, -2 £ l £ 2 is the area of interest in many econometric applications. If l = 2, it is the quadratic transformation. If l = 0.5, it is a square-root. A linear model corresponds to l = 1, and the logarithmic transformation is the limiting case that l -> 0 (by L'Hôspital's rule, liml->0(Xl-1)/l = ln(X)).

The value of power l may not be the same for each of the variables in the model. In particular, the dependent variable and independent variables as a group may take different Box-Cox transformations. Let a = (b,q,l)' be the vector of unknown parameters for a regression model:

e = F(Y,X,a) = Y(q) - X(l)b

Or, Equivalently,

Y(q) = X(l)b + e

where e ~ normal(0,s2I). The log-likelihood function is

ll(a,s2|Y,X) = -½N [ln(2p)+ln(s2)] -½ (e'e/s2) + (q-1)Si=1,2,...,Nln(|Yi|)

Note that for each data observation i, the Jacobian term is derived as Ji(q) = |ei/Yi| = |Yiq-1|. By substituting out s2 = e'e/N, the concentrated log-likelihood function is

ll*(a|Y,X) = -½N [1+ln(2p)-ln(N)] - Si=1,2,...,Nln(|Yi|) -½N ln(e'e) + q Si=1,2,...,Nln(|Yi|)
= -½N [1+ln(2p)-ln(N)] - Si=1,2,...,Nln(|Yi|) -½N ln(e*'e*)
where e* = e / [(|Y1||Y2|...|YN|)q/N]

Given the values of Box-Cox tranformation parematers q and l, a wide range of model specifications are possible. Of course, q and l should be estimated simultaneously with b. The efficient estimator of a = (b,q,l)' is obtained by maximizing the above concentrated log-likelihood function. It is equivalent to minimizing the sum of squared weighted errors:

S*(b|Y,X) = e*'e*,

where e* = we, and w = 1/[(|Y1||Y2|...|YN|)q/N].

Based on the estimated parameter vector a = (b,q,l)', a Box-Cox model is typically interpreted in terms of the elasticity. That is,

ln(Y)/ln(X) = (X/Y)(y/X) = (Xl/Yq)b

Example 1

Based on the money demand data given in Greene's Table 10.1 (p. 443 and 451), formulate and estimate the following functional forms of money demand equations (Program and Data, see also Greene's Example 10.11 for comparison):

M(q) = b0 + b1 R(l) + b2 Y(l) + e

As described in Greene's Example 10.5 and Table 10.3, M is the real stock of M2, R is the discount interest rate, and Y is the real GNP. Several variations of the Box-Cox transformation parameters may be estimated and tested for the most appropriate functional form of money demand equation:

  1. q = l, i.e.
    M(l) = b0 + b1 R(l) + b2 Y(l) + e
  2. q -> 0, i.e.
    ln(M) = b0 + b1 R(l) + b2 Y(l) + e
  3. l -> 0, i.e.
    M(q) = b0 + b1 ln(R) + b2 ln(Y) + e
  4. q -> 0 and l -> 0, i.e.
    ln(M) = b0 + b1 ln(R) + b2 ln(Y) + e

Homework

Continuing from the previous homework on estimating the generalized production functions of Zellner and Revankar [1970], the Box-Cox functional form is given as:

Q(q) = b1 + b2K(l) + b3L(l) + e

Estimate and interpret the Box-Cox production function based on the estimate (variable) rate of returns to scale and elsticiticy of subsitituion, respectively.

Many classical functional forms are possible by setting the Box-Cox transformation parameters. For example, if q, l -> 0, then it is the Cobb-Douglas form. Formulate and test for the most appropriate functional forms for this study.


Heteroscedastic Regression Models

Readings and References:

Consider a general regression model F(Y,X,b) = e ~ normal(0,S). Let the covariance matrix S = s2W(a), then the corresponding log-likelihood function is

ll(b,a,s2|Y,X) = -½N (ln(2p)+ln(s2)) -½ ln(|W|) -½ (1/s2)(e'W-1e) + Si=1,2,...,N ln(|¶ei/Yi|)

The last term is the sum of log-Jacobians from ei to Yi over the entire sample. Since the variance term can be solved as s2 = e'W-1e / N for a given W, the concentrated log-likelihood function is

ll*(b,a,s2|Y,X) = -½N (1+ln(2p)-ln(N)) -½N ln(e'W-1e) -½ ln(|W|) + Si=1,2,...,N ln(|¶ei/Yi|)

It is clear that the maximum likelihood estimation is in general not equivalent to the nonlinear least squares unless W = I, the identity matrix, and ¶ei/Yi = 1 for each i=1,2,...,N. If W is known and the log-Jacobians vanish, it is the GLS (Generalized Least Squares) problem that minimizes e'W-1e.

Unfortunately, W = W(a) is not known and must be parametrized with a lower dimention a, which in turn is estimated together with the vector of model parameters b. The models with heteroscedastic and/or autocorrelated errors are the special cases of the general regression model in which W(a) is defined more specifically.

For simplicity, consider a regression model e = F(Y,X,b) = Y - f(X,b). Then for each data observation i, ¶ei/Yi = 1. We assume further that the heteroscedastic error ei ~ normal(0,si2). The log-likelihood function is

ll(b,si2|Yi,Xi) = -½ [ln(2p) + ln(si2) + ei2/si2]

Summing over a sample of N observations, the total log-likelihood function is written as

ll(b,s12,s22,...,sN2|Y,X) = -½N ln(2p) -½ Si=1,2,...,Nln(si2) -½ Si=1,2,...,N(ei2/si2)

Given the general form of heteroscedasticity, there are too many unknown parameters. For practical purpose, some hypotheses of heteroscedasticity must be assumed:

si2 = s2 hi(a)

where s2 > 0 and hi(a) is indexed by i to indicate that it is a function of Zi. That is hi(a) = h(a|Zi), where Z is a set of independent variables that may or may not be coincide with X. Depending on the form of heteroscedasticity hi(a), denoted by hi for brevity, the log-likelihood function is written as

ll(b,a,s2|Y,X) = -½N (ln(2p) + ln(s2)) -½ Si=1,2,...,Nln(hi) -½(1/s2)Si=1,2,...,N(ei2/hi)

Let ei* = ei / Öhi and substitute out the maximum likelihood estimator of s2 with e*'e*/N, then the concentrated log-likelihood function is

ll*(b,a|Y,X) = -½N (1+ln(2p)-ln(N)) -½ Si=1,2,...,Nln(hi) -½N ln(e*'e*)

Consider the following special cases of hi = h(a|Zi) = h(Zia):

  1. si2 = s2(Zia), Zia > 0
  2. si2 = s2(Zia)2
  3. Multiplicative Heteroscedasticity: si2 = s2exp(Zia)

    The corresponding concentrated log-likelihood function for estimation is

    ll*(b,a|Y,X) = -½N (1+ln(2p)-ln(N)) -½ Si=1,2,...,Nzia -½N ln(e*'e*)

    where ei* = ei / exp(Zia)½ for each observation i=1,2,...,N.

  4. Exponential Heteroscedasticity: si2 = s2Pm=1,2,...,M Zimam, where M is the number of variables in Zi. This is equivalent to the multiplicative case if the variables in Z are logs. A special case, with a single variable, is

    si2 = s2 Zia

    If a = 0, the model is homoscedastic; If a = 2, it is the case (ii).

Example 2

Given the data of per capita expendiure on public schools and per capita income from Greene's Table 12.1 (1997, p. 541), consider the following somewhat heteroscedastic relationship of public school spending (Y) and income (X):

Y = b0 + b1 X + b2 X2 + e

Find and compare the maximum likelihood estimates based on the following hypotheses of heteroscedasticity (Program and Data):

  1. si2 = s2 Xi2
  2. si2 = s2 Xia
  3. si2 = s2 exp(aXi)
Note that (1) is a special case of (2) in which a = 2; and (2) is equivalent to (3) if X is expressed in log form.

Homework

A cross section model is prone to the problem of heterocedasticity. The generalized Cobb-Douglas production function of Zellner and Revankar [1970] investigated in the previous homeworks considers the output dependent scale elasticity. However, heteroscedasticity has not been tested and corrected. Recall that the production function is:

ln(Q) + q Q = b1 + b2ln(L) + b3ln(K) + e

where Q = VA/NFirm, L=Labor/NFirm, K=Capital/NFirm are per establishment data of value-added, labor, and capital, respectively. We assume that the heteroscedastic variances take the following multiplicative form:

si2 = s2 hi(a) and
hi(a) = exp(Zia)

where Zi = [ln(Li),ln(Ki)] and a = [a1,a2]'. Or, equivalently,

si2 = s2 Lia1 Kia2

Formulate and estimate the log-likelihood function with multiplicative heteroscedasticity. Does the incorporation of heteroscedastic variances improve the model estimates of the generalized Cobb-Douglas production function?


Binary Qualitative Choice Models

Readings and References:

Consider a linear regression model Y = Xb + e, where

Yi = 1 with probability Pi
0 with probability 1-Pi

It is clear that Xi explains the probability of Yi to be 1 or 0. Let

Pi = Prob(Yi=1|Xi) = F(Xib)
1-Pi = Prob(Yi=0|Xi) = 1-F(Xib)

Since E(Yi|Xi) = (1)F(Xib) + (0)(1-F(Xib)) = F(Xib), the estimated model may be interpreted with the marginal effects defined by

E(Yi|Xi)/Xi = [F(Xib)/(Xib)] b

Given a sample of N independent observations, the likelihood function is

L(b) = Õi=1,2,...,N PiYi (1-Pi)1-Yi = Õi=1,2,...,N F(Xib)Yi (1-F(Xib))1-Yi

Then the log-likelihood function is

ll(b) = ln(L(b)) = åi=1,2,...,N (Yi lnF(Xib) + (1-Yi) ln(1-F(Xib)))

To maximize ll(b) with respect to b, we solve from the first order condition:

ll(b)/¶b = åi=1,2,...,N (Yi/Fi-(1-Yi)/(1-Fi)) fiXi
= åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

where Fi = F(Xib) and fi = f(Xib) = Fi/(Xib). Note that fiXi = Fi/¶b. Finally, the Hessian ll2(b)/¶b¶b' must be negative definite, and the estimated variance-covariance matrix of b is Var(b) = [-E(ll2(b)/¶b¶b')]-1.

Linear Probability Model

Pi = F(Xib) = Xib

It is immediately that E(Yi|Xi) = Xib. In particular,

E(ei) = (1-Xib)Pi + (-Xib)(1-Pi) = Pi - Xib
Var(ei) = E(ei2) = Pi(1-Xib)2 + (1-Pi)(-Xib)2
= Pi(1-Pi)2 + (1-Pi)(-Pi)2 = (1-Pi)Pi = (1-Xib)(Xib)

The range of Var(ei) is between 0 and 0.25 and it is clearly heteroscedastic. Furthermore, since E(Yi|Xi) = F(Xib) = Xib, a linear function, there is no guarantee that the estimated probability will lie within the unit interval.

Probit Model

Pi = F(Xib) = òXib 1/(2p)½ exp(-z2/2) dz

Pi, the cumulative normal distribution, is called Probit for the i-th observation. The model Yi = F-1(Pi) + ei is called the Probit Model, where F-1(Pi) = Xib is the inverse of cumulative distribution F(Xib). The probit model can be derived from a model involving an unobserved, or latent, variable Yi* such that Yi* = Xib + ei where ei ~ normal(0,1). Suppose the value of the observed binary variable Yi depends on the sign of Yi*:

Yi = 1 if Yi* > 0
0 if Yi* £ 0

Therefore,

Pi = Prob(Yi=1|Xi) = Prob(Yi*>0|Xi) = Prob(ei>-Xib)
= ò¥-Xib 1/(2p)½ exp(-z2/2) dz
= òXib 1/(2p)½ exp(-z2/2) dz

For maximum likelihood estimation, we solve the following first order condition:

åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

where Fi = F(Xib) = òXib 1/(2p)½ exp(-z2/2) dz, and
fi = F(Xib)/(Xib) = 1/(2p)½ exp(-(Xib)2/2)

This is exactly the first order conditions for weighted least squares estimation of the nonlinear regression model: Yi = F(Xib) + ei with weights given by (F(Xib)(1-F(Xib)).

Furthermore, it can be shown that for the maximum likelihood estimates b

E([2ll(b)/¶b¶b']) = -åi=1,2,...,N(fi2XiXi')/(Fi(1-Fi))

which is negative definite. The estimated variance-covariance matrix of b is computed as

Var(b) = (-E[2ll(b)/¶b¶b'])-1

For model interpretation, the marginal effects of Xi is defined as

E(Yi|Xi)/Xi = [F(Xib)/¶(Xib)] b = f(Xib)b = fib

Logit Model

Pi = F(Xib) = 1/(1+exp(-Xib))

Pi as defined is the logistic curve. The model Yi = F-1(Pi) + ei is called the Logit Model. The logit model is most easily derived by assuming the logarithm of the odds is equal to Xib, or the odd ratio model: ln(Pi/(1-Pi)) = Xib Solving for Pi, we find that

Pi = exp(Xib)/(1+exp(Xib)) = 1/(1+exp(-Xib))

For maximum likelihood estimation, we solve the following first order condition:

åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0

Because of the logistic functional form,

Fi = F(Xib) = 1/(1+exp(-Xib)) and
fi = F(Xib)/¶(Xib) = exp(-Xib)/(1+exp(-Xib)) = Fi(1-Fi)

it amounts to solve the following simple expression:

åi=1,2,...,N (Yi-Fi)Xi = 0

with the negative definite Hessian:

2ll(b)/¶b¶b' = - åi=1,2,...,NFi(1-Fi)Xi'Xi

Therefore, the estimate of variance-covariance matrix of b is

Var(b) = [-2ll(b)/¶b¶b']-1

For model interpretation, the marginal effects of Xi is defined as

E(Yi|Xi)/Xi = [F(Xib)/¶(Xib)] b = fib = Fi(1-Fi)b

Example 3

This example (see also, Greene [1999], Example 19.1) examines the effect of a new teaching method on students' grades. The following variables are used (Data):

The qualitative equation is formulated as follows:

GRADE = b0 + b1GPA + b2TUCE + b3PSI + e

Estimate and interpret the logit (Program) and probit (Program) probablity model specifications of the above equation, respectively. Explain the estimated marginal effects of new teaching method on students' grade performance.

Homework

To consider the potential problem of heteroscedasticity in the above cross section study, we assume a form of multiplicative heterscedasticity associated with the important variable PSI in the qualitative regression equation:

GRADE = b0 + b1GPA + b2TUCE + b3PSI + e

where for each observation i, the heteroscedastic variance is defined by

si2 = exp(aPSIi)2

That is, when PSIi = 0, si2 = 1; when PSIi equals 1, si2 = exp(a)2, where a is the unknown parameter in addition to bs for model estimation.


Switching Regression Models

Readings and References:

Techniques of swtiching regression are useful to study the random variable with mixture of probability distributions, sturctural change, and disequilibrium. The recent development of Markov swtiching can be applied to dynamic time series models.

Consider the M-state (regime or structure) linear regression equation:

Yt = Xtb(i) + et(i)
et(i) ~ normal(0,s2(i))

where [Yt,Xt] is the observed sample data, t=1,2,...,N, and (i) is the state (regime) indicator, i=1,2,...,M. In particular, assuming normal distribution, the conditional probability density function of Yt occured in the regime i is written as:

f(Yt;b(i),s2(i)) = 1/Ö(2ps2(i)) exp(-½(Yt-Xtb(i))2/s2(i))

In general, the regression parameters could change several times (that is, switching backward and forward) within M possible regimes. Typically, M=2.

Deterministic Switching

If the sample separation is known at a given break point, then dummy variable approach and classical Chow test may be used to study the structural switch in the model. For examples, the well-known break points in time series are the World War II (end in 1945), OPEC Oil Embargo (began in October 1973), Black Monday (stock market crashed on Monday, October 19, 1987).

Dummy Variable Approach

Suppose the entire sample can be divided into M non-overlapping sub-samples: N(i), i=1,2,...,M. åi=1,2,...,M N(i) = N. Let Dit be the dummy variable defined by:

Dit = 1, if t in N(i)
0 otherwise

Since Dit is observed, given the conditional probability density function f(Yt;b(i),s2(i)) of Yt, the log-likelihood function for model estimation is:

ll(q) = åt=1,2,...,Nln åi=1,2,...,M Ditf(Yt;b(i),s2(i))

where q = (b(i),s2(i), i=1,2,...,M)' is the parameter vector of the model.

A Simple Disequilibrium Econometric Model

An interesting application of deterministic switching regression is the disequilibrium econometric model. The simplest model of maket disequilibrium includes a demand equation, a supply equation, and a transcation equation. Both demand and supply are unobservable. The short-side transaction rule determines that the quantity traded (or consumed) Q is the smaller of demand D and supply S. This rule is also called the minimum condition: Q = min(D,S). For notational convenience, variables and parameters associated with demand and supply equations are subscripted with D and S, respectively. The common (shared) variables are not subscripted (e.g. P for price and Y for income, etc.). The model may be written as follows:

Demand Equation: D = fD(P,XD,bD)
Supply Equation: S = fS(P,XS,bS)
Transaction Equation:
(Minimum Condition)
Q = min(D,S)

The simplest form of disequilibrium model has been applied to estimate the macroeconomic structure of a centrally planned economy (CPE) during periods of chronic shortage. For a market or transitional economy, a price machanism or equation may be added to describe the adjustment of disequilibrium toward the clearence of market. The simplest description of the price tatonnement adjustment is:

DP = d(D-S)

where DP denotes the change of price level, and d > 0 is the adjustment coefficient.

The Model

D = fD(P,XD,bD) + eD
S = fS(P,XS,bS) + eS
Q = min(D,S)

Assume both eD and eS are normally independently distributed with zero means and variances sD2, sS2 respectively. We write

g1(D) = 1/Ö(2psD) exp[-½(D-fD(P,XD,bD))2/sD2]
g2(S) = 1/Ö(2psS) exp[-½(S-fS(P,XS,bS))2/sS2]

The joint probabililty density of D and S (assuming that D and S are independent) is

g(D,S) = g1(D)g2(S) = 1/Ö(2psDsS) exp[-½(D-fD(P,XD,bD))2/sD2] exp[-½(S-fS(P,XS,bS))2/sS2]

The two conditional probability density functions of the observable Q when Q=D<S and Q=S<D are:

h(Q|Q=D<S) = òQ¥g(Q,S)dS = g1(Q)òQ¥g2(S)dS
h(Q|Q=S<D) = òQ¥g(D,Q)dD = g2(Q)òQ¥g1(D)dD

Then the unconditional probability density function of the variable Q is:

h(Q) = h(Q|Q=D<S) + h(Q|Q=S<D) = g1(Q)òQ¥g2(S)dS + g2(Q)òQ¥g1(D)dD

Given a sample of N observations of Q, P, XD, and XS, to estimate the model parameters q = (bD,bS,sD,sS) the following log-likelihood function is maximized:

ll(q) = åt=1,...,Nln h(Qt)

Example 5

This example is taken from Portes, et. al. [1987] on estimating the aggregate consumer goods market of Poland for the pre-reform period from 1959 to 1980. We shall consider only the simpliest case of discrete switching. The model is presented as follows (Program, Data):

Demand Equation CD = a0 + a1Y-1 + a2(Y-Y-1) + a3(Y-C)-1
Supply Equation CS = b0 + b1C* + b2Z* + b3F
Transaction Function
(Minimum Condition)
C = min(CD,CS)

where CD = Desired consumption
C = Actual consumption
Y = Disposable income
CS = Supply of consumption goods
C* = Plan target for consumption
Z* = (C*/NMP*)(NMP-NMP*)
NMP = Net material product
NMP* = Target for net material product
F = Trend deviation of net financial assets of households

Stochastic Switching

If there is no a prior information can be based upon to make structure or regime classification, it is completely up to the state of nature to decide the regression structure. Let St be the discrete state variable which takes an index for the regime at t. That is, St = 1,2,...,M. Using the dummy variable notation:

Dit = 1, if St = i (the regime is i at time t)
0 otherwise

Dit is a random discrete variable taking value either 1 or 0, which is not directly observed. Let Dit* be the latent variable for Dit such that Dit* > 0 if Dit = 1, and Dit* £ 0 otherwise (Dit = 0). We write:

Dit* = a(i) + ut

Or, in a more general framework, Dit* = Zta(i) + ut where Zt is a set of exogenous or predertermined variables determining the regime structure.

When Dit* > 0, ut > - a(i). Assuming ut ~ normal(0,1), the probability of Dit = 1 is represented by the cumulative normal density evaluated from -¥ to a(i). Then the probability that St=i is expressed as:

Pr(St=i;a(i)) = Pr(Dit=1) = Pr(Dit*>0) = Pr(ut>-a(i))
= ò-¥ a(i) 1/Ö(2p) exp(-½ z2)dz

Of course, åi=1,2,...,M Pr(St=i;a(i)) = 1. Then the conditional joint probability density function of Yt and St=i is

f(Yt,St=i;b(i),s2(i),a(i))
= Pr(St=i;a(i)) f(Yt|St=i;b(i),s2(i))
= ò-¥ a(i)1/Ö(2p) exp(-½ z2)dz [1/Ö(2ps2(i)) exp(-½(Yt-Xtb(i))2/s2(i))]

Let q(i) = (b(i),s2(i),a(i))' and q =(q(i), i=1,2,...,M)'. Then, the unconditional probability density function of Yt is

f(Yt;q) = åi=1,2,...,M f(Yt,St=i;q(i))

The resulting log-likelihood function is:

ll(q) = åt=1,2,...,N ln f(Yt;q)
= åt=1,2,...,N ln åi=1,2,...,M f(Yt,St=i;q(i))

As a by-product of the maximum likelihood estimation of the parameter vector q, the estimated conditional probability for the observation t occured in regime i is computed as follows:

Pr(St=i|Yt;q) = f(Yt,St=i;q(i)) / f(Yt;q)
= f(Yt,St=i;q(i)) / åi=1,2,...,M f(Yt,St=i;q(i))

Mixture of Probability Distributions Revisited

Recall the previous study of INCOME variable drawn from the mixture of two normal probability distributions (Program, Data). That is, for each observation of INCOME variable Yt, it is the mixture of two normal distributions:

f(Yt|St=i;m(i),s2(i)) = 1/Ö(2ps2(i)) exp(-½(Yt-m(i))2/s2(i)) with probability Pr(St=i), i=1,2.

Let Pr(St=1)=p, then Pr(St=2) = 1-p. Here p>0 must be estimated togeter with the parameters m(i) and s2(i) for i=1,2. The log-likelihood function is:

ll(p,m(1),m(2),s2(1),s2(2)) = åt=1,2,...,N ln [p f(Yt|St=1;m(1),s2(1)) + (1-p) f(Yt|St=2;m(2),s2(2))]

Finally, the conditional probability of INCOME variable Yt drawn from regime 1 is

p f(Yt|St=1;m(1),s2(1)) / [p f(Yt|St=1;m(1),s2(1)) + (1-p) f(Yt|St=2;m(2),s2(2))]

Markov Switching

If the switching is state-dependent (as in most cases of dynamic models), we assume the simplest case of first-order Markov process for the state indicator St with the following transition probability:

Pr(St=i|St-1=j) = pij > 0
åi=1,2,...,M pij = 1

Then the joint probability of St=i and St-1=j is defined by

Pr(St=i,St-1=j) = Pr(St=i|St-1=j)Pr(St-1=j)
= pijPr(St-1=j)

For a two-state Markov switching process (M=2), we write:

p = p11 = Pr(St=1|St-1=1)     1-p = p21 = Pr(St=2|St-1=1)
q = p22 = Pr(St=2|St-1=2)     1-q = p12 = Pr(St=1|St-1=2)

Let Pr(St=1) = Pr(St*£0) and Pr(St=2) = Pr(St*>0), where St* is a latent variable for the discrete state indicator St defined by:

St* = a0 + a1 (St-1-1) + ut, ut ~ normal(0,1)

Therefore,

p = Pr(St=1|St-1=1) = Pr(St*£0|St-1=1) = Pr(ut£-a0)
= ò-¥ -a01/Ö(2p) exp(-½ z2)dz
q = Pr(St=2|St-1=2) = Pr(St*>0) = Pr(ut>-a0-a1)
= ò-a0-a1 ¥1/Ö(2p) exp(-½ z2)dz
= 1 - ò-¥ -a0-a11/Ö(2p) exp(-½ z2)dz

It clear that through probability transformation, 0 < p < 1 and 0 < q < 1. An alternative approach to ensure that 0 < p < 1 and 0 < q < 1 is to use logistic transformation as follows:

p = exp(p0)/(1+exp(p0))
q = exp(q0)/(1+exp(q0))

where p0 and q0 are unrestricted real numbers.

We are interested in a dynamic model of Yt resulting from a combination of M states amd L lags. Then there are r cases or events (r = ML+1) must be considered: (St,St-1,...,St-r+1). For simplicity, we consider a 2-state 1-lag dynamic model. There are four cases: (St=1,St-1=1), (St=2,St-1=1), (St=1,St-1=2), (St=2,St-1=2).

Let Ht-1 = {Yt-1,Yt-2,...} be the historical information up to time t-1. Suppose the conditional probability density function of Yt (conditional to St=i, St-1=j and Ht-1, ignoring the parameter vector q for the moment) is f(Yt|St=i,St-1=j,Ht-1). Then the conditional joint probability density function of Yt, St, and St-1 is:

f(Yt,St=i,St-1=j|Ht-1) = Pr(St=i,St-1=j|Ht-1) f(Yt|St=i,St-1=j,Ht-1)

and

f(Yt|Ht-1;q) = åi=1,2åj=1,2 f(Yt,St=i,St-1=j|Ht-1)

Finally, the conditional log-likelihood function is:

ll(q) = åt=1,2,...,N ln f(Yt|Ht-1;q)

Hamilton's Algorithm

To evaluate the log-likelihood function ll(q), first we need to compute:

(1)     Pr(St=i,St-1=j|Ht-1) = Pr(St=i|St-1=j) Pr(St-1=j|Ht-1)

Starting t=1, let Pr(S0=1|H0) = (1-p)/(2-p-q) and Pr(S0=2|H0) = (1-q)/(2-p-q). These are the steady-state or unconditional probability of St=1 and St=2, respectively.

From the computed Pr(St=i,St-1=j|Ht-1) and given f(Yt|St=i,St-1=j,Ht-1), compute the joint pdfs f(Yt,St=i,St-1=j|Ht-1) for all combination of i,j=1,2, and therefore the conditional likelihood at t is:

(2)     f(Yt|Ht-1) = åi=1,2åj=1,2 f(Yt,St=i,St-1=j|Ht-1)
= åi=1,2åj=1,2 Pr(St=i,St-1=j|Ht-1) f(Yt|St=i,St-1=j,Ht-1)

By definition, Pr(St=i,St-1=j|Ht) = Pr(St=i,St-1=j|Yt,Ht-1)
= f(Yt,St=i,St-1=j|Ht-1) / f(Yt|Ht-1)
= f(Yt|St=i,St-1=j,Ht-1)Pr(St=i,St-1=j|Ht-1) / f(Yt|Ht-1)
Pr(St=i|Ht)= åj=1,2 Pr(St=i,St-1=j|Ht)

To iterate from t=1 to N, use Pr(St=i|Ht) to compute Pr(St+1=k,St=i|Ht) as (1) and the conditional likelihood f(Yt+1|Ht) as (2). Finally, we sum all the log of component log-lieklihoods for maximization with repect to the parameter vector.

The above algorithm may be generalized straightforwardly to consider M cases (M>2) and L lags (L>1), in which the probabilities and pdfs are evaluated over a large number of ML+1 cases.

For parameter estimation, we apply standard numercial maximization of the log-likelihood function for the entire sample. The alternative method is the EM (Expectation-Maximization) Algorithm.

To make inference about the state variable St, we can use filtered probability: Pr(St=i|Ht), or smoothed probability: Pr(St=i|HN). The later refers to the probability of St=i conditional on all the information in the sample. The smoothed probability is obtained by updating the filtered probability using all information.

Example: Hamilton's Model of Business Fluctuations

Define the growth in real GDP (Data) as follows:

Yt = 100*(ln(GDPt)-ln(GDPt-1))

Then 2-state ("boom" and "burst", or expansion and recession) growth in real GDP is modeled as an AR(4) process:

Yt = m(i) + et, i=1,2
et = r1et-1 + r2et-2 + r3et-3 + r4et-4 + ut

where ut ~ normal(0,s2). By assuming the first-order Markov switching process, the transition probabilities between two consecutive states St and St-1 are Pr(St=1|St-1=1)=p and Pr(St=2|St-1=2)=q. By definition, Pr(St=1|St-1=2)=1-q and Pr(St=2|St-1=1)=1-p.

The model can be represented equivalently as:

(Yt-m(i)) = r1(Yt-1-m(j)) + r2(Yt-2-m(k)) + r3(Yt-3-m(l)) + r4(Yt-4-m(m)) + ut

with i,j,k,l,m = 1,2. In total, there are 32 (=24+1) cases of probabilities and pdfs forming the likelihood function for maximization with respect to the parameter vector. In addition to the model estimation, both filtered and sommothed probabilities are computed for inference about the state variable St: Pr(St|Ht) and Pr(St|HN) for each observation t. (See Program [draft version])

Example: Three-State Markov Switching Model of the Real Interest Rate

Garcia and Perron [1996] switching regression model of real interest rate is a 3-state AR(2) process defined as follows:

(Yt-m(i)) = r1(Yt-1-m(j)) + r2(Yt-2-m(k)) + ut
ut ~ normal(0,s2(i))

with i,j,k = 1,2,3, where Yt is the ex-post real interest rate calculated by substracting the CPI-based inflation rate from the nominal interest rate (three-month treasury bill rate). See Data.

Transition probability for the 3-state 1st-order Markov process is defined by pij = Pr(St=i|St-1=j) > 0, where åi=1,2,3 pij = 1 for j=1,2,3.

In total, there are 27(=32+1) cases of probabilities and density functions forming the likelihood function for maximization with respect to 14 parameters in the model:

m(1), m(2), m(3)
r1, r2
s2(1), s2(2), s2(3)
p11, p12, p13, p21, p22, p23

Homework

Formulate and estimate the log-likelihood function for Garcia-Perron Model of real interest rate.


Copyright © Kuan-Pin Lin
Last updated: June 10, 2002