X(l) = (Xl-1)/l
Although the range of l can cover the whole real line, -2 £ l £ 2 is the area of interest in many econometric applications. If l = 2, it is the quadratic transformation. If l = 0.5, it is a square-root. A linear model corresponds to l = 1, and the logarithmic transformation is the limiting case that l -> 0 (by L'Hôspital's rule, liml->0(Xl-1)/l = ln(X)).
The value of power l may not be the same for each of the variables in the model. In particular, the dependent variable and independent variables as a group may take different Box-Cox transformations. Let a = (b,q,l)' be the vector of unknown parameters for a regression model:
e = F(Y,X,a) = Y(q) - X(l)b
Or, Equivalently,
Y(q) = X(l)b + e
where e ~ normal(0,s2I). The log-likelihood function is
ll(a,s2|Y,X) = -½N [ln(2p)+ln(s2)] -½ (e'e/s2) + (q-1)Si=1,2,...,Nln(|Yi|)
Note that for each data observation i, the Jacobian term is derived as Ji(q) = |¶ei/¶Yi| = |Yiq-1|. By substituting out s2 = e'e/N, the concentrated log-likelihood function is
| ll*(a|Y,X) | = -½N [1+ln(2p)-ln(N)] - Si=1,2,...,Nln(|Yi|) -½N ln(e'e) + q Si=1,2,...,Nln(|Yi|) |
| = -½N [1+ln(2p)-ln(N)]
- Si=1,2,...,Nln(|Yi|)
-½N ln(e*'e*) where e* = e / [(|Y1||Y2|...|YN|)q/N] |
Given the values of Box-Cox tranformation parematers q and l, a wide range of model specifications are possible. Of course, q and l should be estimated simultaneously with b. The efficient estimator of a = (b,q,l)' is obtained by maximizing the above concentrated log-likelihood function. It is equivalent to minimizing the sum of squared weighted errors:
S*(b|Y,X) = e*'e*,
where e* = we, and w = 1/[(|Y1||Y2|...|YN|)q/N].
Based on the estimated parameter vector a = (b,q,l)', a Box-Cox model is typically interpreted in terms of the elasticity. That is,
¶ln(Y)/¶ln(X) = (X/Y)(¶y/¶X) = (Xl/Yq)b
M(q) = b0 + b1 R(l) + b2 Y(l) + e
As described in Greene's Example 10.5 and Table 10.3, M is the real stock of M2, R is the discount interest rate, and Y is the real GNP. Several variations of the Box-Cox transformation parameters may be estimated and tested for the most appropriate functional form of money demand equation:
Q(q) = b1 + b2K(l) + b3L(l) + e
Estimate and interpret the Box-Cox production function based on the estimate (variable) rate of returns to scale and elsticiticy of subsitituion, respectively.
Many classical functional forms are possible by setting the Box-Cox transformation parameters. For example, if q, l -> 0, then it is the Cobb-Douglas form. Formulate and test for the most appropriate functional forms for this study.
ll(b,a,s2|Y,X) = -½N (ln(2p)+ln(s2)) -½ ln(|W|) -½ (1/s2)(e'W-1e) + Si=1,2,...,N ln(|¶ei/¶Yi|)
The last term is the sum of log-Jacobians from ei to Yi over the entire sample. Since the variance term can be solved as s2 = e'W-1e / N for a given W, the concentrated log-likelihood function is
ll*(b,a,s2|Y,X) = -½N (1+ln(2p)-ln(N)) -½N ln(e'W-1e) -½ ln(|W|) + Si=1,2,...,N ln(|¶ei/¶Yi|)
It is clear that the maximum likelihood estimation is in general not equivalent to the nonlinear least squares unless W = I, the identity matrix, and ¶ei/¶Yi = 1 for each i=1,2,...,N. If W is known and the log-Jacobians vanish, it is the GLS (Generalized Least Squares) problem that minimizes e'W-1e.
Unfortunately, W = W(a) is not known and must be parametrized with a lower dimention a, which in turn is estimated together with the vector of model parameters b. The models with heteroscedastic and/or autocorrelated errors are the special cases of the general regression model in which W(a) is defined more specifically.
For simplicity, consider a regression model e = F(Y,X,b) = Y - f(X,b). Then for each data observation i, ¶ei/¶Yi = 1. We assume further that the heteroscedastic error ei ~ normal(0,si2). The log-likelihood function is
ll(b,si2|Yi,Xi) = -½ [ln(2p) + ln(si2) + ei2/si2]
Summing over a sample of N observations, the total log-likelihood function is written as
ll(b,s12,s22,...,sN2|Y,X) = -½N ln(2p) -½ Si=1,2,...,Nln(si2) -½ Si=1,2,...,N(ei2/si2)
Given the general form of heteroscedasticity, there are too many unknown parameters. For practical purpose, some hypotheses of heteroscedasticity must be assumed:
si2 = s2 hi(a)
where s2 > 0 and hi(a) is indexed by i to indicate that it is a function of Zi. That is hi(a) = h(a|Zi), where Z is a set of independent variables that may or may not be coincide with X. Depending on the form of heteroscedasticity hi(a), denoted by hi for brevity, the log-likelihood function is written as
ll(b,a,s2|Y,X) = -½N (ln(2p) + ln(s2)) -½ Si=1,2,...,Nln(hi) -½(1/s2)Si=1,2,...,N(ei2/hi)
Let ei* = ei / Öhi and substitute out the maximum likelihood estimator of s2 with e*'e*/N, then the concentrated log-likelihood function is
ll*(b,a|Y,X) = -½N (1+ln(2p)-ln(N)) -½ Si=1,2,...,Nln(hi) -½N ln(e*'e*)
Consider the following special cases of hi = h(a|Zi) = h(Zia):
The corresponding concentrated log-likelihood function for estimation is
ll*(b,a|Y,X) = -½N (1+ln(2p)-ln(N)) -½ Si=1,2,...,Nzia -½N ln(e*'e*)
where ei* = ei / exp(Zia)½ for each observation i=1,2,...,N.
si2 = s2 Zia
If a = 0, the model is homoscedastic; If a = 2, it is the case (ii).
Y = b0 + b1 X + b2 X2 + e
Find and compare the maximum likelihood estimates based on the following hypotheses of heteroscedasticity (Program and Data):
ln(Q) + q Q = b1 + b2ln(L) + b3ln(K) + e
where Q = VA/NFirm, L=Labor/NFirm, K=Capital/NFirm are per establishment data of value-added, labor, and capital, respectively. We assume that the heteroscedastic variances take the following multiplicative form:
si2 = s2
hi(a) and
hi(a) = exp(Zia)
where Zi = [ln(Li),ln(Ki)] and a = [a1,a2]'. Or, equivalently,
si2 = s2 Lia1 Kia2
Formulate and estimate the log-likelihood function with multiplicative heteroscedasticity. Does the incorporation of heteroscedastic variances improve the model estimates of the generalized Cobb-Douglas production function?
| Yi = | 1 with probability Pi |
| 0 with probability 1-Pi |
It is clear that Xi explains the probability of Yi to be 1 or 0. Let
Pi = Prob(Yi=1|Xi) = F(Xib)
1-Pi = Prob(Yi=0|Xi) = 1-F(Xib)
Since E(Yi|Xi) = (1)F(Xib) + (0)(1-F(Xib)) = F(Xib), the estimated model may be interpreted with the marginal effects defined by
¶E(Yi|Xi)/¶Xi = [¶F(Xib)/¶(Xib)] b
Given a sample of N independent observations, the likelihood function is
L(b) = Õi=1,2,...,N PiYi (1-Pi)1-Yi = Õi=1,2,...,N F(Xib)Yi (1-F(Xib))1-Yi
Then the log-likelihood function is
ll(b) = ln(L(b)) = åi=1,2,...,N (Yi lnF(Xib) + (1-Yi) ln(1-F(Xib)))
To maximize ll(b) with respect to b, we solve from the first order condition:
| ¶ll(b)/¶b | = åi=1,2,...,N (Yi/Fi-(1-Yi)/(1-Fi)) fiXi |
| = åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0 |
where Fi = F(Xib) and fi = f(Xib) = ¶Fi/¶(Xib). Note that fiXi = ¶Fi/¶b. Finally, the Hessian ¶ll2(b)/¶b¶b' must be negative definite, and the estimated variance-covariance matrix of b is Var(b) = [-E(¶ll2(b)/¶b¶b')]-1.
It is immediately that E(Yi|Xi) = Xib. In particular,
| E(ei) | = (1-Xib)Pi + (-Xib)(1-Pi) = Pi - Xib |
| Var(ei) | = E(ei2) = Pi(1-Xib)2 + (1-Pi)(-Xib)2 |
| = Pi(1-Pi)2 + (1-Pi)(-Pi)2 = (1-Pi)Pi = (1-Xib)(Xib) |
The range of Var(ei) is between 0 and 0.25 and it is clearly heteroscedastic. Furthermore, since E(Yi|Xi) = F(Xib) = Xib, a linear function, there is no guarantee that the estimated probability will lie within the unit interval.
Pi, the cumulative normal distribution, is called Probit for the i-th observation. The model Yi = F-1(Pi) + ei is called the Probit Model, where F-1(Pi) = Xib is the inverse of cumulative distribution F(Xib). The probit model can be derived from a model involving an unobserved, or latent, variable Yi* such that Yi* = Xib + ei where ei ~ normal(0,1). Suppose the value of the observed binary variable Yi depends on the sign of Yi*:
| Yi = | 1 if Yi* > 0 |
| 0 if Yi* £ 0 |
Therefore,
Pi = Prob(Yi=1|Xi) = Prob(Yi*>0|Xi)
= Prob(ei>-Xib)
= ò¥-Xib
1/(2p)½ exp(-z2/2) dz
= ò-¥Xib
1/(2p)½ exp(-z2/2) dz
For maximum likelihood estimation, we solve the following first order condition:
åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0
where Fi = F(Xib) =
ò-¥Xib
1/(2p)½ exp(-z2/2) dz, and
fi =
¶F(Xib)/¶(Xib) =
1/(2p)½
exp(-(Xib)2/2)
This is exactly the first order conditions for weighted least squares estimation of the nonlinear regression model: Yi = F(Xib) + ei with weights given by (F(Xib)(1-F(Xib))-½.
Furthermore, it can be shown that for the maximum likelihood estimates b
E([¶2ll(b)/¶b¶b']) = -åi=1,2,...,N(fi2XiXi')/(Fi(1-Fi))
which is negative definite. The estimated variance-covariance matrix of b is computed as
Var(b) = (-E[¶2ll(b)/¶b¶b'])-1
For model interpretation, the marginal effects of Xi is defined as
¶E(Yi|Xi)/¶Xi = [¶F(Xib)/¶(Xib)] b = f(Xib)b = fib
Pi as defined is the logistic curve. The model Yi = F-1(Pi) + ei is called the Logit Model. The logit model is most easily derived by assuming the logarithm of the odds is equal to Xib, or the odd ratio model: ln(Pi/(1-Pi)) = Xib Solving for Pi, we find that
Pi = exp(Xib)/(1+exp(Xib)) = 1/(1+exp(-Xib))
For maximum likelihood estimation, we solve the following first order condition:
åi=1,2,...,N (Yi-Fi)/(Fi(1-Fi)) fiXi = 0
Because of the logistic functional form,
Fi = F(Xib) =
1/(1+exp(-Xib)) and
fi = ¶F(Xib)/¶(Xib) =
exp(-Xib)/(1+exp(-Xib)) =
Fi(1-Fi)
it amounts to solve the following simple expression:
åi=1,2,...,N (Yi-Fi)Xi = 0
with the negative definite Hessian:
¶2ll(b)/¶b¶b' = - åi=1,2,...,NFi(1-Fi)Xi'Xi
Therefore, the estimate of variance-covariance matrix of b is
Var(b) = [-¶2ll(b)/¶b¶b']-1
For model interpretation, the marginal effects of Xi is defined as
¶E(Yi|Xi)/¶Xi = [¶F(Xib)/¶(Xib)] b = fib = Fi(1-Fi)b
The qualitative equation is formulated as follows:
GRADE = b0 + b1GPA + b2TUCE + b3PSI + e
Estimate and interpret the logit (Program) and probit (Program) probablity model specifications of the above equation, respectively. Explain the estimated marginal effects of new teaching method on students' grade performance.
GRADE = b0 + b1GPA + b2TUCE + b3PSI + e
where for each observation i, the heteroscedastic variance is defined by
si2 = exp(aPSIi)2
That is, when PSIi = 0, si2 = 1; when PSIi equals 1, si2 = exp(a)2, where a is the unknown parameter in addition to bs for model estimation.
Consider the M-state (regime or structure) linear regression equation:
Yt = Xtb(i) +
et(i)
et(i) ~
normal(0,s2(i))
where [Yt,Xt] is the observed sample data, t=1,2,...,N, and (i) is the state (regime) indicator, i=1,2,...,M. In particular, assuming normal distribution, the conditional probability density function of Yt occured in the regime i is written as:
f(Yt;b(i),s2(i)) = 1/Ö(2ps2(i)) exp(-½(Yt-Xtb(i))2/s2(i))
In general, the regression parameters could change several times (that is, switching backward and forward) within M possible regimes. Typically, M=2.
| Dit = | 1, if t in N(i) |
| 0 otherwise |
Since Dit is observed, given the conditional probability density function f(Yt;b(i),s2(i)) of Yt, the log-likelihood function for model estimation is:
ll(q) = åt=1,2,...,Nln åi=1,2,...,M Ditf(Yt;b(i),s2(i))
where q = (b(i),s2(i), i=1,2,...,M)' is the parameter vector of the model.
| Demand Equation: | D = fD(P,XD,bD) |
| Supply Equation: | S = fS(P,XS,bS) |
| Transaction Equation: (Minimum Condition) | Q = min(D,S) |
The simplest form of disequilibrium model has been applied to estimate the macroeconomic structure of a centrally planned economy (CPE) during periods of chronic shortage. For a market or transitional economy, a price machanism or equation may be added to describe the adjustment of disequilibrium toward the clearence of market. The simplest description of the price tatonnement adjustment is:
DP = d(D-S)
where DP denotes the change of price level, and d > 0 is the adjustment coefficient.
The Model
| D = fD(P,XD,bD) + eD |
| S = fS(P,XS,bS) + eS |
| Q = min(D,S) |
Assume both eD and eS are normally independently distributed with zero means and variances sD2, sS2 respectively. We write
g1(D) = 1/Ö(2psD)
exp[-½(D-fD(P,XD,bD))2/sD2]
g2(S) = 1/Ö(2psS)
exp[-½(S-fS(P,XS,bS))2/sS2]
The joint probabililty density of D and S (assuming that D and S are independent) is
g(D,S) = g1(D)g2(S) = 1/Ö(2psDsS) exp[-½(D-fD(P,XD,bD))2/sD2] exp[-½(S-fS(P,XS,bS))2/sS2]
The two conditional probability density functions of the observable Q when Q=D<S and Q=S<D are:
h(Q|Q=D<S) = òQ¥g(Q,S)dS
= g1(Q)òQ¥g2(S)dS
h(Q|Q=S<D) = òQ¥g(D,Q)dD
= g2(Q)òQ¥g1(D)dD
Then the unconditional probability density function of the variable Q is:
h(Q) = h(Q|Q=D<S) + h(Q|Q=S<D) = g1(Q)òQ¥g2(S)dS + g2(Q)òQ¥g1(D)dD
Given a sample of N observations of Q, P, XD, and XS, to estimate the model parameters q = (bD,bS,sD,sS) the following log-likelihood function is maximized:
ll(q) = åt=1,...,Nln h(Qt)
Example 5
This example is taken from Portes, et. al. [1987] on estimating the aggregate consumer goods market of Poland for the pre-reform period from 1959 to 1980. We shall consider only the simpliest case of discrete switching. The model is presented as follows (Program, Data):
| Demand Equation | CD = a0 + a1Y-1 + a2(Y-Y-1) + a3(Y-C)-1 |
| Supply Equation | CS = b0 + b1C* + b2Z* + b3F |
| Transaction Function (Minimum Condition) | C = min(CD,CS) |
| where | CD = Desired consumption |
| C = Actual consumption | |
| Y = Disposable income | |
| CS = Supply of consumption goods | |
| C* = Plan target for consumption | |
| Z* = (C*/NMP*)(NMP-NMP*) | |
| NMP = Net material product | |
| NMP* = Target for net material product | |
| F = Trend deviation of net financial assets of households |
| Dit = | 1, if St = i (the regime is i at time t) |
| 0 otherwise |
Dit is a random discrete variable taking value either 1 or 0, which is not directly observed. Let Dit* be the latent variable for Dit such that Dit* > 0 if Dit = 1, and Dit* £ 0 otherwise (Dit = 0). We write:
Dit* = a(i) + ut
Or, in a more general framework, Dit* = Zta(i) + ut where Zt is a set of exogenous or predertermined variables determining the regime structure.
When Dit* > 0, ut > - a(i). Assuming ut ~ normal(0,1), the probability of Dit = 1 is represented by the cumulative normal density evaluated from -¥ to a(i). Then the probability that St=i is expressed as:
| Pr(St=i;a(i)) | = Pr(Dit=1) = Pr(Dit*>0) = Pr(ut>-a(i)) |
| = ò-¥ a(i) 1/Ö(2p) exp(-½ z2)dz |
Of course, åi=1,2,...,M Pr(St=i;a(i)) = 1. Then the conditional joint probability density function of Yt and St=i is
f(Yt,St=i;b(i),s2(i),a(i))
= Pr(St=i;a(i))
f(Yt|St=i;b(i),s2(i))
= ò-¥
a(i)1/Ö(2p)
exp(-½ z2)dz
[1/Ö(2ps2(i))
exp(-½(Yt-Xtb(i))2/s2(i))]
Let q(i) = (b(i),s2(i),a(i))' and q =(q(i), i=1,2,...,M)'. Then, the unconditional probability density function of Yt is
f(Yt;q) = åi=1,2,...,M f(Yt,St=i;q(i))
The resulting log-likelihood function is:
| ll(q) | = åt=1,2,...,N ln f(Yt;q) |
| = åt=1,2,...,N ln åi=1,2,...,M f(Yt,St=i;q(i)) |
As a by-product of the maximum likelihood estimation of the parameter vector q, the estimated conditional probability for the observation t occured in regime i is computed as follows:
| Pr(St=i|Yt;q) | = f(Yt,St=i;q(i)) / f(Yt;q) |
| = f(Yt,St=i;q(i)) / åi=1,2,...,M f(Yt,St=i;q(i)) |
f(Yt|St=i;m(i),s2(i)) = 1/Ö(2ps2(i)) exp(-½(Yt-m(i))2/s2(i)) with probability Pr(St=i), i=1,2.
Let Pr(St=1)=p, then Pr(St=2) = 1-p. Here p>0 must be estimated togeter with the parameters m(i) and s2(i) for i=1,2. The log-likelihood function is:
ll(p,m(1),m(2),s2(1),s2(2)) = åt=1,2,...,N ln [p f(Yt|St=1;m(1),s2(1)) + (1-p) f(Yt|St=2;m(2),s2(2))]
Finally, the conditional probability of INCOME variable Yt drawn from regime 1 is
p f(Yt|St=1;m(1),s2(1)) / [p f(Yt|St=1;m(1),s2(1)) + (1-p) f(Yt|St=2;m(2),s2(2))]
Pr(St=i|St-1=j) = pij > 0
åi=1,2,...,M pij = 1
Then the joint probability of St=i and St-1=j is defined by
| Pr(St=i,St-1=j) | = Pr(St=i|St-1=j)Pr(St-1=j) |
| = pijPr(St-1=j) |
For a two-state Markov switching process (M=2), we write:
| p = p11 = Pr(St=1|St-1=1) | 1-p = p21 = Pr(St=2|St-1=1) |
| q = p22 = Pr(St=2|St-1=2) | 1-q = p12 = Pr(St=1|St-1=2) |
Let Pr(St=1) = Pr(St*£0) and Pr(St=2) = Pr(St*>0), where St* is a latent variable for the discrete state indicator St defined by:
St* = a0 + a1 (St-1-1) + ut, ut ~ normal(0,1)
Therefore,
| p = Pr(St=1|St-1=1) | = Pr(St*£0|St-1=1) = Pr(ut£-a0) |
| = ò-¥ -a01/Ö(2p) exp(-½ z2)dz | |
| q = Pr(St=2|St-1=2) | = Pr(St*>0) = Pr(ut>-a0-a1) |
| = ò-a0-a1 ¥1/Ö(2p) exp(-½ z2)dz | |
| = 1 - ò-¥ -a0-a11/Ö(2p) exp(-½ z2)dz |
It clear that through probability transformation, 0 < p < 1 and 0 < q < 1. An alternative approach to ensure that 0 < p < 1 and 0 < q < 1 is to use logistic transformation as follows:
p = exp(p0)/(1+exp(p0))
q = exp(q0)/(1+exp(q0))
where p0 and q0 are unrestricted real numbers.
We are interested in a dynamic model of Yt resulting from a combination of M states amd L lags. Then there are r cases or events (r = ML+1) must be considered: (St,St-1,...,St-r+1). For simplicity, we consider a 2-state 1-lag dynamic model. There are four cases: (St=1,St-1=1), (St=2,St-1=1), (St=1,St-1=2), (St=2,St-1=2).
Let Ht-1 = {Yt-1,Yt-2,...} be the historical information up to time t-1. Suppose the conditional probability density function of Yt (conditional to St=i, St-1=j and Ht-1, ignoring the parameter vector q for the moment) is f(Yt|St=i,St-1=j,Ht-1). Then the conditional joint probability density function of Yt, St, and St-1 is:
f(Yt,St=i,St-1=j|Ht-1) = Pr(St=i,St-1=j|Ht-1) f(Yt|St=i,St-1=j,Ht-1)
and
f(Yt|Ht-1;q) = åi=1,2åj=1,2 f(Yt,St=i,St-1=j|Ht-1)
Finally, the conditional log-likelihood function is:
ll(q) = åt=1,2,...,N ln f(Yt|Ht-1;q)
| (1) | Pr(St=i,St-1=j|Ht-1) = Pr(St=i|St-1=j) Pr(St-1=j|Ht-1) |
Starting t=1, let Pr(S0=1|H0) = (1-p)/(2-p-q) and Pr(S0=2|H0) = (1-q)/(2-p-q). These are the steady-state or unconditional probability of St=1 and St=2, respectively.
From the computed Pr(St=i,St-1=j|Ht-1) and given f(Yt|St=i,St-1=j,Ht-1), compute the joint pdfs f(Yt,St=i,St-1=j|Ht-1) for all combination of i,j=1,2, and therefore the conditional likelihood at t is:
| (2) | f(Yt|Ht-1) | = åi=1,2åj=1,2 f(Yt,St=i,St-1=j|Ht-1) |
| = åi=1,2åj=1,2 Pr(St=i,St-1=j|Ht-1) f(Yt|St=i,St-1=j,Ht-1) |
| By definition, Pr(St=i,St-1=j|Ht) | = Pr(St=i,St-1=j|Yt,Ht-1) |
| = f(Yt,St=i,St-1=j|Ht-1) / f(Yt|Ht-1) | |
| = f(Yt|St=i,St-1=j,Ht-1)Pr(St=i,St-1=j|Ht-1) / f(Yt|Ht-1) | |
| Pr(St=i|Ht) | = åj=1,2 Pr(St=i,St-1=j|Ht) |
To iterate from t=1 to N, use Pr(St=i|Ht) to compute Pr(St+1=k,St=i|Ht) as (1) and the conditional likelihood f(Yt+1|Ht) as (2). Finally, we sum all the log of component log-lieklihoods for maximization with repect to the parameter vector.
The above algorithm may be generalized straightforwardly to consider M cases (M>2) and L lags (L>1), in which the probabilities and pdfs are evaluated over a large number of ML+1 cases.
For parameter estimation, we apply standard numercial maximization of the log-likelihood function for the entire sample. The alternative method is the EM (Expectation-Maximization) Algorithm.
To make inference about the state variable St, we can use filtered probability: Pr(St=i|Ht), or smoothed probability: Pr(St=i|HN). The later refers to the probability of St=i conditional on all the information in the sample. The smoothed probability is obtained by updating the filtered probability using all information.
Yt = 100*(ln(GDPt)-ln(GDPt-1))
Then 2-state ("boom" and "burst", or expansion and recession) growth in real GDP is modeled as an AR(4) process:
Yt = m(i) +
et, i=1,2
et =
r1et-1 +
r2et-2 +
r3et-3 +
r4et-4 + ut
where ut ~ normal(0,s2). By assuming the first-order Markov switching process, the transition probabilities between two consecutive states St and St-1 are Pr(St=1|St-1=1)=p and Pr(St=2|St-1=2)=q. By definition, Pr(St=1|St-1=2)=1-q and Pr(St=2|St-1=1)=1-p.
The model can be represented equivalently as:
(Yt-m(i)) = r1(Yt-1-m(j)) + r2(Yt-2-m(k)) + r3(Yt-3-m(l)) + r4(Yt-4-m(m)) + ut
with i,j,k,l,m = 1,2. In total, there are 32 (=24+1) cases of probabilities and pdfs forming the likelihood function for maximization with respect to the parameter vector. In addition to the model estimation, both filtered and sommothed probabilities are computed for inference about the state variable St: Pr(St|Ht) and Pr(St|HN) for each observation t. (See Program [draft version])
(Yt-m(i)) =
r1(Yt-1-m(j)) +
r2(Yt-2-m(k)) + ut
ut ~ normal(0,s2(i))
with i,j,k = 1,2,3, where Yt is the ex-post real interest rate calculated by substracting the CPI-based inflation rate from the nominal interest rate (three-month treasury bill rate). See Data.
Transition probability for the 3-state 1st-order Markov process is defined by pij = Pr(St=i|St-1=j) > 0, where åi=1,2,3 pij = 1 for j=1,2,3.
In total, there are 27(=32+1) cases of probabilities and density functions forming the likelihood function for maximization with respect to 14 parameters in the model:
m(1), m(2), m(3)
r1, r2
s2(1), s2(2), s2(3)
p11, p12, p13, p21, p22, p23