or
r(B)Yt = d + q(B)et,
where
r(B) = 1 - r1B -
r2B2 - ... -
rpBp,
q(B) = 1 - q1B -
q2B2 - ... -
qqBq,
and
et ~ ii(0,s2), t = 1,2,...,N
The general model ARMA(p,q) may be written as a linear stochastic process:
Yt = m + et + y1et-1 + y2et-2 + ... (y0 = 1)
Stationarity requirement for the process:
| Mean | m = E(Yt) < ¥ |
| Variance | g0 = s2åi=0,...,¥yi2 < ¥ |
| Autocovariance | gj = s2åi=0,...,¥yiyj+i < ¥ |
| Autocorrelation | fj = gj / g0 |
Bartlett Test
Testing H0: fj = 0 for each j>q based on the Bartlett distribution of the estimated fj. That is,
the estimated fj ~ normal(0,ÖVar(fj)) approximately, where
Var(fj) = 1/N
(1 + 2 åj=1,...,qfj2).
Box-Pierece Test and Ljung-Box Test
Testing H0: f1 =... = fk =0 based on Box-Pierece Q or Ljung-Box Q' Statistic defined as follows:
Q = N åj=1,...,kfj2
Q' = N(N+2) åj=1,...,kfj2/(N-j)
Both Q and Q' ~ Chi-Square(k).
Let fjj be the partial autocorrelation coefficient at the j-th lag. Testing H0: fjj = 0 for each j>p based on the approximated distribution for the estimated fjj ~ normal(0,1/ÖN).
In order to use all N data observations, initialization may be needed for the following:
The model may be written in the "inverted" form as
q(B)-1(-d+r(B)Yt) = et
where et ~ ii(0,s2). Conditional to the historical information (YN, ..., Y1), and data initialization (Y0, ..., Y-p+1), (e0, ..., e-q+1), the sum-of-squares is defined by
S = åt=1,2,...,Net2
Assuming et ~ nii(0,s2) for each observation i, the concentrated log-likelihood function is
ll = -N/2 (1+ln(2p)-ln(N)+ln(S))
The conditional maximum likelihood estimators of rs, qs, and d are obtained by maximizing the function ll.
Further identification on the estimated residuals with the degrees of freedom N - (p+q+1).
Yt = g0 + g1 Yt-1 + g2 Yt-2 + ut
The alternative is to express the model (mean-deviation) residuals in terms of AR(2) structure:
Yt = m + et
et =
r1 et-1 +
r2 et-2 + ut
where ut ~ nii(0,s2).
We investigate the data series Yt for autocorrelations and partial autocorrelations. For univariate analysis, the ACF and PACF of the time series is the same as those of residuals obtained from the mean-deviation regression. Up to the maximum number of lags of ACF and PACF specified, Box-Pierece and Ljung-Box test statistics will be useful for identifying the proper order of AR(p), MA(q), or ARMA(p,q) process. In addition, Breusch-Pagan test may be used to verify the higher order of autocorrelation (Program).
or
r(B)Yt = Xtb + q(B)et,
where Xt may include distributed lags as well, and et ~ nii(0,s2).
Model analysis including model identification, estimation, and forecasting is the same as univariate ARMA analysis. Regression parameters b and ARMA parameters rs and qs must be simultaneously estimated through iterations of nonlinear functional (sum-of-squares or log-likelihood) optimization. For statistical reference, the degrees of freedom may be adjusted.
or
Yt = Xtb + r(B)-1q(B)ut
where ut ~ nii(0,s2).
We assume |r| < 1 for model stability. It is clear that
s2 = Var(ut) = (1-r2) Var(et).
Denote the variable transformations Yt* = Yt - r Yt-1 and Xt* = Xt - r Xt-1. Since u1 = (1-r2)½ e1, the otherwise lost first observation is kept with the transformations Y1* = (1-r2)½Y1 and X1* = (1-r2)½X1.
Thus model for estimation is
ut = Yt* - Xt*b
with the following Jacobian transformation from ut to Yt (depending on r only):
| Jt(r) = |¶ut / ¶Yt| = | (1-r2)½| for t=1 | |
| 1 | for t>1 |
Therefore, the (exact) concentrated log-likelihood function is:
ll*(b,r|Y,X) = -½N (1+ln(2p)-ln(N)) +½ ln(1-r2) -½N ln(u'u)
Extension: AR(2)
The model is defined as et = r1et-1 + r2et-2 + ut with the following proper data transformation (Z is referenced as either X or Y below):
Again, we assume |q| < 1 for model stability. The model is
ut = Yt - Xtb - qut-1
Notice that the one-period lag of error terms, ut-1, is used to define the model error ut. A recursive calculation is needed with proper initialization of u0. For example, set the initial value u0 = E(ut) = 0 (or alternatively the sample mean of ut), then u1 = Y1-X1b and ut = Yt-Xtb + ut-1 for t=2,...,N.
Since each log-jacobian terms vanish in this case, the (conditional) concentrated log-likelihood function is simply
ll*(b,q|Y,X) = -½N (1+ln(2p)-ln(N)) -½N ln(u'u)
This is the mixed process of AR(1) and MA(1). Using the variable transformations as of AR(1) and data initialization as of MA(1), the model is written as
ut = Yt* - Xt*b - q ut-1
and the (conditional) concentrated log-likelihood function for parameter estimation is
ll*(b,r,q|Y,X) = -½N (1+ln(2p)-ln(N)) +½ ln(1-r2) -½N ln(u'u)
This example demonstrates the nonlinear maximum likelihood estimation for three basic autocorrelated regression models: AR(1), MA(1), and ARMA(1,1). Based on the U. S. investment data from Greene's Table 13.1, formulate and estimate the three models of autocorrelation for a linear real investment relationship with real GNP and real interest rate (Program and Data):
Invest = b0 + b1 Rate + b2 GNP + e
Consider the time series regression model:
Yt = Xtb + et
At time t, conditional to the available historical information Ht, we assume that the error structure follows a normal distribution:
et|Ht ~ n(0,s2t)
| where s2t | = a0 + d1s2t-1 + ... + dps2t-p + a1e2t-1 + ... + aqe2t-q |
| = a0 + Si=1,2,...pdis2t-i + Sj=1,2,...qaje2t-j |
Let ut = e2t-s2t, ai = 0 for i > q, dj = 0 for j > p, and m = max(p,q), the above GARCH(p,q) process may be conveniently re-written as an ARMA(m,p) model for e2t. That is,
e2t = a0 + Si=1,2,...m (ai+di)e2t-i - Sj=1,2,...pdjut-j + ut
This is the general specification of autoregressive conditional heteroscedasticity, or GARCH(p,q), according to Bollerslev [1986]. If p = 0, then it is the GARCH(0,q) or simply ARCH(q) process:
s2t = a0 + Sj=1,2,...qaje2t-j
ARCH(1) Process
The simplest case is q = 1, or ARCH(1), originated in Engle [1982] as follows:
s2t = a0 + a1e2t-1
ARCH(1) model can be summarized as follows:
Yt = Xtb + et
et = ut(a0 + a1e2t-1)½
where ut ~ nii(0,1)
Then, the conditional means E(et|et-1) = 0 and the conditional variances s2t = E(e2t|et-1) = a0 + a1e2t-1
Note that the unconditional variance of et is
E(e2t) = E(E(e2t|et-1)) = a0 + a1E(e2t-1).
If s2 = E(e2t) = E(e2t-1), then s2 = a0/(1-a1) provided that |a1| < 1. Therefore, the model may be free of general heteroscedasticity although the conditional heteroscedasticity is assumed.
The ARCH(1) process can be generalized (therefore the name Generalized AutoRegressive Conditional Heteroscedasticity) to:
GARCH(1,1) Process
s2t = a0 + a1 e2t-1 + d1 s2t-1
This resembles the mixed autoregressive moving-average process ARMA(1,1) as described in autocorrelation. Presample variances and squared error terms can be initialized with St=1,2,...,N e2t/N. The following parameter restrictions are necessary to preserve stationaity of the error process:
Another extension is ARCH or GARCH in mean (ARCH-M or GARCH-M model) which adds the heteroscedastic variance term directly into the regression equation (assuming linear model):
ARCH-M(1) or GARCH-M(1,1) Model
et = Yt - Xtb - gs2t
s2t = a0 + a1 e2t-1 (or s2t = a0 + a1 e2t-1 + d1 s2t-1)
The last variance term of the regression may be expressed in log form or in standard error st. For example, Yt = Xtb + gln(s2t) + et. Moreover, constraints on the parameters in the conditional variance equation may be required to ensure the positivity of variances: a0 > 0, 0 £ a1 < 1 (or a1 + d1 < 1, d1 ³ 0).
ll = -½N ln(2p) - ½ åt=1,2,...,Nln(s2t) - ½ åt=1,2,...,N(e2t / s2t)
with the general conditional heteroscedastic variance GARCH(p,q) process:
s2t = a0 + a1e2t-1 + a2e2t-2 + ... + aqe2t-q + d1s2t-1 + d2s2t-2 + ... + dps2t-p
The parameter vector (a, d) is estimated together with the regression parameters (e.g., et = Yt - Xtb) by maximizing the log-likelihood, conditional to the starting values e02, e2-1, ..., e2-q, s20, s2-1, ..., s2-p and satisfying the nonnegativity requirement for the estimated variances: s2t > 0, t=1,2,...,N.
We note that the presample series: e02, e2-1, ..., e2-q, s20, s2-1, ..., s2-p may be initialized by the estimated (homoschedastic) unconditional variance:
1 - (åi=1,2,...,qai + åj=1,2,...,pdj)
or by the estimated sample variance of residuals:
åt=1,2,...,Ne2t/N,
The model of interest is
Yt = 100 [ln(Pt - ln(Pt-1)] = m + et
where Pt is the bilateral spot Deutschemark-British pound exchange rate. Thus Yt is the daily percentage nominal returns of BM/BP exchange. Test, identify, and estimate the appropriate GARCH(p,q) variance structure (Program).
dPt = 100 [ln(Pt) - ln(Pt-1)]
is believed to be effected by the previous excess monetary growth (faster than the growth of real output) and by the external shocks. Excess monetary growth is defined as dM - dY, where
dMt = 100 [ln(M1t) - ln(M1t-1)]
dYt = 100 [ln(GNPt) - ln(GNPt-1)]
The basic model is represented by the following:
dPt = b0 + b1(dMt-dYt) + et
In addition, the lagged values of the inflation rate (or the disturbance) will carry the effects of external shocks to the economy.
The data file USINF.TXT consistes of 136 quarterly observations (from 1950 Q1 to 1984 Q4) of data series for price (implicit deflator for GNP) Pt, money stock M1t, and output (GNP) Yt. Identify and estimate the best model of U. S. inflation rate in which serial correlations may exist in the means or in the variances or in both (see Greene [1999], Example 18.11) (Program).
Yt = Xtb + et,
et = stut
st2 = a0 +
a1et-12 +
d1st-12
Student t-Distribution (Bollerslev, 1987)
ut ~ t(d), d > 2 is the degree of freedom of the underlying Student t distribution. The p.d.f of Student t distribution (normalized with zero mean and unit variance) is written as:
| f(ut) = |
|
| 1 + |
|
Therefore,
| f(et) = f(Yt) = |
|
| 1 + |
|
The component log-likelihood function for each observation is:
| llt = | ln(G((d+1)/2)) - ln(G(d/2)) - ½ln(p) - ½ln(d-2) |
| - ½((d+1)/2)ln[1+(Yt-Xtb)2/((d-2)st)2] - ½ln(st2) |
Generalized Exponential Distribution (GED) (Nelson, 1991)
ut ~ GED(v) with zero mean and unit variance, v is the thickness of tails for the underlying GED. If v > 2 the distribution has thinner tails than normal. If v < 2 the distribution has thicker tails than normal.
The p.d.f of GED(v) is written as:
| f(ut) = |
|
|
| where l = |
|
Therefore,
| f(et) = f(Yt) = |
|
|
The component log-likelihood function for each observation is:
| llt = | ln(v/l) - (1+1/v)ln(2) - ln(G(1/v)) |
| - ½|(Yt-Xtb)/(lst)|v - ½ln(st2) |
Yt = Xtb + et
et = stut
GJR Specification (Glosten-Jagannathan-Runkle, 1993)
st2 = a0 + a1et-12 + d1st-12 + g1(et-12Dt-1)
| where Dt-1 = | 1 if et-1 > 0 |
| 0 otherwise |
The parameter g1 < 0 is sometimes referred as the Leverage Effect. The non-negativity of st2 is satisfied provided that a0 > 0, d1 ³ 0 a1+g1 ³ 0.
The asymmetric consquences of positive and negative innovations in the GARCH models can be studied based on various distributional assumptions (e.g., normal, t, GED) as described above.
EGARCH Specification (Nelson, 1991)
ln(st2) = a0 + d1ln(st-12) + a1[g1ut-1 + |ut-1| - E(ut-1)]
where ut = et/st is independently distributed with zero mean and unit variance. The parameter of ut-1, or a1g1 < 0, is interpreted as the Leverage Effect. The advantage of the Nelson's specification of the variance equation is that log of st2 is used, then the estimated st2 is positive no matter what is the sign of the estimated parameters.
Nelson's EGARCH(1,1) model assumes ut ~ GED(v) in which E(ut) = 0 and Var(ut) = 1. Furthermore,
| E(|ut|) = |
| -> (2/p)1/2 as v -> 2 (normal distribution) |
| We note that l = |
|
Yt = Htbt + at + ut
where Ht is an nxk matrix and at is an nx1 vector, which may be either data on exogenous variables or constant paramters. That is, given the exogenous or predetermined observed variables Xt, we may define Ht = H(Xt) and at = a(Xt).
We assume ut ~ nii(0nx1,Rnxn). Note that the covariance matrix R may also depend on Xt.
bt = ct + Ftbt-1 + vt
where Ft is an kxk matrix and ct is an kx1 vector.
We assume vt ~ nii(0kx1,Qkxk) and Cov(ut,vs) = E(utvs') = 0nxk. Note that ct = c(Xt), Ft = F(Xt), and the covariance matrix Q may depend on Xt.
Conditional to the information available at time t-1, the expected value of bt is Et-1(bt) = ct + FtEt-1(bt-1). Similarly, the conditional covariance is Vart-1(bt) = FtVart-1(bt-1)Ft' + Q. For notational convenience, let bt|t-1 = Et-1(bt) and Wt|t-1 = Vart-1(bt). Then,
bt|t-1 = ct + Ftbt-1|t-1
Wt|t-1 = FtWt-1|t-1Ft' + Q
Combining the measurement and transition equations, we have
Yt = (HtFt)bt-1 + (Htct+at) + (Htvt+ut)
Given the information at time t-1, the conditional expectation and covariance of Yt are:
Yt|t-1 = Et-1(Yt) =
Htbt|t-1 + at
St|t-1 = Vart-1(Yt) =
HtWt|t-1Ht' + R
Since Yt is distributed according to normal(Yt|t-1,St|t-1), the log-likelihood is evaluated as:
llt = - ½ ln(2pSt|t-1) - ½ (Yt-Yt|t-1)'St|t-1-1(Yt-Yt|t-1)
bt|t-1 = ct + Ftbt-1|t-1
Wt|t-1 = FtWt-1|t-1Ft' + Q
Define the prediction error et|t-1 = Yt - Yt|t-1. Then
et|t-1 = Yt -
Htbt|t-1 - at
St|t-1 =
HtWt|t-1Ht' + R
Then the log-likelihood is defined by
llt = - ½ ln(2pSt|t-1) - ½ et|t-1'St|t-1-1et|t-1
bt|t = bt|t-1 +
Ktet|t-1
Wt|t = Wt|t-1 -
KtHtWt|t-1
where Kt = Wt|t-1Ht'St|t-1-1 is the Kalman gain.
b0|0 = (I-F)-1c
vec(W0|0) = (I-FÄF)-1vec(Q)
If bt is nonstationary, then we can use a wild guess of b0|0 (e.g. zeros vector) with large diagonal elements in the covariance matrix W0|0. In this case, the evaluation of log-likelihood and inference should not include the first few observations of the guess values.
As a by product of maximum likelihood estimation, we obtain the estimated (updated) parameter vector and the corresponding covariance matrix at time t: bt|t and Wt|t, for t=1,...,T. For a better inference, the smoothed parameter vector and the corresponding covariance matrix based on all information in the sample are:
bt|T = bt|t +
K*t+1(bt+1|T-ct+1-Ft+1bt|t)
Wt|T = Wt|t +
K*t+1(Wt+1|T-Wt+1|t)K*t+1'
where K*t+1 = Wt|tFt+1'Wt+1|t-1. The smoothing is performed from t=T-1 down to t=1 with the initial values bT|T and WT|T obtained from the last iteration of the basic filter.
Yt = d +
r1Yt-1 + ... +
rpYt-p + et
et ~ nii(0,s2)
| Yt = [1 0 ... 0] |
|
where a = 0, ut = 0, and R = 0
| = |
|
| + |
| + |
|
| where Q = |
|
Yt = m + et -
q1et-1 - ... -
qqet-q
et ~ nii(0,s2)
| Yt = [1 -q1 ... -qq] |
| + m |
where ut = 0, and R = 0
| = |
|
| + |
| + |
|
| where Q = |
|
Yt = Xtbt + et
et ~ nii(0,s2)
where Ht = Xt, a = 0, ut = et, R = s2.
where F, c and Q may be defined according to a model specification.
The State-Space Model Representation
DMt = b0t + b1tDRt-1 + b2tDPt-1 + b3tSURPt-1 + b4tDMt-1 + ut
ut ~ nii(0,s2)
bit = bit-1 + vit
vit ~ nii(0,si2) i = 0,1,...,4.
Data Description (Data)
DM = Quarterly M1 growth rate
DR = Change in 3-month T-bill interest rate
DP = Inflation rate as measured by the CPI
SURP = Detrended full employment budget surplus
Fixed Parameters
s2, s02, s12, s22, s32, s42.
Time-Varying Parameters
b0t, b1t, b2t, b3t, b4t.
(Program)