Time Series Analysis III

Advanced Topics

Table of Contents

Fractional Difference of Time Series

ARMA Analysis for Regression Residuals

Auto-Regressive Conditional Heteroscedasticity

Multi-Equation Time Series Models

State-Space Models

Readings


Fractional Difference of Time Series

For a nonstationary time series, a properly integrated (differenced) series is required for ARMA analysis. By viewing the plot of autocorrelation and partial autocorrelation functions, it may be evident that they decay hyperbolically instead of the exponential damping. To improve the model performance, fractional difference is useful for a time series exhibited long memory process. Consider Yt is obtained by partially differencing the time series Xt:

Yt = (1-B)dXt, -1 < d < 1

where B is the backshift operator, and
(1-B)d = åj=0,1,...,¥
ædö
èjø
(-B)j

We write Xt = (1-B)-dYt. If d > 0, it is indicative of long memory, and d < 0 a short memory process. When d = 0, the process is memoryless. It can be shown that Xt is stationary if -1/2 < d < 1/2. For d ³ 1/2, Xt is nonstationary!

Example 1

Based on the U. S. investment data from Greene's Table 13.1, estimate and test the fractional difference in the error term for a linear real investment relationship with real GNP and real interest rate:

Investt = b0 + b1 Ratet + b2 GNPt + (1-B)det

where B is the backshift operator and -1 < d < 1 is the fractional difference (Program and Data).


ARMA Analysis for Regression Residuals

Yt = Xtb + et
et = r1et-1 + r2et-2 + ... + rpet-p - q1ut-1 - q2ut-2 - ... - qqut-q + ut

or

Yt = Xtb + r(B)-1q(B)ut

where ut ~ nii(0,s2).

AR(1) Process

et = r et-1 + ut

We assume |r| < 1 for model stability. It is clear that

s2 = Var(ut) = (1-r2) Var(et).

Denote the variable transformations Yt* = Yt - r Yt-1 and Xt* = Xt - r Xt-1. Since u1 = (1-r2)½ e1, the otherwise lost first observation is kept with the transformations Y1* = (1-r2)½Y1 and X1* = (1-r2)½X1.

Thus model for estimation is

ut = Yt* - Xt*b

with the following Jacobian transformation from ut to Yt (depending on r only):

Jt(r) = |ut / Yt| = (1-r2)½ for t=1
1 for t>1

Therefore, the (exact) concentrated log-likelihood function is:

ll*(b,r|Y,X) = -½N (1+ln(2p)-ln(N)) +½ ln(1-r2) -½N ln(u'u)

Extension: AR(2)

The model is defined as et = r1et-1 + r2et-2 + ut with the following proper data transformation (Z is referenced as either X or Y below):

Pre-Sample Data Initialization

The alternative pre-sample data initialization may be used to transform the time series:

The resulting maximum likelihood estimation is conditional to the pre-sample data initialization.

MA(1) Process

et = ut - qut-1

Again, we assume |q| < 1 for model stability. The model is

ut = Yt - Xtb + qut-1

Notice that the one-period lag of error terms, ut-1, is used to define the model error ut. A recursive calculation is needed with proper initialization of u0. For example, set the initial value u0 = E(ut) = 0 (or alternatively the sample mean of ut), then u1 = Y1-X1b and ut = Yt-Xtb + qut-1 for t=2,...,N.

Since each log-jacobian term vanishes in this case, the (conditional) concentrated log-likelihood function is simply

ll*(b,q|Y,X) = -½N (1+ln(2p)-ln(N)) -½N ln(u'u)

ARMA(1,1) Process

et = r et-1 + ut - q ut-1

This is the mixed process of AR(1) and MA(1). Using the variable transformations as of AR(1) and data initialization as of MA(1), the model is written as

ut = Yt* - Xt*b + q ut-1

and the (conditional) concentrated log-likelihood function for parameter estimation is

ll*(b,r,q|Y,X) = -½N (1+ln(2p)-ln(N)) +½ ln(1-r2) -½N ln(u'u)

Example 2

This example demonstrates the nonlinear maximum likelihood estimation for three basic autocorrelated regression models: AR(1), MA(1), and ARMA(1,1). Based on the U. S. investment data from Greene's Table 13.1, formulate and estimate the three models of autocorrelation for a linear real investment relationship with real GNP and real interest rate (Program and Data):

Invest = b0 + b1 Rate + b2 GNP + e


Auto-Regressive Conditional Heteroscedasticity

In many financial and monetary economic applications, serial correlations over time are characterized not only in the means but also in the variances. The latter is the so-called Auto-Regressive Conditional Heteroscedasticy or ARCH models. It is possible that the variance is unconditionally homogenous.

The Model

Consider the time series regression model:

Yt = Xtb + et

At time t, conditional to the available historical information Ht, we assume that the error structure follows a normal distribution:

et|Ht ~ n(0,s2t)

where s2t = a0 + d1s2t-1 + ... + dps2t-p + a1e2t-1 + ... + aqe2t-q
= a0 + Si=1,2,...pdis2t-i + Sj=1,2,...qaje2t-j

Let ut = e2t-s2t, ai = 0 for i > q, dj = 0 for j > p, and m = max(p,q), the above GARCH(p,q) process may be conveniently re-written as an ARMA(m,p) model for e2t. That is,

e2t = a0 + Si=1,2,...m (ai+di)e2t-i - Sj=1,2,...pdjut-j + ut

This is the general specification of auto-regressive conditional heteroscedasticity, or GARCH(p,q), according to Bollerslev [1986]. If p = 0, then it is the GARCH(0,q) or simply ARCH(q) process:

s2t = a0 + Sj=1,2,...qaje2t-j

ARCH(1) Process

The simplest case is q = 1, or ARCH(1), originated in Engle [1982] as follows:

s2t = a0 + a1e2t-1

ARCH(1) model can be summarized as follows:

Yt = Xtb + et
et = ut(a0 + a1e2t-1)½   where ut ~ nii(0,1)

Then, the conditional means E(et|et-1) = 0 and the conditional variances s2t = E(e2t|et-1) = a0 + a1e2t-1

Note that the unconditional variance of et is

E(e2t) = E(E(e2t|et-1)) = a0 + a1E(e2t-1).

If s2 = E(e2t) = E(e2t-1), then s2 = a0/(1-a1) provided that |a1| < 1. Therefore, the model may be free of general heteroscedasticity although the conditional heteroscedasticity is assumed.

The ARCH(1) process can be generalized (therefore the name Generalized Auto-Regressive Conditional Heteroscedasticity) to:

GARCH(1,1) Process

s2t = a0 + a1 e2t-1 + d1 s2t-1

This resembles the mixed auto-regressive moving-average process ARMA(1,1) as described in autocorrelation. Presample variances and squared error terms can be initialized with St=1,2,...,N e2t/N. The following parameter restrictions are necessary to preserve stationarity of the error process:

Another extension is ARCH or GARCH in mean (ARCH-M or GARCH-M model) which adds the heteroscedastic variance term directly into the regression equation (assuming linear model):

ARCH-M(1) or GARCH-M(1,1) Model

et = Yt - Xtb - gs2t

s2t = a0 + a1 e2t-1 (or s2t = a0 + a1 e2t-1 + d1 s2t-1)

The last variance term of the regression may be expressed in log form or in standard error st. For example, Yt = Xtb + gln(s2t) + et. Moreover, constraints on the parameters in the conditional variance equation may be required to ensure the positivity of variances: a0 > 0, 0 £ a1 < 1 (or a1 + d1 < 1, d1 ³ 0).

Model Identification for ARCH and GARCH Processes

Model Estimation

Recall the normal log-likelihood of a heteroscedastic regression model:

ll = -½N ln(2p) - ½ åt=1,2,...,Nln(s2t) - ½ åt=1,2,...,N(e2t / s2t)

with the general conditional heteroscedastic variance GARCH(p,q) process:

s2t = a0 + a1e2t-1 + a2e2t-2 + ... + aqe2t-q + d1s2t-1 + d2s2t-2 + ... + dps2t-p

The parameter vector (a, d) is estimated together with the regression parameters (e.g., et = Yt - Xtb) by maximizing the log-likelihood, conditional to the starting values e02, e2-1, ..., e2-q, s20, s2-1, ..., s2-p and satisfying the nonnegativity requirement for the estimated variances: s2t > 0, t=1,2,...,N.

We note that the presample series: e02, e2-1, ..., e2-q, s20, s2-1, ..., s2-p may be initialized by the estimated (homoschedastic) unconditional variance:

1 / [1 - (åi=1,2,...,qai + åj=1,2,...,pdj)]

or by the estimated sample variance of residuals:

åt=1,2,...,Ne2t/N,

Example 3

This example investigates the "long-run volatility" persistence of Deutschemark-British pound exchange rate (Bollerslev and Ghysels [1986]). Data of daily exchange rates from January 3, 1984 to December 31, 1991 (1974 observations) are used (see DMBP.TXT).

The model of interest is

Yt = 100 [ln(Pt - ln(Pt-1)] = m + et

where Pt is the bilateral spot Deutschemark-British pound exchange rate. Thus Yt is the daily percentage nominal returns of BM/BP exchange. Test, identify, and estimate the appropriate GARCH(p,q) variance structure (Program).

Example 4

U. S. inflation measured as the quarterly rate of change in the log of the price:

dPt = 100 [ln(Pt) - ln(Pt-1)]

is believed to be effected by the previous excess monetary growth (faster than the growth of real output) and by the external shocks. Excess monetary growth is defined as dM - dY, where

dMt = 100 [ln(M1t) - ln(M1t-1)]
dYt = 100 [ln(GNPt) - ln(GNPt-1)]

The basic model is represented by the following:

dPt = b0 + b1(dMt-1-dYt-1) + et

In addition, the lagged values of the inflation rate (or the disturbance) will carry the effects of external shocks to the economy.

The data file USINF.TXT consists of 136 quarterly observations (from 1950 Q1 to 1984 Q4) of data series for price (implicit deflator for GNP) Pt, money stock M1t, and output (GNP) Yt. Identify and estimate the best model of U. S. inflation rate in which serial correlations may exist in the means or in the variances or in both (see Greene [1999], Example 18.11) (Program).

Example 5

To demonstrate the ARCH innovation process for the U. S. inflation rate defined in the above Example 4, dPt may be specified with a a combination of distributed lags, ARMA, and GARCH models (see Greene [1999], Example 18.12) (Program).

GARCH(1,1) Models Based on Non-Normal Distributions

Consider the standard GARCH(1,1) model represented by:

Yt = Xtb + et, et = stut
st2 = a0 + a1et-12 + d1st-12

Generalized Exponential Distribution (GED) (Nelson, 1991)

ut ~ GED(v) with zero mean and unit variance, v is the thickness of tails for the underlying GED. If v > 2 the distribution has thinner tails than normal. If v < 2 the distribution has thicker tails than normal.

The p.d.f of GED(v) is written as:

f(ut) =
v

l
exp[-½|ut/l|v]

2(1+1/v)G(1/v)

where l =
æ  G(1/2)ö1/2
ç
÷
è22/v G(3/v)ø

Therefore,

f(et) = f(Yt) =
v

l
exp[-½|(Yt-Xtb)/(lst)|v]

    2(1+1/v)G(1/v)st

The component log-likelihood function for each observation is:

llt = ln(v/l) - (1+1/v)ln(2) - ln(G(1/v))
- ½|(Yt-Xtb)/(lst)|v - ½ln(st2)

Student t-Distribution (Bollerslev, 1987)

ut ~ t(d), d > 2 is the degree of freedom of the underlying Student t distribution. The p.d.f of Student t distribution (normalized with zero mean and unit variance) is written as:

f(ut) = C
ì
í
î
1 +
ut2ü-(d+1)/2

ý
d-2þ

Therefore,

f(et) = f(Yt) =
C

st
ì
í
î
1 +
1

d-2
æ
ç
è
Yt-Xtb

st
ö2
÷
ø
ü-(d+1)/2
ý
þ

where

C =
G((d+1)/2)

G(d/2)[(d-2)p]1/2

The component log-likelihood function for each observation is:

llt = ln(C) - ½ln(st2) - ½(d+1)ln[1+(1/(d-2))((Yt-Xtb)/st))2]

where ln(C) = ln(G((d+1)/2)) - ln(G(d/2)) - ½ln(d-2) - ½ln(p)

Skewed Student t-Distribution (Hansen, 1994)

ut ~ t(d,s), d > 2 is the degree of freedom and -1< s < 1 is the skewedness of the underlying Skewed Student t-distribution. It specializes to the Student t-distribution by setting s = 0.

The p.d.f of Skewed Student t distribution (normalized with zero mean and unit variance) is written as:

f(ut) =
BC
æ
ç 1 +
è
1æ

ç
d-2è
A+Butö2

÷
1+s-2s*Itø
ö-(d+1)/2
÷
ø

where

It = 1 if A+But < 0
0 if A+But ³ 0

or, equivalently

vt = 1+s-2sIt = 1-s if A+But < 0
1+s if A+But ³ 0

Therefore,

f(et) = f(Yt) =
(BC/st)
æ
ç 1 +
è
1æ

ç
d-2è
A+B((Yt-Xtb)/st)ö2

÷
1+s-2sItø
ö-(d+1)/2
÷
ø

where

It = 1 if A+B((Yt-Xtb)/st) < 0
0 if A+B((Yt-Xtb)/st) ³ 0

C =
   G((d+1)/2)

G(d/2)[(d-2)p]1/2
   A = 4sC[(d-2)/(d-1)]    B2 = 1 + 3s2 - A2

The component log-likelihood function for each observation is:

llt = ½ln(1+3s2-A2) + ln(C) - ½ln(st2)
- ½(d+1) ln{1+(1/(d-2))[(A+B((Yt-Xtb)/st))/(1+s-2sIt)]2}

Example 6

Based on Example 3 above on the GARCH(1,1) model of Deutschemark-British pound exchange rate, estimate and compare the results by assuming the following distributions for the model error (Program):

GARCH(1,1) Models with Asymmetry Behavior (Leverage Effect)

There are many evidences in the financial markets that a negative surprise (change in asset returns) tends to increase volatility (variance or risk) more than positive surprise. Therefore, not only the size of the return but also the sign (negative or positive) are important in describing the characteristics of the variance of the asset returns. Consider the following simple model:

Yt = Xtb + et
et = stut

GJR Specification (Glosten-Jagannathan-Runkle, 1993)

st2 = a0 + a1et-12 + d1st-12 + g1(et-12Dt-1)

where Dt-1 = 1     if et-1 > 0
0     otherwise

The parameter g1 < 0 is sometimes referred as the Leverage Effect. The non-negativity of st2 is satisfied provided that a0 > 0, d1 ³ 0 a1+g1 ³ 0.

The asymmetric consequences of positive and negative innovations in the GARCH models can be studied based on various distributional assumptions (e.g., normal, t, GED) as described above.

EGARCH Specification (Nelson, 1991)

ln(st2) = a0 + d1ln(st-12) + a1[g1ut-1 + |ut-1| - E(ut-1)]

where ut = et/st is independently distributed with zero mean and unit variance. The parameter of ut-1, or a1g1 < 0, is interpreted as the Leverage Effect. The advantage of the Nelson's specification of the variance equation is that log of st2 is used, then the estimated st2 is positive no matter what is the sign of the estimated parameters.

Nelson's EGARCH(1,1) model assumes ut ~ GED(v) in which E(ut) = 0 and Var(ut) = 1. Furthermore,

E(|ut|) =
l 21/v G(2/v)

   G(1/v)
-> (2/p)1/2 as v -> 2 (normal distribution)

We note that l =
æ  G(1/2)ö1/2
ç
÷
è22/v G(3/v)ø
and the parameter v measures the thickness of the underlying GED distribution.


Multi-Equation Time Series Models

Generalizing from the univariate time series AR(1) model:

Yt = m + rYt-1 + et

the multivariate system of G variables can be written as follows:

Yit = mi + åj=1,2,...,G rijYj,t-1 + eit (i=1,2,...,G)

This is called Vector Autocorrelation of order 1, or VAR(1). The matrix representation of the model as a simultaneous linear equations system looks like this:

[Y1t,Y2t,...,YGt] = [m1,m2,...,mG] + [Y1,t-1,Y2,t-1,...,YG,t-1]
é
ê
ê
ë
r11r21..rG1
r12r22..rG2
::::
r1Gr2G..rGG
ù
ú
ú
û
+ [e1,e2,...,eG]
The alternative is the stacked form suitable for estimation as a system of regression equations:

é
ê
ê
ë
Y1t
Y2t
..
YGt
ù
ú
ú
û
=
é
ê
ê
ë
m1
m2
..
mG
ù
ú
ú
û
+
é
ê
ê
ë
r11r12..r1G
r21r22..r2G
::::
rG1rG2..rGG
ù
ú
ú
û
é
ê
ê
ë
Y1,t-1
Y2,t-1
..
YG,t-1
ù
ú
ú
û
+
é
ê
ê
ë
e1t
e2t
:
eGt
ù
ú
ú
û

In a shorthand notation,

Yt = m + r Yt-1 + et

Extension: VAR(p)

First, we can write the univariate AR(p) model as the system:

Yt = m + r1Yt-1 + r2Yt-2 + ... +rpYt-p + et
Yt-1 = Yt-1
Yt-2 = Yt-2
:
Yt-p+1 = Yt-p+1

Or,

é
ê
ê
ë
Yt
Yt-1
:
Yt-p+1
ù
ú
ú
û
=
é
ê
ê
ë
m
0
:
0
ù
ú
ú
û
+
é
ê
ê
ë
r1 r2.. rp
10..0
::::
0..10
ù
ú
ú
û
é
ê
ê
ë
Yt-1
Yt-2
:
Yt-p
ù
ú
ú
û
+
é
ê
ê
ë
et
0
:
0
ù
ú
ú
û

That is,

Yt = m + r Yt-1 + et

This is a system of p equations with restricted parameters matrix. The usable time series observations are from p+1 to N (N-p in total).

Similarly, for the multivariate VAR(p) system, the model can be expressed in terms of the stacked G endogenous variables. Therefore, Yt, Yt-1, ..., and Yt-p are Gx1 vectors. The size of the problem is (N-p)Gp. Then the parameter matrix r of the lag variable Yt-1 is

r =
é
ê
ê
ë
r1r2....rp
I0....0
0I::0
00..I0
ù
ú
ú
û

where, for each k = 1,2,...,p, rk = [rij,k (i,j=1,2...,G)]. Furthermore, I is GxG identity matrix, and 0 is GxG zeros matrix.

Impulse Response Functions

Deriving from a general VAR(1) system, Yt = m + r Yt-1 + et, we write:

[I-r(B)]Yt = m + et

where B is the backshift operator. Then,

Yt = [I-r]-1m + åi=0,2...,¥ riet-i

= Y* + (et + r1et-1 + r2et-2 + ...)

Y* is the equilibrium and et is the innovation. By shocking one element of et, says ejt, Yt will move away from the equilibrium Y*. Note that the effect of Yt due to change of ejt is not just on the jth variable alone but also on other variables in the system. The path whereby the variables returns to equilibrium is called the Impulse Responses of a stable VAR system. The Impulse Response Function traces the effects of a one-time innovation ejt on the k-th variable over time (i=0,1,2,...) as rikj (k,j = 1,2,...,G).

Examples

  1. Formulate and estimate a VAR system of three variables: M (Money Aggregate), Y (Output or GDP), R (Interest Rate). Compute and interpret the Impulse Response Functions. (VAR(1), VAR(2), Data)
  2. A strutural VAR(1) model: Dynamic Multiplier Analysis of Klein Model I


State-Space Models

State-space analysis deals with dynamic time series models that involve unobserved state variables such as inflation expectation, permanent income, time-varying parameters, etc.. The basic tool used to study the state-space model is the Kalman Filter, which is a recursive algorithm for estimating the unobserved component or state vector at time t, based on available information through time t-1.

Model Representation

A state-space model consists of two equations:

Conditional to the information available at time t-1, the expected value of bt is Et-1(bt) = ct + FtEt-1(bt-1). Similarly, the conditional covariance is Vart-1(bt) = FtVart-1(bt-1)Ft' + Q. For notational convenience, let bt|t-1 = Et-1(bt) and Wt|t-1 = Vart-1(bt). Then,

bt|t-1 = ct + Ftbt-1|t-1
Wt|t-1 = FtWt-1|t-1Ft' + Q

Combining the measurement and transition equations, we have

Yt = (HtFt)bt-1 + (Htct+at) + (Htvt+ut)

Given the information at time t-1, the conditional expectation and covariance of Yt are:

Yt|t-1 = Et-1(Yt) = Htbt|t-1 + at
St|t-1 = Vart-1(Yt) = HtWt|t-1Ht' + R

Since Yt is distributed according to normal(Yt|t-1,St|t-1), the log-likelihood is evaluated as:

llt = - ½ ln(2pSt|t-1) - ½ (Yt-Yt|t-1)'St|t-1-1(Yt-Yt|t-1)

Kalman Filter

The computation of log-likelihood function for parameter estimation is based on the algorithm of Kalman Filter as follows:

The above basic filter (prediction and updating) is carried out iteratively from t=1 to t=T. At the end, the sum of log-likelihoods is maximized with respect to the model parameters. To begin at time t=1, the initial values b0|0 and W0|0 must be given. If bt is stationary, then the unconditional expectation and covariance may be used:

b0|0 = (I-F)-1c
vec(W0|0) = (I-FÄF)-1vec(Q)

If bt is nonstationary, then we can use a wild guess of b0|0 (e.g. zeros vector) with large diagonal elements in the covariance matrix W0|0. In this case, the evaluation of log-likelihood and inference should not include the first few observations of the guess values.

As a by product of maximum likelihood estimation, we obtain the estimated (updated) parameter vector and the corresponding covariance matrix at time t: bt|t and Wt|t, for t=1,...,T. For a better inference, the smoothed parameter vector and the corresponding covariance matrix based on all information in the sample are:

bt|T = bt|t + K*t+1(bt+1|T-ct+1-Ft+1bt|t)
Wt|T = Wt|t + K*t+1(Wt+1|T-Wt+1|t)K*t+1'

where K*t+1 = Wt|tFt+1'Wt+1|t-1. The smoothing is performed from t=T-1 down to t=1 with the initial values bT|T and WT|T obtained from the last iteration of the basic filter.

Applications

Example 7

C-J. Kim and C. R. Nelson, "The Time-Varying-Parameter Model for Modeling Changing Conditional Variance: The case of the Lucas Hypothesis," Journal of Business and Economic Statistics, 1989, 433-440.

The State-Space Model Representation

Data Description (Data)

DM = Quarterly M1 growth rate
DR = Change in 3-month T-bill interest rate
DP = Inflation rate as measured by the CPI
SURP = Detrended full employment budget surplus

Fixed Parameters

s2, s02, s12, s22, s32, s42.

Time-Varying Parameters

b0t, b1t, b2t, b3t, b4t.

(Program)


Last updated: 03/21/2005