Monday, March 7, 2016

Got some result for my reserve paper.

The paper was presented in MPSA2014 and ISA2014. But I had been just sitting on it until recently. The tenure anxiety got me re-working on it. After a number of hiccups (such as finding nonstationarity in the data because I forgot to consider the trend; yeah, shame on me), now I got a pretty robust result (translation: robust to FE :)).

Here's the marginal effect graph:

I will come back to this when the theory is ``calibrated''.

[method ramble #3] zinb and TSCS

Recently, I've been working on a project on the relationship between inequality and social unrests (riots and demonstrations). The whole paper rests upon a conditional hypothesis:

y*_#riots = b1*Gini + b2*d.Unemployment + b3*Gini*d.Unemployment + e ...... (1)
y*_#demos = b1*Gini + b2*d.Unemployment + b3*Gini*d.Unemployment + e ...... (2),

where y* is a latent variable for riots (and demonstrations). An MLE function, of course, is necessary given that the latent continuous y* is not observed and instead we have data that counts the NUMBER of riots (say, y_#riots) and demonstrations (say, y_#demos) in a given country-year. So far, very straightforward.

Negative binomial regression is the answer. The current Stata estimator (_xtbgreg) takes care of time-series cross-section (TSCS) data pretty efficiently.

A problem arises when there are too many zeros.

The error term e may not be iid, however, when there's a systematic reason why y_#riots (and y_#demos) has so many zeros. In other words,

if equation (1) and (2) is affected by logit functions:

y(ritos |p=1) =  b1*Gini + b2*d.Unemployment + b3*Gini*d.Unemployment + e ...... (3)
y(demos |p=1) =  b1*Gini + b2*d.Unemployment + b3*Gini*d.Unemployment + e ...... (4),

then the results of (1) and (2) are likely biased.

Zero inflated negative binomial (zinb) is the way to go; but the current estimators do not deal with take into account the TSCS structure of data. The result might very well be biased.

So far a reasonable solution I've found would be something to the effect of a bunch of pair-wise comparisons like this. It makes a lot of computational sense, but I don't think it's compelling enough to convince any reviewers.

I googled quite a bit in search of a new estimator and found this one. It seems reasonable, but how stable it is hasn't been proven.

More practical solution I can think of particularly for fixed effects would be including country-year dummies in the 'inflate' equation. Whether or not the event occurs at all, I think, is much more driven by country-year heterogeneity than how often it does (for most political event, that is).

Of course, zinb fits data, but without 'recognizing' it is time series. Nonetheless a number of papers are published using zinb on TSCS data.