Sure. You can use the all the observations pre & post treatment either by averaging multiple time periods (before or after), or just including all the observations in your regression with a flag for all observations post treatment. As I recall, the key issue when including >2 time periods in your regression is that autocorrelation will exist and tend to bias estimates of the impact upwards (though you should double check me on this as it’s been awhile since I read any academic research on this aspect). Cheers,

]]>I wonder how to specify a D-in-D model when you have outcome data collected at three time points–one before implementation of a policy and two after. I have a mind to just lump together the data from the two post intervention time points, and estimate the model you show above (y = b0 + b1*treatment + b2*time + b3*(treatment*time). Is there a way of explicitly modelling the different time, for instance using time dummies in this case?

Thanks

]]>thanks for your blog, i have a question. for you. How to set up data for running the diff in diff?can you give an example? i am running a simple DID Estimation with two periods, the before 1993 and after 2003. Is this acceptable estimation? ]]>

Awesome thanks for sharing this.

]]>Thank you very much for your blog. I have a question regarding stata command “diff” (please “help diff” in stata) for the pooled/repeated cross section data (whether arguments for this command change for this data). I will write my code here and please confirm if I get it right.

I have data for two countries (lets say, countryA and countryB) for 5 years from 2001 until 2005. The treatment happens to countryB in the year 2002. So the stata commands I am using for the difference-in-difference estimator are:

*code starts here

period=1 if year>=2002

period=0 if year<2002

treated=1 if country=="countryB"

treated=0 if country=="countryA"

diff outcomevar, period(period) treated(treated) cov(a,b,c,d)

I am aware that this coding is pretty much a standard for this command. However, I would like to confirm if it prevails for the repeated/pooled cross-sectional data, and also, I would like to confirm if I got everything right.

Thanks and regards,

Sumit

I have just noticed that the link actually works. Sorry!

]]>Hello Kevin,

I am unable to obtain the EITC data set you used in your example. The link seems not to be functioning any longer.

Could you please provide another link

Thanks.

-Raul

@GJ — “select = work” is used to select the “work” variable from the dataset. Try it out yourself by loading the dataset and trying “subset(eitc, post93 == 0 & anykids == 0, select=nonwhite), or any other variable. Kevin does this because he wants to take the average of the “work” variable for the 4 different groups.

@Ica – I don’t use stata, but how about you use stata (or excel) to seperate your dataset into two datasets: female and male? Then the code Kevin wrote should apply.

@Nicholas — the dependant variables, “post” and “anykids” only take the values 0 and 1. So, there is no regression “line”. Try writing out the formula for a regression and you’ll see that in the case of only 0 and 1 values, it simplifies to kevin’s statement. (Also verified by the fact that using a “regression” returns the same values!

]]>