This exercise makes use of the data set E2-BanerjeeEtAl-data.dta, a subset of the data used in the paper A multifaceted program causes lasting progress for the very poor: Evidence from six countries by Abhijit Banerjee, Esther Duflo, Nathanael Goldberg, Dean Karlan, William Pariente, Jeremy Shapiro, Bram Thuysbaert, and Chris Udry, published in the Science in 2015.
The authors examine the impacts of a “graduation” program first designed by the Bangladeshi NGO BRAC. The program offers extremely poor households an asset transfer, temporary consumption support, skills training, home visits, and access to savings technologies. The program was evaluated through a randomized trial in six countries.
In this exercise, we use data on the program’s impacts on food security to explore the mechanics of fixed effects.
You can access the in-class activity (below) as a do file or pdf.
You can also access the main empirical exercise (also below) as a do file or pdf.
Create a do
file that contains the following preliminaries:
** preliminaries
clear all
set more off
set seed 12345
** load the data from the course website
webuse set https://pjakiela.github.io/ECON523/exercises
webuse E2-BanerjeeEtAl-data.dta
Extend your do file as you answer the following questions, so that you can run the code from start to finish and re-generate all your answers.
Familiarize yourself with the data set. How many countries are included in the study, and how many observations are there in each country? What fraction of the observations from each country were treated?
Take a look at the outcome variable e_foodsec
. What is the mean value in each country? What is the mean value in the treatment group
in each country? What does a histogram of the food security index look like?
Regress food security on treatment. What do you find? How should we interpret this coefficient?
Now regress food security on treatment controlling for country fixed effects (by adding i.country
) to the regression. How do the results change?
What if we regress food security on treatment separately for each country? In how many of the six countries do we see a positive and statistically significant treatment effect?
The regression including country fixed effects is equivalent to a regression where we first subtract off
country-specific means and then regress de-meaned (or normalized) food security on normalized treatment. Show
that this is the case. (hint: use egen
)
The regression including country fixed effects is also equivalent to a regression of residualized food security
(predicted from a regression of food security on country fixed effects) on residualized treatment
(predicted the same way). Show that this is the case. (hint: use predict
)
The regression including country fixed effects is also equivalent to a weighted average of the country-specific
treatment effects. The weights are proportional to N*p*(1-p)
where N
is the number of observations in a country
and p
is the proportion treated in that country. The weights are normalized by dividing by the sum of
all the weights. Extend the program below to calculate the treatment effect that you would get from a regression controlling for fixed effects.
gen T_mean = .
gen C_mean = .
gen p = .
gen N = .
forvalues i = 1/6 {
sum e_foodsec if treatment==1 & country==`i'
replace T_mean = r(mean) in `i'
sum e_foodsec if treatment==0 & country==`i'
replace C_mean = r(mean) in `i'
sum treatment if country==`i'
replace p = r(mean) in `i'
count if country==`i'
replace N = r(N) in `i'
}
gen weight = N*p*(1-p)
egen sum_weights = total(weight)
replace weight = weight / sum_weights
drop sum_weights
For this part of the exercise, we’re going to drop all the observations in the treatment group, and then simulate alternative scenarios to better understand how fixed effects work. Create a new do file that begins with the code below, and then extend your do file as you answer the questions.
** preliminaries
clear all
set more off
set seed 12345
** load the data from the course website
webuse set https://pjakiela.github.io/ECON523/exercises
webuse E2-BanerjeeEtAl-data.dta
** drop observations in the treatment group
drop if treatment==1
drop treatment
** randomly assign observations to four equally-sized groups
gen randnum = runiform()
sort country randnum
by country: gen within_id = _n
gen group = mod(within_id,4)
replace group = 4 if group==0
sort country within_id
p
is constant across countriesCreate a treatment variable t1
and assign observations in groups 1 and 2 to treatment. Then,
create a variable impact1
that is equal to 2 for observations in the treatment group and 0 otherwise. This is the treatment effect
for the purposes of this (first) simulation. Generate an outcome variable y1
that is endline foodsecurity (e_foodsec
)
plus impact1
. Now regress y1
on t1
with and without country fixed effects. How do the estimated treatments effecta
and the levels of statistical significant compare across the two specifications?
When the probability of treatment does not vary across countries, including country fixed effects is not necessary - but it may increase
statistical power. In the example above, fixed effects did not improve statistical power much because the mean
does not vary across countries (it is normalized to zero in the control group in every country). Change this by increasing
y1
by 10 in two countries and decreasing y1
by 20 in two other countries. Now rerun your two regressions
(with and without fixed effects). You should see that including fixed effects now changes the standard error
on your estimated treatment effect substantially (though it still should not impact your estimated coefficient much).
When p
is fixed, the estimated coefficient from a regression with fixed effects is a weighted average of the estimated country-specific
treatment effects (i.e. the within-country differences in means). The weights are the share of the total sample size within
each country. Given this, if you increased the treatment effect in Peru from 2 to 11, what you expect the treatment effect to be? See whether
this is true in practice (by changing the treatment effect in Peru and then re-running your fixed effects regression).
Fixed effects are needed when treatment probabilities vary across countries and the mean of the outcome variable also varies
across countries (because then treatment is correlated with the outcome, even in the absence of a treatment effect). To see this,
generate a variable t2
that is equal to 1 for all observations in group 1 plus the observations in group 2 in
Ethiopia, Ghana, and Honduras (countries 1, 2, and 3). In this simulation, we are not going to add any treatment effect. Generate
an outcome variable y2
that is equal to food security, and then add 5 to it in Ethiopia, Ghana, and Honduras
(for observations in the treatment and control groups in those countries). How do the results of regressions
with and without country fixed effects compare?
For the last question, we need to have the same number of observations in each country. The code below does this. You can see that we now have equal numbers of observations from groups 1, 2, 3, and 4 in each country as well.
keep if within<=360 // 360 obs per country
tab country group
Now generate a treatment variable t3
. t3
should be equal to one for observations in group 1 in
Ethiopia and Ghana. t3
should be equal to one for observations in groups 1 and 2 in Honduras and India. t3
should
be equal to 1 for observations in groups 1, 2, and 3 in Pakistan and Peru. Given this, what is the proportion treated in each country?
First, consider what happens when we only have a treatment effect in the countries with the lowest proportion treated. Create
a variable impact3
that is equal to 10 for treated observations in Ethiopia and Ghana, and equal to zero for everybody else. Then,
create an outcome variable y3a
that is the sum of e_foodsec
and impact3a
. You can see the average treatment effect across
all the treated observations in the sample summarizing impact3a among all treated individuals. How does that compare to
the results of regressions with and without fixed effects, or to the results from a regression that only includes data from Ethiopia and Ghana?
Now replicate the exercise above, but have the treatment effect occur in Honduras and India (where the proportion treated is one half)
rather than in Ethiopia and Ghana (where the proportion treated is one quarter). Generate new variables impact3b
and y3b
and repeat your analysis.
Now replicate the exercise again, but have the treatment effect occur in Pakistan and Peru (where the proportion treated is three quarters)
rather than in Honduras and India (where the proportion treated is one half). Generate new variables impact3c
and y3c
and repeat your analysis.
Based on the above, which countries received relatively low weight in the analysis of Banerjee et al. because the proportion treated was relatively low? How do you think that might have impacted their results?
This exercise is part of the module Revisiting Regression.