In this exercise, we’ll be using data from the paper The Miracle of Microfinance? Evidence from a Randomized Evaluation by Abhijit Banerjee, Esther Duflo, Rachel Glennerster, and Cynthia Kinnan. The paper reports the results of one of the first randomized evaluations of a microcredit intervention. The authors worked with an Indian MFI (microfinance institution) called Spandana that was expanding into the city of Hyderabad. Spandana identified 104 neighborhoods where it would be willing to open branches. They couldn’t open branches in all the neighborhoods simultaneously, so they worked with the researchers to assign half of them to a treatment group where branches would be opened immediately. Spandana held off on opening branches in the control neighborhoods until after the study.
Before getting started, take a look at this J-PAL policy brief on the impacts of microfinance. We’ll be using a small slice of the data from the paper by Banerjee, Duflo, Glennerster, and Kinnan to explore the use of instrumental variables techniques to estimate impacts of treatment on the treated - and to think about when such methods are appropriate.
Our first step is to review the mechanics of treatment-on-the-treated estimation. There are four ways to arrive at an estimate of the impact of treatment (access to loans from Spandana) on individuals who take it up (by taking out a Spandana microloan):
feols
to implement two-stage least squares (as in 2, except using a single step)The data that we will use in this exercise is available here. The data set
contains information on 6,863 households in 104 neighborhoods in Hyderabad; these households were randomly sampled form
the local population, so not all of them will have chosen to take out loans from an MFI. Half of the neighborhoods (52 of 104) were randomly assigned
to treatment (and the rest to control). The variable treatment
indicates treatment status, and the variable areaid
is a neighborhood identifier.
We will be using the following outcome variables:
spandana_1
is an indicator for taking out a loan from Spandanabizprofit_1
is a measure of microenterprise profitsbizrev_1
is a measure of microenterprise revenuesbizassets_1
is a measure of assets owned by one’s microenterpriseany_biz_1
is an indicator for operating a microenterpriseTo get started, create a script that reads the data into R directly from the web:
## ECON 523: In-Class Activity 6
## A. Student
library(tidyverse)
library(haven)
library(fixest)
mypath <- "C:\myfilepath"
urlfile <- 'https://pjakiela.github.io/ECON523/exercises/E6-BanerjeeEtAl-data.dta'
e6data <- read_dta(urlfile)
We are going to make use of the variables treatment
, spandana_1
, and bizprofit_1
. Before you begin,
add a line to your code that drops any observations with one of these variables missing.
Hint: the code
filter(df, !if_any(c(x1, x2), is.na))
drops any rows with either x1
or x2
missing from the data frame df
.
Estimate the impact of treatment
on the likelihood of taking a loan from Spandana (the variable spandana_1
). What is the estimated coefficient on treatment
? Because treatment is randomly assigned at the neighborhood level, we need to cluster our standard errors by neighborhood (areaid
indexes neighborhoods). Do this. This is the first stage regression.
Save the coefficient on treatment
as beta_fs
.
Now extend your code so that you also run the reduced form regression of microenterprise profits (the variable bizprofit_1
) on treatment
. What is the estimated impact of being randomly assigned to a treatment (at the neighborhood level) on business profits?
Save the coefficient on treatment
as beta_rf
.
Based on your answers to Questions 1 and 2, what is the treatment-on-the-treated impact of random assignment to Spandana access on business profits? Use beta_fs
and beta_rf
to calculate this quantity (in your script).
Now we want to output our results to excel. To do this, we will create a data frame called results
that contains the results that we wish to export, formatted appropriately.
The code below defines a function reshape_regs()
that formats the results from a regression in a column, with the term
column indicating the variable whose coefficient is being reported and the type
column indicating whether the row contains a coefficient estimate or a standard error in parentheses. Review the code below carefully to make sure that you understand each line. Then extend the code to include the p-value in the row below the standard error. Put the p-value in square brackets rather than parentheses.
reshape_regs <- function(myresults){
olsresults <- tidy(myresults)
olsresults$index <- 1:nrow(olsresults)
olsresults <- olsresults %>%
filter(!term == "(Intercept)") %>%
mutate(across(c(estimate, std.error), ~ sprintf("%.3f", .))) %>%
mutate(across(c(estimate, std.error), ~ as.character(.))) %>%
mutate(std.error = str_c("(", std.error, ")")) %>%
select(term, estimate, std.error, index) %>%
arrange(index) %>%
pivot_longer(c(estimate, std.error),
names_to = "type",
values_to = "est") %>%
select(term, type, est)
return(olsresults)
}
Use the reshape_regs()
function to store the results from the first stage and reduced form regressions. You can also adapt the code below to store the number of observations and the R-squared.
N_fs <- as.character(fs$nobs)
R2_fs <- as.character(round(r2(fs)[2],3))
Now you need to merge the results from your first stage and reduced form regressions into a single data frame that you can export to excel. The code below does this. Make sure that you understand what every line is doing, and then use the code to generate the data frame results
.
results <- left_join(fs_results, rf_results, by = c("term", "type")) %>%
mutate(term = if_else(type == "estimate", term, "")) %>%
mutate(term = if_else(term == "treatment" & type == "estimate", "Treatment", "")) %>%
add_row(term = "Observations", C1 = N_fs, C2 = N_rf) %>%
rename(" " = term,
"Borrowed" = C1,
"Profits" = C2) %>%
select(!type)
print(results)
The last step is to export your results to excel. Here, we adapt the code from Empirical Exercise 5 to export our results to an excel file called R6-in-class.xlsx
. Before doing this, make sure you have loaded the openxlsx
library and defined your filepath correctly (so that you will be able to locate your results).
wbname <- "In-Class"
wb <- createWorkbook()
addWorksheet(wb, wbname)
# create a header style
hs1 <- createStyle(halign = "CENTER", textDecoration = "Bold", border = "BottomTop")
# write the results to the table
writeData(wb, wbname, results, headerStyle = hs1)
# adjust column widths
my_widths <- c(20, 16, 16)
setColWidths(wb, wbname, cols = 1:3, widths = my_widths)
my_heights <- c(rep(16, 1), rep(20, 4))
setRowHeights(wb, wbname, rows = 1:5, heights = my_heights)
# center the content of the table
center_style <- createStyle(halign = "CENTER")
addStyle(wb, wbname, center_style, cols = 2:3, rows = 1:5, gridExpand = TRUE, stack = TRUE)
# create a bottom border
bottom_style <- createStyle(border = "Bottom")
addStyle(wb, wbname, bottom_style, cols = 1:3, rows = 5, gridExpand = TRUE, stack = TRUE)
# save workbook
saveWorkbook(wb, file = paste0(mypath, "R6-in-class.xlsx"), overwrite = TRUE)
Start a new script for the main part of the empirical exercise. We are going to make use of the variables treatment
, spandana_1
, bizprofit_1
, bizrev_1
, bizassets_1
, and any_biz_1
. Before you begin, add a line to your code that drops any observations with one of these variables missing.
Use two-stage least squares (2SLS) to estimate an instrumental variables (IV) regression of bizprofit_1
on spandana_1
, instrumenting for spandana_1
with the treatment dummy. Cluster your standard errors at the neighborhood level. Your estimated coefficient should be identical to your answer from the In-Class Activity.
Hint: the following code illustrates how to implement two-stage least squares using feols()
(given outcome y
, endogenous regressor x
, and instrument z
):
feols(y ~ 1 | x ~ z, data = df, vcov = ~clustvar)
Now make a table that reports TOT estimates of the impact of Spandana loans on microenterprise profits (the variable bizprofit_1
), microenterprise revenues (the variable bizrev_1
), microenterprise assets (the variable bizassets_1
), and the likelihood of operating a microenterprise (the variable any_biz_1
). Modify the code from the In-Class Activity to store your results in a data frame and export them to excel as a nicely formatted table.
Now make another table that replicates the treatment-on-the treated estimation from Question 2 using the control function approach.
Hint: the following code reviews the process of generating a new variable reflecting the residuals from a regression:
model1 <- feols(y ~ x, data = df)
df$myresid <- model1$residuals
Print each of your tables to pdf so that you can upload your finished product(s) to gradescope.
Using instrumental variables to estimate treatment effects on the treated makes sense when random assignment to treatment (i.e. inviting someone to participate in a program) has no impact on those who choose not to take up treatment. Does this approach make sense in the context of microfinance? Why or why not?
The relatively low take-up rates for microfinance loans can be interpreted as evidence that not everyone wants to be an entrepreneur, and several studies have found that access to credit is more effective at helping people expand their businesses than at encouraging non-entrepreneurs to start new businesses. The variable any_old_biz
is an indicator for operating a microenterprise prior to the start of the study. Restrict your sample to those who were already operating microenterprises before Spandana’s expansion, and estimate the impact of Spandana loans on microenterprise profits, revenues, and assets in this restricted sample. Store your results in an excel table (but don’t over-write your earlier work). What do these results suggest about the impacts of microfinance?
This exercise is part of the module Impacts of Treatment on the Treated.