R sandbox

photo: World Bank/Peter Kapuscinski (2015)

Instructor:
Pamela Jakiela

home
syllabus
schedule
readings
lectures
stata


Empirical Exercise 1

This exercise makes use of the data set E1-CohenEtAl-data.dta, a subset of the data used in the paper Price Subsidies, Diagnostic Tests, and Targeting of Malaria Treatment: Evidence from a Randomized Controlled Trial by Jessica Cohen, Pascaline Dupas, and Simone Schaner, published in the American Economic Review in 2015. The authors examine behavioral responses to various discounts (“subsidies”) for malaria treatment, called “artemisinin combination therapy” or “ACT.” An overview of the randomized evalaution is available here.

The aim of this empirical exercise is to review key R commands. Please upload your answers to gradescope after completing the exercise. You can also download the activity as an R Script.


Getting Started

There are two ways to get started. One option is to start by downloading the R file linked above, saving it to a file on your computer, and opening it in RStudio. Alternatively, you can open a new R Script file in RStudio and add the necessary commands yourself.

If you choose the latter option, you’ll want to include a command to change the working directory so that any outputs are saved where you can find them later.

# change working directory as appropriate to where you
# want to save
setwd("C:\\Users\\examplePerson\\OneDrive\\Documents\\R")

Next, we’ll have to load the data into RStudio. There are two ways of doing this, but you only have to choose one. Firstly, you could just run these three lines of code in RStudio:

library(readr)
fileUrl <- "https://raw.githubusercontent.com/pjakiela/IE-in-R/gh-pages/E1-CohenEtAl-data.csv"
E1data <- read_csv(url(fileUrl))

If you were successful, you’ll now see something called ‘E1Data’ in the ‘Data’ section of RStudio, which should be in the top right corner of your screen.

Alternatively, you could first click this link: E1-CohenEtAl-data.dta. Now, this file is in your ‘Downloads’ folder as ‘E1-CohenEtAl-data.dta’. Next, in RStudio, navigate to the ‘File’ tab and select ‘Import Dataset’ and then ‘From Stata’. A new window should appear. Click the ‘Browse’ button in the top right corner of this window, find ‘E1-CohenEtAl-data.dta’ in your ‘Downloads’ folder, and select it. You should now be back at the window where you clicked ‘Browse’. Finally, click ‘Import’ in the bottom corner of that window, and you’ve successfully imported the data! You can verify that the data was loaded by checking if ‘E1_CohenEtAl_data’ appears in the ‘Data’ section of RStudio, which should be in the top right corner of your screen.


R Functions

In this exercise, we’ll use the R functions dim(), summary(), table(), lm() (which stands for linear model - this is for regression), and sd(). If you are unfamiliar with any of these, type help("x") into the console or your script, replacing ‘x’ with the function you’d like to learn more about (for example, help("summary")), to link to the relevant help page.

You may also want to do simple math in R. To do this, just type in an expression, say:

24/8

and R will return the number 3 in answer to your question.


Empirical Exercise

Question 1

How many observations are in the data set?

Question 2

What is the mean of the variable act_any (to three decimal places)?

Question 3

The variable act_any is a dummy for assignment to any treatment (positive subsidy). How many people received a positive subsidy?

Question 4

What is the standard deviation of the variable c_act?

Question 5

The variable c_act is a dummy for using ACT treatment during a malaria episode. How many respondents report using ACT treatment for malaria?

Question 6

Regress c_act on act_any. What is the R-squared?

Question 7

What is the coefficient associated with the act_any variable?

Question 8

What is the associated standard error?

Question 9

What do you get when you divide the coefficient by the standard error?

Question 10

What is the t-statistic associated with the act_any variable?


Even More Fun with R

Use the summary() and table() functions to familiarize yourself with the other variables in the data set. What are the mean and median levels of education among household heads? What proportion of households live more than 2 km from the nearest chemist? For how many obesrvations is information on household size, the education level of the household head, and distance to the nearest chemist missing?

Calculate the mean of c_act in the treatment group (observations with act_any==1) and in the comparison group (observations with act_any==0). How do these means relate to your regression results above, when you regressed c_act on act_any?

Now use the t.test() function to test the hypothesis that the mean of c_act is the same in the treatment and comparison groups. How do these results compare to your regression results?

The variable coartemprice indicates the randomly-assigned ACT price (and, implicitly, the associated level of price subsidy). What price/subsidy levels are included in the experiment?

The variables act40, act60, act100, and act500 are dummies for individual randomly assigned prices/treatments. Summarize the mean level of c_act in each of the treatment arms (you already summarized the mean of c_act in the treatment group above).

Regress c_act on the dummies for the three subsidy levels (act40, act60, act100). How do the regression results compare to the means that you calculated for each treatment group?

Convince yourself that the OLS coefficient from Question 7 is the weighted average of the coefficients from the regression you just estimated. What are the weights?



This exercise is part of Module 1: Why Evaluate?.