ECON 370

Logo
source: The Economist

Instructor:
Pamela Jakiela

home
syllabus
schedule


EDA Project

Objective: to explore cross-country differences in outcomes or changes in countries over time using data from a range of sources

Topic and Scope

The goal of this project is to use (cross-country) data to explore a research question, either about how countries differ from one another or about how they are changing over time. Your focus can be on any economic, political, social, or business topic: trade policy, support for democracy, child health, gender norms, price stability - it is entirely up to you (and your group).

You are free to analyze data from around the world or to focus on a particular region (e.g. Eastern Europe, Latin America) or set of countries (e.g. low-income countries or former French colonies). You can also choose to look at data from a single point in time or to look at changes over time.

Data

You need to use at least two sources of data.

One of these should be the World Development Indicators. You should select at least five WDI indicators, including at least two that we did not analyze in Lab 2. Download the WDI data that you are going to use and save it as a CSV file. You will upload this raw data file together with your project.

Your second source of data can be anything, as long as the data are publicly-available. A few possibilities are:

For many of these sources, you will want to collapse the raw data into a country or country-year level data set.

Elements of the Project

Your finished project should include:

  1. A statement of your research questions, your country sample, together with a brief motivation
  2. A description of your data sources including a (nicely formatted) list of the variables you analyze
  3. A histogram or kernel density plot
  4. A scatter plot or bar graph showing the relationship between two variables
  5. A visualization representing either principal components analysis or k-means clustering

Your goal is to articulate a clear research question and provide a compeling answer to it using data. I will be evaluating both the quality of your question and the quality of your answer. I am also looking for a set of slides that is well-formatted, polished, and complete - supported by replication files that transform the raw data into all of the final outputs that you present.