## ECON 370 LAB 4:  OLS 
## NAME:  
## DATE:  


# preliminaries ----------------------------------------------------------------

## set the seed (set.seed() in R and np.random.seed() in Python)


# bivariate OLS, no constant ---------------------------------------------------

## Step 1:  generate a data set
##   - set a datasize parameter to 200
##   - generate X = a vector of 200 draws from a standard normal
##   - generate Y = 2X + a normal error with mean = 0 and sd = 8 (so no constant)
##   - put X and Y in a data frame or tibble called data

## R: rnorm(N, mean, sd) generates a vector of N draws from the normal distribution
## R: define X as a matrix rather than a data frame so that we can matrix multiply it later
## Python:  np.random.normal() generates a vector of N draws from the normal distribution


## Step 2:  regress Y on X and save the results
## No need to worry about robust standard errors
## Use lm() in R and sm.OLS() in Python


## Step 3: calculate the OLS coefficient betahat "by hand" 
##    using the formula on slide 10


## Step 4: find betahat by finding the candidate beta that minimizes the RSS
##   - Define beta_min and beta_min as the limits of your search window from -10 to 10
##   - set beta_steps = 100000
##   - Define trial_betas as a sequence of beta_steps from beta_min to beta_max
## R: you can use seq(), make trial_betas a matrix for later matrix multiplication
## Python: you can use np.linspace().reshape(-1, 1)


## Step 5: define a function RSS(beta) that (1) multiplies X times a candidate beta 
##    and then (2) calculates, squares, and sums the residuals
## R: use sapply() to apply the RSS function to each element in the vector trial_betas


## Step 6: create a data frame or tibble of the trial beta values and associated RSSs,
##   then find the value of beta that minimizes the RSS


## Step 7: find the optimal beta using numerical optimization
##  - first, define a 1x1 vector of starting values of 0
##  - R: use optim() and your RSS function to find the minimizing betas
##  - Python: use scipi.optimize's minimize() 


# multivariate regression ---------------------------------------------------

## Replicate Steps 1, 2, 4, 5, and 7 to find the OLS coefficients when 
##    datasize = 2000
##    X has six columns (X1 through X6) all of which are standard normals
##    Y = 2*X1 + 3*X2 + a standard normal error term
##    You want to run an OLS regression of Y on X including a constant
## Define a vector check that indicates whether the parameter estimates from 
##    numerical optimization are within 0.001 of the OLS coefficients