The following code will attempt to replicate the results of the ` lm()`

function in R. For this exercise, we will be using a cross sectional data set provided by R called “women”, that has height and weight data for 15 individuals.

The OLS regression equation:

where a white noise error term. For this example weight, and height. the marginal impact a one unit change in height has on weight.

## This is the OLS regression we will manually calculate: reg = lm(weight ~ height, data=women) summary(reg)

Recall that the following matrix equation is used to calculate the vector of estimated coefficients of an OLS regression:

where the matrix of regressor data (the first column is all 1’s for the intercept), and the vector of the dependent variable data.

## Matrix operators in R

`as.matrix()`

coerces an object into the matrix class.`t()`

transposes a matrix.`%*%`

is the operator for matrix multiplication.`solve()`

takes the inverse of a matrix. Note, the matrix must be invertible.

For a more complete introduction to doing matrix operations in R, check out this page.

## Back to OLS

The following code calculates the 2 x 1 matrix of coefficients, :

## Create X and Y matrices for this specific regression X = as.matrix(cbind(1,women$height)) Y = as.matrix(women$weight) ## Choose beta-hat to minimize the sum of squared residuals ## resulting in matrix of estimated coefficients: bh = round(solve(t(X)%*%X)%*%t(X)%*%Y, digits=2) ## Label and organize results into a data frame beta.hat = as.data.frame(cbind(c("Intercept","Height"),bh)) names(beta.hat) = c("Coeff.","Est") beta.hat

## Calculating Standard Errors

To calculate the standard errors, you must first calculate the variance-covariance (VCV) matrix, as follows:

The VCV matrix will be a square k x k matrix. Standard errors for the estimated coefficients are found by taking the square root of the diagonal elements of the VCV matrix.

## Calculate vector of residuals res = as.matrix(women$weight-bh[1]-bh[2]*women$height) ## Define n and k parameters n = nrow(women) k = ncol(X) ## Calculate Variance-Covariance Matrix VCV = 1/(n-k) * as.numeric(t(res)%*%res) * solve(t(X)%*%X) ## Standard errors of the estimated coefficients StdErr = sqrt(diag(VCV)) ## Calculate p-value for a t-test of coefficient significance P.Value = rbind(2*pt(abs(bh[1]/StdErr[1]), df=n-k,lower.tail= FALSE), 2*pt(abs(bh[2]/StdErr[2]), df=n-k,lower.tail= FALSE)) ## concatenate into a single data.frame beta.hat = cbind(beta.hat,StdErr,P.Value) beta.hat

## A Scatterplot with OLS line

## Plot results plot(women$height,women$weight, xlab = "Height", ylab = "Weight", main = "OLS: Height and Weight") abline(a = bh[1], b = bh[2], col = 'red', lwd = 2, lty="dashed")

Now you can check the results above using the canned `lm()`

function:

summary(lm(weight ~ height, data = women))