Archive for June 11th, 2011

June 11, 2011

TikZ diagrams for economists: A price ceiling

I have been dabbling with the TikZ package to create some diagrams relevant to a first year microeconomics course. The following diagram of a price ceiling may be useful to others wishing to integrate similar diagrams into their LaTeX documents or Beamer presentations. To use, insert the following code anywhere you like within a .tex document (you must include \usepackage{tikz} in your header):

This diagram was created with the TikZ package in LaTeX.


\usetikzlibrary{calc}	%allows coordinate calculations.

%Define linear parameters for supply and demand
\def\dint{4.5}			%Y-intercept for DEMAND.
\def\dslp{-0.5}			%Slope for DEMAND.
\def\sint{1.2}			%Y-intercept for SUPPLY.
\def\sslp{0.8}  		%Slope for SUPPLY.

\def\pfc{2.5}			%Price floor or ceiling


% Define coordinates.
    \coordinate (ints) at ({(\sint-\dint)/(\dslp-\sslp)},{(\sint-\dint)/(\dslp-\sslp)*\sslp+\sint});
    \coordinate (ep) at  (0,{(\sint-\dint)/(\dslp-\sslp)*\sslp+\sint});
    \coordinate (eq) at  ({(\sint-\dint)/(\dslp-\sslp)},0);
    \coordinate (dint) at (0,{\dint});
    \coordinate (sint) at (0,{\sint});
    \coordinate (pfq) at  ({(\pfc-\dint)/(\dslp)},0);
    \coordinate (pfp) at  ({(\pfc-\dint)/(\dslp)},{\pfc});
    \coordinate (sfq) at  ({(\pfc-\sint)/(\sslp)},0);
    \coordinate (sfp) at  ({(\pfc-\sint)/(\sslp)},{\pfc});

    \draw[thick,color=blue] plot (\demand) node[right] {$P(q) = -\frac{1}{2}q+\frac{9}{2}$};
    \draw[thick,color=purple] plot (\supply) node[right] {Supply};

% Draw axes, and dotted equilibrium lines.
    \draw[->] (0,0) -- (6.2,0) node[right] {$Q$};
    \draw[->] (0,0) -- (0,6.2) node[above] {$P$};
    %Price floor and ceiling lines
    \draw[dashed,color=black] plot (\x,{\pfc}) node[right] {$P_c$};
    \draw[dashed] (pfp) -- (pfq) node[below] {$Q_d$};
    \draw[dashed] (sfp) -- (sfq) node[below] {$Q_s$};
\draw[->,baseline=5] ($(0,{\pfc})+(-1.5,0.7)$) node[label= left:Price Ceiling] {} -- ($(0,{\pfc})+(-.1,0.1)$);

Tags: , ,
June 11, 2011

Clustered Standard Errors in R

So, you want to calculate clustered standard errors in R (a.k.a. cluster-robust, huber-white, White’s) for the estimated coefficients of your OLS regression? This post shows how to do this in both Stata and R:


Let’s say that you want to relax the Gauss-Markov homoskedasticity assumption, and account for the fact that there may be several different covariance structures within your data sample that vary by a certain characteristic – a “cluster” – but are homoskedastic within each cluster.

For example, say you have a panel data set with a bunch of different test scores from different schools around the country. You may want to cluster your sample by state, by school district, or even by town. Economic theory and intuition will guide you in this decision.

So, similar to heteroskedasticity-robust standard errors, you want to allow more flexibility in your variance-covariance (VCV) matrix (Recall that the diagonal elements of the VCV matrix are the squared standard errors of your estimated coefficients). The way to accomplish this is by using clustered standard errors. The formulation is as follows:

Variance_{cluster} =

\bigl (\frac{M}{M-1}\frac{N-1}{N-K} \bigr )(X'X)^{-1}\sum_{j=1}^M \{X_M\hat{\varepsilon}_M\hat{\varepsilon}'_MX_M\}(X'X)^{-1}

where M = number of unique clusters (e.g. number of school districts) N = number of observations, and K = the number of regressors (including the intercept). (See pages 312-313 of Angrist and Pischke’s Mostly Harmless Econometrics (Princeton University Press, 2009) for a better explanation of the summation notation.) This returns a Variance-covariance (VCV) matrix where the diagonal elements are the estimated cluster-robust coefficient variances — the ones of interest. Estimated coefficient standard errors are the square root of these diagonal elements.

Stata makes it very easy to calculate, by simply adding ,cluster(state) to the end of your regression command.


use wr-nevermar.dta
reg nevermar impdum, cluster(state)


In R, you first must run a function here called cl() written by Mahmood Ara in Stockholm University – the backup can be found here and here.

   cl   <- function(dat,fm, cluster){
              require(sandwich, quietly = TRUE)
              require(lmtest, quietly = TRUE)
              M <- length(unique(cluster))
              N <- length(cluster)
              K <- fm$rank
              dfc <- (M/(M-1))*((N-1)/(N-K))
              uj  <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
              vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
              coeftest(fm, vcovCL) }

After running the code above, you can run your regression with clustered standard errors as follows:

nmar = read.dta("")

# Run a plain linear regression
regt = lm(nevermar ~ impdum, data = nmar)

# apply the 'cl' function by choosing a variable to cluster on.
# here, we are clustering on state.
cl(nmar, regt, nmar$state)

Thoughts, comments, ideas?  Let me know; I’m always appreciative of feedback. You can contact me via e-mail at kevingoulding {at} gmail [dot] com.