Resources

Data from my projects

I’m working to organize some of the publicly available datasets that I use more frequently (mainly Medicare Advantage and HCRIS) and associated code files. These are available as part of separate GitHub repositories. For code that is more specific to a given project, please see the repositories under those individual projects/papers on my research page.

Stata Commands

Through some work with excellent mentors and co-authors Daniel Millimet and Rusty Tchernis, I’ve had an opportunity to help write a couple of Stata commands. Links and descriptions to these commands are below.

bmte. This stands for bias-minimizing treatment effects. The goal of this estimator is to provide an estimate of the average treatment effect in cross-sectional settings where we know selection on observables is violated and where we do not have some other exclusion restriction or alternative identification strategy. The command implements the estimator proposed in Millimet and Tchernis (2013).
tebounds. This command implements several versions of Manski-type bounds for the average treatment effect in the case of a binary outcome and binary treatment. A particular advantage of these estimators is that they allow for potentially misreported and endogenous treatment status.

External data sources

Here I’ve collected some handy data resources that I’ve come across over time. Most of these are data search engines or specific websites that I find valuable and that I reference semi-regularly. Hat tip to D. Sebastian Tello-Trillo for organizing many of these data sources on his own resources page.

Try Google’s database search. It’s officially out of beta mode as of January 2020.
Consider joining Data is Plural. This is a newsletter that sends out a few lesser-known datasets every week. You can also peruse their archive.
ICPSR is a data search-engine provided by the Institute for Social Research at the University of Michigan. Lots of good info.
Dataverse is a repository for papers, often with links to specific data sources and code files where available.
FRED is a common data source for macroeconomic variables maintained by the Federal Reserve Bank of St. Louis.
Long list of valuable datasets from the NBER.
IPUMS provides a list of several datasets from several different underlying surveys.
List of datasets sponsored by the CDC at CDC Wonder.
Data from the Bureau of Economic Analysis (BEA).
Historic data can be hard to find. These lists from Historical Statistics can help.
A short list of datasets for economics compiled by the AEA.

Learning and methods

It’s always nice to learn new things, freshen-up on things we thought we learned before, or just hear things again from a different perspective. Below are a few resources that I’ve found valuable specifically for econometric methods.

NBER Lectures are very well done and presented by the best in their respective fields.
Like the NBER Lectures, the AEA Continiung Education webcasts are great for learning new methods, even if you couldn’t attend the pre-conference workshops.
- Cross Section Econometrics
- Mastering Mostly Harmless Econometrics
Want to learn Python like a secret government spy? Take a look at the NSA’s Python Course, made available by a freedom of information act request from Chris Swenson.
Some free econometrics textbooks!
- Causal Inference: The Mixtape by Scott Cunningham
- Causal Inference Book by Jamie Robin and Miguel Hernan
- Econometrics by Bruce Hansen
- Introductory Econometrics class notes from Nick Huntington-Klein
- Causal Inference for the Brave and True
- Causal Inference: What If
Resources for coding:
- Coding for Economists
- For a great resource on R and lots of important data science topics, see Grant McDermott’s Data science for economists GitHub repository
Ever wish you had a central resource for lots of little commands and how to do things in different stats packages? Nick Huntington-Klein thought the same thing in his Library of Statistical Techniques (LOST).
Resources for specific estimators and research designs:
- The DiD Project. An up-to-date compilation of code, literature, and accessible blog/video posts from Asjad Naqvi
- RD Designs and Packages. Comprehensive set of RD tools and links to the most recent literature from Matias Cattaneo and team.
- Nonparametric and Semiparametric Methods. Another great set of tools for implementing nonparametric and semiparametric methods in program evaluation from Matias Cattaneo and team.
- Clustering and inference in DD. Nice flow chart suggesting the appropriate method for inference with different numbers of treated or control group. Thanks to Patrick Button for putting this together!

Writing and presenting like an academic

Academics aren’t known for their patience, and economists are probably worse than most. Learning how to write concisely and present effectively for your audience is critical. Here are some helpful links (hat tip to Christoph Kronenberg and Amanda Agan for gathering many these on their websites first)

Formulae for an Introduction by Keith Head, Body by Marc Bellemare, and Conclusion by Marc Bellemare
Ten Most Important Rules of Writing Your Job Market Paper by Claudia Goldin and Lawrence Katz
Paper writing gone Hollywood
Four steps to an applied micro paper
How to give an applied micro talk
Public speaking for academic economists by Rachel Meager
Beamer tips for presentations by Paul Goldsmith-Pinkham

Discussing other people’s work

Academics are good at offering criticism, sometimes not so constructively, sometimes just wrong, sometimes mean, and sometimes wrong and mean at the same time. Follow some of these guides to make sure you don’t fall in the latter category.

The discussant’s art by Chris Blattman
Writing referee reports, one by Marc Bellemare and another by Tatyana Deryugina and another by Elisabeth Sadoulet and another by Berk, Harvey, and Hirshleifer in the JEP
How to be a great conference participant by Art Carden

Organizing code

Stata coding guide by Julian Reif
Code and data for the social sciences by Matthew Getzkow and Jesse Shapiro

Some fun packages

There are so many incredible things that people are doing in applied econometrics right now. This material seems to come out fast, and I often find myself with dozens of bookmarks that I forget or quickly lose track of. So, here’s where I’ve decided to list some of the new programs and packages that I think are particularly useful for applied researchers. This is not comprehensive by any means…just a way to keep track of new programs that I either actively use in my work or that I want to start using.

R package for the Goodman-Bacon decomposition, bacondecomp. It also exists for Stata.
Randomization Inference “Sandbox” from the World Bank.
panelView is a great way to visualize treatment timing in your data
Present your regression results and summary statistics neatly with modelsummary in R
Simple summary statistics in R with vtable::sumtable() from Nick Huntington-Klein, package code available here
An R package for specification curves, rdfanalysis
Really fast regression with fixed effects in R, fixest
R package for sensitivity analysis of OLS and IV (sensitivity meaning sensitivity to unobserved confounders), sensemakr

For graduate students

Getting started with research can be challenging. How do you develop an initial question? How do you convert this question into a paper? How do you know when to stop pursuing a question and move on to another idea? How do you take care of yourself given the stresses of the research process and the inherent rejection of it all? There are lots of resources that can help in these areas, and here’s a site that has many of these resources in one place:

Advice for PhD Students in Economics