Data from my projects

I’m working to organize some of the publicly available datasets that I use more frequently (mainly Medicare Advantage and HCRIS) and associated code files. These are available as part of separate GitHub repositories. For code that is more specific to a given project, please see the repositories under those individual projects/papers on my research page.

Stata Commands

Through some work with excellent mentors and co-authors Daniel Millimet and Rusty Tchernis, I’ve had an opportunity to help write a couple of Stata commands. Links and descriptions to these commands are below.

  • bmte. This stands for bias-minimizing treatment effects. The goal of this estimator is to provide an estimate of the average treatment effect in cross-sectional settings where we know selection on observables is violated and where we do not have some other exclusion restriction or alternative identification strategy. The command implements the estimator proposed in Millimet and Tchernis (2013).

  • tebounds. This command implements several versions of Manski-type bounds for the average treatment effect in the case of a binary outcome and binary treatment. A particular advantage of these estimators is that they allow for potentially misreported and endogenous treatment status.

Applied Empirical Micro Project

I’m actively working on an “interactive” econometrics project, the goal of which is to organize a handful of common econometric issues in applied empirical micro. The working title of this project is Navigating Empirical Methods. This resource is far from a comprehensive econometrics book, and the focus really isn’t on covering the details of any specific estimator or methods. Rather, this will hopefully serve as a reference for the key things to keep in mind when implementing a given research design or estimation procedure, standard tests to consider, and alternative estimators (when relevant). I’m updating these things constantly as I find new information and correct my own misunderstanding. If you see something awry, please let me know!

External data sources

Here I’ve collected some handy data resources that I’ve come across over time. Most of these are data search engines or specific websites that I find valuable and that I reference semi-regularly. Hat tip to D. Sebastian Tello-Trillo for organizing many of these data sources on his own resources page.

  • Try Google’s database search. It’s officially out of beta mode as of January 2020.
  • Consider joining Data is Plural. This is a newsletter that sends out a few lesser-known datasets every week. You can also peruse their archive.
  • ICPSR is a data search-engine provided by the Institute for Social Research at the University of Michigan. Lots of good info.
  • Dataverse is a repository for papers, often with links to specific data sources and code files where available.
  • FRED is a common data source for macroeconomic variables maintained by the Federal Reserve Bank of St. Louis.
  • Long list of valuable datasets from the NBER.
  • IPUMS provides a list of several datasets from several different underlying surveys.
  • List of datasets sponsored by the CDC at CDC Wonder.
  • Data from the Bureau of Economic Analysis (BEA).
  • Historic data can be hard to find. These lists from Historical Statistics can help.
  • A short list of datasets for economics compiled by the AEA.

Learning and methods

It’s always nice to learn new things, freshen-up on things we thought we learned before, or just hear things again from a different perspective. Below are a few resources that I’ve found valuable specifically for econometric methods.

Writing and presenting like an academic

Academics aren’t known for their patience, and economists are probably worse than most. Learning how to write concisely and present effectively for your audience is critical. Here are some helpful links (hat tip to Christoph Kronenberg and Amanda Agan for gathering many these on their websites first)

Discussing other people’s work

Academics are good at offering criticism, sometimes not so constructively, sometimes just wrong, sometimes mean, and sometimes wrong and mean at the same time. Follow some of these guides to make sure you don’t fall in the latter category.

Organizing code

Some fun packages

There are so many incredible things that people are doing in applied econometrics right now. This material seems to come out fast, and I often find myself with dozens of bookmarks that I forget or quickly lose track of. So, here’s where I’ve decided to list some of the new programs and packages that I think are particularly useful for applied researchers. This is not comprehensive by any means…just a way to keep track of new programs that I either actively use in my work or that I want to start using.

  • R package for the Goodman-Bacon decomposition, bacondecomp. It also exists for Stata.
  • Randomization Inference “Sandbox” from the World Bank.
  • panelView is a great way to visualize treatment timing in your data
  • Present your regression results and summary statistics neatly with modelsummary in R
  • Simple summary statistics in R with vtable::sumtable() from Nick Huntington-Klein, package code available here
  • An R package for specification curves, rdfanalysis
  • Really fast regression with fixed effects in R, fixest
  • R package for sensitivity analysis of OLS and IV (sensitivity meaning sensitivity to unobserved confounders), sensemakr