Data from my projects

I’m working to organize some of the publicly available datasets that I use more frequently (mainly Medicare Advantage and HCRIS) and associated code files. These are available as part of separate GitHub repositories. For code that is more specific to a given project, please see the repositories under those individual projects/papers on my research page.

Stata Commands

Through some work with excellent mentors and co-authors Daniel Millimet and Rusty Tchernis, I’ve had an opportunity to help write a couple of Stata commands. Links and descriptions to these commands are below.

  • bmte. This stands for bias-minimizing treatment effects. The goal of this estimator is to provide an estimate of the average treatment effect in cross-sectional settings where we know selection on observables is violated and where we do not have some other exclusion restriction or alternative identification strategy. The command implements the estimator proposed in Millimet and Tchernis (2013).

  • tebounds. This command implements several versions of Manski-type bounds for the average treatment effect in the case of a binary outcome and binary treatment. A particular advantage of these estimators is that they allow for potentially misreported and endogenous treatment status.

External data sources

Here I’ve collected some handy data resources that I’ve come across over time. Most of these are data search engines or specific websites that I find valuable and that I reference semi-regularly. Hat tip to D. Sebastian Tello-Trillo for organizing many of these data sources on his own resources page.

  • Try Google’s database search. It’s officially out of beta mode as of January 2020.
  • Consider joining Data is Plural. This is a newsletter that sends out a few lesser-known datasets every week. You can also peruse their archive.
  • ICPSR is a data search-engine provided by the Institute for Social Research at the University of Michigan. Lots of good info.
  • Dataverse is a repository for papers, often with links to specific data sources and code files where available.
  • FRED is a common data source for macroeconomic variables maintained by the Federal Reserve Bank of St. Louis.
  • Long list of valuable datasets from the NBER.
  • IPUMS provides a list of several datasets from several different underlying surveys.
  • List of datasets sponsored by the CDC at CDC Wonder.
  • Data from the Bureau of Economic Analysis (BEA).
  • Historic data can be hard to find. These lists from Historical Statistics can help.
  • A short list of datasets for economics compiled by the AEA.

Learning and methods

It’s always nice to learn new things, freshen-up on things we thought we learned before, or just hear things again from a different perspective. Below are a few resources that I’ve found valuable specifically for econometric methods.

Writing and presenting like an academic

Academics aren’t known for their patience, and economists are probably worse than most. Learning how to write concisely and present effectively for your audience is critical. Here are some helpful links (hat tip to Christoph Kronenberg and Amanda Agan for gathering many these on their websites first)

Discussing other people’s work

Academics are good at offering criticism, sometimes not so constructively, sometimes just wrong, sometimes mean, and sometimes wrong and mean at the same time. Follow some of these guides to make sure you don’t fall in the latter category.

Organizing code

Some fun packages

There are so many incredible things that people are doing in applied econometrics right now. This material seems to come out fast, and I often find myself with dozens of bookmarks that I forget or quickly lose track of. So, here’s where I’ve decided to list some of the new programs and packages that I think are particularly useful for applied researchers. This is not comprehensive by any means…just a way to keep track of new programs that I either actively use in my work or that I want to start using.

  • R package for the Goodman-Bacon decomposition, bacondecomp. It also exists for Stata.
  • Randomization Inference “Sandbox” from the World Bank.
  • panelView is a great way to visualize treatment timing in your data
  • Present your regression results and summary statistics neatly with modelsummary in R
  • Simple summary statistics in R with vtable::sumtable() from Nick Huntington-Klein, package code available here
  • An R package for specification curves, rdfanalysis
  • Really fast regression with fixed effects in R, fixest
  • R package for sensitivity analysis of OLS and IV (sensitivity meaning sensitivity to unobserved confounders), sensemakr

For graduate students

Getting started with research can be challenging. How do you develop an initial question? How do you convert this question into a paper? How do you know when to stop pursuing a question and move on to another idea? How do you take care of yourself given the stresses of the research process and the inherent rejection of it all? There are lots of resources that can help in these areas, and here’s a site that has many of these resources in one place: