Data from my projects
I’m working to organize some of the publicly available datasets that I use more frequently (mainly Medicare Advantage and HCRIS) and associated code files. These are available as part of separate GitHub repositories. For code that is more specific to a given project, please see the repositories under those individual projects/papers on my research page.
Through some work with excellent mentors and co-authors Daniel Millimet and Rusty Tchernis, I’ve had an opportunity to help write a couple of Stata commands. Links and descriptions to these commands are below.
bmte. This stands for bias-minimizing treatment effects. The goal of this estimator is to provide an estimate of the average treatment effect in cross-sectional settings where we know selection on observables is violated and where we do not have some other exclusion restriction or alternative identification strategy. The command implements the estimator proposed in Millimet and Tchernis (2013).
tebounds. This command implements several versions of Manski-type bounds for the average treatment effect in the case of a binary outcome and binary treatment. A particular advantage of these estimators is that they allow for potentially misreported and endogenous treatment status.
Applied Empirical Micro Project
I’m actively working on an “interactive” econometrics project, the goal of which is to organize a handful of common econometric issues in applied empirical micro. The working title of this project is Navigating Empirical Methods. This resource is far from a comprehensive econometrics book, and the focus really isn’t on covering the details of any specific estimator or methods. Rather, this will hopefully serve as a reference for the key things to keep in mind when implementing a given research design or estimation procedure, standard tests to consider, and alternative estimators (when relevant). I’m updating these things constantly as I find new information and correct my own misunderstanding. If you see something awry, please let me know!
External data sources
Here I’ve collected some handy data resources that I’ve come across over time. Most of these are data search engines or specific websites that I find valuable and that I reference semi-regularly. Hat tip to D. Sebastian Tello-Trillo for organizing many of these data sources on his own resources page.
- Try Google’s database search. It’s officially out of beta mode as of January 2020.
- Consider joining Data is Plural. This is a newsletter that sends out a few lesser-known datasets every week. You can also peruse their archive.
- ICPSR is a data search-engine provided by the Institute for Social Research at the University of Michigan. Lots of good info.
- Dataverse is a repository for papers, often with links to specific data sources and code files where available.
- FRED is a common data source for macroeconomic variables maintained by the Federal Reserve Bank of St. Louis.
- Long list of valuable datasets from the NBER.
- IPUMS provides a list of several datasets from several different underlying surveys.
- List of datasets sponsored by the CDC at CDC Wonder.
- Data from the Bureau of Economic Analysis (BEA).
- Historic data can be hard to find. These lists from Historical Statistics can help.
- A short list of datasets for economics compiled by the AEA.
Learning and methods
It’s always nice to learn new things, freshen-up on things we thought we learned before, or just hear things again from a different perspective. Below are a few resources that I’ve found valuable specifically for econometric methods.
- NBER Lectures are very well done and presented by the best in their respective fields.
- Like the NBER Lectures, the AEA Continiung Education webcasts are great for learning new methods, even if you couldn’t attend the pre-conference workshops.
- Want to learn Python like a secret government spy? Take a look at the NSA’s Python Course, made available by a freedom of information act request from Chris Swenson.
- Some free econometrics textbooks!
- For a great resource on
Rand lots of important data science topics, see Grant McDermott’s Data science for economists GitHub repository
- Ever wish you had a central resource for lots of little commands and how to do things in different stats packages? Nick Huntington-Klein thought the same thing in his Library of Statistical Techniques (LOST).
Writing and presenting like an academic
Academics aren’t known for their patience, and economists are probably worse than most. Learning how to write concisely and present effectively for your audience is critical. Here are some helpful links (hat tip to Christoph Kronenberg and Amanda Agan for gathering many these on their websites first)
- Formulae for an Introduction by Keith Head, Body by Marc Bellemare, and Conclusion by Marc Bellemare
- Ten Most Important Rules of Writing Your Job Market Paper by Claudia Goldin and Lawrence Katz
- Paper writing gone Hollywood
- Four steps to an applied micro paper
- How to give an applied micro talk
- Public speaking for academic economists by Rachel Meager
- Beamer tips for presentations by Paul Goldsmith-Pinkham
Discussing other people’s work
Academics are good at offering criticism, sometimes not so constructively, sometimes just wrong, sometimes mean, and sometimes wrong and mean at the same time. Follow some of these guides to make sure you don’t fall in the latter category.
- The discussant’s art by Chris Blattman
- Writing referee reports, one by Marc Bellemare and another by Tatyana Deryugina and another by Elisabeth Sadoulet and another by Berk, Harvey, and Hirshleifer in the JEP
- How to be a great conference participant by Art Carden
- Stata coding guide by Julian Reif
- Code and data for the social sciences by Matthew Getzkow and Jesse Shapiro
Some fun packages
There are so many incredible things that people are doing in applied econometrics right now. This material seems to come out fast, and I often find myself with dozens of bookmarks that I forget or quickly lose track of. So, here’s where I’ve decided to list some of the new programs and packages that I think are particularly useful for applied researchers. This is not comprehensive by any means…just a way to keep track of new programs that I either actively use in my work or that I want to start using.
Rpackage for the Goodman-Bacon decomposition, bacondecomp. It also exists for Stata.
- Randomization Inference “Sandbox” from the World Bank.
- panelView is a great way to visualize treatment timing in your data
- Present your regression results and summary statistics neatly with modelsummary in
- Simple summary statistics in
vtable::sumtable()from Nick Huntington-Klein, package code available here
Rpackage for specification curves, rdfanalysis
- Really fast regression with fixed effects in
Rpackage for sensitivity analysis of OLS and IV (sensitivity meaning sensitivity to unobserved confounders), sensemakr