Fifty States Data Descriptor

A detailed description of the 50-State Redistricting Simulations and new software to help you use them.

Cory McCartan (Department of Statistics, Harvard University) , Christopher T. Kenny (Department of Government, Harvard University) , Tyler Simko (Department of Government, Harvard University) , George Garcia III (Department of Economics, Massachusetts Institute of Technology) , Kevin Wang (Harvard College) , Melissa Wu (Harvard College) , Shiro Kuriwaki (Department of Political Science, Yale University) , Kosuke Imai (Departments of Government and Statistics, Harvard University)

It’s been a long redistricting year. We’ve been tracking passed maps while conducting simulations in the 44 states with congressional districts. We are now finalizing some re-runs of states with new validation steps based on additional diagnostics, to ensure a high quality and accurate data product. So, we’ve written up a more detailed draft of what we did, how we did it, and how we checked our work. Most importantly, it introduces some tools so that you can use the data we’ve generated. It’s all open source and the redistricting plans generated are in the public domain.

Read the detailed description of the our process and the data: Simulated redistricting plans for the analysis and evaluation of redistricting in the United States: 50stateSimulations. The abstract is listed below.

This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.

To help make things more usable for those who don’t simulate redistricting plans in their free time, we are also excited to (soft) launch a new R package, alarmdata. This package provides a simplified interface to download the underlying geographic data, generated plans, all sorts of summary statistics, and state-by-state documentation.

The package can be installed with:


Thank you to the Harvard Data Science Initiative and Microsoft for computational support.


For attribution, please cite this work as

McCartan, et al. (2022, July 28). ALARM Project: Fifty States Data Descriptor. Retrieved from

BibTeX citation

  author = {McCartan, Cory and Kenny, Christopher T. and Simko, Tyler and III, George Garcia and Wang, Kevin and Wu, Melissa and Kuriwaki, Shiro and Imai, Kosuke},
  title = {ALARM Project: Fifty States Data Descriptor},
  url = {},
  year = {2022}