redist 4.0

A major release with big changes to constraints and diagnostics.

Christopher T. Kenny https://www.christophertkenny.com/ (Department of Government, Harvard University) , Cory McCartan https://www.corymccartan.com (Department of Statistics, Harvard University)
2022-06-20

We are excited to announce the arrival of redist 4.0.1 on CRAN. This update focuses on increasing constraint consistency and diagnostic usability. The new tools here have been thoroughly tested as part of the 50-State Redistricting Simulations project.

To install the new version, run install.packages('redist').

New Features

Updated Features: A Brief Demo

The first thing you’ll notice upon loading redist is that it also loads redistmetrics. redistmetrics used to live within redist but has been separated to keep the package size reasonable and to make the individual compile times shorter.

We can pull in some data from the ALARM Project, which combines 2020 Census data with VEST’s election data, retabulated to 2020 voting districts. For this example, we can use data from New Mexico.

nm <- geomander::get_alarm('NM')

We can then make a redist_map for New Mexico.1

map_nm <- redist_map(nm, ndists = 3, pop_tol = 0.005)

And we can begin with a basic run of redist_smc to sample 1000 plans using the sampler from Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans by Cory McCartan and Kosuke Imai. Most importantly, redist_smc now offers an argument for the number of independent sampling runs. For now, we can break that 1000 plans into 2 runs of 500.

set.seed(2022)
plans <- redist_smc(map = map_nm, nsims = 500, runs = 2, counties = county)

The new messages above are created with cli to more make message printing cleaner and more consistent.

To the output, we can add some basic summary information using available redistmetrics functions, automatically loaded by redist.

plans <- plans %>% 
    mutate(
        frac_kept = comp_frac_kept(plans = ., shp = map_nm),
        dvs_gov_18 = part_dvs(plans = ., shp = map_nm, dvote = gov_18_dem_luj, rvote = gov_18_rep_pea),
        county_spl = splits_admin(plans = ., shp = map_nm, admin = county)
    )

In order, this adds the Fraction Kept compactness score, the Democratic two-party vote share in the 2018 Governor’s race, and the number of counties split.

Now, the plans object has a few new columns:

head(plans)
With plans resampled from weights
Plans matrix: int [1:1977, 1:1000] 3 3 3 3 3 3 3 3 3 3 ...
# A tibble: 6 × 7
  draw  chain district total_pop frac_kept dvs_gov_18 county_spl
  <fct> <int>    <int>     <dbl>     <dbl>      <dbl>      <int>
1 1         1        1    706220     0.984      0.474          2
2 1         1        2    707900     0.984      0.602          2
3 1         1        3    703402     0.984      0.619          2
4 2         1        1    707157     0.990      0.508          2
5 2         1        2    706801     0.990      0.582          2
6 2         1        3    703564     0.990      0.616          2

Draw, chain, and district identify each plan, where chain is new to 4.0 for SMC. It signifies the SMC run, similar to how redist_mergesplit_parallel indicates the chain from merge-split. Despite this, we can use the normal plotting functions on the redist_plans object. If we load patchwork here to get a nice row of ggplots, we see the following:

library(patchwork)
hist(plans, frac_kept) + 
    plot(plans, dvs_gov_18) + 
    hist(plans, county_spl)

These plots are fairly standard. The exciting thing is that we can now call summary() to get diagnostic information about the runs of SMC. We can call this on any redist_plans object and it will adjust the output information depending on what algorithm generated the plans.

summary(plans)


R-hat values for summary statistics:
 frac_kept dvs_gov_18 county_spl 
   1.00259    1.00197    0.99955 

         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1       485 (97.0%)      9.9%        0.38   469 ( 94%)     10 
Split 2       474 (94.9%)      5.3%        0.48   397 ( 79%)      6 
Resample      411 (82.2%)       NA%        0.48   417 ( 83%)     NA 

         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1       482 (96.5%)     12.2%        0.41   471 ( 94%)      8 
Split 2       476 (95.3%)      6.7%        0.46   405 ( 81%)      5 
Resample      417 (83.4%)       NA%        0.45   419 ( 84%)     NA 

Each R-hat value is below 1.05, so we do not get any warnings. At a high level, this means that both runs of SMC are sampling from regions comparable by these three summary statistics. That isn’t always the case though. If you do get a warning, you should increase the number of simulations or decrease the constraint strengths.

We next introduce the new constraint interface. To initialize a constraint, we call redist_constr, which takes a redist_map input.

constr <- redist_constr(map = map_nm)

We can add any of the many constraints available with ?constraints. There are many new constraints to people who have only used redist_smc/redist_mergesplit or redist_flip before. Now all constraints are available to all algorithms. Additionally, we can write pretty much any constraint that we can map to the positive reals, using the new custom constraint.

For our custom constraint, we just care that the 100th row of map_nm won’t be assigned to district 3. We can do the following

constr <- constr %>% 
    add_constr_custom(
        strength = 10,
        fn = function(plan, distr) {
            as.numeric(plan[100] != 3)
        }
    )

This takes an R function fn and a strength value (how much to multiply the output of fn by). The fn input should always take the form function(plan, distr) { ... }, where plan will be an integer matrix of precinct-district assignments and distr will be the current district.

We can then pass constr to the constraints argument in redist_smc().

set.seed(2022)
plans <- redist_smc(map = map_nm, nsims = 500, runs = 2, counties = county, 
                    constraints = constr)

Again, we add some summary statistics.

plans <- plans %>% 
    mutate(
        frac_kept = comp_frac_kept(plans = ., shp = map_nm),
        dvs_gov_18 = part_dvs(plans = ., shp = map_nm, dvote = gov_18_dem_luj, rvote = gov_18_rep_pea),
        county_spl = splits_admin(plans = ., shp = map_nm, admin = county)
    )

Then run the diagnostics:

summary(plans)


R-hat values for summary statistics:
 frac_kept dvs_gov_18 county_spl 
   0.99916    1.00000    1.00845 

         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1       484 (96.8%)     10.9%        0.38   475 ( 95%)     10 
Split 2       476 (95.1%)      5.7%        0.46   399 ( 80%)      6 
Resample      413 (82.6%)       NA%        0.46   407 ( 81%)     NA 

         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1       483 (96.6%)     12.8%        0.40   478 ( 96%)      8 
Split 2       477 (95.4%)      7.3%        0.45   388 ( 78%)      5 
Resample      419 (83.8%)       NA%        0.45   420 ( 84%)     NA 

And everything looks good. Despite adding a constraint, the sample still looks fine under these summary statistics.

For more information on diagnostics, take a look at McCartan and Imai (2022).


  1. For a very brief intro to redist_maps, see the 3.0 release post at https://alarm-redist.github.io/posts/2021-04-02-redist-300/.↩︎

Citation

For attribution, please cite this work as

Kenny & McCartan (2022, June 20). ALARM Project: redist 4.0. Retrieved from https://alarm-redist.github.io/posts/2022-06-20-redist-40/

BibTeX citation

@misc{kenny2022redist,
  author = {Kenny, Christopher T. and McCartan, Cory},
  title = {ALARM Project: redist 4.0},
  url = {https://alarm-redist.github.io/posts/2022-06-20-redist-40/},
  year = {2022}
}