A major release with big changes to constraints and diagnostics.
We are excited to announce the arrival of redist
4.0.1
on CRAN. This
update focuses on increasing constraint consistency and diagnostic
usability. The new tools here have been thoroughly tested as part of the
50-State
Redistricting Simulations project.
To install the new version, run
install.packages('redist')
.
redist_constr()
and
?constraints
). For the first time, user-defined custom
constraints are supported and integrated within all three
algorithms.summary.redist_plans()
redistmetrics
package This will speed up compilation time
and also provides a cleaner, more extensible interface for the
implementation of additional metrics.doRNG
match_numbers()
using the Hungarian
methodmin_move_parity()
calculates how much population needs
to be moved between districts in order to completely balance a
redistricting plan.cli
errors and warnings throughout the packageThe first thing you’ll notice upon loading redist
is
that it also loads redistmetrics
.
redistmetrics
used to live within redist
but
has been separated to keep the package size reasonable and to make the
individual compile times shorter.
We can pull in some data from the ALARM Project, which combines 2020 Census data with VEST’s election data, retabulated to 2020 voting districts. For this example, we can use data from New Mexico.
nm <- geomander::get_alarm('NM')
We can then make a redist_map
for New Mexico.1
map_nm <- redist_map(nm, ndists = 3, pop_tol = 0.005)
And we can begin with a basic run of redist_smc
to
sample 1000 plans using the sampler from Sequential Monte Carlo for
Sampling Balanced and Compact Redistricting Plans by Cory McCartan and Kosuke Imai. Most importantly,
redist_smc
now offers an argument for the number of
independent sampling runs. For now, we can break that 1000 plans into 2
runs of 500.
set.seed(2022)
plans <- redist_smc(map = map_nm, nsims = 500, runs = 2, counties = county)
The new messages above are created with cli
to more make
message printing cleaner and more consistent.
To the output, we can add some basic summary information using
available redistmetrics
functions, automatically loaded by
redist
.
plans <- plans %>%
mutate(
frac_kept = comp_frac_kept(plans = ., shp = map_nm),
dvs_gov_18 = part_dvs(plans = ., shp = map_nm, dvote = gov_18_dem_luj, rvote = gov_18_rep_pea),
county_spl = splits_admin(plans = ., shp = map_nm, admin = county)
)
In order, this adds the Fraction Kept compactness score, the Democratic two-party vote share in the 2018 Governor’s race, and the number of counties split.
Now, the plans
object has a few new columns:
head(plans)
With plans resampled from weights
Plans matrix: int [1:1977, 1:1000] 3 3 3 3 3 3 3 3 3 3 ...
# A tibble: 6 × 7
draw chain district total_pop frac_kept dvs_gov_18 county_spl
<fct> <int> <int> <dbl> <dbl> <dbl> <int>
1 1 1 1 706220 0.984 0.474 2
2 1 1 2 707900 0.984 0.602 2
3 1 1 3 703402 0.984 0.619 2
4 2 1 1 707157 0.990 0.508 2
5 2 1 2 706801 0.990 0.582 2
6 2 1 3 703564 0.990 0.616 2
Draw, chain, and district identify each plan, where
chain
is new to 4.0 for SMC. It signifies the SMC run,
similar to how redist_mergesplit_parallel
indicates the
chain from merge-split. Despite this, we can use the normal plotting
functions on the redist_plans
object. If we load
patchwork
here to get a nice row of ggplot
s,
we see the following:
These plots are fairly standard. The exciting thing is that we can
now call summary()
to get diagnostic information about the
runs of SMC. We can call this on any redist_plans
object
and it will adjust the output information depending on what algorithm
generated the plans.
summary(plans)
R-hat values for summary statistics:
frac_kept dvs_gov_18 county_spl
1.00259 1.00197 0.99955
Eff. samples (%) Acc. rate Log wgt. sd Max. unique Est. k
Split 1 485 (97.0%) 9.9% 0.38 469 ( 94%) 10
Split 2 474 (94.9%) 5.3% 0.48 397 ( 79%) 6
Resample 411 (82.2%) NA% 0.48 417 ( 83%) NA
Eff. samples (%) Acc. rate Log wgt. sd Max. unique Est. k
Split 1 482 (96.5%) 12.2% 0.41 471 ( 94%) 8
Split 2 476 (95.3%) 6.7% 0.46 405 ( 81%) 5
Resample 417 (83.4%) NA% 0.45 419 ( 84%) NA
Each R-hat value is below 1.05, so we do not get any warnings. At a high level, this means that both runs of SMC are sampling from regions comparable by these three summary statistics. That isn’t always the case though. If you do get a warning, you should increase the number of simulations or decrease the constraint strengths.
We next introduce the new constraint interface. To initialize a
constraint, we call redist_constr
, which takes a
redist_map
input.
constr <- redist_constr(map = map_nm)
We can add any of the many constraints available with
?constraints
. There are many new constraints to people who
have only used redist_smc/redist_mergesplit
or
redist_flip
before. Now all constraints are available to
all algorithms. Additionally, we can write pretty much any constraint
that we can map to the positive reals, using the new custom
constraint.
For our custom constraint, we just care that the 100th row of
map_nm
won’t be assigned to district 3. We can do the
following
constr <- constr %>%
add_constr_custom(
strength = 10,
fn = function(plan, distr) {
as.numeric(plan[100] != 3)
}
)
This takes an R function fn
and a strength value (how
much to multiply the output of fn
by). The fn
input should always take the form
function(plan, distr) { ... }
, where plan
will
be an integer matrix of precinct-district assignments and
distr
will be the current district.
We can then pass constr
to the constraints
argument in redist_smc()
.
set.seed(2022)
plans <- redist_smc(map = map_nm, nsims = 500, runs = 2, counties = county,
constraints = constr)
Again, we add some summary statistics.
plans <- plans %>%
mutate(
frac_kept = comp_frac_kept(plans = ., shp = map_nm),
dvs_gov_18 = part_dvs(plans = ., shp = map_nm, dvote = gov_18_dem_luj, rvote = gov_18_rep_pea),
county_spl = splits_admin(plans = ., shp = map_nm, admin = county)
)
Then run the diagnostics:
summary(plans)
R-hat values for summary statistics:
frac_kept dvs_gov_18 county_spl
0.99916 1.00000 1.00845
Eff. samples (%) Acc. rate Log wgt. sd Max. unique Est. k
Split 1 484 (96.8%) 10.9% 0.38 475 ( 95%) 10
Split 2 476 (95.1%) 5.7% 0.46 399 ( 80%) 6
Resample 413 (82.6%) NA% 0.46 407 ( 81%) NA
Eff. samples (%) Acc. rate Log wgt. sd Max. unique Est. k
Split 1 483 (96.6%) 12.8% 0.40 478 ( 96%) 8
Split 2 477 (95.4%) 7.3% 0.45 388 ( 78%) 5
Resample 419 (83.8%) NA% 0.45 420 ( 84%) NA
And everything looks good. Despite adding a constraint, the sample still looks fine under these summary statistics.
For more information on diagnostics, take a look at McCartan and Imai (2022).
For a very brief intro to
redist_map
s, see the 3.0 release post at https://alarm-redist.github.io/posts/2021-04-02-redist-300/.↩︎
For attribution, please cite this work as
Kenny & McCartan (2022, June 20). ALARM Project: redist 4.0. Retrieved from https://alarm-redist.github.io/posts/2022-06-20-redist-40/
BibTeX citation
@misc{kenny2022redist, author = {Kenny, Christopher T. and McCartan, Cory}, title = {ALARM Project: redist 4.0}, url = {https://alarm-redist.github.io/posts/2022-06-20-redist-40/}, year = {2022} }