Skip to content

Code to score locality (e.g. county, community-of-interest) splitting in political districting plans

Notifications You must be signed in to change notification settings

jacobwachspress/locality-splitting

Repository files navigation

Metrics of locality splitting/preservation in district maps

PyPI version

Description

This code accompanies the Center for Democracy & Technology report, Split Decisions: Guidance for Measuring Locality Preservation in District Maps, by Jacob Wachspress and William T. Adler.

This repository contains Python code that implements a number of metrics for quantifying locality (e.g. county, community of interest) splitting in districting plans. The metrics implemented are:

  • Geography-based
    • Number of localities split
    • Number of locality-district intersections
  • Population-based
    • Effective splits1
    • Conditional entropy2
    • Square root entropy3
    • Split pairs4

Options are provided to ignore zero-population regions and to calculate symmetric splitting scores.

A description of the metrics (with formulas) can be found in the report, linked above.

Installation

If using pip, do pip install locality-splitting

Example use

The required input is a pandas DataFrame with a row for each unit (usually census block or precinct) used to build the districts. The DataFrame must have a column denoting each unit's population, district, and locality. For U.S. Census provides a table with census blocks and their corresponding districts, called "block equivalency files." We have provided code to download block equivalency files from the U.S. Census website for the congressional and state legislative (upper and lower chamber) plans used in the 2012, 2014, 2016, and 2018 elections.

from locality_splitting import block_equivalency_file as bef
year = 2018
plan_type = 'cd'
df = bef.get_block_equivalency_file(year, plan_type)

df.head(10)
BLOCKID cd_2018
0 011290440001080 01
1 011290440001010 01
2 011290440001092 01
3 011290440001091 01
4 011290440001090 01
5 011290440001089 01
6 011290440001088 01
7 011290440001087 01
8 011290440001086 01
9 011290440001085 01

Next we have to pick a state and merge in populations from the census API. We will use Pennsylvania as an example, which has FIPS code 42. State FIPS codes can be looked up here.

fips_code = '42'
df_pop = bef.merge_state_census_block_pops(fips_code, df)
df_pop.head(10)
BLOCKID pop cd_2018
0 420010301011000 6 13
1 420010301011001 30 13
2 420010301011002 15 13
3 420010301011003 77 13
4 420010301011004 27 13
5 420010301011005 25 13
6 420010301011006 12 13
7 420010301011007 0 13
8 420010301011008 4 13
9 420010301011009 62 13

To calculate these metrics for county splitting, we need a column for the county. Conveniently, the first two digits of the census BLOCKID correspond to the state FIPS code, and the next three digits correspond to the county FIPS code.

df_pop['county'] = df_pop['BLOCKID'].str[2:5]
df_pop.head(10)
BLOCKID pop cd_2018 county
0 420010301011000 6 13 001
1 420010301011001 30 13 001
2 420010301011002 15 13 001
3 420010301011003 77 13 001
4 420010301011004 27 13 001
5 420010301011005 25 13 001
6 420010301011006 12 13 001
7 420010301011007 0 13 001
8 420010301011008 4 13 001
9 420010301011009 62 13 001

Then if you write the following python code:

from locality_splitting import metrics

metrics.calculate_all_metrics(df_pop, 'cd_2018', lclty_col='county')

you will get an output like this:

{'plan': 'cd_2018',
 'splits_all': 14.0,
 'splits_pop': 13.0,
 'intersections_all': 85.0,
 'intersections_pop': 84.0,
 'effective_splits': 10.160339912460943,
 'conditional_entropy': 0.47256386411416673,
 'sqrt_entropy': 1.22572584704072,
 'split_pairs': 0.21090396242846743,
 'splits_pop_sym': 14.0,
 'intersections_pop_sym': 84.0,
 'effective_splits_sym': 6.3402186767789255,
 'conditional_entropy_sym': 0.9622343161303942,
 'sqrt_entropy_sym': 1.5503698835379716,
 'split_pairs_sym': 0.34663230810650736}
and can choose which metric(s) to use. The suffix "_all" means that zero-population regions are included, whereas "_pop" means they are ignored. (This distinction is only relevant for the geography-based metrics.) The suffix "_sym" indicates a symmetric splitting score.4

References

  1. Samuel Wang, Sandra J. Chen, Richard Ober, Bernard Grofman, Kyle Barnes, and Jonathan Cervas. (2021). Turning Communities Of Interest Into A Rigorous Standard For Fair Districting. Stanford Journal of Civil Rights and Civil Liberties, Forthcoming.
  2. Larry Guth, Ari Nieh, and Thomas Weighill. (2020). Three Applications of Entropy to Gerrymandering. arXiv.
  3. Moon Duchin. (2018). Outlier analysis for Pennsylvania congressional redistricting.
  4. Jacob Wachspress and William T. Adler. (2021). Split Decisions: Guidance for Measuring Locality Preservation in District Maps. Center for Democracy and Technology.

About

Code to score locality (e.g. county, community-of-interest) splitting in political districting plans

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •