This code accompanies the Center for Democracy & Technology report, Split Decisions: Guidance for Measuring Locality Preservation in District Maps, by Jacob Wachspress and William T. Adler.
This repository contains Python code that implements a number of metrics for quantifying locality (e.g. county, community of interest) splitting in districting plans. The metrics implemented are:
- Geography-based
- Number of localities split
- Number of locality-district intersections
- Population-based
- Effective splits1
- Conditional entropy2
- Square root entropy3
- Split pairs4
Options are provided to ignore zero-population regions and to calculate symmetric splitting scores.
A description of the metrics (with formulas) can be found in the report, linked above.
If using pip, do pip install locality-splitting
The required input is a pandas DataFrame with a row for each unit (usually census block or precinct) used to build the districts. The DataFrame must have a column denoting each unit's population, district, and locality. For U.S. Census provides a table with census blocks and their corresponding districts, called "block equivalency files." We have provided code to download block equivalency files from the U.S. Census website for the congressional and state legislative (upper and lower chamber) plans used in the 2012, 2014, 2016, and 2018 elections.
from locality_splitting import block_equivalency_file as bef
year = 2018
plan_type = 'cd'
df = bef.get_block_equivalency_file(year, plan_type)
df.head(10)
BLOCKID | cd_2018 | |
---|---|---|
0 | 011290440001080 | 01 |
1 | 011290440001010 | 01 |
2 | 011290440001092 | 01 |
3 | 011290440001091 | 01 |
4 | 011290440001090 | 01 |
5 | 011290440001089 | 01 |
6 | 011290440001088 | 01 |
7 | 011290440001087 | 01 |
8 | 011290440001086 | 01 |
9 | 011290440001085 | 01 |
Next we have to pick a state and merge in populations from the census API. We will use Pennsylvania as an example, which has FIPS code 42. State FIPS codes can be looked up here.
fips_code = '42'
df_pop = bef.merge_state_census_block_pops(fips_code, df)
df_pop.head(10)
BLOCKID | pop | cd_2018 | |
---|---|---|---|
0 | 420010301011000 | 6 | 13 |
1 | 420010301011001 | 30 | 13 |
2 | 420010301011002 | 15 | 13 |
3 | 420010301011003 | 77 | 13 |
4 | 420010301011004 | 27 | 13 |
5 | 420010301011005 | 25 | 13 |
6 | 420010301011006 | 12 | 13 |
7 | 420010301011007 | 0 | 13 |
8 | 420010301011008 | 4 | 13 |
9 | 420010301011009 | 62 | 13 |
To calculate these metrics for county splitting, we need a column for the county. Conveniently, the first two digits of the census BLOCKID correspond to the state FIPS code, and the next three digits correspond to the county FIPS code.
df_pop['county'] = df_pop['BLOCKID'].str[2:5]
df_pop.head(10)
BLOCKID | pop | cd_2018 | county | |
---|---|---|---|---|
0 | 420010301011000 | 6 | 13 | 001 |
1 | 420010301011001 | 30 | 13 | 001 |
2 | 420010301011002 | 15 | 13 | 001 |
3 | 420010301011003 | 77 | 13 | 001 |
4 | 420010301011004 | 27 | 13 | 001 |
5 | 420010301011005 | 25 | 13 | 001 |
6 | 420010301011006 | 12 | 13 | 001 |
7 | 420010301011007 | 0 | 13 | 001 |
8 | 420010301011008 | 4 | 13 | 001 |
9 | 420010301011009 | 62 | 13 | 001 |
Then if you write the following python code:
from locality_splitting import metrics
metrics.calculate_all_metrics(df_pop, 'cd_2018', lclty_col='county')
you will get an output like this:
{'plan': 'cd_2018',
'splits_all': 14.0,
'splits_pop': 13.0,
'intersections_all': 85.0,
'intersections_pop': 84.0,
'effective_splits': 10.160339912460943,
'conditional_entropy': 0.47256386411416673,
'sqrt_entropy': 1.22572584704072,
'split_pairs': 0.21090396242846743,
'splits_pop_sym': 14.0,
'intersections_pop_sym': 84.0,
'effective_splits_sym': 6.3402186767789255,
'conditional_entropy_sym': 0.9622343161303942,
'sqrt_entropy_sym': 1.5503698835379716,
'split_pairs_sym': 0.34663230810650736}
- Samuel Wang, Sandra J. Chen, Richard Ober, Bernard Grofman, Kyle Barnes, and Jonathan Cervas. (2021). Turning Communities Of Interest Into A Rigorous Standard For Fair Districting. Stanford Journal of Civil Rights and Civil Liberties, Forthcoming.
- Larry Guth, Ari Nieh, and Thomas Weighill. (2020). Three Applications of Entropy to Gerrymandering. arXiv.
- Moon Duchin. (2018). Outlier analysis for Pennsylvania congressional redistricting.
- Jacob Wachspress and William T. Adler. (2021). Split Decisions: Guidance for Measuring Locality Preservation in District Maps. Center for Democracy and Technology.