Redistricting in Pennsylvania

Introduction

Pennsylvania’s Congressional district map has been a source of contention between Republicans and Democrats for years. In 2004, members of the Democratic Party challenged the Republican-drawn district map on the grounds that it violated the principle of one-man, one-vote and thus denied Democratic voters representation in Congress. The case was brought to the Supreme Court as Vieth v. Jubelirer, but the issue was found nonjusticiable and the map was allowed to stand. In his opinion, Justice Kennedy noted that while no judicial standard for assessing the partisanship of a given map yet existed, the Court should remain open to the possibility that one might emerge in the coming years.

In 2011, again in control of state government, Pennsylvania Republicans implemented a new gerrymander which further solidified their advantage in Congress. The map helped protect Republican candidates in 2012, 2014, and 2016, but was struck down in early 2018 by the state Supreme Court on the grounds that it was “clearly, plainly, and palpably” in violation of the state constitution. A new non-partisan map, drawn by the court with the help of Stanford Law professor Nathaniel Persily, was implemented in time for the May 15th state primaries and will remain in effect until the decennial redistricting following the 2020 census.

Elkanah Tisdale (1771-1835) (often falsely attributed to Gilbert Stuart)[1] / Public domain

Elkanah Tisdale (1771-1835) (often falsely attributed to Gilbert Stuart)[1] / Public domain

Step one: Labelling precincts with the correct Congressional districts

I downloaded election data from Nathaniel Kelso and Michal Migurski’s GitHub repo. We’re going to focus on the 2016 election results from Pennsylvania.

Here are the maps involved in the latest suit:

To start things off, we’ll just focus on the gerrymandered map. Let’s begin by cleaning the data.

# Precinct-level election data from 2016
penn <- 
  st_read(file_penn) %>% 
  select(
    precinct = OBJECTID,
    pres_d = T16PRESD, # Clinton vote total
    pres_r = T16PRESR, # Trump vote total
    cong_d = T16CONGD, # Dem HOR vote total
    cong_r = T16CONGR # GOP HOR vote total
  )
## Reading layer `VTDs_Oct17' from data source `/Users/Benjamin/Downloads/PA Redistricting/VTDs_Oct17/VTDs_Oct17.shp' using driver `ESRI Shapefile'
## Simple feature collection with 9152 features and 46 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -80.5195 ymin: 39.7198 xmax: -74.6895 ymax: 42.26933
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
# 115th Congress district shapefiles
dist_115 <- 
  st_read(file_dist) %>% 
  filter(STATEFP == 42) %>% 
  st_transform(crs = 4326) %>% 
  select(cd = CD115FP)
## Reading layer `cb_2017_us_cd115_500k' from data source `/Users/Benjamin/Downloads/PA Redistricting/cb_2017_us_cd115_500k/cb_2017_us_cd115_500k.shp' using driver `ESRI Shapefile'
## Simple feature collection with 441 features and 8 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs

The penn object contains precinct-level election and census data which is organized by county, but not district. The dist_115 file contains geographic information about the district. Both contain geometry objects which allow them to be mapped with the sf package.

Using just the geometry data for the precincts and the districts, we can use the sf package to generate a tibble that contains a unique guess for each precinct’s district. We do this by using st_intersection() to create new geometries that represent each precinct’s intersection with one or more of the congressional district shapes, and then selecting the district that produces the greatest overlap, measured by st_area().

penn_115 <-
  st_intersection(dist_115, penn) %>%
  as_tibble() %>%
  mutate(
    area =  st_area(geometry)
  ) %>%
  group_by(precinct) %>% 
  filter(area == max(area)) %>% 
  ungroup() %>% 
  select(precinct, cd, everything(), -area)

glimpse(penn_115)
## Observations: 9,130
## Variables: 7
## $ precinct <dbl> 0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ cd       <fct> 09, 05, 09, 07, 07, 07, 16, 16, 16, 07, 07, 16, 16, 07,…
## $ pres_d   <dbl> 0, 0, 0, 334, 241, 760, 587, 1727, 580, 318, 1155, 452,…
## $ pres_r   <dbl> 0, 0, 0, 562, 293, 1186, 56, 996, 66, 306, 952, 120, 68…
## $ cong_d   <dbl> 0, 0, 0, 289, 206, 677, 579, 1657, 569, 248, 901, 426, …
## $ cong_r   <dbl> 0, 0, 0, 622, 348, 1312, 44, 1042, 68, 398, 1282, 168, …
## $ geometry <GEOMETRY [°]> POLYGON ((-79.29632 40.0365..., MULTIPOLYGON (…

Here’s what the map looks like – each Congressional District now consists of several constituent precinct shapes.

# Results
penn_115 %>% 
  ggplot() + 
  geom_sf(aes(fill = cd), size = .05, alpha = .5, show.legend = FALSE) +
  geom_sf(data = dist_115, color = "black", size = .15, fill = NA) +
  theme_void() +
  coord_sf(datum = NA) 

Step two: Comparing election results under different district maps

Now that we’ve labelled each precinct with its correct district, we can use the voting data from 2016 to compile election results by district.

penn_115 %>% 
  group_by(cd) %>% 
  summarise(
    rep_margin = (sum(cong_r) - sum(cong_d)) /
      (sum(cong_r) + sum(cong_d))
  ) %>% 
  ungroup() %>% 
  left_join(dist_115, by = "cd") %>% 
  ggplot() +
  geom_sf(aes(fill = rep_margin), size = 0) +
  geom_sf(data = dist_115, color = "black", size = .1, fill = NA) +
  scale_fill_gradient2(
    low = "#1A80C4",
    high = "#CC3D41",
    labels = scales::percent,
    breaks = c(-1, -.5, 0, .5, 1),
    limits = c(-1, 1)
  ) +
  guides(
    fill = guide_colorbar(
      nbin = 10, 
      barheight = .25,
      barwidth = 9,
      raster = FALSE,
      ticks = FALSE,
      title.position = "top"
    )
  ) + 
  theme_void() +
  theme(legend.position = "bottom") + 
  labs(fill = "GOP Margin", title = "2016 PA Congressional Election Results") +
  coord_sf(datum = NA)

penn_115 %>% 
  group_by(cd) %>% 
  summarise(
    rep_margin = (sum(cong_r) - sum(cong_d)) /
      (sum(cong_r) + sum(cong_d))
  ) %>% 
  ungroup() %>% 
  transmute(
    `District` = paste0("PA-", cd),
    `GOP Margin` = rep_margin %>% scales::percent(accuracy = .1),
    `Winner` = if_else(rep_margin > 0, "GOP", "Dems")
  ) %>% 
  knitr::kable()
District GOP Margin Winner
PA-01 -75.0% Dems
PA-02 -77.9% Dems
PA-03 100.0% GOP
PA-04 32.1% GOP
PA-05 34.4% GOP
PA-06 14.4% GOP
PA-07 18.8% GOP
PA-08 8.9% GOP
PA-09 26.7% GOP
PA-10 40.3% GOP
PA-11 27.4% GOP
PA-12 23.5% GOP
PA-13 -89.2% Dems
PA-14 -48.7% Dems
PA-15 21.2% GOP
PA-16 11.2% GOP
PA-17 -7.8% Dems
PA-18 99.8% GOP

Now let’s do the same for the remedial map. We’ll use the presidential vote totals because otherwise the voting dynamics imposed by the old map will skew the projected results. In PA-18, for instance, former GOP Rep. Tim Murphy ran unopposed in 2016 (before resigning in disgrace in 2017), so we’ll substitute Clinton’s votes to stand in for a Democratic opponent in those precincts.

# Read in remedial shapefile
dist_remedial <- 
  st_read(file_remedial) %>% 
  st_transform(crs = 4326) %>% 
  select(cd = DISTRICT)
## Reading layer `Remedial Plan Shapefile' from data source `/Users/Benjamin/Downloads/PA Redistricting/Remedial Plan Shape Files - 006845/Remedial Plan Shapefile.shp' using driver `ESRI Shapefile'
## Simple feature collection with 18 features and 15 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -80.51985 ymin: 39.7198 xmax: -74.6895 ymax: 42.51607
## epsg (SRID):    NA
## proj4string:    +proj=longlat +ellps=GRS80 +no_defs
# Create tibble of guesses
penn_remedial <- 
  st_intersection(dist_remedial, penn) %>%
  as_tibble() %>% 
  mutate(
    area = st_area(geometry)
  ) %>% 
  group_by(precinct) %>% 
  filter(area == max(area)) %>% 
  ungroup() %>% 
  select(precinct, cd, everything(), -area)
# Visualize the election results by district under the remedial plan
penn_remedial %>% 
  group_by(cd) %>% 
  summarise(
    trump_margin = (sum(pres_r) - sum(pres_d)) /
      (sum(pres_r) + sum(pres_d))
  ) %>% 
  ungroup() %>% 
  left_join(dist_remedial, by = "cd") %>% 
  ggplot() +
  geom_sf(aes(fill = trump_margin), size = 0) +
  geom_sf(data = dist_remedial, color = "black", size = .1, fill = NA) +
  scale_fill_gradient2(
    low = "#1A80C4",
    high = "#CC3D41",
    labels = scales::percent,
    breaks = c(-1, -.5, 0, .5, 1),
    limits = c(-1, 1)
  ) +
  guides(
    fill = guide_colorbar(
      nbin = 10, 
      barheight = .25,
      barwidth = 9,
      raster = FALSE,
      ticks = FALSE,
      title.position = "top"
    )
  ) + 
  theme_void() +
  theme(legend.position = "bottom") +
  coord_sf(datum = NA) + 
  labs(
    fill = "Trump Margin"
  )

penn_remedial %>% 
  group_by(cd) %>% 
  summarise(
    trump_margin = (sum(pres_r) - sum(pres_d)) /
      (sum(pres_r) + sum(pres_d))
  ) %>% 
  ungroup() %>% 
  transmute(
    `District` = paste0("PA-", cd),
    `GOP Margin` = trump_margin %>% scales::percent(accuracy = .1),
    `Winner` = if_else(trump_margin > 0, "GOP", "Dems")
  ) %>% 
  knitr::kable()
District GOP Margin Winner
PA-01 -2.0% Dems
PA-02 -49.1% Dems
PA-03 -85.7% Dems
PA-04 -20.4% Dems
PA-05 -29.1% Dems
PA-06 -9.8% Dems
PA-07 -1.2% Dems
PA-08 9.8% GOP
PA-09 35.5% GOP
PA-10 9.4% GOP
PA-11 27.2% GOP
PA-12 37.9% GOP
PA-13 47.2% GOP
PA-14 29.9% GOP
PA-15 44.9% GOP
PA-16 20.9% GOP
PA-17 2.5% GOP
PA-18 -27.9% Dems

Using the presidential election data and the remedial plan, the Democrats would have carried eight seats, two more than they did under the old plan. However we should note that the gain we’ve modeled assumes that the races are being decided by the 2016 presidential electorate along party lines, when in reality… it’s more complicated than that. We could probably pick a more representative set of results to model, but still, the effect is clear. And the districts look much better, too!

Step three: Measuring a map’s partisanship

Let’s see if we can answer Justice Kennedy’s question: How can a court objectively determine whether a map is the result of a partisan gerrymander?

In the years since Vieth v. Jubelirer, political scientists have come up with various statistical approaches to assess the partisanship of maps. With our tools, we can replicate some of their findings. We’ll borrow from two of the most promising approaches developed by political scientists in the past few years.

  • The efficiency gap, devised by Nicholas Stephanopoulos and Eric McGhee, is a simple measure of how many votes each party “wasted” in a given election. A competitive, non-partisan map should theoretically have an efficiency gap close to zero. To calculate the estimated efficiency gap, I’m referncing this primer published by The Brennan Center.

  • Simulated district maps, pioneered by Jowei Chen and David Cottrell, are used as non-partisan counterfactuals by which we can objectively measure partisan bias in existing maps. The simulations are constructed in order to be geographically compact, contiguous, and equally apportioned according to population and election data – factors which should inform the creation of non-partisan maps in real life. You can find hundreds of simulated maps on Chen’s personal website, however I’ll just select one for illustrative purposes.

### Gerrymandered efficiency gap
efficiency_gap <- function(d) {
  d %>% 
  group_by(cd) %>% 
  summarise(
    d_votes = sum(cong_d),
    r_votes = sum(cong_r),
    d_wasted = if_else(
      d_votes >= r_votes, 
      d_votes - (d_votes + r_votes) / 2, 
      d_votes
    ),
    r_wasted = if_else(
      d_votes >= r_votes, 
      r_votes,
      r_votes - (r_votes + d_votes) / 2
    )
  ) %>% 
  ungroup() %>% 
  summarise_if(is.double, sum) %>% 
  transmute(
    `Total Votes` = (d_votes + r_votes) %>% scales::comma(),
    `Net Wasted Votes` = (d_wasted - r_wasted) %>% scales::comma(),
    `Efficiency Gap` = ((d_wasted - r_wasted) / (d_votes + r_votes)) %>% scales::percent(accuracy = .1)
  ) %>% 
  knitr::kable()
}

efficiency_gap(penn_115)
Total Votes Net Wasted Votes Efficiency Gap
5,698,604 856,446 15.0%

The current map yields a 15% efficiency advantage for Republicans, meaning that compared to Democrats, they were able to convert their votes into 15% more seats, i.e. ~2.7 seats out of the 18 total in the Pennsylvania Congressional delegation. Now recall that just a few months into the first session of the 115th Congress, ACA repeal was passed in the House by just four votes.

### Remedial efficiency gap
efficiency_gap(penn_remedial)
Total Votes Net Wasted Votes Efficiency Gap
5,698,604 464,204 8.1%

The new map still favors the Republicans, but only by 8%. This translates to roughly a 1.5 seat advantage, and thus provides a small boost to the Democrats. It also falls within Stephanopoulos and McGhee’s suggested threshold of 2 seats for a non-partisan map.

Now let’s compare these results to one of Chen and Cottrell’s simulated maps.

dist_sim <- 
  st_read(file_simulated) %>% 
  st_set_crs(4326) %>% 
  select(cd = dist)
## Reading layer `plan_1' from data source `/Users/Benjamin/Downloads/PA Redistricting/plan_1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 18 features and 17 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -80.51985 ymin: 39.7198 xmax: -74.6895 ymax: 42.51607
## epsg (SRID):    NA
## proj4string:    NA
penn_sim <- 
  st_intersection(dist_sim, penn) %>%
  as_tibble() %>% 
  mutate(
    area = st_area(geometry)
  ) %>% 
  group_by(precinct) %>% 
  filter(area == max(area)) %>% 
  ungroup() %>% 
  select(precinct, cd, everything(), -area)
efficiency_gap(penn_sim)
Total Votes Net Wasted Votes Efficiency Gap
5,698,604 373,971 6.6%

The simulated map results in an efficiency gap of 6.6%, or a ~1.2 seat advantage, also in favor of the Republicans. As it turns out, surprisingly few of the simulated maps from Chen and Cottrell’s study result in a Democratic advantage – it’s difficult to draw maps that favor Democrats due to the tendency of Democratic voters to cluster in densely populated and geographically confined urban areas. Along with Jonathan Rodden (from Stanford’s political science department), Chen has used these simulations to argue that the Democrats’ biggest problem is not partisan gerrymandering by Republicans, but the natural human geography of their voting base.

This does nothing to excuse the unnatural advantages of partisan gerrymandering, of course, and as we’ve seen (and has been proven in court), the Pennsylvania Republican party’s map represented a blatant – and effective – power grab. In Congress, every representative’s vote counts equally, no matter where they live. In other words, every vote counts. The same should be true for Pennsylvania’s voters.