-
Notifications
You must be signed in to change notification settings - Fork 0
/
23_model-posteriors.Rmd
825 lines (714 loc) · 33.2 KB
/
23_model-posteriors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
## Model Results {#sec:model-results}
<!-- in chapter: model -->
```{r stopper, eval = FALSE, cache = FALSE, include = FALSE}
knitr::knit_exit()
```
```{r knitr-02-3_model-results, include = FALSE, cache = FALSE}
source(here::here("assets-bookdown", "knitr-helpers.R"))
```
```{r r-02-3_model-results, cache = FALSE}
library("here")
library("magrittr")
library("tidyverse")
library("boxr"); box_auth()
library("scales")
# library("english")
library("latex2exp")
library("patchwork")
library("ggforce")
# library("rstan")
library("tidybayes")
library("broom")
```
```{r mcmc-data}
# grab most recent filename
most_recent_date <-
list.files(here("data", "mcmc", "2-dgirt", "run", "samples")) %>%
str_split(pattern = "-mcmc", simplify = TRUE) %>%
as_tibble(.name_repair = "universal") %>%
transmute(date = lubridate::ymd(`...1`)) %>%
filter(date == max(date)) %>%
pull(date) %>%
print()
# uniquely identify file (length should be 1)
stopifnot(length(most_recent_date) == 1)
# make filename
mcmc_path <- file.path("data", "mcmc", "2-dgirt", "run", "samples")
clean_path <- file.path("data", "mcmc", "2-dgirt", "run", "clean")
input_path <- file.path("data", "mcmc", "2-dgirt", "run", "input")
most_recent_static <- as.character(most_recent_date) %>%
str_glue("-mcmc-homsk-2010s.RDS") %>%
as.character() %>%
print()
# import MCMC
mcmc <-
here(mcmc_path, most_recent_static) %>%
# here(mcmc_path, "2020-01-10-mcmc-homsk-2010s.RDS") %>% # small?
readRDS()
```
```{r contextual-data}
# tidy pre-stan data
master_data <-
readRDS(here(input_path, "master-model-data.RDS")) %>%
print()
# stan data
stan_data <- readRDS(here(input_path, "stan-data-list.RDS"))
lapply(stan_data, head)
index_crosswalk <-
tibble(
# factors come from master-data
group_f = master_data$group,
item_f = master_data$item,
state_f = master_data$state,
region_f = master_data$region,
district_f = master_data$district,
party_f = master_data$party,
# integers from stan (but could be coerced from master?)
group = stan_data$group,
item = stan_data$item,
state = stan_data$state,
region = stan_data$region,
district = stan_data$district,
party = stan_data$party
) %>%
# the "ordering" of groups as per my mistake in stan prep
mutate(stan_group_item = row_number()) %>%
print()
```
```{r mcmc-params}
# model params
load(here(input_path, "mcmc-params.Rdata"))
ls()
```
```{r}
total_samples <- (n_iterations - n_warmup)*(1 / n_thin)*n_chains
```
I estimate the ideal point model using Stan's "No-U-Turn" sampling algorithm, an adaptive variant of Hamiltonian Monte Carlo.
I draw posterior samples using `r smart_number(n_chains)` Markov chains that were each run for `r comma(n_iterations)` iterations, discarding the first `r comma(n_warmup)` iterations that are used for an adaptive warmup period.^[
Given the size of the data, a chain of this length can run on a 2014 Macbook Pro in approximately three hours.
To generate more chains, however, I use an external computing cluster affiliated with the Social Science Computing Cooperative at the University of Wisconsin–Madison.
]
I initialize the algorithm with an `adapt_delta` parameter of `r adapt_delta` and a maximum proposal tree-depth of `r max_treedepth`.
Following the advice of @link-eaton:2011:thinning-mcmc, I store every post-warmup sample with no thinning of chains, resulting in a total of `r comma(total_samples)` samples per parameter across all chains.^[
The chains mix well and exhibit little autocorrelation, which is a credit to the model parameterization and the fact that the No-U-Turn algorithm is efficient at proposing transitions that explore the parameter space.
]
Just `r sum(rstan::get_divergent_iterations(mcmc))` out of `r comma(total_samples)` iterations (`r mean(rstan::get_divergent_iterations(mcmc)) %>% percent(accuracy = .01)`) encountered a divergent transition, which indicates no systematic issues with model parameterization,
and `r rstan::get_num_max_treedepth(mcmc)` iterations exceeded the maximum proposal tree-depth.
The energy metrics that monitor the model's Hamiltonian mechanics also detect no problematic model behavior.
```{r}
stopifnot(rstan::get_low_bfmi_chains(mcmc) %>% length() == 0)
```
```{r neat-thetas}
theta_tidy <- readRDS(here(clean_path, "theta-tidy.RDS")) %>% ungroup()
theta_draws <- readRDS(here(clean_path, "theta-draws.RDS")) %>% ungroup()
```
```{r tidy-all}
tidymc <- tidy(mcmc, conf.int = TRUE)
```
```{r plot-thetas}
plot_theta_rank <-
ggplot(theta_tidy) +
aes(x = rank(estimate), y = estimate, color = as.factor(party)) +
geom_hline(yintercept = 0, color = "gray") +
geom_linerange(
aes(ymin = conf.low_0.9, ymax = conf.high_0.9),
alpha = 0.1
) +
geom_linerange(
aes(ymin = conf.low_0.5, ymax = conf.high_0.5),
alpha = 0.7
) +
geom_point(size = 0.25, color = "black") +
scale_color_manual(values = party_factor_colors) +
scale_fill_manual(values = party_factor_colors) +
labs(
x = "Rank Order",
y = "Policy Conservatism",
title = "District-Party Public Ideal Points",
subtitle = "MCMC means, 50% and 90% intervals"
) +
coord_flip(ylim = c(-2, 1)) +
theme(legend.position = "none") +
# annotate(
# geom = "text",
# y = c(-1.5, 0.5), x = c(200, 500),
# label = c("Democrats", "Republicans"),
# vjust = c(-2, 3)
# ) +
geom_mark_ellipse(
aes(label = names(party_colors)[party], fill = as.factor(party)),
color = NA,
alpha = 0,
label.family = font_fam,
label.fill = "gray95",
label.fontface = "plain",
con.type = "straight",
con.cap = 0
) +
NULL
```
```{r plot-theta-rank}
plot_theta_rank
```
I present posterior summaries of the ideal point estimates in Figure \@ref(fig:plot-theta-rank).
The figure features data from `r 435 * 2` district-party groups, including a posterior mean (black dot), a 50% compatibility interval (dark band) and a 90% compatibility interval (light band).
The horizontal axis shows these estimates along the restricted ideological spectrum, which is oriented so that larger values indicate greater ideological conservatism.
The vertical axis ranks the ideal points in ascending order, so the lowest value (the most progressive district-party group) gets the rank of $1$.
```{r plot-theta-rank, include = TRUE, fig.scap = "District-party ideal points and rank-ordering.", fig.cap = "District-party ideal points and rank-ordering. Points are posterior means. Error bars are 50 and 90 percent compatibility intervals.", out.width = "80%", fig.height = 5, fig.width = 7}
```
Unlike ideal point estimates for individual citizens, which may feature a great deal of overlap between the distribution of Republican and Democratic partisans, district-party ideal point estimates have no overlap whatsoever.
This is because district-party ideal points are estimates of the _mean_ ideology for Republican and Democratic groups, so they average the ideological heterogeneity among individuals in each party.
It is also important to note that the ideological space appears "asymmetrical"—the clusters of Republican and Democratic ideal points are not equidistant from zero.
Instead, the Republican cluster is located much closer to an ideal point value of zero, with some Republican groups estimates having negative ideal points.
The appearance of asymmetry results from the way I restrict the item parameters to identify the latent ideal point space.
The item midpoint parameters are restricted to sum to be mean zero, which means that the average item presents policy alternatives that are equidistant from zero.
The fact that Republican ideal points are clustered closer to zero suggests that Republicans are more likely to offer progressive item responses than Democrats are to offer conservative item responses, conditional on the items included for scaling.
Supposing that these items are representative of policy conflict in the 2010s, this finding is consistent with earlier research showing that US citizens tend to hold progressive views on policy even if their symbolic worldview is more conservative [@ellis-stimson:2012:symbolic-ideology].
```{r cces-data}
cces_raw <- read_rds(here("data", "cces", "cumulative_2006_2018.Rds"))
```
```{r cces-estimates}
cces_raw %>% count(year)
cces_ideo <- cces_raw %>%
mutate_all(labelled::remove_labels) %>%
mutate(
ideo5_ch = as.character(ideo5),
ideo5 = as.numeric(ideo5),
party_name = case_when(
pid3 == 1 ~ "Democrats",
pid3 == 2 ~ "Republicans",
pid3 == 3 ~ "Independents",
TRUE ~ "Other/DK/NA"
),
party = case_when(
party_name == "Democrats" ~ 1,
party_name == "Republicans" ~ 2
),
district = abs(parse_number(cd))
) %>%
select(
cycle = year, state_abb = st, st_cd = cd, district_num = dist,
weight, party, party_name, ideo5, ideo5_ch
) %>%
filter(ideo5 %in% 1:5) %>%
filter(cycle >= 2012) %>%
filter(party %in% c(1, 2)) %>%
group_by(state_abb, st_cd, district_num, party, party_name) %>%
summarize(
ideo_mean = weighted.mean(x = ideo5, w = weight)
) %>%
ungroup() %>%
print()
count(cces_ideo, district_num)
count(theta_tidy, district_num)
```
```{r cces-v-draws}
joined_draws <-
master_data %>%
transmute(state_abb, district_num, group = as.numeric(group), party = as.numeric(party)) %>%
distinct() %>%
right_join(theta_draws) %>%
inner_join(
x = ., y = cces_ideo,
by = c("state_abb" = "state_abb", "party", "district_num")
) %>%
group_by(party_name, .draw) %>%
mutate(
r = cor(ideo_mean, theta)
) %>%
ungroup() %>%
print()
ggplot(joined_draws) +
aes(x = r) +
facet_wrap(~ party_name) +
geom_histogram()
```
```{r cces-v-means}
joined_ideal <-
inner_join(
x = theta_tidy, y = cces_ideo,
by = c("state" = "state_abb", "party", "district_num")
) %>%
group_by(party_name) %>%
mutate(
r = cor(ideo_mean, estimate)
) %>%
ungroup() %>%
print()
```
```{r plot-vs-cces}
ggplot(joined_ideal) +
aes(x = ideo_mean, y = estimate, color = as.factor(party)) +
geom_point() +
geom_mark_ellipse(
aes(label = str_glue("{party_name} (r = {round(r, 2)})")),
label.family = font_fam,
label.fill = "gray95",
label.fontface = "plain",
con.type = "straight",
con.cap = 0
) +
scale_color_manual(values = party_factor_colors) +
theme(legend.position = "nonw") +
coord_cartesian(ylim = c(-2, 1)) +
labs(
title = "Ideal Points vs. Ideological Self-Placement",
subtitle = "Self-placement from CCES 5-category item",
x = "CCES self-placement, district-party average (2012–2018)",
y = "District-party ideology from IRT model\n(posterior mean)"
)
```
We can feel confident that these ideal point estimates capture real ideological variation by comparing them to other survey-based measures of ideology.
Figure \@ref(fig:plot-vs-cces) compares the IRT model's district-party ideology estimates (posterior means) to a survey-based measure of ideological self-placement.
Self-placement data are drawn from the Cooperative Congressional Election Study (CCES) for years 2012 through 2018, coding the 5-category item numerically and averaging partisan responses within each district-party.
The figure shows a high degree of correlation between the IRT scores and the self-placement scores.
The overall correlation between the ideal points and self-placement scores is `r joined_ideal %$% cor(estimate, ideo_mean) %>% round(2)`, and the within-party correlations are `r joined_ideal %>% filter(party == 1) %>% sample_n(1) %>% pull(r) %>% round(2)` among Democrats and `r joined_ideal %>% filter(party == 2) %>% sample_n(1) %>% pull(r) %>% round(2)` among Republicans.
These within-party correlations are as strong as the within-party correlations between CF scores and DW-NOMINATE scores for incumbent House members [@bonica:2013:ideology-interests;@bonica:2014:mapping-ideology].
```{r plot-vs-cces, include = TRUE, fig.scap = "Comparison of district-party ideology to ideological self-placement.", fig.cap = "Comparison of district-party ideology and ideological self-placement. Average ideological self-placement in each district-party is measured using the CCES 5-category item, combining survey waves 2012 through 2018.", fig.height = 6, fig.width = 7}
```
<!-- NOT EVALUATED -->
```{r, eval = FALSE}
set.seed(5435)
master_data %>%
transmute(group = as.numeric(group), party) %>%
distinct() %>%
group_by(party) %>%
sample_n(5) %>%
left_join(theta_draws) %>%
ggplot() +
aes(x = theta, fill = party) +
geom_density(
aes(group = group),
position = "identity",
alpha = 0.5
) +
scale_fill_manual(values = party_factor_colors)
```
### Ideological variation across districts
```{r}
plot_hist_means <- ggplot(theta_tidy) +
aes(fill = as.factor(party), x = estimate) +
geom_histogram(
position = "identity",
color = NA,
binwidth = .025,
alpha = 0.8
) +
scale_fill_manual(values = party_factor_colors) +
theme(
legend.position = "none",
plot.title.position = "plot"
) +
labs(
title = "Histogram of Ideal Points",
subtitle = "MCMC Posterior Means",
x = "Policy Conservatism",
y = NULL
) +
coord_cartesian(xlim = c(-2, 1))
plot_hist_means
```
```{r plot-hist-means}
plot_hist_means
```
Recent scholarship on party coalitions has highlighted the ideological cohesion of the Republican Party compared to the Democratic Party, owing to the fact that the Democratic Party is a big-tent assemblage of social groups with policy priorities that may conflict [@grossman-hopkins:2016:asymmetric-politics; @feldman-zaller:1992:political-ambivalence; @lelkes-sniderman:2016:ideological-asymmetry].
Party differences in ideological cohesion may appear across districts as well, especially if the demographic composition of Democratic constituents is more heterogeneous from one district to the next than the demographic composition of Republican constituents.
Figure \@ref(fig:plot-hist-means) provides some aggregate evidence that Republican constituencies are more ideologically similar to one another than Democratic constituencies are.
The histograms in the figure show that the distribution of Democratic district-party ideal points is approximately symmetric around their mean, while Republican district-party groups are more tightly distributed around a modal ideology that is on the conservative edge of the ideal point scores.
```{r plot-hist-means, include = TRUE, fig.cap = "Histogram of ideal point means in both parties.", fig.scap = "Histogram of ideal point means in both parties.", fig.width = 5, out.width = "60%"}
```
```{r SDs}
sd_draws <- master_data %>%
transmute(group = as.numeric(group), party) %>%
distinct() %>%
right_join(theta_draws) %>%
group_by(.draw, party) %>%
summarize(
sd = sd(theta)
) %>%
ungroup() %>%
pivot_wider(
names_from = "party",
values_from = "sd",
names_prefix = "sd_"
) %>%
mutate(
sd_diff = sd_1 - sd_2
) %>%
pivot_longer(
cols = starts_with("sd_"),
names_to = "sd_group",
values_to = "sd_value"
) %>%
mutate(
party = suppressWarnings(parse_number(sd_group)) %>%
ifelse(is.na(.), "diff", .)
) %>%
print()
```
```{r}
sd_labels <- tribble(
~ x, ~ y, ~ party, ~ label,
.15, 200, "2", "Republicans",
.20, 200, "1", "Democrats"
)
plot_sds <-
ggplot(sd_draws) +
aes(x = sd_value, fill = party) +
geom_histogram(
position = "identity",
alpha = 0.7,
binwidth = .0025
) +
facet_wrap(
~ party == "diff",
scales = "free",
labeller = c(
"FALSE" = "Ideal point standard deviation",
"TRUE" = "Difference in std devs:\nDemocrats minus Republicans"
) %>%
as_labeller(),
strip.position = "bottom"
) +
geom_vline(
data = sd_draws %>%
filter(party == "diff") %>%
sample_n(1),
aes(xintercept = 0)
) +
scale_fill_manual(
values = c("1" = dblue, "2" = rred, "diff" = secondary)
) +
geom_text(
data = sd_labels,
aes(x = x, y = y, label = label),
hjust = c(1.2, -0.2),
vjust = -0.2
) +
geom_text(
data = sd_draws %>%
filter(sd_group == "sd_diff") %>%
group_by(sd_group, party) %>%
summarize(
p = mean(sd_value > 0) %>% round(2),
lab = "More variation among\nDemocrats"
),
aes(
x = 0.005, y = 400,
label = str_glue("{lab} (pr = {p})")
),
# fill = "white",
# alpha = 0.7,
size = 3,
hjust = 0,
vjust = -0.1
) +
coord_cartesian(ylim = c(0, 475)) +
labs(
x = NULL,
y = NULL,
title = "Geographic Variation in Ideology",
subtitle = "Histograms of MCMC samples"
) +
theme(
legend.position = "none",
strip.placement = "outside"
)
```
```{r plot-sds}
plot_sds
```
Figure \@ref(fig:plot-sds) verifies this intuition by comparing the standard deviation of district-party ideal points in each party.
The first panel shows the standard deviation of ideal points Democratic and Republican groups.
The histogram represents the distribution of estimates across all MCMC sample iterations.
Democratic groups tend to be more variable across districts, with a distribution of standard deviation estimates centered on a mean of `r sd_draws %>% filter(party ==1) %$% mean(sd_value) %>% round(2)`.
Republican groups are less variable across districts, with a distribution of standard deviations centered on a mean of `r sd_draws %>% filter(party ==2) %$% mean(sd_value) %>% round(2)`.
The second panel of Figure \@ref(fig:plot-sds) plots the difference between the Democratic and Republican standard deviation estimates, again with a distribution representing all MCMC sample iterations.
The histogram shows that almost all MCMC iterations (`r sd_draws %>% filter(sd_group == "sd_diff") %$% mean(sd_value > 0) %>% percent(acc = 1)`) contain ideal point estimates that are higher variance for Democrats than for Republicans.
```{r plot-sds, include = TRUE, fig.scap = "Geographic heterogeneity in ideal points.", fig.cap = "Geographic heterogeneity in ideal points. Left panel: standard deviation of Republican and Democratic district-party ideologies. Right panel: difference in standard deviations (Democrats minus Republicans). Distributions reflect MCMC draws.", fig.width = 8, fig.height = 4, out.width = "100%"}
```
```{r correlations}
cor_draws <- master_data %>%
transmute(district, group = as.numeric(group), party = as.numeric(party)) %>%
distinct() %>%
right_join(theta_draws) %>%
select(-group, -sigma_in_g) %>%
pivot_wider(
names_from = "party",
values_from = "theta",
names_prefix = "theta_"
) %>%
group_by(.draw) %>%
summarize(r = cor(theta_1, theta_2, method = "pearson")) %>%
print()
plot_cor_draws <- ggplot(cor_draws) +
aes(x = r) +
geom_histogram(
boundary = 0,
fill = secondary,
alpha = 0.7
) +
geom_label(
data = cor_draws %>%
summarize(
pos = mean(r > 0) %>% round(2)
),
aes(x = 0, y = 700,
label = str_glue("pr(positive correlation) = {pos}")),
hjust = -0.1,
vjust = 1
) +
geom_vline(xintercept = 0) +
labs(
title = "Correlation Between Republicans and Democrats",
subtitle = "Histogram of correlation draws",
x = "Pearson's r",
y = NULL
) +
ggeasy::easy_remove_y_axis()
plot_cor_draws
```
```{r ideal-scatter}
theta_match_district <- master_data %>%
transmute(group = as.numeric(group), party = as.numeric(party)) %>%
distinct() %>%
right_join(theta_tidy) %>%
select(district, party, estimate, ends_with("_0.9")) %>%
pivot_wider(
names_from = "party",
values_from = c("estimate", "conf.low_0.9", "conf.high_0.9")
) %>%
print()
plot_match_scatter <- ggplot(theta_match_district) +
aes(x = estimate_1, y = estimate_2) +
geom_point(color = "black", size = 3, alpha = 0.8, shape = 16) +
labs(
x = "Democratic Ideal Point",
y = "Republican Ideal Point",
title = "Republican and Democratic Ideal Points",
subtitle = "Within-district comparison"
)
plot_match_scatter
```
```{r plot-cors}
plot_match_scatter + plot_cor_draws
```
Given the geographic variation in ideology, it is natural to wonder if Republican and Democratic ideal points are related.
Do the districts containing more conservative Republicans also contain more conservative Democrats?
Perhaps the pattern is reversed, where local conditions that reinforce the conservatism of Republicans actually reinforce progressivism among Democrats?
Figure \@ref(fig:plot-cors) explores this possibility.
The left panel plots the Republican group ideal point in a district (vertical axis) against the Democratic group ideal point in the same district, with points representing posterior means.
The posterior means do not exhibit much correlation to one another.
The right panel plots a histogram of correlation estimates (Pearson's $r$) from all MCMC draws.
The distribution of correlations suggests that although there is a slight correlation between Republican and Democratic ideal points ($r > 0$ in `r cor_draws %$% mean(r > 0) %>% percent(acc = 1)` of MCMC draws), the correlation is quite small, with a posterior mean of `r cor_draws %$% mean(r) %>% round(2)` and a standard deviation of `r cor_draws %$% sd(r) %>% round(2)`.
The predominant takeaway is that most of the variation in Republican ideal points is unrelated to Democrat ideal points.
Although this project does not explore the correlates or possible causes of local ideological convergence or divergence with much detail, this project enables this research agenda by measuring local partisan ideology.
```{r plot-cors, include = TRUE, fig.height = 6, fig.width = 11, out.width = '100%', fig.scap = "Correlation between Republican and Democratic ideal points in the same district.", fig.cap = "Correlation between Republican and Democratic ideal points in the same district. Left: scatterplot of Republican versus Democratic ideal points (posterior means). Left: posterior distribution of the correlation (Pearson's $r$) between Democratic and Republican ideal points."}
```
### Aggregate correlates of district-party ideology {#sec:ideal-covariates}
```{r}
grp_feature_names <- c("Pct. White", "Pct. College", "Med. Income", "Med. Age", "Gini", "Pct. Foreign Born", "Pct. Unemployed")
state_feature_names <- c("Pct. Evangelical", "Pct. Nonwhite", "Income per cap")
tidy_coefs <- tidymc %>%
filter(
str_detect(term, "coef_grp_mean") |
str_detect(term, "coef_st_mean")
) %>%
mutate(
party =
str_split(term, pattern = ",", simplify = TRUE)[,2] %>%
parse_number(),
party_name = names(party_colors)[party],
predictor = str_split(term, pattern = ",", simplify = TRUE)[,1] %>%
parse_number(),
level = str_split(term, pattern = "\\[",simplify = TRUE)[,1],
coef_lab = case_when(
level == "coef_grp_mean" ~ grp_feature_names[predictor],
level == "coef_st_mean" ~ state_feature_names[predictor]
)
) %>%
print()
plot_coef_pts <- ggplot(tidy_coefs) +
aes(x = coef_lab, y = estimate,
color = party_name,
shape = party_name
) +
geom_hline(yintercept = 0) +
facet_wrap(
~ level,
ncol = 1, scales = "free_y",
labeller = c("coef_grp_mean" = "District-Level Terms",
"coef_st_mean" = "State-Level Terms") %>%
as_labeller()
) +
geom_pointrange(
aes(ymin = conf.low, ymax = conf.high),
position = position_dodge(width = -0.5),
fill = "white"
) +
scale_color_manual(values = party_colors) +
scale_shape_manual(values = c("Democrats" = 16, "Republicans" = 15)) +
labs(
y = "Coefficient on Policy Conservatism",
x = NULL,
color = NULL,
shape = NULL,
title = "Hierarchical Coefficients",
subtitle = "Separate parameters by party"
) +
theme(
plot.title.position = "plot",
legend.position = "none"
) +
coord_flip()
```
```{r eval = FALSE}
coef_samples <- mcmc %>%
gather_draws(coef_grp_mean[j, party], coef_st_mean[j, party]) %>%
ungroup() %>%
pivot_wider(
names_from = "party",
values_from = ".value",
names_prefix = "coef_"
) %>%
mutate(
coef_diff = coef_1 - coef_2,
coef_lab = case_when(
.variable == "coef_grp_mean" ~ grp_feature_names[j],
.variable == "coef_st_mean" ~ state_feature_names[j]
)
) %>%
print()
ggplot(coef_samples) +
aes(coef_1, coef_2) +
facet_wrap(~ .variable + j) +
geom_hline(yintercept = 0, color = "gray") +
geom_vline(xintercept = 0, color = "gray") +
geom_point(color = secondary, alpha = 0.2, shape = 16) +
geom_abline()
plot_coef_hist <- ggplot(coef_samples) +
aes(x = coef_diff, y = coef_lab) +
ggridges::geom_ridgeline(
stat = "binline",
aes(height = ..density..),
fill = secondary,
draw_baseline = FALSE,
scale = 0.1,
binwidth = .005,
boundary = 0,
position = "identity",
alpha = 0.5
) +
geom_vline(xintercept = 0) +
facet_wrap(
~ .variable,
scales = "free_y",
ncol = 1,
labeller = c("coef_grp_mean" = "District-Level Terms",
"coef_st_mean" = "State-Level Terms") %>%
as_labeller()
) +
theme(plot.title.position = "plot") +
labs(
title = "Difference in Coefficients",
subtitle = "Democrats minus Republicans",
x = "Difference in Coefficients",
y = NULL
)
```
```{r}
plot_marginal_scatter <- theta_tidy %>%
pivot_longer(
cols = c(prcntWhite, prcntUnemp, prcntForeignBorn, prcntBA, medianIncome, gini),
names_to = "predictor",
values_to = "value"
) %>%
mutate(
party_lab = names(party_colors)[party],
coef_lab = case_when(
predictor == "prcntWhite" ~ "Pct. White",
predictor == "prcntUnemp" ~ "Pct.\nUnemployed",
predictor == "prcntForeignBorn" ~ "Pct. Foreign\nBorn",
predictor == "prcntBA" ~ "Pct. College",
predictor == "medianIncome" ~ "Med. Income\n(Log)",
predictor == "gini" ~ "Gini Coef."
),
value = case_when(
coef_lab == "Med. Income\n(Log)" ~ log(value),
TRUE ~ value
)
) %>%
ggplot() +
aes(x = value, y = estimate,
color = party_lab, shape = party_lab) +
facet_wrap(~ coef_lab, scales = "free_x") +
# geom_pointrange(
# aes(ymin = conf.low_0.9, ymax = conf.high_0.9),
# position = position_dodge(width = -0.25)
# ) +
geom_point(fill = "white", alpha = 0.5) +
theme(legend.position = "bottom") +
scale_shape_manual(values = c("Democrats" = 16, "Republicans" = 15)) +
scale_color_manual(values = party_colors) +
guides(color = guide_legend(override.aes = list(alpha = 1))) +
labs(
x = NULL,
y = "District-Party Ideal Point",
color = NULL,
shape = NULL,
title = "Ideal Points and Covariates",
subtitle = "Bivariate (unadjusted) relationships"
)
```
```{r plot-party-covs}
# coef_layout <- "
# ##AAA##
# #BBBBB#
# "
coef_layout <- "
AABBBBB
"
wrap_plots(
A = plot_coef_pts, B = plot_marginal_scatter,
design = coef_layout
)
# plot_coef_pts + plot_marginal_scatter
# ggsave("~/desktop/test.png", height = 7, width = 12)
```
If Republican and Democratic ideal points are largely unrelated, it may be because district and state characteristics are related to partisan ideology in diverging ways.
One final piece of descriptive analysis is to explore the relationship between district-party ideal points and the hierarchical covariates used to smooth the ideal point estimates in the model.
Figure \@ref(fig:plot-party-covs) visualizes the relationship between ideal points and aggregate covariates in two ways.
The left side of the figure displays coefficient estimates from the IRT model for district- and state-level covariates included in the hierarchical regression.
These coefficients capture the linear relationship between the covariates and ideal points, holding other covariates fixed.
The right side of the figure plots the _bivariate_ relationship between ideal points (posterior means) and a selection of district-level covariates, with no additional statistical adjustments.
The bivariate relationships convey information about the "types" of districts that contain more conservative or progressive partisans, even if these relationships may be statistical artifacts of other confounding relationships.
```{r plot-party-covs, include = TRUE, out.width = "100%", fig.height = 7, fig.width = 12, fig.scap = "How ideal points relate to hierarchical covariates.", fig.cap = "How ideal points relate to hierarchical covariates. Left: coefficients from hierarchical regression on district covariates (top left) and state covariates (bottom left), with 90 percent compatibility intervals. Right: Bivariate relationships between ideal points and a selection of district-level covariates with no statistical controls."}
```
Larger values of the ideal point space are associated with greater conservatism, so increasing covariate values are related to increasing conversatism if the coefficient is positive and progressivism if the coefficient is negative.
Some covariates have similar "effects" in both parties: districts with higher median income levels and median ages are estimated to be more conservative on average, and districts with greater college attainment are estimated to be more progressive.
Other covariates have diverging effects for each party.
Whiter districts are associated with greater conservatism among Republicans and (with less statistical confidence) greater progressivism among Democrats.
Higher unemployment rates are related to decreased conservatism among Republicans and increased conservatism among Democrats.
Districts with greater numbers of foreign born residents contain more progressive Republicans, which could be related to the cosmopolitanism of the district culture, and districts with greater income inequality (measured by the Gini coefficient) contain more conservative Republicans.
The uncertainty of the coefficients at the state level indicate weaker relationships to state-level covariates, but it is worth noting that Republicans and Democrats in wealthier _states_ are more progressive than in less wealthy states, which is a pattern that differs from the district-level relationship of increasing conservatism greater wealth and is similar to the findings of @gelman-et-al:2007:red-state-qjps regarding state-level wealth.
Counter-intuitively, I also find that larger evangelical populations appear to be related to greater progressivism among both Republicans and Democrats, and larger White populations are weakly related to greater progressivism among Democrats.
Because these aggregate demographic features are likely to be confounded, it is important not to give these coefficients a causal interpretation.
Furthermore, the correlations themselves may not be most straightforward if the regression specification introduces collider bias among the aggregate predictors.
The bivariate relationships between ideal points and district covariates convey what types of districts are more progressive or conservative, setting aside statistical adjustments.
These relationships sometimes contrast with the model coefficients in interesting ways.
For instance, although the coefficient for median income is positive in both parties, the scatterplot shows that higher-income districts contain more progressive Democrats.
This suggests that other factors dominate the effect of income when determining the progressivism of Democratic groups in wealthy districts.
This factor could be college education, which is strongly related to progressivism among Democrats.
The scatterplot also captures relationships that appear in the coefficients: the strongest relationships to Republican ideology are the White share of the population (positively related to conservatism) and the foreign-born share of the population (negatively related to conservatism).
Among Democrats, greater unemployment is related to conservatism, and ideology is weakly related to racial composition, income inequality, and the foreign-born share of the population.
## Improving the ideal point model
The results from this ideal point model are promising and informative, but there are some modeling approaches that could improve this model and others like it.
The model's predictive ability could be improved by creating more flexible hierarchical regressions.
Because the hierarchical regression is included to smooth the ideal point estimates and not for causal interpretation, the linear specifications have no particular benefit over more flexible modeling approaches.
The use of nonlinear models and "machine learning" approaches is new to hierarchical measurement modeling in political science.
Examples include the use of regression trees multilevel regression and poststratification [@bisbee:2019:barp] and Gaussian process priors for IRT models [@duckmayr-et-al:2020:gpirt].
Work outside of political science has also explored the use of spline regression for IRT models [@woods-thissen:2006:spline-irt], which are not as flexible as Gaussian processes but are computationally less intensive.
The substance of the model could also be extended in several ways aside from its predictive capacity.
As mentioned above, @caughey-warshaw:2015:DGIRT lay out a dynamic linear model approach to bridge the ideal point space across multiple time periods.
The parameterization of the model also exposes a parameter for the variance of individual ideal points within a district, which could itself be modeled as a function of covariates [@lauderdale:2010:heteroskedastic-irt].
Lastly, because survey items vary in their response design, researchers have explored the use of ordinal response models to scale survey items that ask respondents to choose from more than two policy alternatives [e.g. @hill-tasanovitch:2015:polarization-disconnect].