library(pollster)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(knitr)
library(ggplot2)
It’s common to want to view a crosstab of two variables by a third
variable, for instance educational attainment by sex and
marital status. The function crosstab_3way
accomplishes
this. Row and cell percents are both supported; column percents are
not.
%>%
illinois # filter for recent years & limited ages
filter(year > 2009,
> 39) %>%
age crosstab_3way(x = sex, y = educ6, z = maritalstatus, weight = weight,
remove = c("widow/divorced/sep"),
n = FALSE) %>%
kable(digits = 0, caption = "Educational attainment by sex and marital status among Illinois residents ages 35+",
format = "html")
sex | maritalstatus | LT HS | HS | Some Col | AA | BA | Post-BA |
---|---|---|---|---|---|---|---|
Male | Married | 7 | 28 | 16 | 8 | 24 | 17 |
Male | Never Married | 13 | 35 | 19 | 11 | 15 | 8 |
Female | Married | 6 | 28 | 16 | 10 | 24 | 16 |
Female | Never Married | 11 | 27 | 21 | 8 | 17 | 15 |
Three-way crosstabs plot well as small multiples using ggplot facets.
%>%
illinois # filter for recent years & limited ages
filter(year > 2009,
> 34) %>%
age crosstab_3way(x = sex, y = educ6, z = maritalstatus, weight = weight,
remove = c("widow/divorced/sep"),
format = "long") %>%
ggplot(aes(educ6, pct, fill = maritalstatus)) +
geom_bar(stat = "identity", position = position_dodge()) +
facet_wrap(facets = vars(sex)) +
labs("Educational attainment by sex and marital status",
subtitle = "Illinois residents ages 40+") +
theme(legend.position = "top")
The same plot can be made with margin of errors as well. (See the “crosstabs” vignette for a more detailed discussion of margin of errors.)
%>%
illinois # filter for recent years & limited ages
filter(year > 2009,
> 34) %>%
age moe_crosstab_3way(x = sex, y = educ6, z = maritalstatus, weight = weight,
remove = c("widow/divorced/sep"), format = "long") %>%
ggplot(aes(educ6, pct, fill = maritalstatus)) +
geom_bar(stat = "identity", position = position_dodge(),
alpha = 0.5) +
geom_errorbar(aes(ymin = (pct - moe), ymax = (pct + moe),
color = maritalstatus),
position = position_dodge()) +
facet_wrap(facets = vars(sex)) +
labs(title = "Educational attainment by sex and marital status",
subtitle = "Illinois residents ages 35+",
caption = "Current Population Survey, 2010-2018") +
theme(legend.position = "top")
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
If the x-variable in your crosstab uniquely identifies survey waves
for which the weights were independently generated, it is best practice
to calculate the design effect independently for each wave.
moe_wave_crosstab_3way
does just that. All of the arguments
remain the same as in moe_crosstab_3way
.
moe_wave_crosstab_3way(df = illinois, x = sex, y = educ6, z = year, weight = weight)
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Your data includes weights equal to zero. These are removed before calculating the design effect.
#> Joining with `by = join_by(year)`
#> # A tibble: 144 × 6
#> year sex educ6 pct moe n
#> <dbl> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 1996 Male LT HS 15.1 1.80 3889089.
#> 2 1996 Male HS 32.5 2.35 3889089.
#> 3 1996 Male Some Col 20.3 2.02 3889089.
#> 4 1996 Male AA 6.11 1.20 3889089.
#> 5 1996 Male BA 17.7 1.91 3889089.
#> 6 1996 Male Post-BA 8.38 1.39 3889089.
#> 7 1996 Female LT HS 14.2 1.65 4193383.
#> 8 1996 Female HS 34.8 2.25 4193383.
#> 9 1996 Female Some Col 22.8 1.98 4193383.
#> 10 1996 Female AA 6.72 1.18 4193383.
#> # … with 134 more rows