Yating Zeng 2022-10-23
COVID-19 has been here for around 3 years, with vaccine widely used. It would be likely that some of the people tend to not take the vaccine than the others. Thus, in this project, the question of my interest is: What is the association between age and two vaccination status (at least one dose & completed a primary series) in California state? For this project, I’ll use the dataset on Covid-19 vaccination from the Centers for Disease Control and Prevention (CDC) website, which provided data for select demographic characteristics (age, sex, and age by sex) of people receiving COVID-19 vaccinations in the United States at the national and jurisdictional levels, fitting my analysis interest well. All the data were cumulative data, which were counted since the date it started observing.
In this project, the dataset used was a public resource from CDC website, named “COVID-19 Vaccination Age and Sex Trends in the United States, National and Jurisdictional”. The link of the dataset is shown below: https://data.cdc.gov/Vaccinations/COVID-19-Vaccination-Age-and-Sex-Trends-in-the-Uni/5i5k-6cmh. The CSV file of the data was then downloaded and read into R studio for further analysis in this project.
library(readr)
## Warning: package 'readr' was built under R version 4.1.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## âś” ggplot2 3.3.5 âś” dplyr 1.0.9
## âś” tibble 3.1.6 âś” stringr 1.4.0
## âś” tidyr 1.2.0 âś” forcats 0.5.2
## âś” purrr 0.3.4
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'purrr' was built under R version 4.1.2
## Warning: package 'dplyr' was built under R version 4.1.2
## Warning: package 'forcats' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
library(dplyr)
library(stringr)
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.1.2
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
#read in the dataset
if (!file.exists("COVID19_Vaccination.csv")){
library("RSocrata")
vaccination <- read.socrata(
"https://data.cdc.gov/resource/n8mc-b4w4.json",
app_token = "KS8vICWuRMDR6QzLnGP7SVO1a",
email = "[email protected]",
password = "Ttzyt119089838--"
)
}
vaccination <- read_csv("COVID19_Vaccination.csv")
## Rows: 1744080 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Location, Demographic_Category
## dbl (9): census, Administered_Dose1, Series_Complete_Yes, Booster_Doses, Sec...
##
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
After checking the summary of the content of the dataset, the dimensions and the original properties for each variable were known. I filtered the data to create a new dataset to keep only the information of California. For simplifying the typing in analysis, 7 columns were renamed to be shorter. Then the proportion of missing values of each column and column “Demographic_Category” were checked. Considering that age and vaccination status of primary dose series were the main factors towards this analysis, the information about “Booster”, “Age_unknown” and all the “Age>65” levels of the “Demographic_Category” column, and and the missing values of “dose1”(count of people take at least one dose) and “census” (census statistics used for calculating the percentage of vaccination) were removed.
Because the information this dataset was about the information strongly rely on time series, and all the statistics were cumulative data, a new variable “date” was created for further reorder the data by the time recorded. Based on the category from “Demographic_Category” variable (now named “cat”), the original dataset was split into 4 subset for better analysis, which are 1. objects from both sex categorized only by age level; 2. objects were all females categorized by age level; 3. objects were all males categorized by age level; and 4. objects from both sex categorized only by sex level.
Totally 8 summary tables and 8 summary figures (boxplots) were planed to create by 2 vaccination status (“at least one dose” and “completed a primary series”) and 4 categorical groups (“age”; “female_age”; “male_age”; and “sex”), showing the minimum, 1st quantile, median, 3nd quantile, maximum, and the number of recorded objects of “the percentage of people” with the 2 kinds of vaccination status grouped by age, sex or age groups stratified by sex. The reason to use data stratified by sex was to remove the possible confounding effect from sex on the association between vaccination status and age level. Then to find out the association between the age and vaccination status, 8 grouped scatterplots were planed to create, by the same approach mentioned in the part of summary tables and figures.
#select only the data of CA
ca_vac <- vaccination[which(vaccination$Location == "CA"), ]
#str(ca_vac)
#reorder the dataset by Demographic_Category and then date
ca_vac <- ca_vac[order( ca_vac[,3], ca_vac[,1] ),]
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
#head(ca_vac)
#simplified the variable names
colnames(ca_vac)[3] <- "cat"
colnames(ca_vac)[5] <- "dose1"
colnames(ca_vac)[6] <- "series"
colnames(ca_vac)[9] <- "dose1_pct"
colnames(ca_vac)[10] <- "series_pct"
colnames(ca_vac)[11] <- "booster_pct"
colnames(ca_vac)[12] <- "secbooster_pct"
#checking the proportion of missing values
#(colMeans(is.na(ca_vac)))*100
vac <- ca_vac[which(ca_vac$cat != "Age_Unknown"), ]
vac <- vac %>%
filter(!is.na(vac$dose1),!is.na(vac$census))
#check the missing value again
#(colMeans(is.na(vac)))*100
#check about the "Demographic_Category", "Dose1_pct",and "series_pct"
#unique(vac$cat)
#summary(vac$dose1_pct)
#summary(vac$series_pct)
#create new variables about date
vac$Date <- substr(vac$Date, 0, 10)
vac$year <- substr(vac$Date, 7, 10)
vac$month <- substr(vac$Date, 0, 2)
vac$day <- substr(vac$Date, 4, 5)
#sort the data by date
vac1 <- vac[with(vac, order(year, month, day)), ]
#create a new "date" numeric variable with the time order acceptable for reoder dataset
vac1 <- mutate(vac1, date = paste(year, month, day))
vac1$date <- str_replace_all(vac1$date, fixed(" "), "")
vac1$year <- as.numeric(vac1$year)
vac1$month <- as.numeric(vac1$month)
vac1$day <- as.numeric(vac1$day)
#remove the some information of no interest: booster information; the data of level "ages 65+"
vac1 = subset(vac1, select = -c(Booster_Doses, Second_Booster, booster_pct, secbooster_pct) )
vac1 <- vac1 %>%
filter(vac1$cat != "Ages_65+_yrs",
vac1$cat != "Female_Ages_65+_yrs",
vac1$cat != "Male_Ages_65+_yrs"
)
#find that there is a unreasonable order for the level 5-11
#rename the level of 5-11 to 05-11
vac1$cat <- str_replace_all(vac1$cat, fixed("Female_Ages_5-11_yrs"), "Female_Ages_05-11_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Male_Ages_5-11_yrs"), "Male_Ages_05-11_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Ages_5-11_yrs"), "Ages_05-11_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Female_Ages_2-4_yrs"), "Female_Ages_02-04_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Male_Ages_2-4_yrs"), "Male_Ages_02-04_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Ages_2-4_yrs"), "Ages_02-04_yrs")
#splitting the data by "cat" level into 4 subset: "age"; "Female_age"; "Male_age"; "sex"
vac1$CAT = substr(vac1$cat, 0, 1)
#build a subset for
vac_age<- vac1 %>%
filter(vac1$CAT == "A")
vac_Fage<- vac1 %>%
filter(vac1$CAT == "F")
vac_Mage<- vac1 %>%
filter(vac1$CAT == "M")
vac_sex<- vac1 %>%
filter(vac1$CAT == "S")
From the summary table of the percent of people with at least one dose grouped by age (Table 1), we could notice that excepting a little part of the data, most part the data were showing a trend that the statistics (the minimum, 1st quantile, median, 3nd quantile, maximum) of Percent of people with at least one dose would be larger when the age level was higher. And for the major part of observed objects, who were aged from 12-17 years old to 75+ years old, the final vaccination rate will went up to around 90%, while for the low age-level(<5 years) objects, the vaccination rate were all under or around 10%, and for 5-11 years objects the vaccination rate stayed in the middle, which is around 50%. This trend were consistent with the results form all the other 2 summary tables of Percent of people with at least one dose grouped by age stratidied by sex (Table 2 & 3).
For the summary table of the percent of people completed a primary series grouped by age (Table 5), we could find that there was still a trend similar to the one above between the vaccination status and age level, but what different was that the statistics were lower than the one of people with at least one dose. For the major part of observed objects, aged from 12-17 years old to 75+ years old, the final vaccination rate will went up to 70%-90%, with group of 65-74 years age would up to 95%. But for the low age-level(<5 years) objects, the vaccination rate were all under or around 5%, and for 5-11 years objects the vaccination rate stayed in the middle, which is around 40%. This trend were consistent with the results form all the other 2 summary tables of Percent of people with at least one dose grouped by age stratidied by sex (Table 6 & 7).
From the summary tables for the percentage of 2 vaccination status grouped by sex (Table 4 & 8), we could find that the statistics of female were always larger than males; and in the one for the percent of people with at least one dose, the final vaccination would go up to 80+%, while for the one of the percent of people completed a primary series, the rate could up to 70+%.
# summary table for dose1
vac_age_tab <- vac_age %>% group_by(cat) %>%
summarise(
min = min(dose1_pct, na.rm = T),
q1 = quantile(dose1_pct, 0.25),
median = median(dose1_pct),
q3 = quantile(dose1_pct, 0.75),
max = max(dose1_pct, na.rm = T),
days_recorded = sum(!is.na(dose1_pct)),
) %>% arrange(cat) %>%
kbl(caption =
"Table 1.Summary of Percent of people with at least one dose grouped by age") %>%
kable_styling()
vac_Fage_tab <- vac_Fage %>% group_by(cat) %>%
summarise(
min = min(dose1_pct, na.rm = T),
q1 = quantile(dose1_pct, 0.25),
median = median(dose1_pct),
q3 = quantile(dose1_pct, 0.75),
max = max(dose1_pct, na.rm = T),
days_recorded = sum(!is.na(dose1_pct))
) %>% arrange(cat) %>%
kbl(caption =
"Table 2.Summary of Percent of people with at least one dose in females grouped by age") %>%
kable_styling()
vac_Mage_tab <- vac_Mage %>% group_by(cat) %>%
summarise(
min = min(dose1_pct, na.rm = T),
q1 = quantile(dose1_pct, 0.25),
median = median(dose1_pct),
q3 = quantile(dose1_pct, 0.75),
max = max(dose1_pct, na.rm = T),
days_recorded = sum(!is.na(dose1_pct))
) %>% arrange(cat) %>%
kbl(caption =
"Table 3.Summary of Percent of people with at least one dose in males grouped by age") %>%
kable_styling()
vac_sex_tab <- vac_sex %>% group_by(cat) %>%
summarise(
min = min(dose1_pct, na.rm = T),
q1 = quantile(dose1_pct, 0.25),
median = median(dose1_pct),
q3 = quantile(dose1_pct, 0.75),
max = max(dose1_pct, na.rm = T),
days_recorded = sum(!is.na(dose1_pct))
) %>% arrange(cat) %>%
kbl(caption =
"Table 4.Summary of Percent of people with at least one dose grouped by sex") %>%
kable_styling()
vac_age_tab
cat | min | q1 | median | q3 | max | days_recorded |
---|---|---|---|---|---|---|
Ages\_\<2yrs | 0 | 3.725 | 5.20 | 6.200 | 7.1 | 122 |
Ages\_\<5yrs | 0 | 4.725 | 7.00 | 8.200 | 9.1 | 122 |
Ages_02-04_yrs | 0 | 5.350 | 8.15 | 9.475 | 10.4 | 122 |
Ages_05-11_yrs | 0 | 1.500 | 16.10 | 42.700 | 46.2 | 667 |
Ages_12-17_yrs | 0 | 35.700 | 72.80 | 82.300 | 84.0 | 676 |
Ages_18-24_yrs | 0 | 54.100 | 77.65 | 88.325 | 90.7 | 676 |
Ages_25-39_yrs | 0 | 58.300 | 78.05 | 86.400 | 88.2 | 676 |
Ages_25-49_yrs | 0 | 62.300 | 81.05 | 89.000 | 90.6 | 676 |
Ages_40-49_yrs | 0 | 69.375 | 86.35 | 93.400 | 94.8 | 676 |
Ages_50-64_yrs | 0 | 78.200 | 91.35 | 95.000 | 95.0 | 676 |
Ages_65-74_yrs | 0 | 90.900 | 95.00 | 95.000 | 95.0 | 676 |
Ages_75+\_yrs | 0 | 85.700 | 93.35 | 95.000 | 95.0 | 676 |
vac_Fage_tab
cat | min | q1 | median | q3 | max | days_recorded |
---|---|---|---|---|---|---|
Female_Ages\_\<2yrs | 0 | 3.725 | 5.30 | 6.275 | 7.1 | 122 |
Female_Ages\_\<5yrs | 0 | 4.750 | 7.10 | 8.275 | 9.1 | 122 |
Female_Ages_02-04_yrs | 0 | 5.425 | 8.20 | 9.575 | 10.5 | 122 |
Female_Ages_05-11_yrs | 0 | 1.700 | 18.20 | 43.500 | 46.9 | 661 |
Female_Ages_12-17_yrs | 0 | 37.400 | 75.15 | 84.800 | 86.6 | 676 |
Female_Ages_18-24_yrs | 0 | 57.600 | 80.35 | 91.300 | 93.7 | 676 |
Female_Ages_25-39_yrs | 0 | 60.600 | 80.25 | 88.700 | 90.5 | 676 |
Female_Ages_25-49_yrs | 0 | 64.600 | 83.05 | 90.900 | 92.5 | 676 |
Female_Ages_40-49_yrs | 0 | 71.500 | 87.90 | 94.700 | 95.0 | 676 |
Female_Ages_50-64_yrs | 0 | 79.000 | 91.45 | 95.000 | 95.0 | 676 |
Female_Ages_65-74_yrs | 0 | 90.100 | 95.00 | 95.000 | 95.0 | 676 |
Female_Ages_75+\_yrs | 0 | 83.800 | 91.25 | 95.000 | 95.0 | 676 |
vac_Mage_tab
cat | min | q1 | median | q3 | max | days_recorded |
---|---|---|---|---|---|---|
Male_Ages\_\<2yrs | 0 | 3.700 | 5.20 | 6.200 | 7.0 | 122 |
Male_Ages\_\<5yrs | 0 | 4.650 | 6.90 | 8.100 | 9.0 | 122 |
Male_Ages_02-04_yrs | 0 | 5.250 | 8.05 | 9.375 | 10.2 | 122 |
Male_Ages_05-11_yrs | 0 | 1.675 | 17.90 | 42.000 | 45.4 | 660 |
Male_Ages_12-17_yrs | 0 | 33.975 | 70.35 | 79.600 | 81.3 | 676 |
Male_Ages_18-24_yrs | 0 | 50.375 | 74.55 | 84.900 | 87.2 | 676 |
Male_Ages_25-39_yrs | 0 | 55.475 | 75.15 | 83.300 | 85.0 | 676 |
Male_Ages_25-49_yrs | 0 | 59.300 | 78.25 | 86.000 | 87.6 | 676 |
Male_Ages_40-49_yrs | 0 | 66.300 | 83.75 | 91.000 | 92.3 | 676 |
Male_Ages_50-64_yrs | 0 | 76.575 | 90.15 | 95.000 | 95.0 | 676 |
Male_Ages_65-74_yrs | 0 | 91.100 | 95.00 | 95.000 | 95.0 | 676 |
Male_Ages_75+\_yrs | 0 | 87.700 | 95.00 | 95.000 | 95.0 | 676 |
vac_sex_tab
cat | min | q1 | median | q3 | max | days_recorded |
---|---|---|---|---|---|---|
Sex_Female | 0 | 59.1 | 75.0 | 84.2 | 86.6 | 676 |
Sex_Male | 0 | 54.7 | 71.3 | 80.6 | 83.0 | 676 |
# summary table for series dose
vac_age_tab2 <- vac_age %>% group_by(cat) %>%
summarise(
min = min(series_pct, na.rm = T),
q1 = quantile(series_pct, 0.25, na.rm = T),
median = median(series_pct, na.rm = T),
q3 = quantile(series_pct, 0.75, na.rm = T),
max = max(series_pct, na.rm = T),
days_recorded = sum(!is.na(series_pct))
) %>% arrange(cat) %>%
kbl(caption =
"Table 5.Summary of Percent of people completed a primary series grouped by age") %>%
kable_styling()
vac_Fage_tab2 <- vac_Fage %>% group_by(cat) %>%
summarise(
min = min(series_pct, na.rm = T),
q1 = quantile(series_pct, 0.25, na.rm = T),
median = median(series_pct, na.rm = T),
q3 = quantile(series_pct, 0.75, na.rm = T),
max = max(series_pct, na.rm = T),
days_recorded = sum(!is.na(series_pct))
) %>% arrange(cat) %>%
kbl(caption =
"Table 6.Summary of Percent of people completed a primary series in females grouped by age") %>%
kable_styling()
vac_Mage_tab2 <- vac_Mage %>% group_by(cat) %>%
summarise(
min = min(series_pct, na.rm = T),
q1 = quantile(series_pct, 0.25, na.rm = T),
median = median(series_pct, na.rm = T),
q3 = quantile(series_pct, 0.75, na.rm = T),
max = max(series_pct, na.rm = T),
days_recorded = sum(!is.na(series_pct))
) %>% arrange(cat) %>%
kbl(caption =
"Table 7.Summary of Percent of people completed a primary series in males grouped by age") %>%
kable_styling()
vac_sex_tab2 <- vac_sex %>% group_by(cat) %>%
summarise(
min = min(series_pct, na.rm = T),
q1 = quantile(series_pct, 0.25, na.rm = T),
median = median(series_pct, na.rm = T),
q3 = quantile(series_pct, 0.75, na.rm = T),
max = max(series_pct, na.rm = T),
days_recorded = sum(!is.na(series_pct))
) %>% arrange(cat) %>%
kable(caption =
"Table 8.Summary of Percent of people completed a primary series grouped by sex") %>%
kable_styling()
vac_age_tab2
cat | min | q1 | median | q3 | max | days_recorded |
---|---|---|---|---|---|---|
Ages\_\<2yrs | 0 | 0.700 | 1.40 | 2.100 | 2.8 | 99 |
Ages\_\<5yrs | 0 | 0.400 | 2.00 | 3.200 | 4.4 | 116 |
Ages_02-04_yrs | 0 | 0.725 | 2.50 | 3.975 | 5.4 | 114 |
Ages_05-11_yrs | 0 | 1.400 | 11.25 | 36.100 | 39.2 | 630 |
Ages_12-17_yrs | 0 | 20.950 | 64.20 | 73.200 | 74.7 | 667 |
Ages_18-24_yrs | 0 | 43.200 | 67.85 | 74.900 | 76.6 | 676 |
Ages_25-39_yrs | 0 | 48.575 | 69.05 | 74.700 | 75.9 | 676 |
Ages_25-49_yrs | 0 | 52.275 | 72.15 | 77.500 | 78.6 | 676 |
Ages_40-49_yrs | 0 | 58.775 | 77.55 | 82.400 | 83.4 | 676 |
Ages_50-64_yrs | 0 | 67.900 | 82.45 | 87.000 | 88.6 | 676 |
Ages_65-74_yrs | 0 | 80.200 | 89.85 | 94.100 | 95.0 | 676 |
Ages_75+\_yrs | 0 | 75.500 | 83.05 | 86.725 | 88.8 | 676 |
vac_Fage_tab2
cat | min | q1 | median | q3 | max | days_recorded |
---|---|---|---|---|---|---|
Female_Ages\_\<2yrs | 0 | 0.700 | 1.40 | 2.100 | 2.8 | 97 |
Female_Ages\_\<5yrs | 0 | 0.900 | 2.20 | 3.350 | 4.5 | 107 |
Female_Ages_02-04_yrs | 0 | 1.600 | 2.90 | 4.400 | 5.5 | 100 |
Female_Ages_05-11_yrs | 0 | 1.500 | 11.80 | 36.700 | 39.9 | 629 |
Female_Ages_12-17_yrs | 0 | 34.525 | 67.05 | 75.875 | 77.3 | 650 |
Female_Ages_18-24_yrs | 0 | 46.675 | 70.75 | 77.900 | 79.6 | 676 |
Female_Ages_25-39_yrs | 0 | 50.800 | 71.30 | 77.100 | 78.4 | 676 |
Female_Ages_25-49_yrs | 0 | 54.575 | 74.25 | 79.700 | 80.9 | 676 |
Female_Ages_40-49_yrs | 0 | 60.900 | 79.25 | 84.100 | 85.1 | 676 |
Female_Ages_50-64_yrs | 0 | 68.800 | 82.75 | 87.300 | 88.9 | 676 |
Female_Ages_65-74_yrs | 0 | 79.475 | 89.05 | 93.225 | 95.0 | 676 |
Female_Ages_75+\_yrs | 0 | 73.600 | 81.35 | 85.000 | 86.9 | 676 |
vac_Mage_tab2
cat | min | q1 | median | q3 | max | days_recorded |
---|---|---|---|---|---|---|
Male_Ages\_\<2yrs | 0 | 0.700 | 1.40 | 2.100 | 2.8 | 98 |
Male_Ages\_\<5yrs | 0 | 0.525 | 2.00 | 3.175 | 4.3 | 114 |
Male_Ages_02-04_yrs | 0 | 1.275 | 2.70 | 4.200 | 5.4 | 104 |
Male_Ages_05-11_yrs | 0 | 1.400 | 15.10 | 35.500 | 38.5 | 615 |
Male_Ages_12-17_yrs | 0 | 27.600 | 62.00 | 70.500 | 71.9 | 657 |
Male_Ages_18-24_yrs | 0 | 39.575 | 64.65 | 71.600 | 73.2 | 676 |
Male_Ages_25-39_yrs | 0 | 45.900 | 66.30 | 71.825 | 72.9 | 676 |
Male_Ages_25-49_yrs | 0 | 49.500 | 69.45 | 74.700 | 75.8 | 676 |
Male_Ages_40-49_yrs | 0 | 55.975 | 75.05 | 80.000 | 81.0 | 676 |
Male_Ages_50-64_yrs | 0 | 66.300 | 81.40 | 86.000 | 87.6 | 676 |
Male_Ages_65-74_yrs | 0 | 80.600 | 90.25 | 94.525 | 95.0 | 676 |
Male_Ages_75+\_yrs | 0 | 77.675 | 84.95 | 88.800 | 91.1 | 676 |
vac_sex_tab2
cat | min | q1 | median | q3 | max | days_recorded |
---|---|---|---|---|---|---|
Sex_Female | 0 | 49.3 | 66.20 | 73.925 | 75.8 | 676 |
Sex_Male | 0 | 45.1 | 62.55 | 70.300 | 72.1 | 676 |
All the summary figures (Figure 1-3 & 5-7) showed the similar trend mentioned above relatively. Excepting these information we gained and mentioned, we still could find that for the object age level from 5-11 years old to 65-74 years old, with the age level went up, the major part of the statistics would with higher values, which means it might take shorter time to have a relatively high vaccination rate for those people with higher age level. And the two figures (Figure 4 & 8) for the percent group by sex were still show that the females would have higher vaccination rate, making it to be reasonable that we’d better use the stratified data for analyze the association between age and vaccination rate.
#summary graphs for dose1
vac_age %>%
ggplot(aes(x=date, y=dose1_pct)) +
geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Age group", y = "Percent of people with at least one dose",
caption = "Figure 1.Percent of people with at least one dose grouped by age") +
guides(fill=guide_legend(title="Age group"))
vac_Fage %>%
ggplot(aes(x=date, y=dose1_pct)) +
geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Age group", y = "Percent of people with at least one dose",
caption = "Figure 2.Percent of people with at least one dose of females grouped by age") +
guides(fill=guide_legend(title="Age group"))
vac_Mage %>%
ggplot(aes(x=date, y=dose1_pct)) +
geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Age group", y = "Percent of people with at least one dose",
caption = "Figure 3.Percent of people with at least one dose of males grouped by age") +
guides(fill=guide_legend(title="Age group"))
vac_sex %>%
ggplot(aes(x=date, y=dose1_pct)) +
geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Sex", y = "Percent of people with at least one dose",
caption = "Figure 4.Percent of people with at least one dose grouped by sex") +
guides(fill=guide_legend(title="Sex group"))
#summary graphs for series doses
vac_age %>%
ggplot(aes(x=date, y=series_pct)) +
geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Age group", y = "Percent of people completed a primary series",
caption = "Figure 5.Percent of people completed a primary series grouped by age") +
guides(fill=guide_legend(title="Age group"))
vac_Fage %>%
ggplot(aes(x=date, y=series_pct)) +
geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Age group", y = "Percent of people completed a primary series",
caption = "Figure 6.Percent of people completed a primary series of females grouped by age") +
guides(fill=guide_legend(title="Age group"))
vac_Mage %>%
ggplot(aes(x=date, y=series_pct)) +
geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Age group", y = "Percent of people completed a primary series",
caption = "Figure 7.Percent of people completed a primary series of males grouped by age") +
guides(fill=guide_legend(title="Age group"))
vac_sex %>%
ggplot(aes(x=date, y=series_pct)) +
geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Sex", y = "Percent of people completed a primary seriese",
caption = "Figure 8.Percent of people completed a primary seriese grouped by sex") +
guides(fill=guide_legend(title="Sex group"))
These 4 figures (Figure 9-12) all verified the trend that for both 2 kinds of vaccination status(take at least one dose & with completed series) and both sex, the vaccination rate would be higher with the age level being higher for the same time point, and the objects with higher age might take shorter time to have a relatively high vaccination rate. Which needs to be mentioned is that, this trend was also observed in the low age-level group(<5 years), but with obviously lower vaccination rate than the major part of the sample objects.
#visualization for first dose
#vac_age %>%
# ggplot(aes(x = date, y = dose1_pct)) +
# geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
# scale_x_discrete(breaks=
# c("20201213","20210601","20211201","20220601","20221019")) +
# theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
# labs(x = "Date(yyyymmdd)", y = "Percent of people with at least one dose", col="Age group",
# caption = "Figure 9.2020-2022 Percent of people with at least one dose grouped by age") +
# guides(fill=guide_legend(title="Age group"))
#vac_sex %>%
# ggplot(aes(x = date, y = dose1_pct)) +
# geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
# scale_x_discrete(breaks=
# c("20201213","20210601","20211201","20220601","20221019")) +
# theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
# labs(x = "Date(yyyymmdd)", y = "Percent of people with at least one dose", col="Sex group",
# caption = "Figure 8.Percent of people completed a primary seriese grouped by sex") +
# guides(fill=guide_legend(title="Sex group"))
vac_Fage %>%
ggplot(aes(x = date, y = dose1_pct)) +
geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
scale_x_discrete(breaks=
c("20201213","20210601","20211201","20220601","20221019")) +
theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
labs(x = "Date(yyyymmdd)", y = "Percent of people with at least one dose", col="Age group", caption = "Figure 9.2020-2022 Percent of people with at least one dose of females grouped by age") +
guides(fill=guide_legend(title="Age group"))
vac_Mage %>%
ggplot(aes(x=date, y=dose1_pct)) +
geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
scale_x_discrete(breaks=
c("20201213","20210601","20211201","20220601","20221019")) +
labs(x = "Date(yyyymmdd)", y = "Percent of people with at least one dose", col="Age group", caption = "Figure 10.2020-2022 Percent of people with at least one dose of males grouped by age") +
scale_fill_discrete(name = "Age group")
#visualization for series dose
#vac_age %>%
# ggplot(aes(x=date, y=series_pct)) +
# geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
# theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
# scale_x_discrete(breaks=
# c("20201213","20210601","20211201","20220601","20221019")) +
# labs(x = "Date(yyyymmdd)", y = "Percent of people completed a primary series", col="Age group",
# caption = "Figure 13.2020-2022 Percent of people completed a primary series grouped by age") +
# guides(fill=guide_legend(title="Age group"))
#vac_sex %>%
# ggplot(aes(x=date, y=series_pct)) +
# geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
# theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
# scale_x_discrete(breaks=
# c("20201213","20210601","20211201","20220601","20221019")) +
# labs(x = "Date(yyyymmdd)", y = "Percent of people completed a primary series", col="Sex group",
# caption = "Figure 14.2020-2022 Percent of people completed a primary series grouped by sex") +
# guides(fill=guide_legend(title="Sex group"))
vac_Fage %>%
ggplot(aes(x=date, y=series_pct)) +
geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
scale_x_discrete(breaks=
c("20201213","20210601","20211201","20220601","20221019")) +
labs(x = "Date(yyyymmdd)", y = "Percent of people completed a primary series", col="Age group",
caption = "Figure 11.2020-2022 Percent of people completed a primary series of females grouped by age") +
guides(fill=guide_legend(title="Age group"))
vac_Mage %>%
ggplot(aes(x=date, y=series_pct)) +
geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
scale_x_discrete(breaks=
c("20201213","20210601","20211201","20220601","20221019")) +
labs(x = "Date(yyyymmdd)", y = "Percent of people completed a primary series", col="Age group",
caption = "Figure 12.2020-2022 Percent of people completed a primary series of males grouped by age") +
guides(fill=guide_legend(title="Age group"))
We could believe that there could be an association between age and the two vaccination status (at least one dose & completed a primary series) in California state.For both 2 kinds of vaccination status(take at least one dose & with completed series) and both sex, the vaccination rate would be higher with the age level being higher for the same time point, and the objects with higher age might also take shorter time to have a relatively high vaccination rate. And the final vaccination rate would be higher with the age level being higher, but the rate for people with age less than 5 years old would keep in a low level, even though they follow the same trend mentioned above.