Skip to content

yating-zeng/PM566Midterm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

PM566_Midterm

Yating Zeng 2022-10-23

Introduction

COVID-19 has been here for around 3 years, with vaccine widely used. It would be likely that some of the people tend to not take the vaccine than the others. Thus, in this project, the question of my interest is: What is the association between age and two vaccination status (at least one dose & completed a primary series) in California state? For this project, I’ll use the dataset on Covid-19 vaccination from the Centers for Disease Control and Prevention (CDC) website, which provided data for select demographic characteristics (age, sex, and age by sex) of people receiving COVID-19 vaccinations in the United States at the national and jurisdictional levels, fitting my analysis interest well. All the data were cumulative data, which were counted since the date it started observing.

Methods

1.Dataset

In this project, the dataset used was a public resource from CDC website, named “COVID-19 Vaccination Age and Sex Trends in the United States, National and Jurisdictional”. The link of the dataset is shown below: https://data.cdc.gov/Vaccinations/COVID-19-Vaccination-Age-and-Sex-Trends-in-the-Uni/5i5k-6cmh. The CSV file of the data was then downloaded and read into R studio for further analysis in this project.

library(readr)
## Warning: package 'readr' was built under R version 4.1.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## âś” ggplot2 3.3.5     âś” dplyr   1.0.9
## âś” tibble  3.1.6     âś” stringr 1.4.0
## âś” tidyr   1.2.0     âś” forcats 0.5.2
## âś” purrr   0.3.4

## Warning: package 'tidyr' was built under R version 4.1.2

## Warning: package 'purrr' was built under R version 4.1.2

## Warning: package 'dplyr' was built under R version 4.1.2

## Warning: package 'forcats' was built under R version 4.1.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
library(dplyr)
library(stringr)
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.1.2

## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
#read in the dataset

if (!file.exists("COVID19_Vaccination.csv")){
  library("RSocrata")
  vaccination <- read.socrata(
                 "https://data.cdc.gov/resource/n8mc-b4w4.json",
                  app_token = "KS8vICWuRMDR6QzLnGP7SVO1a",
                  email     = "[email protected]",
                  password  = "Ttzyt119089838--"
  )
}

vaccination <- read_csv("COVID19_Vaccination.csv")
## Rows: 1744080 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Location, Demographic_Category
## dbl (9): census, Administered_Dose1, Series_Complete_Yes, Booster_Doses, Sec...
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.

2.Data cleaning, wrangling and EDA

After checking the summary of the content of the dataset, the dimensions and the original properties for each variable were known. I filtered the data to create a new dataset to keep only the information of California. For simplifying the typing in analysis, 7 columns were renamed to be shorter. Then the proportion of missing values of each column and column “Demographic_Category” were checked. Considering that age and vaccination status of primary dose series were the main factors towards this analysis, the information about “Booster”, “Age_unknown” and all the “Age>65” levels of the “Demographic_Category” column, and and the missing values of “dose1”(count of people take at least one dose) and “census” (census statistics used for calculating the percentage of vaccination) were removed.

Because the information this dataset was about the information strongly rely on time series, and all the statistics were cumulative data, a new variable “date” was created for further reorder the data by the time recorded. Based on the category from “Demographic_Category” variable (now named “cat”), the original dataset was split into 4 subset for better analysis, which are 1. objects from both sex categorized only by age level; 2. objects were all females categorized by age level; 3. objects were all males categorized by age level; and 4. objects from both sex categorized only by sex level.

Totally 8 summary tables and 8 summary figures (boxplots) were planed to create by 2 vaccination status (“at least one dose” and “completed a primary series”) and 4 categorical groups (“age”; “female_age”; “male_age”; and “sex”), showing the minimum, 1st quantile, median, 3nd quantile, maximum, and the number of recorded objects of “the percentage of people” with the 2 kinds of vaccination status grouped by age, sex or age groups stratified by sex. The reason to use data stratified by sex was to remove the possible confounding effect from sex on the association between vaccination status and age level. Then to find out the association between the age and vaccination status, 8 grouped scatterplots were planed to create, by the same approach mentioned in the part of summary tables and figures.

#select only the data of CA
ca_vac <- vaccination[which(vaccination$Location == "CA"), ]
#str(ca_vac)

#reorder the dataset by Demographic_Category and then date
ca_vac <- ca_vac[order( ca_vac[,3], ca_vac[,1] ),]
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames

## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
#head(ca_vac)
#simplified the variable names
colnames(ca_vac)[3]  <- "cat"
colnames(ca_vac)[5]  <- "dose1"
colnames(ca_vac)[6]  <- "series"
colnames(ca_vac)[9]  <- "dose1_pct"
colnames(ca_vac)[10] <- "series_pct"
colnames(ca_vac)[11] <- "booster_pct"
colnames(ca_vac)[12] <- "secbooster_pct"
#checking the proportion of missing values
#(colMeans(is.na(ca_vac)))*100
vac <- ca_vac[which(ca_vac$cat != "Age_Unknown"), ] 
vac <- vac %>%
  filter(!is.na(vac$dose1),!is.na(vac$census))

#check the missing value again
#(colMeans(is.na(vac)))*100
#check about the "Demographic_Category", "Dose1_pct",and "series_pct"
#unique(vac$cat)
#summary(vac$dose1_pct)
#summary(vac$series_pct)
#create new variables about date
vac$Date  <- substr(vac$Date, 0, 10)
vac$year  <- substr(vac$Date, 7, 10)
vac$month <- substr(vac$Date, 0, 2)
vac$day   <- substr(vac$Date, 4, 5)
#sort the data by date 
vac1 <- vac[with(vac, order(year, month, day)), ]

#create a new "date" numeric variable with the time order acceptable for reoder dataset
vac1 <- mutate(vac1, date = paste(year, month, day))
vac1$date <- str_replace_all(vac1$date, fixed(" "), "")

vac1$year  <- as.numeric(vac1$year)
vac1$month <- as.numeric(vac1$month)
vac1$day   <- as.numeric(vac1$day)
#remove the some information of no interest: booster information; the data of level "ages 65+"
vac1 = subset(vac1, select = -c(Booster_Doses, Second_Booster, booster_pct, secbooster_pct) )
vac1 <- vac1 %>%
  filter(vac1$cat != "Ages_65+_yrs",
               vac1$cat != "Female_Ages_65+_yrs",
               vac1$cat != "Male_Ages_65+_yrs"
               )

#find that there is a unreasonable order for the level 5-11

#rename the level of 5-11 to 05-11
vac1$cat <- str_replace_all(vac1$cat, fixed("Female_Ages_5-11_yrs"), "Female_Ages_05-11_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Male_Ages_5-11_yrs"),   "Male_Ages_05-11_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Ages_5-11_yrs"),        "Ages_05-11_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Female_Ages_2-4_yrs"),  "Female_Ages_02-04_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Male_Ages_2-4_yrs"),    "Male_Ages_02-04_yrs")
vac1$cat <- str_replace_all(vac1$cat, fixed("Ages_2-4_yrs"),         "Ages_02-04_yrs")
#splitting the data by "cat" level into 4 subset: "age"; "Female_age"; "Male_age"; "sex"
vac1$CAT = substr(vac1$cat, 0, 1)
#build a subset for 
vac_age<- vac1 %>%
  filter(vac1$CAT == "A")
vac_Fage<- vac1 %>%
  filter(vac1$CAT == "F")
vac_Mage<- vac1 %>%
  filter(vac1$CAT == "M")
vac_sex<- vac1 %>%
  filter(vac1$CAT == "S")

Results

1. Summary tables

From the summary table of the percent of people with at least one dose grouped by age (Table 1), we could notice that excepting a little part of the data, most part the data were showing a trend that the statistics (the minimum, 1st quantile, median, 3nd quantile, maximum) of Percent of people with at least one dose would be larger when the age level was higher. And for the major part of observed objects, who were aged from 12-17 years old to 75+ years old, the final vaccination rate will went up to around 90%, while for the low age-level(<5 years) objects, the vaccination rate were all under or around 10%, and for 5-11 years objects the vaccination rate stayed in the middle, which is around 50%. This trend were consistent with the results form all the other 2 summary tables of Percent of people with at least one dose grouped by age stratidied by sex (Table 2 & 3).

For the summary table of the percent of people completed a primary series grouped by age (Table 5), we could find that there was still a trend similar to the one above between the vaccination status and age level, but what different was that the statistics were lower than the one of people with at least one dose. For the major part of observed objects, aged from 12-17 years old to 75+ years old, the final vaccination rate will went up to 70%-90%, with group of 65-74 years age would up to 95%. But for the low age-level(<5 years) objects, the vaccination rate were all under or around 5%, and for 5-11 years objects the vaccination rate stayed in the middle, which is around 40%. This trend were consistent with the results form all the other 2 summary tables of Percent of people with at least one dose grouped by age stratidied by sex (Table 6 & 7).

From the summary tables for the percentage of 2 vaccination status grouped by sex (Table 4 & 8), we could find that the statistics of female were always larger than males; and in the one for the percent of people with at least one dose, the final vaccination would go up to 80+%, while for the one of the percent of people completed a primary series, the rate could up to 70+%.

# summary table for dose1
vac_age_tab <- vac_age %>% group_by(cat) %>%
                   summarise(
                     min    = min(dose1_pct, na.rm = T),
                     q1     = quantile(dose1_pct, 0.25),
                     median = median(dose1_pct),
                     q3     = quantile(dose1_pct, 0.75),
                     max    = max(dose1_pct, na.rm = T), 
                     days_recorded    = sum(!is.na(dose1_pct)),
                     ) %>%  arrange(cat) %>% 
                                    kbl(caption = 
             "Table 1.Summary of Percent of people with at least one dose grouped by age") %>% 
                                      kable_styling()

vac_Fage_tab <- vac_Fage %>% group_by(cat) %>%
                   summarise(
                     min    = min(dose1_pct, na.rm = T),
                     q1     = quantile(dose1_pct, 0.25),
                     median = median(dose1_pct),
                     q3     = quantile(dose1_pct, 0.75),
                     max    = max(dose1_pct, na.rm = T), 
                     days_recorded    = sum(!is.na(dose1_pct))
                     ) %>%  arrange(cat) %>% 
                                    kbl(caption = 
            "Table 2.Summary of Percent of people with at least one dose in females grouped by age") %>% 
                                      kable_styling()

vac_Mage_tab <- vac_Mage %>% group_by(cat) %>%
                   summarise(
                     min           = min(dose1_pct, na.rm = T),
                     q1            = quantile(dose1_pct, 0.25),
                     median        = median(dose1_pct),
                     q3            = quantile(dose1_pct, 0.75),
                     max           = max(dose1_pct, na.rm = T), 
                     days_recorded  = sum(!is.na(dose1_pct))
                     
                     ) %>%  arrange(cat) %>% 
                                    kbl(caption = 
            "Table 3.Summary of Percent of people with at least one dose in males grouped by age") %>%
                                      kable_styling()

vac_sex_tab <- vac_sex %>% group_by(cat) %>%
                   summarise(
                     min    = min(dose1_pct, na.rm = T),
                     q1     = quantile(dose1_pct, 0.25),
                     median = median(dose1_pct),
                     q3     = quantile(dose1_pct, 0.75),
                     max    = max(dose1_pct, na.rm = T), 
                     days_recorded    = sum(!is.na(dose1_pct))
                     ) %>%  arrange(cat) %>% 
                                    kbl(caption = 
            "Table 4.Summary of Percent of people with at least one dose grouped by sex") %>% 
                                      kable_styling()

vac_age_tab
Table 1.Summary of Percent of people with at least one dose grouped by age
cat min q1 median q3 max days_recorded
Ages\_\<2yrs 0 3.725 5.20 6.200 7.1 122
Ages\_\<5yrs 0 4.725 7.00 8.200 9.1 122
Ages_02-04_yrs 0 5.350 8.15 9.475 10.4 122
Ages_05-11_yrs 0 1.500 16.10 42.700 46.2 667
Ages_12-17_yrs 0 35.700 72.80 82.300 84.0 676
Ages_18-24_yrs 0 54.100 77.65 88.325 90.7 676
Ages_25-39_yrs 0 58.300 78.05 86.400 88.2 676
Ages_25-49_yrs 0 62.300 81.05 89.000 90.6 676
Ages_40-49_yrs 0 69.375 86.35 93.400 94.8 676
Ages_50-64_yrs 0 78.200 91.35 95.000 95.0 676
Ages_65-74_yrs 0 90.900 95.00 95.000 95.0 676
Ages_75+\_yrs 0 85.700 93.35 95.000 95.0 676
vac_Fage_tab
Table 2.Summary of Percent of people with at least one dose in females grouped by age
cat min q1 median q3 max days_recorded
Female_Ages\_\<2yrs 0 3.725 5.30 6.275 7.1 122
Female_Ages\_\<5yrs 0 4.750 7.10 8.275 9.1 122
Female_Ages_02-04_yrs 0 5.425 8.20 9.575 10.5 122
Female_Ages_05-11_yrs 0 1.700 18.20 43.500 46.9 661
Female_Ages_12-17_yrs 0 37.400 75.15 84.800 86.6 676
Female_Ages_18-24_yrs 0 57.600 80.35 91.300 93.7 676
Female_Ages_25-39_yrs 0 60.600 80.25 88.700 90.5 676
Female_Ages_25-49_yrs 0 64.600 83.05 90.900 92.5 676
Female_Ages_40-49_yrs 0 71.500 87.90 94.700 95.0 676
Female_Ages_50-64_yrs 0 79.000 91.45 95.000 95.0 676
Female_Ages_65-74_yrs 0 90.100 95.00 95.000 95.0 676
Female_Ages_75+\_yrs 0 83.800 91.25 95.000 95.0 676
vac_Mage_tab
Table 3.Summary of Percent of people with at least one dose in males grouped by age
cat min q1 median q3 max days_recorded
Male_Ages\_\<2yrs 0 3.700 5.20 6.200 7.0 122
Male_Ages\_\<5yrs 0 4.650 6.90 8.100 9.0 122
Male_Ages_02-04_yrs 0 5.250 8.05 9.375 10.2 122
Male_Ages_05-11_yrs 0 1.675 17.90 42.000 45.4 660
Male_Ages_12-17_yrs 0 33.975 70.35 79.600 81.3 676
Male_Ages_18-24_yrs 0 50.375 74.55 84.900 87.2 676
Male_Ages_25-39_yrs 0 55.475 75.15 83.300 85.0 676
Male_Ages_25-49_yrs 0 59.300 78.25 86.000 87.6 676
Male_Ages_40-49_yrs 0 66.300 83.75 91.000 92.3 676
Male_Ages_50-64_yrs 0 76.575 90.15 95.000 95.0 676
Male_Ages_65-74_yrs 0 91.100 95.00 95.000 95.0 676
Male_Ages_75+\_yrs 0 87.700 95.00 95.000 95.0 676
vac_sex_tab
Table 4.Summary of Percent of people with at least one dose grouped by sex
cat min q1 median q3 max days_recorded
Sex_Female 0 59.1 75.0 84.2 86.6 676
Sex_Male 0 54.7 71.3 80.6 83.0 676
# summary table for series dose
vac_age_tab2 <- vac_age %>% group_by(cat) %>%
                   summarise(
                     min    = min(series_pct, na.rm = T),
                     q1     = quantile(series_pct, 0.25, na.rm = T),
                     median = median(series_pct, na.rm = T),
                     q3     = quantile(series_pct, 0.75, na.rm = T),
                     max    = max(series_pct, na.rm = T), 
                     days_recorded    = sum(!is.na(series_pct))
                     ) %>%  arrange(cat) %>% 
                                    kbl(caption = 
            "Table 5.Summary of Percent of people completed a primary series grouped by age") %>% 
                                      kable_styling()

vac_Fage_tab2 <- vac_Fage %>% group_by(cat) %>%
                   summarise(
                     min    = min(series_pct, na.rm = T),
                     q1     = quantile(series_pct, 0.25, na.rm = T),
                     median = median(series_pct, na.rm = T),
                     q3     = quantile(series_pct, 0.75, na.rm = T),
                     max    = max(series_pct, na.rm = T), 
                     days_recorded    = sum(!is.na(series_pct))
                     ) %>%  arrange(cat) %>% 
                                    kbl(caption = 
         "Table 6.Summary of Percent of people completed a primary series in females grouped by age") %>% 
                                      kable_styling()

vac_Mage_tab2 <- vac_Mage %>% group_by(cat) %>%
                   summarise(
                     min    = min(series_pct, na.rm = T),
                     q1     = quantile(series_pct, 0.25, na.rm = T),
                     median = median(series_pct, na.rm = T),
                     q3     = quantile(series_pct, 0.75, na.rm = T),
                     max    = max(series_pct, na.rm = T), 
                     days_recorded    = sum(!is.na(series_pct))
                     ) %>%  arrange(cat) %>% 
                              kbl(caption = 
           "Table 7.Summary of Percent of people completed a primary series in males grouped by age") %>% 
                                  kable_styling()

vac_sex_tab2 <- vac_sex %>% group_by(cat) %>%
                   summarise(
                     min    = min(series_pct, na.rm = T),
                     q1     = quantile(series_pct, 0.25, na.rm = T),
                     median = median(series_pct, na.rm = T),
                     q3     = quantile(series_pct, 0.75, na.rm = T),
                     max    = max(series_pct, na.rm = T), 
                     days_recorded    = sum(!is.na(series_pct))
                     ) %>%  arrange(cat) %>% 
                               kable(caption = 
           "Table 8.Summary of Percent of people completed a primary series grouped by sex") %>%
                                  kable_styling()

vac_age_tab2
Table 5.Summary of Percent of people completed a primary series grouped by age
cat min q1 median q3 max days_recorded
Ages\_\<2yrs 0 0.700 1.40 2.100 2.8 99
Ages\_\<5yrs 0 0.400 2.00 3.200 4.4 116
Ages_02-04_yrs 0 0.725 2.50 3.975 5.4 114
Ages_05-11_yrs 0 1.400 11.25 36.100 39.2 630
Ages_12-17_yrs 0 20.950 64.20 73.200 74.7 667
Ages_18-24_yrs 0 43.200 67.85 74.900 76.6 676
Ages_25-39_yrs 0 48.575 69.05 74.700 75.9 676
Ages_25-49_yrs 0 52.275 72.15 77.500 78.6 676
Ages_40-49_yrs 0 58.775 77.55 82.400 83.4 676
Ages_50-64_yrs 0 67.900 82.45 87.000 88.6 676
Ages_65-74_yrs 0 80.200 89.85 94.100 95.0 676
Ages_75+\_yrs 0 75.500 83.05 86.725 88.8 676
vac_Fage_tab2
Table 6.Summary of Percent of people completed a primary series in females grouped by age
cat min q1 median q3 max days_recorded
Female_Ages\_\<2yrs 0 0.700 1.40 2.100 2.8 97
Female_Ages\_\<5yrs 0 0.900 2.20 3.350 4.5 107
Female_Ages_02-04_yrs 0 1.600 2.90 4.400 5.5 100
Female_Ages_05-11_yrs 0 1.500 11.80 36.700 39.9 629
Female_Ages_12-17_yrs 0 34.525 67.05 75.875 77.3 650
Female_Ages_18-24_yrs 0 46.675 70.75 77.900 79.6 676
Female_Ages_25-39_yrs 0 50.800 71.30 77.100 78.4 676
Female_Ages_25-49_yrs 0 54.575 74.25 79.700 80.9 676
Female_Ages_40-49_yrs 0 60.900 79.25 84.100 85.1 676
Female_Ages_50-64_yrs 0 68.800 82.75 87.300 88.9 676
Female_Ages_65-74_yrs 0 79.475 89.05 93.225 95.0 676
Female_Ages_75+\_yrs 0 73.600 81.35 85.000 86.9 676
vac_Mage_tab2
Table 7.Summary of Percent of people completed a primary series in males grouped by age
cat min q1 median q3 max days_recorded
Male_Ages\_\<2yrs 0 0.700 1.40 2.100 2.8 98
Male_Ages\_\<5yrs 0 0.525 2.00 3.175 4.3 114
Male_Ages_02-04_yrs 0 1.275 2.70 4.200 5.4 104
Male_Ages_05-11_yrs 0 1.400 15.10 35.500 38.5 615
Male_Ages_12-17_yrs 0 27.600 62.00 70.500 71.9 657
Male_Ages_18-24_yrs 0 39.575 64.65 71.600 73.2 676
Male_Ages_25-39_yrs 0 45.900 66.30 71.825 72.9 676
Male_Ages_25-49_yrs 0 49.500 69.45 74.700 75.8 676
Male_Ages_40-49_yrs 0 55.975 75.05 80.000 81.0 676
Male_Ages_50-64_yrs 0 66.300 81.40 86.000 87.6 676
Male_Ages_65-74_yrs 0 80.600 90.25 94.525 95.0 676
Male_Ages_75+\_yrs 0 77.675 84.95 88.800 91.1 676
vac_sex_tab2
Table 8.Summary of Percent of people completed a primary series grouped by sex
cat min q1 median q3 max days_recorded
Sex_Female 0 49.3 66.20 73.925 75.8 676
Sex_Male 0 45.1 62.55 70.300 72.1 676

2. Summary figures

All the summary figures (Figure 1-3 & 5-7) showed the similar trend mentioned above relatively. Excepting these information we gained and mentioned, we still could find that for the object age level from 5-11 years old to 65-74 years old, with the age level went up, the major part of the statistics would with higher values, which means it might take shorter time to have a relatively high vaccination rate for those people with higher age level. And the two figures (Figure 4 & 8) for the percent group by sex were still show that the females would have higher vaccination rate, making it to be reasonable that we’d better use the stratified data for analyze the association between age and vaccination rate.

#summary graphs for dose1
vac_age %>%
    ggplot(aes(x=date, y=dose1_pct)) +
    geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
    theme(axis.text.x  = element_text(angle = 60, hjust = 1),
          plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Age group", y = "Percent of people with at least one dose", 
         caption = "Figure 1.Percent of people with at least one dose grouped by age") +
    guides(fill=guide_legend(title="Age group"))

vac_Fage %>%
    ggplot(aes(x=date, y=dose1_pct)) +
    geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
    theme(axis.text.x = element_text(angle = 60, hjust = 1),
          plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Age group", y = "Percent of people with at least one dose", 
         caption = "Figure 2.Percent of people with at least one dose of females grouped by age") +
    guides(fill=guide_legend(title="Age group"))

vac_Mage %>%
    ggplot(aes(x=date, y=dose1_pct)) +
    geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
    theme(axis.text.x = element_text(angle = 60, hjust = 1),
          plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Age group", y = "Percent of people with at least one dose", 
         caption = "Figure 3.Percent of people with at least one dose of males grouped by age") +
    guides(fill=guide_legend(title="Age group"))

vac_sex %>%
    ggplot(aes(x=date, y=dose1_pct)) +
    geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Sex", y = "Percent of people with at least one dose", 
         caption = "Figure 4.Percent of people with at least one dose grouped by sex") +
    guides(fill=guide_legend(title="Sex group"))

#summary graphs for series doses
vac_age %>%
    ggplot(aes(x=date, y=series_pct)) +
    geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
    theme(axis.text.x = element_text(angle = 60, hjust = 1),
          plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Age group", y = "Percent of people completed a primary series", 
         caption = "Figure 5.Percent of people completed a primary series grouped by age") +
    guides(fill=guide_legend(title="Age group"))

vac_Fage %>%
    ggplot(aes(x=date, y=series_pct)) +
    geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
    theme(axis.text.x = element_text(angle = 60, hjust = 1),
          plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Age group", y = "Percent of people completed a primary series", 
         caption = "Figure 6.Percent of people completed a primary series of females grouped by age") +
    guides(fill=guide_legend(title="Age group"))

vac_Mage %>%
    ggplot(aes(x=date, y=series_pct)) +
    geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
    theme(axis.text.x = element_text(angle = 60, hjust = 1),
          plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Age group", y = "Percent of people completed a primary series", 
         caption = "Figure 7.Percent of people completed a primary series of males grouped by age") +
    guides(fill=guide_legend(title="Age group"))

vac_sex %>%
    ggplot(aes(x=date, y=series_pct)) +
    geom_boxplot(mapping = aes(x = cat, y = dose1_pct, fill = cat)) +
    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Sex", y = "Percent of people completed a primary seriese", 
         caption = "Figure 8.Percent of people completed a primary seriese grouped by sex") +
    guides(fill=guide_legend(title="Sex group"))

3. Visualization of the association

These 4 figures (Figure 9-12) all verified the trend that for both 2 kinds of vaccination status(take at least one dose & with completed series) and both sex, the vaccination rate would be higher with the age level being higher for the same time point, and the objects with higher age might take shorter time to have a relatively high vaccination rate. Which needs to be mentioned is that, this trend was also observed in the low age-level group(<5 years), but with obviously lower vaccination rate than the major part of the sample objects.

#visualization for first dose
#vac_age %>%
#    ggplot(aes(x = date, y = dose1_pct)) + 
#    geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
#    scale_x_discrete(breaks=
#          c("20201213","20210601","20211201","20220601","20221019")) +
#    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
#    labs(x = "Date(yyyymmdd)", y = "Percent of people with at least one dose", col="Age group",
#         caption = "Figure 9.2020-2022 Percent of people with at least one dose grouped by age") +
#    guides(fill=guide_legend(title="Age group"))
 
#vac_sex %>%
#    ggplot(aes(x = date, y = dose1_pct)) + 
#    geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
#    scale_x_discrete(breaks=
#          c("20201213","20210601","20211201","20220601","20221019")) +
#    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
#    labs(x = "Date(yyyymmdd)", y = "Percent of people with at least one dose", col="Sex group", 
#         caption = "Figure 8.Percent of people completed a primary seriese grouped by sex") +
#    guides(fill=guide_legend(title="Sex group")) 

vac_Fage %>%
    ggplot(aes(x = date, y = dose1_pct)) + 
    geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
    scale_x_discrete(breaks=
          c("20201213","20210601","20211201","20220601","20221019")) +
    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    labs(x = "Date(yyyymmdd)", y = "Percent of people with at least one dose", col="Age group", caption = "Figure 9.2020-2022 Percent of people with at least one dose of females grouped by age") +
    guides(fill=guide_legend(title="Age group"))

vac_Mage %>%
    ggplot(aes(x=date, y=dose1_pct)) + 
    geom_point(mapping = aes(x = date, y = dose1_pct, color = cat)) +
    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    scale_x_discrete(breaks=
          c("20201213","20210601","20211201","20220601","20221019")) +
    labs(x = "Date(yyyymmdd)", y = "Percent of people with at least one dose", col="Age group", caption = "Figure 10.2020-2022 Percent of people with at least one dose of males grouped by age") +
    scale_fill_discrete(name = "Age group")

#visualization for series dose
#vac_age %>%
#    ggplot(aes(x=date, y=series_pct)) + 
#    geom_point(mapping = aes(x = date, y = dose1_pct, color = cat))  +
#    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
#    scale_x_discrete(breaks=
#          c("20201213","20210601","20211201","20220601","20221019")) +
#    labs(x = "Date(yyyymmdd)", y = "Percent of people completed a primary series", col="Age group",
#         caption = "Figure 13.2020-2022 Percent of people completed a primary series grouped by age") +
#    guides(fill=guide_legend(title="Age group"))
 
#vac_sex %>%
#    ggplot(aes(x=date, y=series_pct)) + 
#    geom_point(mapping = aes(x = date, y = dose1_pct, color = cat))  +
#    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
#    scale_x_discrete(breaks=
#          c("20201213","20210601","20211201","20220601","20221019")) +
#    labs(x = "Date(yyyymmdd)", y = "Percent of people completed a primary series", col="Sex group",
#         caption = "Figure 14.2020-2022 Percent of people completed a primary series grouped by sex") +
#    guides(fill=guide_legend(title="Sex group"))

vac_Fage %>%
    ggplot(aes(x=date, y=series_pct)) + 
    geom_point(mapping = aes(x = date, y = dose1_pct, color = cat))  +
    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    scale_x_discrete(breaks=
          c("20201213","20210601","20211201","20220601","20221019")) +
    labs(x = "Date(yyyymmdd)", y = "Percent of people completed a primary series", col="Age group",
         caption = "Figure 11.2020-2022 Percent of people completed a primary series of females grouped by age") +
    guides(fill=guide_legend(title="Age group"))

vac_Mage %>%
    ggplot(aes(x=date, y=series_pct)) + 
    geom_point(mapping = aes(x = date, y = dose1_pct, color = cat))  +
    theme(plot.caption = element_text(hjust=0.5, size=rel(1.2))) +
    scale_x_discrete(breaks=
          c("20201213","20210601","20211201","20220601","20221019")) +
    labs(x = "Date(yyyymmdd)", y = "Percent of people completed a primary series", col="Age group",
         caption = "Figure 12.2020-2022 Percent of people completed a primary series of males grouped by age") +
    guides(fill=guide_legend(title="Age group"))

Conclusion

We could believe that there could be an association between age and the two vaccination status (at least one dose & completed a primary series) in California state.For both 2 kinds of vaccination status(take at least one dose & with completed series) and both sex, the vaccination rate would be higher with the age level being higher for the same time point, and the objects with higher age might also take shorter time to have a relatively high vaccination rate. And the final vaccination rate would be higher with the age level being higher, but the rate for people with age less than 5 years old would keep in a low level, even though they follow the same trend mentioned above.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published