diff --git a/NAMESPACE b/NAMESPACE index c1cd702..31c3118 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -67,7 +67,9 @@ export(add_predictions) export(add_transactions) export(anti_join) export(arrange) +export(as_exp_df) export(as_exposed_df) +export(as_trx_df) export(autoplot) export(autotable) export(bake) @@ -88,8 +90,10 @@ export(full_join) export(group_by) export(groups) export(inner_join) +export(is_exp_df) export(is_exposed_df) export(is_split_exposed_df) +export(is_trx_df) export(left_join) export(mutate) export(plot_actual_to_expected) diff --git a/NEWS.md b/NEWS.md index 3d1d0d0..3b7c0d7 100644 --- a/NEWS.md +++ b/NEWS.md @@ -18,6 +18,13 @@ when `exp_stats()` is passed a weighting variable. - Added a `summary()` method for `exposed_df` objects that calls `exp_stats()`. - The assumed default status in `expose()` functions was changed from the first observed status to the most common status. +- The functions `as_exp_df()` and `as_trx_df()` were added to convert +pre-aggregated experience studies to the `exp_df` and `trx_df` formats, +respectively. +- `agg_sim_dat` - a new simulated data set of pre-aggregated experience was +added for testing `as_exp_df()` and `as_trx_df()`. +- `is_exp_df()` and `as_trx_df()` were added to test for the `exp_df` and +`trx_df` classes. # actxps 1.3.0 diff --git a/R/agg_sim_dat.R b/R/agg_sim_dat.R new file mode 100644 index 0000000..1488247 --- /dev/null +++ b/R/agg_sim_dat.R @@ -0,0 +1,40 @@ +#' Aggregate simulated annuity data +#' +#' A pre-aggregated version of surrender and withdrawal experience from the +#' simulated data sets `census_dat`, `withdrawals`, and `account_vals`. This +#' data is theoretical only and does not represent the experience on any +#' specific product. +#' +#' @format A data frame containing summarized experience study results grouped +#' by policy year, income guarantee presence, tax-qualified status, and product. +#' +#' @details +#' +#' \describe{ +#' \item{pol_yr}{Policy year} +#' \item{inc_guar}{Indicates whether the policy was issued with an income +#' guarantee} +#' \item{qual}{Indicates whether the policy was purchased with tax-qualified +#' funds} +#' \item{product}{Product: a, b, or c} +#' \item{exposure_n}{Sum of policy year exposures by count} +#' \item{claims_n}{Sum of claim counts} +#' \item{av}{Sum of account value} +#' \item{exposure_amt}{Sum of policy year exposures weighted by account value} +#' \item{claims_amt}{Sum of claims weighted by account value} +#' \item{av_sq}{Sum of squared account values} +#' \item{n}{Number of exposure records} +#' \item{wd}{Sum of partial withdrawal transactions} +#' \item{wd_n}{Count of partial withdrawal transactions} +#' \item{wd_flag}{Count of exposure records with partial withdrawal +#' transactions} +#' \item{wd_sq}{Sum of squared partial withdrawal transactions} +#' \item{av_w_wd}{Sum of account value for exposure records with partial +#' withdrawal transactions} +#' } +#' @seealso [census_dat] +#' @name agg_sim_dat + +NULL +#' @rdname agg_sim_dat +"agg_sim_dat" diff --git a/R/exp_df_helpers.R b/R/exp_df_helpers.R new file mode 100644 index 0000000..535bb38 --- /dev/null +++ b/R/exp_df_helpers.R @@ -0,0 +1,160 @@ +#' Termination summary helper functions +#' +#' Convert aggregate termination experience studies to the `exp_df` class. +#' +#' `is_exp_df()` will return `TRUE` if `x` is an `exp_df` object. +#' +#' `as_exp_df()` will coerce a data frame to an `exp_df` object if that +#' data frame has columns for exposures and claims. +#' +#' `as_exp_df()` is most useful for working with aggregate summaries of +#' experience that were not created by actxps where individual policy +#' information is not available. After converting the data to the `exp_df` +#' class, [summary()] can be used to summarize data by any grouping variables, +#' and [autoplot()] and [autotable()] are available for reporting. +#' +#' If nothing is passed to `wt`, the data frame `x` must include columns +#' containing: +#' +#' - Exposures (`exposure`) +#' - Claim counts (`claims`) +#' +#' If `wt` is passed, the data must include columns containing: +#' +#' - Weighted exposures (`exposure`) +#' - Weighted claims (`claims`) +#' - Claim counts (`n_claims`) +#' - The raw sum of weights **NOT** multiplied by exposures +#' - Exposure record counts (`.weight_n`) +#' - The raw sum of squared weights (`.weight_sq`) +#' +#' The names in parentheses above are expected column names. If the data +#' frame passed to `as_exp_df()` uses different column names, these can be +#' specified using the `col_*` arguments. +#' +#' When a column name is passed to `wt`, the columns `.weight`, `.weight_n`, +#' and `.weight_sq` are used to calculate credibility and confidence intervals. +#' If credibility and confidence intervals aren't required, then it is not +#' necessary to pass anything to `wt`. The results of `as_exp_df()` and any +#' downstream summaries will still be weighted as long as the exposures and +#' claims are pre-weighted. +#' +#' `target_status`, `start_date`, and `end_date` are optional arguments that are +#' only used for printing the resulting `exp_df` object. +#' +#' @param x An object. For `as_exp_df()`, `x` must be a data frame. +#' @param expected A character vector containing column names in x with +#' expected values +#' @param wt Optional. Length 1 character vector. Name of the column in `x` +#' containing weights to use in the calculation of claims, exposures, partial +#' credibility, and confidence intervals. +#' @param col_claims Optional. Name of the column in `x` containing claims. The +#' assumed default is "claims". +#' @param col_exposure Optional. Name of the column in `x` containing exposures. +#' The assumed default is "exposure". +#' @param col_n_claims Optional and only used used when `wt` is passed. Name of +#' the column in `x` containing the number of claims. +#' @param col_weight_sq Optional and only used used when `wt` is passed. Name of +#' the column in `x` containing the sum of squared weights. +#' @param col_weight_n Optional and only used used when `wt` is passed. Name of +#' the column in `x` containing exposure record counts. +#' @param credibility If `TRUE`, future calls to [summary()] will include +#' partial credibility weights and credibility-weighted termination rates. +#' @param conf_level Confidence level used for the Limited Fluctuation +#' credibility method and confidence intervals +#' @param cred_r Error tolerance under the Limited Fluctuation credibility +#' method +#' @param conf_int If `TRUE`, future calls to [summary()] will include +#' confidence intervals around the observed termination rates and any +#' actual-to-expected ratios. +#' @inheritParams expose +#' +#' @return For `is_exp_df()`, a length-1 logical vector. For `as_exp_df()`, +#' an `exp_df` object. +#' +#' @seealso [exp_stats()] for information on how `exp_df` objects are typically +#' created from individual exposure records. +#' +#' @examples +#' # convert pre-aggregated experience into an exp_df object +#' dat <- as_exp_df(agg_sim_dat, col_exposure = "exposure_n", +#' col_claims = "claims_n", +#' target_status = "Surrender", +#' start_date = 2005, end_date = 2019, +#' conf_int = TRUE) +#' dat +#' is_exp_df(dat) +#' +#' # summary by policy year +#' summary(dat, pol_yr) +#' +#' # repeat the prior exercise on a weighted basis +#' dat_wt <- as_exp_df(agg_sim_dat, wt = "av", +#' col_exposure = "exposure_amt", +#' col_claims = "claims_amt", +#' col_n_claims = "claims_n", +#' col_weight_sq = "av_sq", +#' col_weight_n = "n", +#' target_status = "Surrender", +#' start_date = 2005, end_date = 2019, +#' conf_int = TRUE) +#' dat_wt +#' +#' # summary by policy year +#' summary(dat_wt, pol_yr) +#' +#' +#' @export +as_exp_df <- function(x, expected = NULL, wt = NULL, + col_claims, col_exposure, + col_n_claims, col_weight_sq, col_weight_n, + target_status = NULL, + start_date = as.Date("1900-01-01"), end_date = NULL, + credibility = FALSE, + conf_level = 0.95, cred_r = 0.05, conf_int = FALSE) { + + if (is_exp_df(x)) return(x) + + if (!is.data.frame(x)) { + rlang::abort("`x` must be a data frame.") + } + + # column name alignment + if (!missing(col_exposure)) x <- x |> rename(exposure = {{col_exposure}}) + if (!missing(col_claims)) x <- x |> rename(claims = {{col_claims}}) + + if (is.null(wt)) { + req_names <- c("exposure", "claims") + } else { + req_names <- c("exposure", "claims", "n_claims", ".weight", + ".weight_sq", ".weight_n") + if (!missing(col_n_claims)) x <- x |> rename(n_claims = {{col_n_claims}}) + x <- x |> rename(.weight = {{wt}}) + if (!missing(col_weight_sq)) x <- x |> + rename(.weight_sq = {{col_weight_sq}}) + if (!missing(col_weight_n)) x <- x |> rename(.weight_n = {{col_weight_n}}) + } + + # check required columns + verify_col_names(names(x), req_names) + + if (is.null(wt)) x$n_claims <- x$claims + + new_exp_df(x, + .groups = list(), + target_status = target_status, + start_date = start_date, + expected = expected, + end_date = end_date, + wt = wt, + credibility = credibility, + conf_level = conf_level, cred_r = cred_r, + conf_int = conf_int) + +} + +#' @export +#' @rdname as_exp_df +is_exp_df <- function(x) { + inherits(x, "exp_df") +} diff --git a/R/exp_stats.R b/R/exp_stats.R index 6256587..472e2be 100644 --- a/R/exp_stats.R +++ b/R/exp_stats.R @@ -332,7 +332,23 @@ finish_exp_stats <- function(.data, target_status, expected, .after = dplyr::last_col()) } - tibble::new_tibble(res, + new_exp_df(res, + .groups = .groups, + target_status = target_status, + start_date = start_date, + expected = expected, + end_date = end_date, + wt = wt, + credibility = credibility, + conf_level = conf_level, cred_r = cred_r, + conf_int = conf_int) +} + +# low level class constructor +new_exp_df <- function(x, .groups, target_status, start_date, expected, + end_date, wt, credibility, conf_level, + cred_r = cred_r, conf_int) { + tibble::new_tibble(x, class = "exp_df", groups = .groups, target_status = target_status, @@ -341,7 +357,8 @@ finish_exp_stats <- function(.data, target_status, expected, end_date = end_date, wt = wt, xp_params = list(credibility = credibility, - conf_level = conf_level, cred_r = cred_r, + conf_level = conf_level, + cred_r = cred_r, conf_int = conf_int)) } diff --git a/R/expose_split.R b/R/expose_split.R index 7892cc4..7cb77e1 100644 --- a/R/expose_split.R +++ b/R/expose_split.R @@ -37,7 +37,8 @@ #' @examples #' toy_census |> expose_cy("2022-12-31") |> expose_split() #' -#' @seealso [expose()] +#' @seealso [expose()] for information on creating exposure records from census +#' data. #' #' @export expose_split <- function(.data) { diff --git a/R/exposed_df_helpers.R b/R/exposed_df_helpers.R index dfce5a9..330304f 100644 --- a/R/exposed_df_helpers.R +++ b/R/exposed_df_helpers.R @@ -42,6 +42,8 @@ #' `as_exposed_df()`, an `exposed_df` object. #' #' @importFrom vctrs vec_ptype2 vec_cast +#' @seealso [expose()] for information on how `exposed_df` objects are typically +#' created from census data. #' #' @export is_exposed_df <- function(x) { @@ -119,16 +121,11 @@ as_exposed_df <- function(x, end_date, start_date = as.Date("1900-01-01"), # check required columns # pol_num, status, exposure, 2 date cols, policy period (policy expo only) - unmatched <- c("pol_num", "status", "exposure", + req_names <- c("pol_num", "status", "exposure", exp_col_pol_per, exp_cols_dates, exp_cols_trx) - unmatched <- setdiff(unmatched, names(x)) - - if (length(unmatched) > 0) { - rlang::abort(c(x = glue::glue("The following columns are missing from `x`: {paste(unmatched, collapse = ', ')}."), - i = "Hint: create these columns or use the `col_*` arguments to specify existing columns that should be mapped to these elements.")) - } + verify_col_names(names(x), req_names) if (missing(default_status)) { default_status <- most_common(x$status) @@ -612,3 +609,13 @@ verify_get_trx_types <- function(.data, required = TRUE) { } trx_types } + +# function to verify that required names exist and to send an error if not +verify_col_names <- function(x_names, required) { + unmatched <- setdiff(required, x_names) + + if (length(unmatched) > 0) { + rlang::abort(c(x = glue::glue("The following columns are missing: {paste(unmatched, collapse = ', ')}."), + i = "Hint: create these columns or use the `col_*` arguments to specify existing columns that should be mapped to these elements.")) + } +} diff --git a/R/sim_data.R b/R/sim_data.R index e15b61b..45f4575 100644 --- a/R/sim_data.R +++ b/R/sim_data.R @@ -43,7 +43,7 @@ #' \item{av_anniv}{Account value on the policy anniversary date} #' } #' - +#' @seealso [census_dat] #' @name sim_data NULL diff --git a/R/trx_df_helpers.R b/R/trx_df_helpers.R new file mode 100644 index 0000000..69417bc --- /dev/null +++ b/R/trx_df_helpers.R @@ -0,0 +1,153 @@ +#' Transaction summary helper functions +#' +#' Convert aggregate transaction experience studies to the `trx_df` class. +#' +#' `is_trx_df()` will return `TRUE` if `x` is a `trx_df` object. +#' +#' `as_trx_df()` will coerce a data frame to a `trx_df` object if that +#' data frame has the required columns for transaction studies listed below. +#' +#' `as_trx_df()` is most useful for working with aggregate summaries of +#' experience that were not created by actxps where individual policy +#' information is not available. After converting the data to the `trx_df` +#' class, [summary()] can be used to summarize data by any grouping variables, +#' and [autoplot()] and [autotable()] are available for reporting. +#' +#' At a minimum, the following columns are required: +#' +#' - Transaction amounts (`trx_amt`) +#' - Transaction counts (`trx_n`) +#' - The number of exposure records with transactions (`trx_flag`). This number +#' is not necessarily equal to transaction counts. If multiple transactions +#' are allowed per exposure period, `trx_flag` will be less than `trx_n`. +#' - Exposures (`exposure`) +#' +#' If transaction amounts should be expressed as a percentage of another +#' variable (i.e. to calculate utilization rates or actual-to-expected ratios), +#' additional columns are required: +#' +#' - A denominator "percent of" column. For example, the sum of account values. +#' - A denominator "percent of" column for exposure records with transactions. +#' For example, the sum of account values across all records with non-zero +#' transaction amounts. +#' +#' If confidence intervals are desired and "percent of" columns are passed, an +#' additional column for the sum of squared transaction amounts (`trx_amt_sq`) +#' is also required. +#' +#' The names in parentheses above are expected column names. If the data +#' frame passed to `as_trx_df()` uses different column names, these can be +#' specified using the `col_*` arguments. +#' +#' `start_date`, and `end_date` are optional arguments that are +#' only used for printing the resulting `trx_df` object. +#' +#' Unlike [trx_stats()], `as_trx_df()` only permits a single transaction type and +#' a single `percent_of` column. +#' +#' @param x An object. For `as_trx_df()`, `x` must be a data frame. +#' @param col_trx_amt Optional. Name of the column in `x` containing transaction +#' amounts. +#' @param col_trx_n Optional. Name of the column in `x` containing transaction +#' counts. +#' @param col_trx_flag Optional. Name of the column in `x` containing the number +#' of exposure records with transactions. +#' @param col_exposure Optional. Name of the column in `x` containing exposures. +#' @param col_percent_of Optional. Name of the column in `x` containing a +#' numeric variable to use in "percent of" calculations. +#' @param col_percent_of_w_trx Optional. Name of the column in `x` containing a +#' numeric variable to use in "percent of" calculations with transactions. +#' @param col_trx_amt_sq Optional and only required when `col_percent_of` is +#' passed. Name of the column in `x` containing squared transaction amounts. +#' @param conf_int If `TRUE`, future calls to [summary()] will include +#' confidence intervals around the observed utilization rates and any +#' `percent_of` output columns. +#' @param conf_level Confidence level for confidence intervals +#' @inheritParams expose +#' +#' @return For `is_trx_df()`, a length-1 logical vector. For `as_trx_df()`, +#' a `trx_df` object. +#' +#' @seealso [trx_stats()] for information on how `trx_df` objects are typically +#' created from individual exposure records. +#' +#' @examples +#' # convert pre-aggregated experience into a trx_df object +#' dat <- as_trx_df(agg_sim_dat, +#' col_exposure = "n", +#' col_trx_amt = "wd", +#' col_trx_n = "wd_n", +#' col_trx_flag = "wd_flag", +#' col_percent_of = "av", +#' col_percent_of_w_trx = "av_w_wd", +#' col_trx_amt_sq = "wd_sq", +#' start_date = 2005, end_date = 2019, +#' conf_int = TRUE) +#' dat +#' is_trx_df(dat) +#' +#' # summary by policy year +#' summary(dat, pol_yr) +#' +#' @export +as_trx_df <- function(x, + col_trx_amt = "trx_amt", + col_trx_n = "trx_n", + col_trx_flag = "trx_flag", + col_exposure = "exposure", + col_percent_of = NULL, + col_percent_of_w_trx = NULL, + col_trx_amt_sq = "trx_amt_sq", + start_date = as.Date("1900-01-01"), + end_date = NULL, + conf_int = FALSE, + conf_level = 0.95) { + + if (is_trx_df(x)) return(x) + + if (!is.data.frame(x)) { + rlang::abort("`x` must be a data frame.") + } + + # column name alignment + req_names <- c("exposure", "trx_amt", "trx_n", "trx_flag") + if (!missing(col_exposure)) x <- x |> rename(exposure = {{col_exposure}}) + if (!missing(col_trx_amt)) x <- x |> rename(trx_amt = {{col_trx_amt}}) + if (!missing(col_trx_n)) x <- x |> rename(trx_n = {{col_trx_n}}) + if (!missing(col_trx_flag)) x <- x |> rename(trx_flag = {{col_trx_flag}}) + + if (conf_int && !missing(col_percent_of)) { + req_names <- c(req_names, "trx_amt_sq") + if (!missing(col_trx_amt_sq)) x <- x |> + rename(trx_amt_sq = {{col_trx_amt_sq}}) + } + + if (!missing(col_percent_of)) { + req_names <- c(req_names, col_percent_of, paste0(col_percent_of, "_w_trx")) + } + if (!missing(col_percent_of_w_trx)) { + if (missing(col_percent_of)) { + rlang::abort("`col_percent_of_w_trx` was supplied without passing anything to `col_percent_of`") + } + pct_w_trx_name <- rlang::parse_expr(paste0(col_percent_of, "_w_trx")) + x <- x |> rename(!!pct_w_trx_name := {{col_percent_of_w_trx}}) + } + + # check required columns + verify_col_names(names(x), req_names) + + new_trx_df(x |> mutate(trx_type = col_trx_amt), + .groups = list(), + trx_types = col_trx_amt, + start_date = start_date, + percent_of = col_percent_of, + end_date = end_date, + conf_level = conf_level, + conf_int = conf_int) +} + +#' @export +#' @rdname as_trx_df +is_trx_df <- function(x) { + inherits(x, "trx_df") +} diff --git a/R/trx_stats.R b/R/trx_stats.R index 829f0e0..043880b 100644 --- a/R/trx_stats.R +++ b/R/trx_stats.R @@ -348,14 +348,32 @@ finish_trx_stats <- function(.data, trx_types, percent_of, relocate(trx_amt_sq, .after = dplyr::last_col()) } - tibble::new_tibble(res, + new_trx_df(res, + .groups = .groups, + trx_types = trx_types, + start_date = start_date, + percent_of = percent_of, + end_date = end_date, + conf_level = conf_level, + conf_int = conf_int) + +} + +# low level class constructor +new_trx_df <- function(x, .groups, trx_types, + start_date, percent_of, end_date, + conf_level, conf_int) { + + tibble::new_tibble(x, class = "trx_df", - groups = .groups, trx_types = trx_types, + groups = .groups, + trx_types = trx_types, start_date = start_date, percent_of = percent_of, end_date = end_date, xp_params = list(conf_level = conf_level, conf_int = conf_int)) + } verify_trx_df <- function(.data) { diff --git a/README.Rmd b/README.Rmd index 4cc01c5..7fd7260 100644 --- a/README.Rmd +++ b/README.Rmd @@ -88,7 +88,6 @@ Create a summary grouped by policy year and the presence of a guaranteed income rider. ```{r stats-grouped} - exp_res <- exposed_data |> group_by(pol_yr, inc_guar) |> exp_stats() @@ -102,7 +101,6 @@ First, attach one or more columns of expected termination rates to the exposure data. Then, pass these column names to the `expected` argument of `exp_stats()`. ```{r stats-ae} - expected_table <- c(seq(0.005, 0.03, length.out = 10), 0.2, 0.15, rep(0.05, 3)) # using 2 different expected termination rates diff --git a/README.md b/README.md index ac0193b..54185ac 100644 --- a/README.md +++ b/README.md @@ -112,7 +112,6 @@ Create a summary grouped by policy year and the presence of a guaranteed income rider. ``` r - exp_res <- exposed_data |> group_by(pol_yr, inc_guar) |> exp_stats() @@ -147,7 +146,6 @@ exposure data. Then, pass these column names to the `expected` argument of `exp_stats()`. ``` r - expected_table <- c(seq(0.005, 0.03, length.out = 10), 0.2, 0.15, rep(0.05, 3)) # using 2 different expected termination rates diff --git a/data-raw/create_data.R b/data-raw/create_data.R index 0fbb36c..452c32d 100644 --- a/data-raw/create_data.R +++ b/data-raw/create_data.R @@ -23,3 +23,24 @@ source("data-raw/simulate_data.R") usethis::use_data(census_dat, overwrite = TRUE) usethis::use_data(withdrawals, overwrite = TRUE) usethis::use_data(account_vals, overwrite = TRUE) + +agg_sim_dat <- expose_py(census_dat, "2019-12-31", + target_status = "Surrender") |> + add_transactions(withdrawals) |> + left_join(account_vals, by = c("pol_num", "pol_date_yr")) |> + group_by(pol_yr, inc_guar, qual, product) |> + summarize(exposure_n = sum(exposure), + claims_n = sum(status == "Surrender"), + av = sum(av_anniv), + exposure_amt = sum(exposure * av_anniv), + claims_amt = sum((status == "Surrender") * av_anniv), + av_sq = sum(av_anniv ^ 2), + n = n(), + wd = sum(trx_amt_Rider) + sum(trx_amt_Base), + wd_n = sum(trx_n_Rider) + sum(trx_n_Base), + wd_flag = sum(trx_amt_Rider > 0 | trx_amt_Base > 0), + wd_sq = sum(trx_amt_Rider ^ 2) + sum(trx_amt_Base ^ 2), + av_w_wd = sum(av_anniv[trx_amt_Rider > 0 | trx_amt_Base > 0]), + .groups = "drop") + +usethis::use_data(agg_sim_dat, overwrite = TRUE) diff --git a/data/agg_sim_dat.rda b/data/agg_sim_dat.rda new file mode 100644 index 0000000..77c106e Binary files /dev/null and b/data/agg_sim_dat.rda differ diff --git a/man/agg_sim_dat.Rd b/man/agg_sim_dat.Rd new file mode 100644 index 0000000..f221ecb --- /dev/null +++ b/man/agg_sim_dat.Rd @@ -0,0 +1,49 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/agg_sim_dat.R +\docType{data} +\name{agg_sim_dat} +\alias{agg_sim_dat} +\title{Aggregate simulated annuity data} +\format{ +A data frame containing summarized experience study results grouped +by policy year, income guarantee presence, tax-qualified status, and product. + +An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 180 rows and 16 columns. +} +\usage{ +agg_sim_dat +} +\description{ +A pre-aggregated version of surrender and withdrawal experience from the +simulated data sets \code{census_dat}, \code{withdrawals}, and \code{account_vals}. This +data is theoretical only and does not represent the experience on any +specific product. +} +\details{ +\describe{ +\item{pol_yr}{Policy year} +\item{inc_guar}{Indicates whether the policy was issued with an income +guarantee} +\item{qual}{Indicates whether the policy was purchased with tax-qualified +funds} +\item{product}{Product: a, b, or c} +\item{exposure_n}{Sum of policy year exposures by count} +\item{claims_n}{Sum of claim counts} +\item{av}{Sum of account value} +\item{exposure_amt}{Sum of policy year exposures weighted by account value} +\item{claims_amt}{Sum of claims weighted by account value} +\item{av_sq}{Sum of squared account values} +\item{n}{Number of exposure records} +\item{wd}{Sum of partial withdrawal transactions} +\item{wd_n}{Count of partial withdrawal transactions} +\item{wd_flag}{Count of exposure records with partial withdrawal +transactions} +\item{wd_sq}{Sum of squared partial withdrawal transactions} +\item{av_w_wd}{Sum of account value for exposure records with partial +withdrawal transactions} +} +} +\seealso{ +\link{census_dat} +} +\keyword{datasets} diff --git a/man/as_exp_df.Rd b/man/as_exp_df.Rd new file mode 100644 index 0000000..06a6e5d --- /dev/null +++ b/man/as_exp_df.Rd @@ -0,0 +1,156 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/exp_df_helpers.R +\name{as_exp_df} +\alias{as_exp_df} +\alias{is_exp_df} +\title{Termination summary helper functions} +\usage{ +as_exp_df( + x, + expected = NULL, + wt = NULL, + col_claims, + col_exposure, + col_n_claims, + col_weight_sq, + col_weight_n, + target_status = NULL, + start_date = as.Date("1900-01-01"), + end_date = NULL, + credibility = FALSE, + conf_level = 0.95, + cred_r = 0.05, + conf_int = FALSE +) + +is_exp_df(x) +} +\arguments{ +\item{x}{An object. For \code{as_exp_df()}, \code{x} must be a data frame.} + +\item{expected}{A character vector containing column names in x with +expected values} + +\item{wt}{Optional. Length 1 character vector. Name of the column in \code{x} +containing weights to use in the calculation of claims, exposures, partial +credibility, and confidence intervals.} + +\item{col_claims}{Optional. Name of the column in \code{x} containing claims. The +assumed default is "claims".} + +\item{col_exposure}{Optional. Name of the column in \code{x} containing exposures. +The assumed default is "exposure".} + +\item{col_n_claims}{Optional and only used used when \code{wt} is passed. Name of +the column in \code{x} containing the number of claims.} + +\item{col_weight_sq}{Optional and only used used when \code{wt} is passed. Name of +the column in \code{x} containing the sum of squared weights.} + +\item{col_weight_n}{Optional and only used used when \code{wt} is passed. Name of +the column in \code{x} containing exposure record counts.} + +\item{target_status}{Character vector of target status values. Default value += \code{NULL}.} + +\item{start_date}{Experience study start date. Default value = 1900-01-01.} + +\item{end_date}{Experience study end date} + +\item{credibility}{If \code{TRUE}, future calls to \code{\link[=summary]{summary()}} will include +partial credibility weights and credibility-weighted termination rates.} + +\item{conf_level}{Confidence level used for the Limited Fluctuation +credibility method and confidence intervals} + +\item{cred_r}{Error tolerance under the Limited Fluctuation credibility +method} + +\item{conf_int}{If \code{TRUE}, future calls to \code{\link[=summary]{summary()}} will include +confidence intervals around the observed termination rates and any +actual-to-expected ratios.} +} +\value{ +For \code{is_exp_df()}, a length-1 logical vector. For \code{as_exp_df()}, +an \code{exp_df} object. +} +\description{ +Convert aggregate termination experience studies to the \code{exp_df} class. +} +\details{ +\code{is_exp_df()} will return \code{TRUE} if \code{x} is an \code{exp_df} object. + +\code{as_exp_df()} will coerce a data frame to an \code{exp_df} object if that +data frame has columns for exposures and claims. + +\code{as_exp_df()} is most useful for working with aggregate summaries of +experience that were not created by actxps where individual policy +information is not available. After converting the data to the \code{exp_df} +class, \code{\link[=summary]{summary()}} can be used to summarize data by any grouping variables, +and \code{\link[=autoplot]{autoplot()}} and \code{\link[=autotable]{autotable()}} are available for reporting. + +If nothing is passed to \code{wt}, the data frame \code{x} must include columns +containing: +\itemize{ +\item Exposures (\code{exposure}) +\item Claim counts (\code{claims}) +} + +If \code{wt} is passed, the data must include columns containing: +\itemize{ +\item Weighted exposures (\code{exposure}) +\item Weighted claims (\code{claims}) +\item Claim counts (\code{n_claims}) +\item The raw sum of weights \strong{NOT} multiplied by exposures +\item Exposure record counts (\code{.weight_n}) +\item The raw sum of squared weights (\code{.weight_sq}) +} + +The names in parentheses above are expected column names. If the data +frame passed to \code{as_exp_df()} uses different column names, these can be +specified using the \verb{col_*} arguments. + +When a column name is passed to \code{wt}, the columns \code{.weight}, \code{.weight_n}, +and \code{.weight_sq} are used to calculate credibility and confidence intervals. +If credibility and confidence intervals aren't required, then it is not +necessary to pass anything to \code{wt}. The results of \code{as_exp_df()} and any +downstream summaries will still be weighted as long as the exposures and +claims are pre-weighted. + +\code{target_status}, \code{start_date}, and \code{end_date} are optional arguments that are +only used for printing the resulting \code{exp_df} object. +} +\examples{ +# convert pre-aggregated experience into an exp_df object +dat <- as_exp_df(agg_sim_dat, col_exposure = "exposure_n", + col_claims = "claims_n", + target_status = "Surrender", + start_date = 2005, end_date = 2019, + conf_int = TRUE) +dat +is_exp_df(dat) + +# summary by policy year +summary(dat, pol_yr) + +# repeat the prior exercise on a weighted basis +dat_wt <- as_exp_df(agg_sim_dat, wt = "av", + col_exposure = "exposure_amt", + col_claims = "claims_amt", + col_n_claims = "claims_n", + col_weight_sq = "av_sq", + col_weight_n = "n", + target_status = "Surrender", + start_date = 2005, end_date = 2019, + conf_int = TRUE) +dat_wt + +# summary by policy year +summary(dat_wt, pol_yr) + + +} +\seealso{ +\code{\link[=exp_stats]{exp_stats()}} for information on how \code{exp_df} objects are typically +created from individual exposure records. +} diff --git a/man/as_trx_df.Rd b/man/as_trx_df.Rd new file mode 100644 index 0000000..c7128be --- /dev/null +++ b/man/as_trx_df.Rd @@ -0,0 +1,133 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/trx_df_helpers.R +\name{as_trx_df} +\alias{as_trx_df} +\alias{is_trx_df} +\title{Transaction summary helper functions} +\usage{ +as_trx_df( + x, + col_trx_amt = "trx_amt", + col_trx_n = "trx_n", + col_trx_flag = "trx_flag", + col_exposure = "exposure", + col_percent_of = NULL, + col_percent_of_w_trx = NULL, + col_trx_amt_sq = "trx_amt_sq", + start_date = as.Date("1900-01-01"), + end_date = NULL, + conf_int = FALSE, + conf_level = 0.95 +) + +is_trx_df(x) +} +\arguments{ +\item{x}{An object. For \code{as_trx_df()}, \code{x} must be a data frame.} + +\item{col_trx_amt}{Optional. Name of the column in \code{x} containing transaction +amounts.} + +\item{col_trx_n}{Optional. Name of the column in \code{x} containing transaction +counts.} + +\item{col_trx_flag}{Optional. Name of the column in \code{x} containing the number +of exposure records with transactions.} + +\item{col_exposure}{Optional. Name of the column in \code{x} containing exposures.} + +\item{col_percent_of}{Optional. Name of the column in \code{x} containing a +numeric variable to use in "percent of" calculations.} + +\item{col_percent_of_w_trx}{Optional. Name of the column in \code{x} containing a +numeric variable to use in "percent of" calculations with transactions.} + +\item{col_trx_amt_sq}{Optional and only required when \code{col_percent_of} is +passed. Name of the column in \code{x} containing squared transaction amounts.} + +\item{start_date}{Experience study start date. Default value = 1900-01-01.} + +\item{end_date}{Experience study end date} + +\item{conf_int}{If \code{TRUE}, future calls to \code{\link[=summary]{summary()}} will include +confidence intervals around the observed utilization rates and any +\code{percent_of} output columns.} + +\item{conf_level}{Confidence level for confidence intervals} +} +\value{ +For \code{is_trx_df()}, a length-1 logical vector. For \code{as_trx_df()}, +a \code{trx_df} object. +} +\description{ +Convert aggregate transaction experience studies to the \code{trx_df} class. +} +\details{ +\code{is_trx_df()} will return \code{TRUE} if \code{x} is a \code{trx_df} object. + +\code{as_trx_df()} will coerce a data frame to a \code{trx_df} object if that +data frame has the required columns for transaction studies listed below. + +\code{as_trx_df()} is most useful for working with aggregate summaries of +experience that were not created by actxps where individual policy +information is not available. After converting the data to the \code{trx_df} +class, \code{\link[=summary]{summary()}} can be used to summarize data by any grouping variables, +and \code{\link[=autoplot]{autoplot()}} and \code{\link[=autotable]{autotable()}} are available for reporting. + +At a minimum, the following columns are required: +\itemize{ +\item Transaction amounts (\code{trx_amt}) +\item Transaction counts (\code{trx_n}) +\item The number of exposure records with transactions (\code{trx_flag}). This number +is not necessarily equal to transaction counts. If multiple transactions +are allowed per exposure period, \code{trx_flag} will be less than \code{trx_n}. +\item Exposures (\code{exposure}) +} + +If transaction amounts should be expressed as a percentage of another +variable (i.e. to calculate utilization rates or actual-to-expected ratios), +additional columns are required: +\itemize{ +\item A denominator "percent of" column. For example, the sum of account values. +\item A denominator "percent of" column for exposure records with transactions. +For example, the sum of account values across all records with non-zero +transaction amounts. +} + +If confidence intervals are desired and "percent of" columns are passed, an +additional column for the sum of squared transaction amounts (\code{trx_amt_sq}) +is also required. + +The names in parentheses above are expected column names. If the data +frame passed to \code{as_trx_df()} uses different column names, these can be +specified using the \verb{col_*} arguments. + +\code{start_date}, and \code{end_date} are optional arguments that are +only used for printing the resulting \code{trx_df} object. + +Unlike \code{\link[=trx_stats]{trx_stats()}}, \code{as_trx_df()} only permits a single transaction type and +a single \code{percent_of} column. +} +\examples{ +# convert pre-aggregated experience into a trx_df object +dat <- as_trx_df(agg_sim_dat, + col_exposure = "n", + col_trx_amt = "wd", + col_trx_n = "wd_n", + col_trx_flag = "wd_flag", + col_percent_of = "av", + col_percent_of_w_trx = "av_w_wd", + col_trx_amt_sq = "wd_sq", + start_date = 2005, end_date = 2019, + conf_int = TRUE) +dat +is_trx_df(dat) + +# summary by policy year +summary(dat, pol_yr) + +} +\seealso{ +\code{\link[=trx_stats]{trx_stats()}} for information on how \code{trx_df} objects are typically +created from individual exposure records. +} diff --git a/man/expose_split.Rd b/man/expose_split.Rd index abafe87..02d0b36 100644 --- a/man/expose_split.Rd +++ b/man/expose_split.Rd @@ -54,5 +54,6 @@ toy_census |> expose_cy("2022-12-31") |> expose_split() } \seealso{ -\code{\link[=expose]{expose()}} +\code{\link[=expose]{expose()}} for information on creating exposure records from census +data. } diff --git a/man/is_exposed_df.Rd b/man/is_exposed_df.Rd index 7c34770..575dda1 100644 --- a/man/is_exposed_df.Rd +++ b/man/is_exposed_df.Rd @@ -92,3 +92,7 @@ policy periods (for policy exposures only), and exposure start / end dates. Optionally, if \code{x} has transaction counts and amounts by type, these can be specified without calling \code{\link[=add_transactions]{add_transactions()}}. } +\seealso{ +\code{\link[=expose]{expose()}} for information on how \code{exposed_df} objects are typically +created from census data. +} diff --git a/man/sim_data.Rd b/man/sim_data.Rd index ee12973..b202945 100644 --- a/man/sim_data.Rd +++ b/man/sim_data.Rd @@ -63,4 +63,7 @@ does not represent the experience on any specific product. } } +\seealso{ +\link{census_dat} +} \keyword{datasets} diff --git a/tests/testthat/test-exp_df_helpers.R b/tests/testthat/test-exp_df_helpers.R new file mode 100644 index 0000000..2d980da --- /dev/null +++ b/tests/testthat/test-exp_df_helpers.R @@ -0,0 +1,78 @@ +res <- expose(toy_census, "2022-12-31", target_status = "Surrender") |> + exp_stats() + +test_that("is_exp_df works", { + expect_true(is_exp_df(res)) + expect_false(is_exp_df(mtcars)) +}) + +res2 <- as.data.frame(res) + +test_that("as_exp_df works", { + + + res3 <- as_exp_df(res2) + res4 <- res2 |> + rename(expo = exposure) + res5 <- res4 |> + rename(clms = claims) + + expect_error(as_exp_df(data.frame(a = 1:3)), + regexp = "The following columns are missing") + + expect_true(is_exp_df(as_exp_df(res))) + + expect_false(is_exp_df(res2)) + + expect_true(is_exp_df(res3)) + + expect_error(as_exp_df(res4), regexp = "The following columns are missing") + expect_no_error(as_exp_df(res4, col_exposure = "expo")) + expect_no_error(as_exp_df(res5, col_exposure = "expo", col_claims = "clms")) + + expect_error(as_exp_df(1), regexp = "`x` must be a data frame.") + +}) + +# weighted tests +res_wt <- expose(census_dat, "2019-12-31", target_status = "Surrender") |> + mutate(ex = 0.05) |> + group_by(pol_yr, product) |> + exp_stats(wt = "premium", expected = "ex", + conf_int = TRUE, credibility = TRUE) + +res_wt2 <- as.data.frame(res_wt) |> + rename(premium = .weight) +res_wt3 <- as_exp_df(res_wt2, wt = "premium", expected = "ex", + conf_int = TRUE, credibility = TRUE) + +test_that("as_exp_df with weights works", { + + res_wt4 <- res_wt2 |> + rename(expo = exposure) + res_wt5 <- res_wt4 |> + rename(clms = claims, + n = n_claims, + sq = .weight_sq) + + expect_true(is_exp_df(as_exp_df(res_wt))) + expect_true(is_exp_df(res_wt3)) + + expect_error(as_exp_df(res_wt5, wt = "premium"), + regexp = "The following columns are missing") + expect_no_error(as_exp_df(res_wt4, wt = "premium", col_exposure = "expo")) + expect_no_error(as_exp_df(res_wt5, wt = "premium", + col_exposure = "expo", col_claims = "clms", + col_weight_sq = "sq", col_n_claims = "n")) + +}) + +test_that("as_exp_df summary matches an object created by exp_stats", { + x <- summary(res_wt, product) |> select(-product) + y <- summary(res_wt3, product) |> select(-product) + expect_true(dplyr::near(x - y, 0) |> all()) + + x <- summary(res_wt, pol_yr) |> select(-pol_yr) + y <- summary(res_wt3, pol_yr) |> select(-pol_yr) + expect_true(dplyr::near(x - y, 0) |> all()) +}) diff --git a/tests/testthat/test-exposed_df_helpers.R b/tests/testthat/test-exposed_df_helpers.R index cbfa49d..37ee7d5 100644 --- a/tests/testthat/test-exposed_df_helpers.R +++ b/tests/testthat/test-exposed_df_helpers.R @@ -20,16 +20,20 @@ test_that("as_exposed_df works", { start = pol_date_yr, end = pol_date_yr_end) - expect_error(as_exposed_df(data.frame(a = 1:3), Sys.Date())) + expect_error(as_exposed_df(data.frame(a = 1:3), Sys.Date()), + regexp = "The following columns are missing") expect_true(is_exposed_df(as_exposed_df(expo))) expect_false(is_exposed_df(expo2)) - expect_error(as_exposed_df(expo2, end_date = "2022-12-31", expo_length = "yr")) + expect_error(as_exposed_df(expo2, end_date = "2022-12-31", + expo_length = "yr"), + regexp = "`expo_length` must be one of") expect_true(is_exposed_df(expo3)) - expect_error(as_exposed_df(expo4)) + expect_error(as_exposed_df(expo4, + regexp = "The following columns are missing")) expect_no_error(as_exposed_df(expo4, end_date = "2022-12-31", col_pol_num = "pnum")) expect_no_error(as_exposed_df(expo5, end_date = "2022-12-31", @@ -39,7 +43,7 @@ test_that("as_exposed_df works", { col_pol_per = "py", cols_dates = c("start", "end"))) - expect_error(as_exposed_df(1)) + expect_error(as_exposed_df(1), regexp = "`x` must be a data frame.") }) @@ -53,12 +57,14 @@ test_that("as_exposed_df works with transactions", { trx_amt_B = 4) expect_no_error(as_exposed_df(expo6, "2022-12-31", trx_types = c("A", "B"))) - expect_error(as_exposed_df(expo6, "2022-12-31", trx_types = c("A", "C"))) + expect_error(as_exposed_df(expo6, "2022-12-31", trx_types = c("A", "C")), + regexp = "The following columns are missing") expo7 <- expo6 |> rename(n_A = trx_n_A, n_B = trx_n_B, amt_A = trx_amt_A, amt_B = trx_amt_B) - expect_error(as_exposed_df(expo7, "2022-12-31", trx_types = c("A", "B"))) + expect_error(as_exposed_df(expo7, "2022-12-31", trx_types = c("A", "B")), + regexp = "The following columns are missing") expect_no_error(as_exposed_df(expo7, "2022-12-31", trx_types = c("A", "B"), col_trx_n_ = "n_", col_trx_amt_ = "amt_")) diff --git a/tests/testthat/test-trx_df_helpers.R b/tests/testthat/test-trx_df_helpers.R new file mode 100644 index 0000000..6c9fd2d --- /dev/null +++ b/tests/testthat/test-trx_df_helpers.R @@ -0,0 +1,46 @@ +res <- expose(census_dat, "2019-12-31", target_status = "Surrender") |> + add_transactions(withdrawals) |> + left_join(account_vals, by = c("pol_num", "pol_date_yr")) |> + group_by(pol_yr, inc_guar) |> + trx_stats(percent_of = "av_anniv", trx_types = "Base", conf_int = TRUE) + +test_that("is_trx_df works", { + expect_true(is_trx_df(res)) + expect_false(is_trx_df(mtcars)) +}) + +res2 <- as.data.frame(res) +res3 <- as_trx_df(res2, col_percent_of = "av_anniv", conf_int = TRUE) + +test_that("as_trx_df works", { + + res4 <- res2 |> + rename(expo = exposure) + res5 <- res4 |> + rename(tamt = trx_amt, + tn = trx_n) + + expect_error(as_trx_df(data.frame(a = 1:3)), + regexp = "The following columns are missing") + + expect_true(is_trx_df(as_trx_df(res))) + + expect_false(is_trx_df(res2)) + + expect_true(is_trx_df(res3)) + + expect_error(as_trx_df(res4), regexp = "The following columns are missing") + expect_no_error(as_trx_df(res4, col_exposure = "expo")) + expect_no_error(as_trx_df(res5, col_exposure = "expo", col_trx_amt = "tamt", + col_trx_n = "tn")) + + expect_error(as_trx_df(1), regexp = "`x` must be a data frame.") + +}) + + +test_that("as_trx_df summary matches an object created by trx_stats", { + x <- summary(res, inc_guar) |> select(-inc_guar, -trx_type) + y <- summary(res3, inc_guar) |> select(-inc_guar, -trx_type) + expect_true(dplyr::near(x - y, 0) |> all()) +}) diff --git a/vignettes/actxps.Rmd b/vignettes/actxps.Rmd index 8b6ee9b..d17fc7f 100644 --- a/vignettes/actxps.Rmd +++ b/vignettes/actxps.Rmd @@ -108,7 +108,6 @@ exp_res To derive actual-to-expected rates, first attach one or more columns of expected termination rates to the exposure data. Then, pass these column names to the `expected` argument of `exp_stats()`. ```{r stats-ae} - expected_table <- c(seq(0.005, 0.03, length.out = 10), 0.2, 0.15, rep(0.05, 3)) # using 2 different expected termination rates diff --git a/vignettes/exp_summary.Rmd b/vignettes/exp_summary.Rmd index 121b52f..933935f 100644 --- a/vignettes/exp_summary.Rmd +++ b/vignettes/exp_summary.Rmd @@ -78,7 +78,6 @@ If the data frame passed into `exp_stats()` is grouped using `dplyr::group_by()` In the following, `exposed_data` is grouped by policy year before being passed to `exp_stats()`. This results in one row per policy year found in the data. ```{r grouped-1} - exposed_data |> group_by(pol_yr) |> exp_stats() @@ -88,7 +87,6 @@ exposed_data |> Multiple grouping variables are allowed. Below, the presence of an income guarantee (`inc_guar`) is added as a second grouping variable. ```{r grouped-2} - exposed_data |> group_by(inc_guar, pol_yr) |> exp_stats() @@ -105,7 +103,6 @@ Even if the target status exists on the input data, it can be overridden. Howeve Using the example data, a total termination rate can be estimated by including both death and surrender statuses in `target_status`. To ensure exposures are accurate, an adjustment is made to fully expose deaths prior to calling `exp_stats()`^[This adjustment is not necessary on surrenders because the `expose()` function previously did this for us.]. ```{r targ-status} - exposed_data |> mutate(exposure = ifelse(status == "Death", 1, status)) |> group_by(pol_yr) |> @@ -121,7 +118,6 @@ Experience studies often weight output by key policy values. Examples include ac Our sample data contains a column called `premium` that we can weight by. When weights are supplied, the `claims`, `exposure`, and `q_obs` columns will be weighted. If expected termination rates are supplied (see below), these rates and A/E values will also be weighted.^[When weights are supplied, additional columns are created containing the sum of weights, the sum of squared weights, and the number of records. These columns are used for re-summarizing the data (see the "Summary method" section on this page).] ```{r weight-res} - exposed_data |> group_by(pol_yr) |> exp_stats(wt = 'premium') @@ -145,7 +141,6 @@ In the output, 4 new columns are created for expected rates and A/E ratios. ```{r act-exp} - expected_table <- c(seq(0.005, 0.03, length.out = 10), 0.2, 0.15, rep(0.05, 3)) # using 2 different expected termination assumption sets @@ -167,7 +162,6 @@ exp_res |> As noted above, if weights are passed to `exp_stats()` then A/E ratios will also be weighted. ```{r act-exp-wt} - exposed_data2 |> group_by(pol_yr, inc_guar) |> exp_stats(expected = c("expected_1", "expected_2"), @@ -305,7 +299,6 @@ exposed_data |> `exp_stats()` can still work when given a non-`exposed_df` data frame. However, it will be unable to infer certain attributes like the target status and the study dates. For target status, all statuses except the first level are assumed to be terminations. Since this may not be desirable, a warning message will appear informing what statuses were assumed to be terminated. ```{r not-exposed_df} - not_exposed_df <- data.frame(exposed_data) exp_stats(not_exposed_df) diff --git a/vignettes/exposures.Rmd b/vignettes/exposures.Rmd index 6d53e0f..fd373e4 100644 --- a/vignettes/exposures.Rmd +++ b/vignettes/exposures.Rmd @@ -58,7 +58,6 @@ Let's assume we're performing an experience study as of 2022-12-31 and we're int To calculate exposures, we pass our data to the `expose()` function and we specify a study `end_date`. ```{r expose-1} - exposed_data <- expose(toy_census, end_date = "2022-12-31") ``` @@ -151,7 +150,6 @@ If `cal_expo` is set to `TRUE`, calendar year exposures will be calculated. Looking at the second policy, we can see that the first year is left-censored because the policy was issued two-fifths of the way through the year, and the last period is right-censored because the policy terminated roughly seven-tenths of the way through the year. ```{r expo-cal} - exposed_cal <- toy_census |> expose(end_date = "2022-12-31", cal_expo = TRUE, target_status = "Surrender") @@ -214,7 +212,6 @@ The two exposure bases will often not match for two reasons: Some downstream functions like `exp_stats()` expect `exposed_df` objects to have a single column for exposures. For split exposures, the exposure basis must be specified using the `col_exposure` argument. ```{r, split-stats-unclear, eval = FALSE} - exp_stats(split) ``` @@ -226,15 +223,13 @@ tryCatch(exp_stats(split), ``` -```{r, split-stats-clear, eval = FALSE} - +```{r, split-stats-clear} exp_stats(split, col_exposure = "exposure_pol") ``` `expose_split()` doesn't just work with calendar year exposures. Calendar quarters, months, or weeks can also be split. For periods shorter than a year, a record is only split into pre- and post-anniversary segments if a policy anniversary appears in the middle of the period. ```{r, split-qtr} - expose_cq(toy_census, "2022-12-31", target_status = "Surrender") |> expose_split() |> filter(pol_num == 2) |> @@ -248,7 +243,6 @@ Note, however, that calendar period exposures will always be expressed in the or For machine learning feature engineering, the actxps package contains a function called `step_expose()` that is compatible with the recipes package from tidymodels. This function applies the `expose()` function within a recipe. ```{r rec-expose} - library(recipes) expo_rec <- recipe(status ~ ., toy_census) |> @@ -323,7 +317,6 @@ For example, below `exposed_data2` contains study start and end dates that are b When `vctrs::vec_rbind()` is used to combine `exposed_data` and `exposed_data2`, the result combines attributes across both objects. ```{r combine-1} - exposed_data2 <- expose(toy_census, end_date = "2023-12-31", start_date = "1890-01-01", diff --git a/vignettes/misc.Rmd b/vignettes/misc.Rmd index d2d54eb..5c3e53a 100644 --- a/vignettes/misc.Rmd +++ b/vignettes/misc.Rmd @@ -24,6 +24,42 @@ library(actxps) library(lubridate) ``` +## Working with aggregate experience data + +Seriatim-level policy experience data is often not available for analysis. This is almost always the case with industry studies that contain experience data submitted by multiple parties. In these cases, experience is grouped by a several common policy attributes and aggregated accordingly. + +The typical workflow in actxps of `expose() |> exp_stats()` for termination studies or `expose() |> add_transactions() |> trx_stats()` for transaction studies doesn't apply if the starting data is aggregated. That is because another party has already gone through the steps of creating exposure records and performing an initial level of aggregation. + +Actxps provides two functions designed to work with aggregate experience data. + +- For termination studies, `as_exp_df()` converts a data frame of aggregate experience into an `exp_df` object, which is the class returned by `exp_stats()` += For transaction studies, `as_trx_df()` converts a data frame of aggregate experience into a `trx_df` object, which is the class returned by `trx_stats()` + +Both object classes have a `summary()` method which summarizes experience across any grouping variables passed to the function. The output of `summary()` will always be another `exp_df` (or `trx_df`) object, and will look just like the results of `exp_stats()` (or `trx_stats()`). For downstream reporting, summary results can be passed to the visualization functions `autoplot()` and `autotable()`. + +The `agg_sim_dat` data set contains aggregate experience on a theoretical block of deferred annuity contracts. Below, `as_exp_df()` is used to convert the data to an `exp_df`, and `summary()` is called using multiple grouping variables. + +```{r agg-exp-1} +agg_sim_exp_df <- agg_sim_dat |> + as_exp_df(col_exposure = "exposure_n", col_claims = "claims_n", + conf_int = TRUE, + start_date = 2005, end_date = 2019, target_status = "Surrender") +``` + +Results summarized by policy year + +```{r agg-exp-2} +summary(agg_sim_exp_df, pol_yr) +``` + +Results summarized by income guarantee presence and product + +```{r agg-exp-3} +summary(agg_sim_exp_df, inc_guar, product) +``` + +`as_exp_df()` and `as_trx_df()` contain several arguments for optional calculations like confidence intervals, expected values, weighting variables, and more. These arguments mirror the functionality in `exp_stats()` and `trx_stats()`. Both functions also contain multiple arguments for specifying column names associated with required values like exposures and claims. + ## Policy duration functions The `pol_()` family of functions calculates policy years, months, quarters, weeks, or any other arbitrary duration. Each function accepts a vector of dates and a vector of issue dates. @@ -31,8 +67,6 @@ The `pol_()` family of functions calculates policy years, months, quarters, week **Example**: assume a policy was issued on 2022-05-10 and we are interested in calculating various policy duration values at the end of calendar years 2022-2032. ```{r pol-dur1} - - dates <- ymd("2022-12-31") + years(0:10) # policy years @@ -53,7 +87,6 @@ pol_wk(dates, "2022-05-10") The more general `pol_interval()` function calculates any arbitrary duration. This function has a third argument where the length of the policy duration can be specified. This argument must be a period object. See `lubridate::period()` for more information. ```{r pol-dur2} - # days pol_interval(dates, "2022-05-10", days(1)) @@ -71,7 +104,6 @@ Below, a very simple logistic regression model is fit to surrender experience in The `col_expected` argument is used to rename the column(s) containing predicted values. If no names are specified, the default name is "expected". ```{r add-preds, fig.height=4, fig.width=5} - # create exposure records exposed_data <- expose(census_dat, end_date = "2019-12-31", target_status = "Surrender") |> diff --git a/vignettes/transactions.Rmd b/vignettes/transactions.Rmd index 69a0397..b868ebd 100644 --- a/vignettes/transactions.Rmd +++ b/vignettes/transactions.Rmd @@ -41,7 +41,6 @@ In this example, we'll be using the `census_dat`, `withdrawals`, and `account_va The `add_transactions()` function attaches transactions to a data frame with exposure-level records. This data frame must have the class `exposed_df`. For our example, we first need to convert `census_dat` into exposure records using the `expose()` function.^[See `vignette('exposures')` for more information on creating `exposed_df` objects.] This example will use policy year exposures. ```{r packages} - library(actxps) library(dplyr) @@ -119,7 +118,6 @@ If the data frame passed into `trx_stats()` is grouped using `dplyr::group_by()` In the following, `exposed_trx` is grouped by the presence of an income guarantee (`inc_guar`) before being passed to `trx_stats()`. This results in four rows because we have two types of transactions and two distinct values of `inc_guar`. ```{r grouped-1} - exposed_trx |> group_by(inc_guar) |> trx_stats() @@ -129,7 +127,6 @@ exposed_trx |> Multiple grouping variables are allowed. Below, policy year (`pol_yr`) is added as a second grouping variable. ```{r grouped-2} - exposed_trx |> group_by(pol_yr, inc_guar) |> trx_stats() @@ -150,7 +147,6 @@ If column names are passed to the `percent_of` argument of `trx_stats()`, the ou For our example, let's assume we're interested in examining withdrawal transactions as a percentage of account values, which are available in the `account_vals` data frame in the column `av_anniv`. ```{r pct-of} - # attach account values data exposed_trx_w_av <- exposed_trx |> left_join(account_vals, by = c("pol_num", "pol_date_yr")) @@ -206,7 +202,6 @@ exposed_trx_w_av |> The `autoplot()` and `autotable()` functions create visualizations and summary tables from `trx_df` objects. See `vignette("visualizations")` for full details on these functions. ```{r trx-plot, warning=FALSE, message=FALSE, fig.height=5.5, fig.width=7} - library(ggplot2) trx_res |>