From 91d1c9c337d1bd911aa007d8f528063763dcb378 Mon Sep 17 00:00:00 2001 From: Oskar Laverny Date: Tue, 25 Jun 2024 11:34:32 +0200 Subject: [PATCH] [docs] full rewrite (#19) --- .gitignore | 3 +- docs/src/index.md | 71 ++++++++++++++++++++++++++++++++--------------- src/RateTable.jl | 7 +++-- src/hmd_rates.jl | 6 ++-- src/r_rates.jl | 12 ++++---- 5 files changed, 65 insertions(+), 34 deletions(-) diff --git a/.gitignore b/.gitignore index 49b901e..b7b684c 100644 --- a/.gitignore +++ b/.gitignore @@ -6,4 +6,5 @@ /docs/build/ data/death_rates.zip data/death_rates/* -data/frpop.r \ No newline at end of file +data/frpop.r +.vscode/* \ No newline at end of file diff --git a/docs/src/index.md b/docs/src/index.md index eecf99f..e8669bb 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -1,52 +1,67 @@ ```@meta CurrentModule = RateTables +CollapsedDocStrings = true ``` -# RateTables +# RateTables.jl -The [RateTables.jl](https://github.com/JuliaSurv/RateTables.jl) Julia package provides daily rate table objects extracted from census datasets, tailored for person-year computations. This functionality is similar to [R's `ratetable` class](https://www.rdocumentation.org/packages/survival/versions/3.2-3/topics/ratetable). - -You can install it using the command below: +The [`RateTables.jl`](https://github.com/JuliaSurv/RateTables.jl) Julia package provides daily rate table objects extracted from census datasets, tailored for person-year computations. This functionality is similar to [R's `ratetable` class](https://www.rdocumentation.org/packages/survival/versions/3.2-3/topics/ratetable). You can install and load it through: ```julia using Pkg Pkg.add("https://github.com/JuliaSurv/RateTables.jl") ``` -Loading this package exports constant `RateTable` objects. The index refering to all the available rate tables can be found at the bottom of this page. For the sake of this example, we'll use the `hmd_rates` dictionary which stores rates tables extracted from the [Human Mortality Database (HMD)](https://mortality.org). +## `RateTables` objects + +Loading this package exports several constant `RateTable` objects, which list [given below](@ref "Exported RateTables"). Since they are exported, you can simply call them after `using RateTables` as follow : ```@example 1 using RateTables hmd_rates ``` -The output of the REPL shows that we have a `RateTable` object with two covariates `:country` and `:sex`. You can query the available covariates of a given RateTable as such: +This `hmd_rates` rate table represents mortality rates extracted from the [Human Mortality Database (HMD)](https://mortality.org). The output of the REPL shows that we have a `RateTable` object with two covariates `:country` and `:sex`. You can query the available covariates of a given `RateTable` as such: ```@example 1 availlable_covariates(hmd_rates, :sex) ``` -For this specific dataset, the number of countries is huge and calling `availlable_covariates(hmd_rates, :country)` won't be very useful. Thus, for convenience, we provided details on the country codes separately in another constant object called `hmd_countries`: +For this specific dataset, the number of countries is huge and calling `availlable_covariates(hmd_rates, :country)` won't be very useful. Thus, for convenience and only for this dataset we provided details on the country codes separately in another constant object called `hmd_countries`: ```@example 1 hmd_countries ``` +You can then use these covariates to subset the Rate Table object: + +```@example 1 +brt = hmd_rates[:svn,:male] +``` + +You obtain another object of the class `BasicRateTable`, as the core of the implementation. These objects have very strict internal characteristics. They mostly hold a matrix of daily hazard rates, indexed by ages (yearly) and dates (yearly too). The show function shows you the ranges of values for both ages and dates. When we constructed the life tables, we took care of other irregularities so that they all have exactly this shape (yearly intervals on both axes). + +The most important thing that you can do with them is querying mortality rates, which is done through the `daily_hazard` function. + +## Daily hazard + + Recall that the daily hazard rate of mortality is defined as $-\log(1 - q_x)/365.241$ for an annual death rate $q_x$. We present an alternative approach to displaying mortality tables that is particularly convenient for person-year computations. To obtain daily rates from the tables, you can use the `daily_hazard` function. Its arguments need to be in the following specific format: - The `age` parameter should be provided in days, with the conversion factor being 1 year = 365.241 days. - The `date` parameter should be provided in days as well, with the same conversion factor. - The format of other covariates may vary between rate tables, but it's essential to consider that their order is significant. -The `sex` covariate typically has values such as `:male` and `:female`, and sometimes `:total`. For the `hmd_rates` table, we have previously observed two additional covariates: `country` and `sex`. +The `sex` covariate typically has values such as `:male` and `:female`, and sometimes `:total`. For the `hmd_rates` table, we have previously observed two additional covariates: `country` and `sex`. Recall that you can use the `availlable_covariates` function to obtain these informations. -Depending on the querying syntax, the order of the passed arguments can be significant. For instance, the daily hazard rate for a Slovene male, on his 20th birthday, which happens to fall on the first of january 2010, can be queried using one of the following syntaxes: +There are several querying syntax, all lowering to the same code. You are free to choose the syntax that you prefer. Depending on the querying syntax, the order of the passed arguments can be significant. For instance, the daily hazard rate for a Slovene male, on his 20th birthday, which happens to fall on the tenth of January 2010, can be queried using one of the following syntaxes: ```@example 1 c = :svn # slovenia. -a = 20*365.241 -d = 2010*365.241 s = :male +a = 20 * 365.241 # twenty years old +d = 2010 * 365.241 + 10 # tenth of january 2010 + v1 = daily_hazard(hmd_rates, a, d, c, s) v2 = daily_hazard(hmd_rates, a, d; country=c, sex=s) v3 = daily_hazard(hmd_rates, a, d; sex=s, country=c) # when using kwargs syntax, the order of additional covariates does not matter. @@ -60,9 +75,7 @@ For completeness, this package also includes datasets commonly used in R for cen daily_hazard(slopop, a, d; sex=s) ``` -Note the discrepency with the HMD data. - -Another example with additional covariates would be the `survival::survexp.usr` dataset which includes `race` as a covariate. In this case, the calling structure remains similar: +Note the discrepancy with the HMD data: the source of the information is not exactly the same and so the rates dont perfectly match. Another example with additional covariates would be the `survival::survexp.usr` dataset which includes `race` as a covariate. In this case, the calling structure remains similar: ```@example 1 r = :white @@ -73,13 +86,17 @@ v4 = daily_hazard(survexp_usr[s, r], a, d) (v1,v2,v3,v4) ``` -Please note that retrieving these daily hazards is a highly sensitive operation that should be optimized for speed, especially considering it's often used within critical loops. As such, we prioritize the performance of our fetching algorithms. +Please note that retrieving these daily hazards is a highly sensitive operation that is very optimized for speed, especially considering it's often used within critical loops. As such, we prioritized the performance of our fetching algorithms over convenience of other parts of the implementation. The core algorithm is as follows: -For a list of the available rate tables, kindly refer to the following index: +- Fetch the right `BasicRateTable` from a dictionary using the provided covariates +- Convert from days to years the provided ages and dates +- Index the rate matrix at corresponding indices. + +If you feel like you are not getting top fetching performance, please open an issue. ## Life random variables -The `Life` function is used to extract individual life profiles (as random variables complient with `Distributions.jl`'s API) from a comprehensive ratetable, by using covariates such as age, gender, and health status or others. Once these life profiles are established, they serve as foundational elements for various analytical practices such as survival probability estimations, expected lifespan calculations, and simulations involving random variables related to life expectancy. +The `Life` function is used to extract individual life profiles (as random variables compliant with `Distributions.jl`'s API) from a `RateTable`, by using covariates such as age, gender, and health status or others. Once these life profiles are established, they serve as foundational elements for various analytical practices such as survival probability estimations, expected lifespan calculations, and simulations involving random variables related to life expectancy. When applying it to a male individual aged $20$ in $1990$, we get the outcome below: @@ -87,25 +104,35 @@ When applying it to a male individual aged $20$ in $1990$, we get the outcome be L = Life(slopop[:male], 7000, 1990*365.241) ``` -Due to the constance of the hazard rates on each cell of the lifetable, the life expectation can be computed through the following formula: +Since hazard rates are constants on each cell of a rate tables, the life expectation can be computed exactly through the following formula: $$\mathbf{E}(P) = \int_0^\inf S_p (t) dt = \sum_{j=0}^\inf \frac{S_p(t_j)}{\lambda_p(t_j)(1 - exp(-\lambda_p(t_j)(t_{j+1}-t_j)))}$$ -Implemented in the function `Distributions.expectation`: +Two approximations are made when the life gets out of the life table: + +- The last line of the ratetable is assumed to last until eternity. Indeed, the last line represents persons that are already 110 years old, and thus assuming that their future death rates are constants is not that much of an issue. +- When on the other hand a life exits the ratetable from the right, i.e. into the future but at a young age, we assume the last column of the rate table to define the future for this person. + +All this is implemented as a method for the `Distributions.expectation` function, since Lifes are random variables: ```@example 1 expectation(L)/365.241 ``` -We get $57.7$ years left, implying a total life expectancy of about $77$ years for the given individual. +On this example, we get $57.7$ years left, implying a total life expectancy of about $77$ years for the given individual. These random variables comply with the [`Distributions.jl`](https://github.com/JuliaStats/Distributions.jl)'s API. -## Package contents +## Exported RateTables -```@index +```@autodocs +Modules = [RateTables] +Filter = t -> isa(t,RateTables.AbstractRateTable) ``` +## Other docstrings + ```@autodocs Modules = [RateTables] +Filter = t -> !isa(t,RateTables.AbstractRateTable) ``` diff --git a/src/RateTable.jl b/src/RateTable.jl index 7edc2b9..bd54332 100644 --- a/src/RateTable.jl +++ b/src/RateTable.jl @@ -14,7 +14,7 @@ struct BasicRateTable <: AbstractRateTable end """ - `RateTable` + RateTable This class contains daily rate tables used in person-years computation. @@ -105,10 +105,13 @@ Base.getindex(rt::RateTable; kwargs...) = getindex(rt, collect(kwargs[n] for n i dty(t,minval,maxval) = min(Int(trunc(t*RT_YEARS_IN_DAY))-minval+1,maxval-minval+1) """ - `daily_hazard(rt::BasicRateTable,age,date)` + daily_hazard(rt::BasicRateTable, age, date) + daily_hazard(rt::RateTable, age, date, args...) + daily_hazard(rt::RateTable, age, date; kwargs...) This function queries daily hazard values from a given BasicRateTable. The parameters `age` and `date` have to be in days (1 year = $(RT_DAYS_IN_YEAR) days). +Potential args and kwargs will be used to subset the ratetable. """ @inline daily_hazard(rt::BasicRateTable,a, d) = rt.values[dty(a,rt.extrema_age...),dty(d,rt.extrema_year...)] @inline daily_hazard(rt::RateTable, a, d; kwargs...) = daily_hazard(getindex(rt; kwargs...), a, d) diff --git a/src/hmd_rates.jl b/src/hmd_rates.jl index c450ecd..96f0fc1 100644 --- a/src/hmd_rates.jl +++ b/src/hmd_rates.jl @@ -1,11 +1,11 @@ """ - `hmd_rates` + hmd_rates RateTable providing daily hazard rates for both sexes for several countries. They are derived from annual death probabilities (qₓ's) from the [Human Mortality Database](https://www.mortality.org/) Segmented by `country ∈ keys(hmd_countries)` and `sex ∈ (:male, :female, :total)`. -The list of countries codes is given with details in the `hmd_countries` constant. +The list of countries codes is given with details in the [`hmd_countries`](@ref) constant. """ const hmd_rates = let @@ -49,7 +49,7 @@ const hmd_rates = let end """ - `hmd_countries` + hmd_countries Gives a list of countries codes used in the `hmd_rates` dataset. """ diff --git a/src/r_rates.jl b/src/r_rates.jl index 7e098cb..33d7c96 100644 --- a/src/r_rates.jl +++ b/src/r_rates.jl @@ -1,7 +1,7 @@ """ - `slopop` + slopop Slovene census data. Correspond to R's `relsurv::slopop` ratetable from the `relsurv` package. Segmented by `sex ∈ (:male, :female)`. """ @@ -14,7 +14,7 @@ end """ - `survexp_us` + survexp_us Census data set for the US population, drawn from R's package `survival`. RateTable `survexp_us` gives total United States population, by age and sex, 1940 to 2012. Segmented by `sex ∈ (:male, :female)` """ @@ -26,7 +26,7 @@ end """ - `survexp_usr` + survexp_usr Census data set for the US population, drawn from R's package `survival`. RateTable `survexp_usr` gives the United States population, by age, sex and race, 1940 to 2014. Race is white or black. For 1960 and 1970 the black population values were not reported separately, so the nonwhite values were used. (Over the years, the reported tables have differed wrt reporting non-white and/or black.). Segmented by `sex ∈ (:male, :female)` and `race `∈ (:white, :black)`. """ @@ -39,7 +39,7 @@ end """ - `survexp_mn` + survexp_mn Census data set for the US population, drawn from R's package `survival`. RateTable `survexp_mn` gives total Minnesota population, by age and sex, 1970 to 2013. Segmented by `sex ∈ (:male, :female)` """ @@ -52,7 +52,7 @@ end """ - `survexp_fr` + survexp_fr French census datas, drawn from R's package `survexp.fr`. Death rates are available from 1977 to 2019 for males and females aged from 0 to 99. Segmented by `sex ∈ (:male, :female)` @@ -69,7 +69,7 @@ end """ - `frpop` + frpop French census datas, sourced from the Human mortality database (not exaclty the same series as hmd_rates[:fr]).