Skip to content

Commit

Permalink
[docs] full rewrite (#19)
Browse files Browse the repository at this point in the history
  • Loading branch information
lrnv authored Jun 25, 2024
1 parent 84c1a58 commit 91d1c9c
Show file tree
Hide file tree
Showing 5 changed files with 65 additions and 34 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@
/docs/build/
data/death_rates.zip
data/death_rates/*
data/frpop.r
data/frpop.r
.vscode/*
71 changes: 49 additions & 22 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,52 +1,67 @@
```@meta
CurrentModule = RateTables
CollapsedDocStrings = true
```

# RateTables
# RateTables.jl

The [RateTables.jl](https://github.com/JuliaSurv/RateTables.jl) Julia package provides daily rate table objects extracted from census datasets, tailored for person-year computations. This functionality is similar to [R's `ratetable` class](https://www.rdocumentation.org/packages/survival/versions/3.2-3/topics/ratetable).

You can install it using the command below:
The [`RateTables.jl`](https://github.com/JuliaSurv/RateTables.jl) Julia package provides daily rate table objects extracted from census datasets, tailored for person-year computations. This functionality is similar to [R's `ratetable` class](https://www.rdocumentation.org/packages/survival/versions/3.2-3/topics/ratetable). You can install and load it through:

```julia
using Pkg
Pkg.add("https://github.com/JuliaSurv/RateTables.jl")
```

Loading this package exports constant `RateTable` objects. The index refering to all the available rate tables can be found at the bottom of this page. For the sake of this example, we'll use the `hmd_rates` dictionary which stores rates tables extracted from the [Human Mortality Database (HMD)](https://mortality.org).
## `RateTables` objects

Loading this package exports several constant `RateTable` objects, which list [given below](@ref "Exported RateTables"). Since they are exported, you can simply call them after `using RateTables` as follow :

```@example 1
using RateTables
hmd_rates
```

The output of the REPL shows that we have a `RateTable` object with two covariates `:country` and `:sex`. You can query the available covariates of a given RateTable as such:
This `hmd_rates` rate table represents mortality rates extracted from the [Human Mortality Database (HMD)](https://mortality.org). The output of the REPL shows that we have a `RateTable` object with two covariates `:country` and `:sex`. You can query the available covariates of a given `RateTable` as such:

```@example 1
availlable_covariates(hmd_rates, :sex)
```

For this specific dataset, the number of countries is huge and calling `availlable_covariates(hmd_rates, :country)` won't be very useful. Thus, for convenience, we provided details on the country codes separately in another constant object called `hmd_countries`:
For this specific dataset, the number of countries is huge and calling `availlable_covariates(hmd_rates, :country)` won't be very useful. Thus, for convenience and only for this dataset we provided details on the country codes separately in another constant object called `hmd_countries`:

```@example 1
hmd_countries
```

You can then use these covariates to subset the Rate Table object:

```@example 1
brt = hmd_rates[:svn,:male]
```

You obtain another object of the class `BasicRateTable`, as the core of the implementation. These objects have very strict internal characteristics. They mostly hold a matrix of daily hazard rates, indexed by ages (yearly) and dates (yearly too). The show function shows you the ranges of values for both ages and dates. When we constructed the life tables, we took care of other irregularities so that they all have exactly this shape (yearly intervals on both axes).

The most important thing that you can do with them is querying mortality rates, which is done through the `daily_hazard` function.

## Daily hazard


Recall that the daily hazard rate of mortality is defined as $-\log(1 - q_x)/365.241$ for an annual death rate $q_x$. We present an alternative approach to displaying mortality tables that is particularly convenient for person-year computations. To obtain daily rates from the tables, you can use the `daily_hazard` function. Its arguments need to be in the following specific format:

- The `age` parameter should be provided in days, with the conversion factor being 1 year = 365.241 days.
- The `date` parameter should be provided in days as well, with the same conversion factor.
- The format of other covariates may vary between rate tables, but it's essential to consider that their order is significant.

The `sex` covariate typically has values such as `:male` and `:female`, and sometimes `:total`. For the `hmd_rates` table, we have previously observed two additional covariates: `country` and `sex`.
The `sex` covariate typically has values such as `:male` and `:female`, and sometimes `:total`. For the `hmd_rates` table, we have previously observed two additional covariates: `country` and `sex`. Recall that you can use the `availlable_covariates` function to obtain these informations.

Depending on the querying syntax, the order of the passed arguments can be significant. For instance, the daily hazard rate for a Slovene male, on his 20th birthday, which happens to fall on the first of january 2010, can be queried using one of the following syntaxes:
There are several querying syntax, all lowering to the same code. You are free to choose the syntax that you prefer. Depending on the querying syntax, the order of the passed arguments can be significant. For instance, the daily hazard rate for a Slovene male, on his 20th birthday, which happens to fall on the tenth of January 2010, can be queried using one of the following syntaxes:

```@example 1
c = :svn # slovenia.
a = 20*365.241
d = 2010*365.241
s = :male
a = 20 * 365.241 # twenty years old
d = 2010 * 365.241 + 10 # tenth of january 2010
v1 = daily_hazard(hmd_rates, a, d, c, s)
v2 = daily_hazard(hmd_rates, a, d; country=c, sex=s)
v3 = daily_hazard(hmd_rates, a, d; sex=s, country=c) # when using kwargs syntax, the order of additional covariates does not matter.
Expand All @@ -60,9 +75,7 @@ For completeness, this package also includes datasets commonly used in R for cen
daily_hazard(slopop, a, d; sex=s)
```

Note the discrepency with the HMD data.

Another example with additional covariates would be the `survival::survexp.usr` dataset which includes `race` as a covariate. In this case, the calling structure remains similar:
Note the discrepancy with the HMD data: the source of the information is not exactly the same and so the rates dont perfectly match. Another example with additional covariates would be the `survival::survexp.usr` dataset which includes `race` as a covariate. In this case, the calling structure remains similar:

```@example 1
r = :white
Expand All @@ -73,39 +86,53 @@ v4 = daily_hazard(survexp_usr[s, r], a, d)
(v1,v2,v3,v4)
```

Please note that retrieving these daily hazards is a highly sensitive operation that should be optimized for speed, especially considering it's often used within critical loops. As such, we prioritize the performance of our fetching algorithms.
Please note that retrieving these daily hazards is a highly sensitive operation that is very optimized for speed, especially considering it's often used within critical loops. As such, we prioritized the performance of our fetching algorithms over convenience of other parts of the implementation. The core algorithm is as follows:

For a list of the available rate tables, kindly refer to the following index:
- Fetch the right `BasicRateTable` from a dictionary using the provided covariates
- Convert from days to years the provided ages and dates
- Index the rate matrix at corresponding indices.

If you feel like you are not getting top fetching performance, please open an issue.

## Life random variables

The `Life` function is used to extract individual life profiles (as random variables complient with `Distributions.jl`'s API) from a comprehensive ratetable, by using covariates such as age, gender, and health status or others. Once these life profiles are established, they serve as foundational elements for various analytical practices such as survival probability estimations, expected lifespan calculations, and simulations involving random variables related to life expectancy.
The `Life` function is used to extract individual life profiles (as random variables compliant with `Distributions.jl`'s API) from a `RateTable`, by using covariates such as age, gender, and health status or others. Once these life profiles are established, they serve as foundational elements for various analytical practices such as survival probability estimations, expected lifespan calculations, and simulations involving random variables related to life expectancy.

When applying it to a male individual aged $20$ in $1990$, we get the outcome below:

```@example 1
L = Life(slopop[:male], 7000, 1990*365.241)
```

Due to the constance of the hazard rates on each cell of the lifetable, the life expectation can be computed through the following formula:
Since hazard rates are constants on each cell of a rate tables, the life expectation can be computed exactly through the following formula:

$$\mathbf{E}(P) = \int_0^\inf S_p (t) dt = \sum_{j=0}^\inf \frac{S_p(t_j)}{\lambda_p(t_j)(1 - exp(-\lambda_p(t_j)(t_{j+1}-t_j)))}$$

Implemented in the function `Distributions.expectation`:
Two approximations are made when the life gets out of the life table:

- The last line of the ratetable is assumed to last until eternity. Indeed, the last line represents persons that are already 110 years old, and thus assuming that their future death rates are constants is not that much of an issue.
- When on the other hand a life exits the ratetable from the right, i.e. into the future but at a young age, we assume the last column of the rate table to define the future for this person.

All this is implemented as a method for the `Distributions.expectation` function, since Lifes are random variables:

```@example 1
expectation(L)/365.241
```

We get $57.7$ years left, implying a total life expectancy of about $77$ years for the given individual.
On this example, we get $57.7$ years left, implying a total life expectancy of about $77$ years for the given individual.

These random variables comply with the [`Distributions.jl`](https://github.com/JuliaStats/Distributions.jl)'s API.

## Package contents
## Exported RateTables

```@index
```@autodocs
Modules = [RateTables]
Filter = t -> isa(t,RateTables.AbstractRateTable)
```

## Other docstrings

```@autodocs
Modules = [RateTables]
Filter = t -> !isa(t,RateTables.AbstractRateTable)
```
7 changes: 5 additions & 2 deletions src/RateTable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ struct BasicRateTable <: AbstractRateTable
end

"""
`RateTable`
RateTable
This class contains daily rate tables used in person-years computation.
Expand Down Expand Up @@ -105,10 +105,13 @@ Base.getindex(rt::RateTable; kwargs...) = getindex(rt, collect(kwargs[n] for n i
dty(t,minval,maxval) = min(Int(trunc(t*RT_YEARS_IN_DAY))-minval+1,maxval-minval+1)

"""
`daily_hazard(rt::BasicRateTable,age,date)`
daily_hazard(rt::BasicRateTable, age, date)
daily_hazard(rt::RateTable, age, date, args...)
daily_hazard(rt::RateTable, age, date; kwargs...)
This function queries daily hazard values from a given BasicRateTable.
The parameters `age` and `date` have to be in days (1 year = $(RT_DAYS_IN_YEAR) days).
Potential args and kwargs will be used to subset the ratetable.
"""
@inline daily_hazard(rt::BasicRateTable,a, d) = rt.values[dty(a,rt.extrema_age...),dty(d,rt.extrema_year...)]
@inline daily_hazard(rt::RateTable, a, d; kwargs...) = daily_hazard(getindex(rt; kwargs...), a, d)
Expand Down
6 changes: 3 additions & 3 deletions src/hmd_rates.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
"""
`hmd_rates`
hmd_rates
RateTable providing daily hazard rates for both sexes for several countries. They are derived from annual death probabilities (qₓ's) from the [Human Mortality Database](https://www.mortality.org/)
Segmented by `country ∈ keys(hmd_countries)` and `sex ∈ (:male, :female, :total)`.
The list of countries codes is given with details in the `hmd_countries` constant.
The list of countries codes is given with details in the [`hmd_countries`](@ref) constant.
"""
const hmd_rates = let

Expand Down Expand Up @@ -49,7 +49,7 @@ const hmd_rates = let
end

"""
`hmd_countries`
hmd_countries
Gives a list of countries codes used in the `hmd_rates` dataset.
"""
Expand Down
12 changes: 6 additions & 6 deletions src/r_rates.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

"""
`slopop`
slopop
Slovene census data. Correspond to R's `relsurv::slopop` ratetable from the `relsurv` package. Segmented by `sex ∈ (:male, :female)`.
"""
Expand All @@ -14,7 +14,7 @@ end

"""
`survexp_us`
survexp_us
Census data set for the US population, drawn from R's package `survival`. RateTable `survexp_us` gives total United States population, by age and sex, 1940 to 2012. Segmented by `sex ∈ (:male, :female)`
"""
Expand All @@ -26,7 +26,7 @@ end

"""
`survexp_usr`
survexp_usr
Census data set for the US population, drawn from R's package `survival`. RateTable `survexp_usr` gives the United States population, by age, sex and race, 1940 to 2014. Race is white or black. For 1960 and 1970 the black population values were not reported separately, so the nonwhite values were used. (Over the years, the reported tables have differed wrt reporting non-white and/or black.). Segmented by `sex ∈ (:male, :female)` and `race `∈ (:white, :black)`.
"""
Expand All @@ -39,7 +39,7 @@ end

"""
`survexp_mn`
survexp_mn
Census data set for the US population, drawn from R's package `survival`. RateTable `survexp_mn` gives total Minnesota population, by age and sex, 1970 to 2013. Segmented by `sex ∈ (:male, :female)`
"""
Expand All @@ -52,7 +52,7 @@ end

"""
`survexp_fr`
survexp_fr
French census datas, drawn from R's package `survexp.fr`. Death rates are available from 1977 to 2019 for males and females aged from 0 to 99. Segmented by `sex ∈ (:male, :female)`
Expand All @@ -69,7 +69,7 @@ end

"""
`frpop`
frpop
French census datas, sourced from the Human mortality database (not exaclty the same series as hmd_rates[:fr]).
Expand Down

0 comments on commit 91d1c9c

Please sign in to comment.