Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to reusable action #5

Merged
merged 25 commits into from
Mar 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
6206021
Ignore study-related files and QA issues
iaindillingham Mar 16, 2022
43c9ed8
Configure dependabot
iaindillingham Mar 16, 2022
0dd43bf
Add study as use case
iaindillingham Mar 16, 2022
d948f8f
Move and rename core module
iaindillingham Mar 16, 2022
cc6e644
Move cli and replace click
iaindillingham Mar 16, 2022
5eb6c63
Remove sh
iaindillingham Mar 16, 2022
d87bbf2
Downgrade Pandas and fix tests
iaindillingham Mar 16, 2022
6524f5d
Remove altair
iaindillingham Mar 17, 2022
e5b2510
Copy requirements.dev.in from cohort-joiner
iaindillingham Mar 17, 2022
0b210c6
Require ebmdatalab
iaindillingham Mar 17, 2022
8a69108
Use ebmdatalab.charts for making deciles charts
iaindillingham Mar 17, 2022
1954bfe
Add generate_deciles_charts to study
iaindillingham Mar 17, 2022
31832fa
Delete `get_deciles_table`
iaindillingham Mar 17, 2022
d714124
Delete `is_measure_table` decorator
iaindillingham Mar 17, 2022
56c3036
Rename measures_table to measure_table
iaindillingham Mar 17, 2022
82943ef
Remove remaining typings
iaindillingham Mar 17, 2022
d209ea6
Remove mocks from tests
iaindillingham Mar 17, 2022
406eb15
Split test into clearer arrange/act/assert stages
iaindillingham Mar 17, 2022
2d8eb7c
Rename deciles_chart to deciles_charts
iaindillingham Mar 17, 2022
b6ea652
Add action.yaml
iaindillingham Mar 17, 2022
67a8e77
Copy tagging new version from cohort-joiner
iaindillingham Mar 17, 2022
e125014
Update README.md
iaindillingham Mar 18, 2022
646f916
Rename module in tests
iaindillingham Mar 18, 2022
98643d3
Remove ethnicity codelist
iaindillingham Mar 22, 2022
cbd9310
Update measure ID
iaindillingham Mar 22, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .flake8
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
[flake8]
extend-exclude = .direnv,.venv,venv
ignore = \
E501 \ # line too long (black fixes long lines, except for long strings which may benefit from being long (eg URLs))
W503 # line break before binary operator (black disagrees)
ignore =
E501
W503
per-file-ignores =
analysis/*:INP001
max-line-length = 88
24 changes: 18 additions & 6 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ name: CI

on:
push:
branches:
- main

jobs:
check:
Expand All @@ -13,10 +15,10 @@ jobs:
- uses: actions/setup-python@v2
with:
python-version: "3.8"
cache: "pip"
cache: pip
cache-dependency-path: requirements.*.txt
- uses: extractions/setup-just@v1
- name: Check formatting, linting, and import sorting
- name: Check formatting, linting and import sorting
run: just check

test:
Expand All @@ -28,10 +30,20 @@ jobs:
- uses: actions/setup-python@v2
with:
python-version: "3.8"
cache: "pip"
cache: pip
cache-dependency-path: requirements.*.txt
- uses: extractions/setup-just@v1
- name: Run tests
# env: # Add environment variables required for tests
run: |
just test
run: just test

tag-new-version:
needs: [test]
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Tag new version
uses: mathieudutour/github-tag-action@d745f2e74aaf1ee82e747b181f7a0967978abee0
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
create_annotated_tag: true
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,16 @@ htmlcov
# IDEs
.idea/


# Virtual environments
.venv/
venv/

# OpenSAFELY research template
*~
model.log
*/input.csv
__pycache__
.python-version
output/*
metadata/*
venv/
60 changes: 55 additions & 5 deletions DEVELOPERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,68 @@ just # Shortcut for just --list

## Development

Set up a local development environment with:
Set up a local development environment with

```sh
just devenv
```

## Tests
and create a new branch.
Then, iteratively:

Run the tests with:
* Make changes to the code
* Run the tests with

```sh
just test
```

* Check the code for issues with

```sh
just check
```

* Fix any issues with

```sh
just fix
```

* Commit the changes

Finally, push the branch to GitHub and open a pull request against the `main` branch.

## Tagging a new version

This reusable action follows [Semantic Versioning, v2.0.0]().

A new __patch__ version is automatically tagged when a group of commits is pushed to the `main` branch;
for example, when a group that comprises a pull request is merged.
Alternatively, a new patch version is tagged for each commit in the group that has a message title prefixed with `fix`.
For example, a commit with the following message title would tag a new patch version when it is pushed to the `main` branch:

```sh
just test
```
fix: a bug fix
```

A new __minor__ version is tagged for each commit in the group that has a message title prefixed with `feat`.
For example, a commit with the following message title would tag a new minor version when it is pushed to the `main` branch:

```
feat: a new feature
```

A new __major__ version is tagged for each commit in the group that has `BREAKING CHANGE` in its message body.
For example, a commit with the following message body would tag a new major version:

```
Remove a function

BREAKING CHANGE: Removing a function is not backwards-compatible.
```

Whilst there are other prefixes besides `fix` and `feat`, they do not tag new versions.

[1]: https://github.com/casey/just/
[2]: https://semver.org/spec/v2.0.0.html
79 changes: 77 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,80 @@
# Deciles Chart
# deciles-charts

deciles-charts generates a line chart for each [measure table][1] in an input directory.
The line chart has time on the horizontal axis (`x`) and value on the vertical axis (`y`).
Deciles are plotted as dashed lines;
outer percentiles are plotted as dotted lines;
the median is plotted as a solid line.
For example, the following deciles chart was generated from dummy data:

![A deciles chart generated from dummy data](img/deciles_chart_has_sbp_event_by_population.png)

[Using deciles to communicate variation][2] has several advantages when compared to the alternatives.
Consequently, deciles charts are used on [OpenPrescribing.net][]
and in several OpenSAFELY publications, such as [Curtis _et al._ (2021)][3].

## Usage

In summary:

* Use [cohort-extractor][] to extract several weekly or monthly cohorts.
* Use cohort-extractor to generate one or more measure tables from these cohorts.
* Use deciles-charts to generate a deciles chart for each measure table.

Let's walk through an example _project.yaml_.

The following cohort-extractor action extracts several monthly cohorts:

```yaml
generate_cohort:
run: >
cohortextractor:latest generate_cohort
--study-definition study_definition
--index-date-range "2021-01-01 to 2021-06-30 by month"
outputs:
highly_sensitive:
cohort: output/input_2021-*.csv
```

The following cohort-extractor action generates one or more measure tables from these cohorts:

```yaml
generate_measures:
run: >
cohortextractor:latest generate_measures
--study-definition study_definition
needs: [generate_cohort]
outputs:
moderately_sensitive:
measure: output/measure_*.csv
```

Finally, the following deciles-charts reusable action generates a deciles chart for each measure table.
Remember to replace `[version]` with [a deciles-charts version][4]:

```yaml
generate_deciles_charts:
run: >
deciles-charts:[version]
--input_dir output
--output_dir output
needs: [generate_measures]
outputs:
moderately_sensitive:
deciles_charts: output/deciles_chart_*.png
```

For each measure table, there will now be a corresponding deciles chart.
For example, given a measure table called `measure_has_sbp_event_by_stp_code.csv`,
there will now be a corresponding deciles chart called `deciles_chart_has_sbp_event_by_stp_code.png`.

## Notes for developers

Please see [DEVELOPERS.md](DEVELOPERS.md).
Please see [_DEVELOPERS.md_](DEVELOPERS.md).

[1]: https://docs.opensafely.org/measures/
[2]: https://www.thedatalab.org/blog/2019/04/communicating-variation-in-prescribing-why-we-use-deciles/
[3]: https://www.opensafely.org/research/2021/service-restoration-observatory-1/
[4]: https://github.com/opensafely-actions/deciles-charts/tags
[cohort-extractor]: https://docs.opensafely.org/actions-cohortextractor/
[OpenPrescribing.net]: https://openprescribing.net/
1 change: 1 addition & 0 deletions action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
run: python:latest analysis/deciles_charts.py
86 changes: 86 additions & 0 deletions analysis/deciles_charts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
import argparse
import pathlib
import re

import pandas
from ebmdatalab import charts


MEASURE_FNAME_REGEX = re.compile(r"measure_(?P<id>\w+)\.csv")


def _get_denominator(measure_table):
return measure_table.columns[-3]


def _get_group_by(measure_table):
return list(measure_table.columns[:-4])


def get_measure_tables(path):
if not path.is_dir():
raise AttributeError()

for sub_path in path.iterdir():
if not sub_path.is_file():
continue

measure_fname_match = re.match(MEASURE_FNAME_REGEX, sub_path.name)
if measure_fname_match is not None:
# The `date` column is assigned by the measures framework.
measure_table = pandas.read_csv(sub_path, parse_dates=["date"])

# We can reconstruct the parameters passed to `Measure` without
# the study definition.
measure_table.attrs["id"] = measure_fname_match.group("id")
measure_table.attrs["denominator"] = _get_denominator(measure_table)
measure_table.attrs["group_by"] = _get_group_by(measure_table)

yield measure_table


def drop_zero_denominator_rows(measure_table):
mask = measure_table[measure_table.attrs["denominator"]] > 0
return measure_table[mask].reset_index(drop=True)


def get_deciles_chart(measure_table):
return charts.deciles_chart(measure_table, period_column="date", column="value")
Copy link

@ccunningham101 ccunningham101 Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have changes to the ebmdatalab decile charts code, do you think we will push updates to the library? Or have our own code here? In relation to opensafely-core/cohort-extractor#759, we might want to be able to output the intermediate deciles tables for output checking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we'll probably end up replacing that implementation with a new implementation, in this module. Writing out intermediate deciles tables would be one reason for replacing implementations (in this case, we could revert 31832fa). I decided to fall back to charts.deciles_chart because it's the canonical implementation.



def write_deciles_chart(deciles_chart, path):
deciles_chart.savefig(path)


def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--input_dir",
required=True,
type=pathlib.Path,
help="Path to the input directory",
)
parser.add_argument(
"--output_dir",
required=True,
type=pathlib.Path,
help="Path to the output directory",
)
return parser.parse_args()


def main():
args = parse_args()
input_dir = args.input_dir
output_dir = args.output_dir

for measure_table in get_measure_tables(input_dir):
measure_table = drop_zero_denominator_rows(measure_table)
chart = get_deciles_chart(measure_table)
id_ = measure_table.attrs["id"]
fname = f"deciles_chart_{id_}.png"
write_deciles_chart(chart, output_dir / fname)


if __name__ == "__main__":
main()
46 changes: 46 additions & 0 deletions analysis/study_definition.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from cohortextractor import Measure, StudyDefinition, codelist_from_csv, patients


sbp_codelist = codelist_from_csv(
"codelists/opensafely-systolic-blood-pressure-qof.csv",
system="snomed",
column="code",
)

study = StudyDefinition(
default_expectations={
"date": {"earliest": "1921-01-01", "latest": "2021-01-01"},
"rate": "uniform",
"incidence": 1,
},
index_date="2021-01-01",
population=patients.satisfying("is_registered AND NOT is_dead"),
is_registered=patients.registered_as_of(reference_date="index_date"),
is_dead=patients.died_from_any_cause(
on_or_before="index_date",
return_expectations={"incidence": 0.1},
),
stp_code=patients.registered_practice_as_of(
date="index_date",
returning="stp_code",
return_expectations={
"category": {
"ratios": {f"STP{x}": 1 / 50 for x in range(50)},
},
},
),
has_sbp_event=patients.with_these_clinical_events(
codelist=sbp_codelist,
between=["index_date", "index_date"],
return_expectations={"incidence": 0.1},
),
)

measures = [
Measure(
id="has_sbp_event_by_stp_code",
numerator="has_sbp_event",
denominator="population",
group_by="stp_code",
),
]
10 changes: 10 additions & 0 deletions codelists/codelists.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"files": {
"opensafely-systolic-blood-pressure-qof.csv": {
"id": "opensafely/systolic-blood-pressure-qof/3572b5fb",
"url": "https://codelists.opensafely.org/codelist/opensafely/systolic-blood-pressure-qof/3572b5fb/",
"downloaded_at": "2022-03-16 09:44:36.226715Z",
"sha": "f2bc461e351499f4e5573a6f94e760b99979491e"
}
}
}
1 change: 1 addition & 0 deletions codelists/codelists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
opensafely/systolic-blood-pressure-qof/3572b5fb
Loading