Draw the owl #1633

iaindillingham · 2023-09-29T11:05:54Z

As @inglesp has pointed out, the ehrQL tutorial is similar to How to draw an owl.

Upon conclusion of the ehrQL tutorial, the reader has created a repo, created (and deleted) a codespace, interacted with the sandbox, created a minimal dataset definition, and generated a dummy dataset that is displayed in the terminal (i.e. it is not written to a file).

To become a competent user of ehrQL, however, the reader should also:

Expand the dataset definition
Write a dummy dataset to a file
Commit the dataset definition to main

Expand the dataset definition

I'd like to check with a couple of researchers about what "expand" most usefully means,¹ but based on this dataset definition, which @alschaffer said was written by her pilots without her help,² I think "expand" probably means:

Combining Boolean series to define the population (e.g. was born on or before a date and is alive and is either male or female; was registered with a practice on a date; was registered with a practice for a minimum of k days)
Adding some simple demographic variables, such as age and sex
Adding a complex demographic variable, such as ethnicity (codelist_from_csv)
Adding a complex socioeconomic variable, such as IMD quintile (case)
Deriving a variable, such as counting the number of medications within the last 30 days (.is_in, .is_on_or_between, days, .count_for_patient)

Write a dummy dataset to a file

The reader should add an associated action to project.yaml, which they will run with opensafely run [action]. They should compare and contrast run with exec, noticing that exec is good for eyeballing the data but run is good for developing downstream actions, especially when the dummy dataset isn't written to a CSV file.

Commit the dataset definition to `main`

Upon conclusion of the ehrQL tutorial, the reader will be at "Initial commit" and be ready to run the associated action on OpenSAFELY Jobs. (Crating a project and workspace, and using OpenSAFELY Jobs is out of scope.) Also, they will have created an artefact inside the codespace that persists outside the codespace.

The reader shouldn't commit the dataset definition to a feature branch and open a pull request, because different projects and different organizations have different guidelines about feature branches and pull requests.

The text was updated successfully, but these errors were encountered:

sebbacon · 2023-10-02T15:25:08Z

Regarding "Expand the dataset definition": this reminds me of background research I've been doing in preparation for some Great Variables Library Thinking.

I've asked around a few times (example) what the most common variables are; and I've cross-referenced them with a bit of grep-foo, and I came up with this tentative list:

age bands (see Andrea docs for example)
ethnicity (of different flavours) (see Colm’s data report work)
IMD
NHS region
sex
bmi (raw number and categories)
smoking
covid infection/hospitalisation/vaccination (at the moment at least)
date of death (patients table vs ONSDeath table)
equivalent of patients.registered_as_of() and patients.registered_with_one_practice_between()
deregistration date
for service analytics we often have practice id
care home residence (how often is the care home variable updated?)
cause of death, ICD-10

Fundamentally, a peer-reviews and agreed common set of things like this, in the research template, is the core of a variables library. So I'm excited to see this happening!

iaindillingham · 2023-10-02T17:00:21Z

I'm putting together an extended dataset definition in this gist, with feedback in Slack.¹

https://bennettoxford.slack.com/archives/C02HJTL065A/p1696001906535959 ↩

iaindillingham · 2023-10-02T17:02:23Z

Thanks, @sebbacon. At the moment, the expanded dataset definition hits several of those. I don't think it can hit them all, but hitting several suggests that it will be useful.

sebbacon · 2023-10-03T09:00:26Z

I don't think it can hit them all

Devil's advocate: why not? If nearly every study includes all of them anyway:

It's didactically useful as it covers all common cases
It's pragmatically useful for the same reason
It helps extend our "best-practice" reach deeper into peoples' code

iaindillingham · 2023-10-03T09:18:59Z

Because it's a tutorial and not a how-to. Hitting all of them will make the tutorial longer, which means it will take more time to complete and more time to maintain. I think a more effective use of time would be to incorporate several into the tutorial and the remainder into how-tos, or, indeed, reusable variables.

sebbacon · 2023-10-03T11:22:41Z

Fair, I think I'm eliding our tutorial with our research template.

It leads me to ask if this part of the tutorial content might also live in the research template?

The familiarity when moving on from the tutorial could be helpful.

iaindillingham · 2023-10-03T17:14:27Z

It could, but I think that's a separate issue, so I've created opensafely/research-template#108.

iaindillingham added the documentation Improvements or additions to documentation label Sep 29, 2023

iaindillingham self-assigned this Sep 29, 2023

iaindillingham added this to Data Team Sep 29, 2023

iaindillingham moved this to In Progress in Data Team Sep 29, 2023

iaindillingham changed the title ~~Draw the rest of the owl~~ Draw the owl Oct 14, 2023

iaindillingham mentioned this issue Oct 14, 2023

Add "Writing a more complex dataset definition" #1655

Merged

inglesp added this to the Deprecate cohort-extractor milestone Oct 24, 2023

iaindillingham mentioned this issue Oct 24, 2023

Draw the owl #1679

Merged

iaindillingham moved this from In Progress to Under Review in Data Team Oct 24, 2023

iaindillingham closed this as completed in #1679 Oct 31, 2023

github-project-automation bot moved this from Under Review to Done in Data Team Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draw the owl #1633

Draw the owl #1633

iaindillingham commented Sep 29, 2023 •

edited

Loading

sebbacon commented Oct 2, 2023

iaindillingham commented Oct 2, 2023

iaindillingham commented Oct 2, 2023

sebbacon commented Oct 3, 2023

iaindillingham commented Oct 3, 2023

sebbacon commented Oct 3, 2023

iaindillingham commented Oct 3, 2023

Draw the owl #1633

Draw the owl #1633

Comments

iaindillingham commented Sep 29, 2023 • edited Loading

Expand the dataset definition

Write a dummy dataset to a file

Commit the dataset definition to main

Footnotes

sebbacon commented Oct 2, 2023

iaindillingham commented Oct 2, 2023

Footnotes

iaindillingham commented Oct 2, 2023

sebbacon commented Oct 3, 2023

iaindillingham commented Oct 3, 2023

sebbacon commented Oct 3, 2023

iaindillingham commented Oct 3, 2023

iaindillingham commented Sep 29, 2023 •

edited

Loading

Commit the dataset definition to `main`