-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draw the owl #1633
Comments
Regarding "Expand the dataset definition": this reminds me of background research I've been doing in preparation for some Great Variables Library Thinking. I've asked around a few times (example) what the most common variables are; and I've cross-referenced them with a bit of
Fundamentally, a peer-reviews and agreed common set of things like this, in the research template, is the core of a variables library. So I'm excited to see this happening! |
I'm putting together an extended dataset definition in this gist, with feedback in Slack.1 Footnotes |
Thanks, @sebbacon. At the moment, the expanded dataset definition hits several of those. I don't think it can hit them all, but hitting several suggests that it will be useful. |
Devil's advocate: why not? If nearly every study includes all of them anyway:
|
Because it's a tutorial and not a how-to. Hitting all of them will make the tutorial longer, which means it will take more time to complete and more time to maintain. I think a more effective use of time would be to incorporate several into the tutorial and the remainder into how-tos, or, indeed, reusable variables. |
Fair, I think I'm eliding our tutorial with our research template. It leads me to ask if this part of the tutorial content might also live in the research template? The familiarity when moving on from the tutorial could be helpful. |
It could, but I think that's a separate issue, so I've created opensafely/research-template#108. |
As @inglesp has pointed out, the ehrQL tutorial is similar to How to draw an owl.
Upon conclusion of the ehrQL tutorial, the reader has created a repo, created (and deleted) a codespace, interacted with the sandbox, created a minimal dataset definition, and generated a dummy dataset that is displayed in the terminal (i.e. it is not written to a file).
To become a competent user of ehrQL, however, the reader should also:
main
Expand the dataset definition
I'd like to check with a couple of researchers about what "expand" most usefully means,1 but based on this dataset definition, which @alschaffer said was written by her pilots without her help,2 I think "expand" probably means:
k
days)codelist_from_csv
)case
).is_in
,.is_on_or_between
,days
,.count_for_patient
)Write a dummy dataset to a file
The reader should add an associated action to project.yaml, which they will run with
opensafely run [action]
. They should compare and contrastrun
withexec
, noticing thatexec
is good for eyeballing the data butrun
is good for developing downstream actions, especially when the dummy dataset isn't written to a CSV file.Commit the dataset definition to
main
Upon conclusion of the ehrQL tutorial, the reader will be at "Initial commit" and be ready to run the associated action on OpenSAFELY Jobs. (Crating a project and workspace, and using OpenSAFELY Jobs is out of scope.) Also, they will have created an artefact inside the codespace that persists outside the codespace.
The reader shouldn't commit the dataset definition to a feature branch and open a pull request, because different projects and different organizations have different guidelines about feature branches and pull requests.
Footnotes
https://bennettoxford.slack.com/archives/C02HJTL065A/p1696001906535959 ↩
https://bennettoxford.slack.com/archives/C31D62X5X/p1695306487454799 ↩
The text was updated successfully, but these errors were encountered: