-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert database notebooks to use SQL Runner #20
Comments
If possible, can we do the schema one first? |
I'm leaning towards creating a new repo for each report (starting with the schema report, as per #9 and Peter's comment). Rationale:
|
It looks as though pointing OpenSAFELY Reports to a new repo would be straightforward; the existing repo isn't hard-coded; the fields on the admin page in OpenSAFELY Reports are clear. |
Starting with a new repo instead of updating the existing repo is absolutely the right thing to do. Start afresh, cut the chaff. I'm not convinced of the benefit of splitting each notebook into different repos. It feels like part of the same suite of information that will likely all be updated on the same schedule, and so can be accessed and updated all in one place. Although if they're set to be updated automatically and viewed on the reports site, then maybe having them all in one place matters less. I don't have a strong preference so if there are developery reasons that I'm not fully grasping then go for it. |
Thinking a bit more -- the "history" and "builds" notebooks should be combined, as raised in this issue. So maybe one repo for that combined report, and one for the schema report does make sense after all! |
Another thing in passing. There is currently no sych between the repo on the L2/3 server and on github because |
Thanks for your comments @wjchulme. I'm still leaning towards a one-to-one mapping between repos and reports. As I understand it, each report (notebook) queries the database, transforms the results, and displays the transformed results. There's one action in project.yaml for each report,1 so regenerating a report means rerunning that action. When we convert each report to use SQL Runner, there will be separate actions in project.yaml for querying ( Footnotes
|
Thanks @iaindillingham that makes sense. Often in research repos you often may only want to run a subset of actions at a time -- to run a specific subgroup analysis for example -- so that feels like less of a problem to me. Particularly in this case as I think all these reports will likely end up always being run at the same time (= after each database update). But as long as there's no inelegant duplication across reports then there is certainly a logic to a one-repo-per-report approach. |
opensafely/database-notebooks contains four database notebooks; three are published on OpenSAFELY Reports (builds, history, schema). This playbook describes how we update them.
We should convert each database notebook to use SQL Runner. We should start with the schema notebook (#9).
Considerations
Create or update
Why create a new repo? If we created a new repo from opensafely-core/repo-template, then we would have a more developer-friendly setup. This would make it easier to incorporate some SQL linting (e.g. SQLFluff has a pre-commit hook).
Why update the existing repo? The existing repo, opensafely/database-notebooks, has seen nearly two years of development; it's not immediately clear which parts should be retained and which parts should be discarded, so updating it would be safer than creating a new repo; opensafely-core/reports and https://reports.opensafely.org/ may have references to the existing repo.
Repos and notebooks
Should there be one repo for all notebooks or one repo for each notebook?
The text was updated successfully, but these errors were encountered: