-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement notebook that migrates Mongo database from "legacy" to "Berkeley" schema #553
Implement notebook that migrates Mongo database from "legacy" to "Berkeley" schema #553
Conversation
Note: This depends upon a version of nmdc-schema that hasn't been published to PyPI yet, and which doesn't have a version number yet.
I converted this back to a draft. I still want to implement the following things:
|
I will mark this "Ready for review" since it works as-is. I'll create a separate ticket about "optimizing" it to do the following:
|
Merging without formal review. No impact on Runtime application (just shares a repo). Migration squad members did a brief pair review session today, during which I presented some of the major changes. |
Instead of creating a new ticket, I added the following comment to an existing ticket, which was already about the same thing: #449 (comment) |
Description
Created a new migration notebook
In this branch, I implemented a Python notebook that can be used to migrate the NMDC Mongo database from conforming to
nmdc-schema
version 10 (a.k.a. the latest version of the "legacy" schema), to conforming tonmdc-schema
version 11 (a.k.a. the "Berkeley" schema).Here's the rendered notebook (it maybe be easier to review in this rendered format compared to reviewing its source code): migrate_10_8_0_to_11_0_0.ipynb
Introduced a new dependency
I introduced the program,
mongosh
, as a dependency of the migration notebooks. That program allows the notebook to perform arbitrary Mongo commands—instead of just dumping and restoring collections. For example, it allows the notebook to change people's Mongo roles (to temporarily revoke their access during the migration process).Changed configuration file
Since
mongosh
does not support the configuration options thatmongodump
andmongorestore
do, in order to be able to share configuration between all three programs, I changed the notebook configuration file format to accommodate all three programs.Writing migrator log to a file
I made it so the log messages generated by migrators, themselves, get written to a log file. Previously, they were ignored/not shown. This was particularly useful for this migration notebook, since there are multiple partial migrators involved. This is our most complex migration so far.
Fixes #519
Type of change
main
. There is already a workaround for it (which is to check out an older version of this repository), should anyone want to use the old migration notebooks before that upcoming Issue gets resolved.As a reminder; although the migration notebooks live in this repository, they have no impact on the Runtime (and vice versa). So, the above breaking change does not affect the Runtime.
How Has This Been Tested?
unittest
tests that target theConfig
class, all pass (as confirmed by a GitHub Actions workflow in this PR)Checklist: