Improvements to export/import script #254
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've made a few improvements to the import/export script to import to local db. The goal here is performance improvements as production db grows including a more targeted approach to db dumps:
anonymize-database.js
script no longer happens after the dump is run. This was a major source of slowdown. Instead it is implemented as a view called which is a pseudo collection that runs an aggregation pipeline on the collection and can be dumped as a collection withmongodump
. These are named<collection>_sanitized
and can be added as needed per-collection. A more complex example can be found inscripts/database/sanitizations/users.js
, however for simple use cases the model definitions also can also now add asanitize
field which still strip or obfuscate that field.ref
ofUser
. Any such collections will be limited to the specific users being dumped. If a collection does not reference aUser
then all documents will be dumped (but can also be excluded with the--exclude
flag).Just by moving the sanitization step into the mongodump pipeline we've already seen ~10x speed improvements. The ability to exclude collections and filter on specific users can further increase the dump/restore time drastically.