Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to export/import script #254

Merged
merged 1 commit into from
Dec 10, 2024
Merged

Conversation

andrewplummer
Copy link
Collaborator

I've made a few improvements to the import/export script to import to local db. The goal here is performance improvements as production db grows including a more targeted approach to db dumps:

  1. anonymize-database.js script no longer happens after the dump is run. This was a major source of slowdown. Instead it is implemented as a view called which is a pseudo collection that runs an aggregation pipeline on the collection and can be dumped as a collection with mongodump. These are named <collection>_sanitized and can be added as needed per-collection. A more complex example can be found in scripts/database/sanitizations/users.js, however for simple use cases the model definitions also can also now add a sanitize field which still strip or obfuscate that field.
  2. Export script now accepts the following flags allowing targeting specific users:
Usage: prepare-export [options]


  Prepares database for export. Sanitizes users table and can perform intelligent
  filtering of documents based on refs of type "User".

  Note that multiple user filters will use an $or with the exception of
  before/after dates which work together.


Options:
  -a, --created-after [date]   Limit to users created after a certain date. Can
                               be any parseable date.
  -b, --created-before [date]  Limit to users created before a certain date.
                               Can be any parseable date.
  -u, --user-id [string...]    Limit to users by ID (can be multiple).
  -m, --email [string...]      Limit to users by email (can be multiple).
  -e, --exclude [string...]    Exclude collections. (default: [])
  -r, --raw [boolean]          Skip sanitizations. Only use this when
                               necessary. (default: false)
  -o, --out [string]           The directory to export the export to. (default:
                               "export")
  -h, --help                   display help for command
  1. When specific users are targeted they will be queried during the dump. In addition, we have access to the models and know which fields have a ref of User. Any such collections will be limited to the specific users being dumped. If a collection does not reference a User then all documents will be dumped (but can also be excluded with the --exclude flag).

Just by moving the sanitization step into the mongodump pipeline we've already seen ~10x speed improvements. The ability to exclude collections and filter on specific users can further increase the dump/restore time drastically.

@andrewplummer andrewplummer merged commit 5a4e6e9 into master Dec 10, 2024
1 check passed
@andrewplummer andrewplummer deleted the export-updates branch December 10, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants