Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backup): Support import chunking #59879

Merged
merged 2 commits into from
Nov 16, 2023

Conversation

azaslavsky
Copy link
Contributor

With this feature in place, we now atomically record which models we imported in a given import_by_model call. This will be useful in the short term for implementing the post-processing import step, and in the long term to support rollbacks and partial import recovery.

Issue: getsentry/team-ospo#203
Issue: getsentry/team-ospo#213

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 14, 2023
@azaslavsky azaslavsky force-pushed the azaslabsky/backup/improve-import-tests branch from 3ca40d8 to 2a0decf Compare November 14, 2023 01:31
@azaslavsky azaslavsky force-pushed the azaslavsky/backup/support-import-chunking branch from d0b635c to 9a74955 Compare November 14, 2023 01:37
@azaslavsky azaslavsky requested a review from markstory November 14, 2023 18:51
@azaslavsky azaslavsky marked this pull request as ready for review November 14, 2023 18:51
@azaslavsky azaslavsky requested review from a team as code owners November 14, 2023 18:51
@azaslavsky azaslavsky force-pushed the azaslabsky/backup/improve-import-tests branch from 2a0decf to b18190e Compare November 14, 2023 20:04
@azaslavsky azaslavsky requested a review from a team as a code owner November 14, 2023 20:04
@azaslavsky azaslavsky force-pushed the azaslavsky/backup/support-import-chunking branch from 9a74955 to 97cf186 Compare November 14, 2023 20:04
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Merging #59879 (d01c5b8) into master (fe3563c) will increase coverage by 0.00%.
Report is 4 commits behind head on master.
The diff coverage is 90.74%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #59879   +/-   ##
=======================================
  Coverage   80.76%   80.77%           
=======================================
  Files        5180     5180           
  Lines      227507   227594   +87     
  Branches    38279    38300   +21     
=======================================
+ Hits       183747   183829   +82     
- Misses      38157    38160    +3     
- Partials     5603     5605    +2     
Files Coverage Δ
src/sentry/backup/dependencies.py 90.07% <100.00%> (+0.14%) ⬆️
src/sentry/backup/helpers.py 91.79% <100.00%> (+0.04%) ⬆️
src/sentry/backup/imports.py 92.98% <100.00%> (+1.14%) ⬆️
src/sentry/models/importchunk.py 100.00% <100.00%> (ø)
...entry/services/hybrid_cloud/import_export/model.py 98.42% <100.00%> (+0.07%) ⬆️
src/sentry/tasks/relocation.py 85.23% <ø> (ø)
static/app/components/sidebar/index.tsx 91.66% <ø> (ø)
...sentry/services/hybrid_cloud/import_export/impl.py 86.60% <83.87%> (-1.53%) ⬇️

... and 7 files with indirect coverage changes

Base automatically changed from azaslabsky/backup/improve-import-tests to master November 14, 2023 21:05
@azaslavsky azaslavsky force-pushed the azaslavsky/backup/support-import-chunking branch from 97cf186 to a3b37f6 Compare November 14, 2023 23:09
Comment on lines 79 to 84
if flags.import_uuid is None:
flags = flags._replace(import_uuid=uuid4().hex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need a mutable structure, you could use dataclasses.dataclass instead of a NamedTuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember trying a dataclass, but it didn't play nice with pydantic (they have their own dataclass which its own sharp corners, and I didn't want pydantic types infecting non-RPC code), so I used NamedTuple. I've left a note here to investigate making this a bit nicer in the future.

@@ -199,14 +206,17 @@ def yield_json_models(content) -> Iterator[Tuple[NormalizedModelName, str]]:
def do_write(
pk_map: PrimaryKeyMap, model_name: NormalizedModelName, json_data: json.JSONData
) -> None:
model_relations = dependencies().get(model_name)
nonlocal scope, flags, filters, deps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a collection of nonlocal state could they be grouped into an 'import context' that contains the mutable state you need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 120 to 125
# It's possible that this write has already occurred, and we are simply retrying
# because the response got lost in transit. If so, just re-use that reply. We do
# this in the transaction because, while `import_by_model` is generally called in a
# sequential manner, cases like timeouts or long queues may cause a previous call to
# still be active when the next one is made. Doing this check inside the transaction
# lock ensures that the data is globally accurate and thwarts data races.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transactions don't take locks that would prevent concurrent writes. They will ensure that all the writes happen as one operation though, or none of them happen though.

If you need to prevent concurrent execution in multiple processes you'd need to use a redis or postgres based lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this by matching on UniqueViolation(... sentry_...importchunk .... A bit hacky to string match on postgres errors like this, but I think its fine for now to do the job of sorting between import chunk races and genuine DB schema violations, per our offline discussion.

@azaslavsky azaslavsky force-pushed the azaslavsky/backup/support-import-chunking branch from a3b37f6 to 072a5ce Compare November 15, 2023 19:24
@azaslavsky azaslavsky requested a review from markstory November 15, 2023 19:26
Copy link
Contributor

This PR has a migration; here is the generated SQL for src/sentry/migrations/0601_import_chunk_unique_together.py ()

--
-- Alter unique_together for controlimportchunk (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq" ON "sentry_controlimportchunk" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_controlimportchunk" ADD CONSTRAINT "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq" UNIQUE USING INDEX "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq";
--
-- Alter unique_together for controlimportchunkreplica (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq" ON "sentry_controlimportchunkreplica" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_controlimportchunkreplica" ADD CONSTRAINT "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq" UNIQUE USING INDEX "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq";
--
-- Alter unique_together for regionimportchunk (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq" ON "sentry_regionimportchunk" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_regionimportchunk" ADD CONSTRAINT "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq" UNIQUE USING INDEX "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq";

With this feature in place, we now atomically record which models we
imported in a given `import_by_model` call. This will be useful in the
short term for implementing the post-processing import step, and in the
long term to support rollbacks and partial import recovery.

Issue: getsentry/team-ospo#203
Issue: getsentry/team-ospo#213
@azaslavsky azaslavsky force-pushed the azaslavsky/backup/support-import-chunking branch from a69a114 to d01c5b8 Compare November 16, 2023 16:40
Copy link
Contributor

This PR has a migration; here is the generated SQL for src/sentry/migrations/0602_import_chunk_unique_together.py ()

--
-- Alter unique_together for controlimportchunk (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq" ON "sentry_controlimportchunk" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_controlimportchunk" ADD CONSTRAINT "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq" UNIQUE USING INDEX "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq";
--
-- Alter unique_together for controlimportchunkreplica (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq" ON "sentry_controlimportchunkreplica" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_controlimportchunkreplica" ADD CONSTRAINT "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq" UNIQUE USING INDEX "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq";
--
-- Alter unique_together for regionimportchunk (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq" ON "sentry_regionimportchunk" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_regionimportchunk" ADD CONSTRAINT "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq" UNIQUE USING INDEX "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq";

@azaslavsky azaslavsky enabled auto-merge (squash) November 16, 2023 17:15
@azaslavsky azaslavsky merged commit 5e05374 into master Nov 16, 2023
48 of 49 checks passed
@azaslavsky azaslavsky deleted the azaslavsky/backup/support-import-chunking branch November 16, 2023 17:17
Copy link

sentry-io bot commented Nov 17, 2023

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ ConnectionError: HTTPConnectionPool(host='us.testserver', port=80): Max retries exceeded with url: /api/0/relays/p... pytest.runtest.protocol tests/sentry/backup/tes... View Issue
  • ‼️ ConnectionError: HTTPConnectionPool(host='us.testserver', port=80): Max retries exceeded with url: /api/0/relays/p... pytest.runtest.protocol tests/sentry/backup/tes... View Issue

Did you find this useful? React with a 👍 or 👎

@github-actions github-actions bot locked and limited conversation to collaborators Dec 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants