-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backup): Support import chunking #59879
Conversation
3ca40d8
to
2a0decf
Compare
d0b635c
to
9a74955
Compare
2a0decf
to
b18190e
Compare
9a74955
to
97cf186
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #59879 +/- ##
=======================================
Coverage 80.76% 80.77%
=======================================
Files 5180 5180
Lines 227507 227594 +87
Branches 38279 38300 +21
=======================================
+ Hits 183747 183829 +82
- Misses 38157 38160 +3
- Partials 5603 5605 +2
|
97cf186
to
a3b37f6
Compare
src/sentry/backup/imports.py
Outdated
if flags.import_uuid is None: | ||
flags = flags._replace(import_uuid=uuid4().hex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need a mutable structure, you could use dataclasses.dataclass
instead of a NamedTuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember trying a dataclass, but it didn't play nice with pydantic (they have their own dataclass which its own sharp corners, and I didn't want pydantic types infecting non-RPC code), so I used NamedTuple
. I've left a note here to investigate making this a bit nicer in the future.
src/sentry/backup/imports.py
Outdated
@@ -199,14 +206,17 @@ def yield_json_models(content) -> Iterator[Tuple[NormalizedModelName, str]]: | |||
def do_write( | |||
pk_map: PrimaryKeyMap, model_name: NormalizedModelName, json_data: json.JSONData | |||
) -> None: | |||
model_relations = dependencies().get(model_name) | |||
nonlocal scope, flags, filters, deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a collection of nonlocal state could they be grouped into an 'import context' that contains the mutable state you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# It's possible that this write has already occurred, and we are simply retrying | ||
# because the response got lost in transit. If so, just re-use that reply. We do | ||
# this in the transaction because, while `import_by_model` is generally called in a | ||
# sequential manner, cases like timeouts or long queues may cause a previous call to | ||
# still be active when the next one is made. Doing this check inside the transaction | ||
# lock ensures that the data is globally accurate and thwarts data races. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transactions don't take locks that would prevent concurrent writes. They will ensure that all the writes happen as one operation though, or none of them happen though.
If you need to prevent concurrent execution in multiple processes you'd need to use a redis or postgres based lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this by matching on UniqueViolation(... sentry_...importchunk ...
. A bit hacky to string match on postgres errors like this, but I think its fine for now to do the job of sorting between import chunk races and genuine DB schema violations, per our offline discussion.
a3b37f6
to
072a5ce
Compare
This PR has a migration; here is the generated SQL for --
-- Alter unique_together for controlimportchunk (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq" ON "sentry_controlimportchunk" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_controlimportchunk" ADD CONSTRAINT "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq" UNIQUE USING INDEX "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq";
--
-- Alter unique_together for controlimportchunkreplica (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq" ON "sentry_controlimportchunkreplica" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_controlimportchunkreplica" ADD CONSTRAINT "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq" UNIQUE USING INDEX "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq";
--
-- Alter unique_together for regionimportchunk (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq" ON "sentry_regionimportchunk" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_regionimportchunk" ADD CONSTRAINT "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq" UNIQUE USING INDEX "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq"; |
072a5ce
to
a69a114
Compare
With this feature in place, we now atomically record which models we imported in a given `import_by_model` call. This will be useful in the short term for implementing the post-processing import step, and in the long term to support rollbacks and partial import recovery. Issue: getsentry/team-ospo#203 Issue: getsentry/team-ospo#213
a69a114
to
d01c5b8
Compare
This PR has a migration; here is the generated SQL for --
-- Alter unique_together for controlimportchunk (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq" ON "sentry_controlimportchunk" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_controlimportchunk" ADD CONSTRAINT "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq" UNIQUE USING INDEX "sentry_controlimportchun_import_uuid_model_min_or_3b482c4c_uniq";
--
-- Alter unique_together for controlimportchunkreplica (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq" ON "sentry_controlimportchunkreplica" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_controlimportchunkreplica" ADD CONSTRAINT "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq" UNIQUE USING INDEX "sentry_controlimportchun_import_uuid_model_min_or_824c8b1d_uniq";
--
-- Alter unique_together for regionimportchunk (1 constraint(s))
--
CREATE UNIQUE INDEX CONCURRENTLY "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq" ON "sentry_regionimportchunk" ("import_uuid", "model", "min_ordinal");
ALTER TABLE "sentry_regionimportchunk" ADD CONSTRAINT "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq" UNIQUE USING INDEX "sentry_regionimportchunk_import_uuid_model_min_or_33b232c2_uniq"; |
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
With this feature in place, we now atomically record which models we imported in a given
import_by_model
call. This will be useful in the short term for implementing the post-processing import step, and in the long term to support rollbacks and partial import recovery.Issue: getsentry/team-ospo#203
Issue: getsentry/team-ospo#213