Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanly support PK and FK remapping when importing #171

Closed
Tracked by #153
azaslavsky opened this issue Jul 27, 2023 · 0 comments
Closed
Tracked by #153

Cleanly support PK and FK remapping when importing #171

azaslavsky opened this issue Jul 27, 2023 · 0 comments

Comments

@azaslavsky
Copy link

No description provided.

azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 3, 2023
Prior to this change, model uniqueness was established using the
concatenation of model name ("sentry.somemodel") and the pk that
particular instance contains in the database. Unfortunately, this is not
sufficient for comparing two JSON dumps for equality, as the pks may
change depending on which underlying validation database is used.

This change moves us to a model where, at ingestion time, each model
being compared is assigned a per-model-kind ordinal, such that its
unique identifier is roughly "this is the 3rd lowest pk of model
`sentry.somemodel`".

To help this transition along, we simplify some of the equality checks,
reducing them to `UnorderedInput` and `UnequalCounts` (there are more
instance of a certain model kind in one side than the other).

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 7, 2023
Prior to this change, model uniqueness was established using the
concatenation of model name ("sentry.somemodel") and the pk that
particular instance contains in the database. Unfortunately, this is not
sufficient for comparing two JSON dumps for equality, as the pks may
change depending on which underlying validation database is used.

This change moves us to a model where, at ingestion time, each model
being compared is assigned a per-model-kind ordinal, such that its
unique identifier is roughly "this is the 3rd lowest pk of model
`sentry.somemodel`".

To help this transition along, we simplify some of the equality checks,
reducing them to `UnorderedInput` and `UnequalCounts` (there are more
instance of a certain model kind in one side than the other).

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 8, 2023
Prior to this change, model uniqueness was established using the
concatenation of model name ("sentry.somemodel") and the pk that
particular instance contains in the database. Unfortunately, this is not
sufficient for comparing two JSON dumps for equality, as the pks may
change depending on which underlying validation database is used.

This change moves us to a model where, at ingestion time, each model
being compared is assigned a per-model-kind ordinal, such that its
unique identifier is roughly "this is the 3rd lowest pk of model
`sentry.somemodel`".

To help this transition along, we simplify some of the equality checks,
reducing them to `UnorderedInput` and `UnequalCounts` (there are more
instance of a certain model kind in one side than the other).

Issue: getsentry/team-ospo#171
spalmurray pushed a commit to getsentry/sentry that referenced this issue Aug 8, 2023
Prior to this change, model uniqueness was established using the
concatenation of model name ("sentry.somemodel") and the pk that
particular instance contains in the database. Unfortunately, this is not
sufficient for comparing two JSON dumps for equality, as the pks may
change depending on which underlying validation database is used.

This change moves us to a model where, at ingestion time, each model
being compared is assigned a per-model-kind ordinal, such that its
unique identifier is roughly "this is the 3rd lowest pk of model
`sentry.somemodel`".

To help this transition along, we simplify some of the equality checks,
reducing them to `UnorderedInput` and `UnequalCounts` (there are more
instance of a certain model kind in one side than the other).

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 9, 2023
There are a number of properties of the export system API that we'd like
to eventually change. These include:

- Barring users from manually excluding/including specific models on export
- Only exporting `sentry...` models

To keep the original behavior export system intact until we debut these
changes, this commit hides the behind `OldExportConfig`, which allows
them to be toggled on (for tests) but left off for the actual CLI tool
(for now).

Issue: getsentry/team-ospo#171
Issue: getsentry/team-ospo#184
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 9, 2023
We will re-use this functionality in multiple places, so its best to
move it now.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 9, 2023
Prior to this change, the `sort_dependencies` function was the only
public dependency calculation that the import/export system used, merely
returning a list of models in dependency order. This change splits part
of that functionality out, exposing an intermediate state where we can
see all of the model relations for a given model, including which fields
they occupy on the main model. This will be useful when we add
"foreign-key remapping" capabilities on import.

To facilitate this, we have added a couple of fixtures that capture the
state of the model dependency graph in source control, in both detailed
and flattened form. These serve a few purposes:

1. It maintains a record in source control of model dependency graph
   changes.
2. It notifies team-ospo of any such changes, allowing us to ensure that
   they don't break import/export functionality.
3. It serves as a simple golden test for the dependency resolution
   mechanism.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 9, 2023
There are a number of properties of the import system API that we'd like
to eventually change. These include:

- Using `INSERT` rather than `INSERT_OR_UPDATE` semantics for import
  operations.
- Using natural foreign keys (currently only relevant to the
  `sentry.User` model).

To keep the original behavior export system intact until we debut these
changes, this commit hides the behind `OldImportConfig`, which allows
them to be toggled on (for tests) but left off for the actual CLI tool
(for now).

Issue: getsentry/team-ospo#171
Issue: getsentry/team-ospo#184
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 9, 2023
The `sorted_dependencies` function has, since time immemorial, tried to
track all `ManyToManyField` dependencies without a `through` model set,
since there is otherwise no way to deduce the dependency relationship
for the shadow junction tables Django creates under the hood. Except...
we actually don't have any `ManyToManyField` definitions without an
explicit `through` argument. Maybe we did when this code was written,
but today we no longer do.

The removed bits of code were thus never really executed, since they
were checking for a state of affairs that wasn't really a problem in our
actual code base. They were also buggy: the comment said they were
checking for `through`-ness, but there was no code to this effect,
resulting in double dependencies.

This change removes the check altogether, and adds an invariant test to
ensure that all `ManyToManyField`s defined in the future carry a
`through` argument.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 10, 2023
Prior to this change, the `sort_dependencies` function was the only
public dependency calculation that the import/export system used, merely
returning a list of models in dependency order. This change splits part
of that functionality out, exposing an intermediate state where we can
see all of the model relations for a given model, including which fields
they occupy on the main model. This will be useful when we add
"foreign-key remapping" capabilities on import.

To facilitate this, we have added a couple of fixtures that capture the
state of the model dependency graph in source control, in both detailed
and flattened form. These serve a few purposes:

1. It maintains a record in source control of model dependency graph
   changes.
2. It notifies team-ospo of any such changes, allowing us to ensure that
   they don't break import/export functionality.
3. It serves as a simple golden test for the dependency resolution
   mechanism.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 10, 2023
Prior to this change, the `sort_dependencies` function was the only
public dependency calculation that the import/export system used, merely
returning a list of models in dependency order. This change splits part
of that functionality out, exposing an intermediate state where we can
see all of the model relations for a given model, including which fields
they occupy on the main model. This will be useful when we add
"foreign-key remapping" capabilities on import.

To facilitate this, we have added a couple of fixtures that capture the
state of the model dependency graph in source control, in both detailed
and flattened form. These serve a few purposes:

1. It maintains a record in source control of model dependency graph
changes.
2. It notifies team-ospo of any such changes, allowing us to ensure that
they don't break import/export functionality.
3. It serves as a simple golden test for the dependency resolution
mechanism.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 10, 2023
There are a number of properties of the import system API that we'd like
to eventually change. These include:

- Using `INSERT` rather than `INSERT_OR_UPDATE` semantics for import
  operations.
- Using natural foreign keys (currently only relevant to the
  `sentry.User` model).

To keep the original behavior export system intact until we debut these
changes, this commit hides the behind `OldImportConfig`, which allows
them to be toggled on (for tests) but left off for the actual CLI tool
(for now).

Issue: getsentry/team-ospo#171
Issue: getsentry/team-ospo#184
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 10, 2023
The `sorted_dependencies` function has, since time immemorial, tried to
track all `ManyToManyField` dependencies without a `through` model set,
since there is otherwise no way to deduce the dependency relationship
for the shadow junction tables Django creates under the hood. Except...
we actually don't have any `ManyToManyField` definitions without an
explicit `through` argument. Maybe we did when this code was written,
but today we no longer do.

The removed bits of code were thus never really executed, since they
were checking for a state of affairs that wasn't really a problem in our
actual code base. They were also buggy: the comment said they were
checking for `through`-ness, but there was no code to this effect,
resulting in double dependencies.

This change removes the check altogether, and adds an invariant test to
ensure that all `ManyToManyField`s defined in the future carry a
`through` argument.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 11, 2023
There are a number of properties of the import system API that we'd like
to eventually change. These include:

- Using `INSERT` rather than `INSERT_OR_UPDATE` semantics for import
operations.
- Using natural foreign keys (currently only relevant to the
`sentry.User` model).

To keep the original behavior export system intact until we debut these
changes, this commit hides the behind `OldImportConfig`, which allows
them to be toggled on (for tests) but left off for the actual CLI tool
(for now).

Issue: getsentry/team-ospo#171
Issue: getsentry/team-ospo#184
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 11, 2023
The `sorted_dependencies` function has, since time immemorial, tried to
track all `ManyToManyField` dependencies without a `through` model set,
since there is otherwise no way to deduce the dependency relationship
for the shadow junction tables Django creates under the hood. Except...
we actually don't have any `ManyToManyField` definitions without an
explicit `through` argument. Maybe we did when this code was written,
but today we no longer do.

The removed bits of code were thus never really executed, since they
were checking for a state of affairs that wasn't really a problem in our
actual code base. They were also buggy: the comment said they were
checking for `through`-ness, but there was no code to this effect,
resulting in double dependencies.

This change removes the check altogether, and adds an invariant test to
ensure that all `ManyToManyField`s defined in the future carry a
`through` argument.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 14, 2023
Given two instances of a model that are being validated for equality, we
want the foreign keys to be correct relatively (ie, they point to the
same model in the respective models' JSON blobs), not absolutely (ie,
they are literally the same integer). By creating maps that store the
relations between pks and ordinals, we can easily check that the models
point to match their respective ordinals regardless of the actual pk
numbers.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 15, 2023
Given two instances of a model that are being validated for equality, we
want the foreign keys to be correct relatively (ie, they point to the
same model in the respective models' JSON blobs), not absolutely (ie,
they are literally the same integer). By creating maps that store the
relations between pks and ordinals, we can easily check that the models
point to match their respective ordinals regardless of the actual pk
numbers.

Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 15, 2023
When importing into a database that is not entirely flushed, sequences
and all, we run into a problem: the foriegn keys in the source JSON file
will not match the primary keys of the newly `INSERT`ed models we will
be importing. This will cause downstream imports that depend on upstream
imports to fail.

To resolve this, we maintain a `PrimaryKeyMap` of old pks to new pks for
every model. As we import models, we take note of their new pks, so that
when foreign key references to these models are encountered later on, we
can perform a simple replacement.

This generally works well enough, but because we have a circular
dependency between `Actor` and `Team`, we must take care to do the
appropriate set of dance moves to avoid writing `Actor`s with
(necessarily) non-existent `Team` references.

To test these changes, I've modified all of `test_models` to *not* reset
sequences between database uploads. This should ensure that every such
test will produce two JSON files with differing pks, which should give
us fairly thorough coverage.

Issue: getsentry/team-ospo#170
Issue: getsentry/team-ospo#171
azaslavsky added a commit to getsentry/sentry that referenced this issue Aug 15, 2023
When importing into a database that is not entirely flushed, sequences
and all, we run into a problem: the foriegn keys in the source JSON file
will not match the primary keys of the newly `INSERT`ed models we will
be importing. This will cause downstream imports that depend on upstream
imports to fail.

To resolve this, we maintain a `PrimaryKeyMap` of old pks to new pks for
every model. As we import models, we take note of their new pks, so that
when foreign key references to these models are encountered later on, we
can perform a simple replacement.

This generally works well enough, but because we have a circular
dependency between `Actor` and `Team`, we must take care to do the
appropriate set of dance moves to avoid writing `Actor`s with
(necessarily) non-existent `Team` references.

To test these changes, I've modified all of `test_models` to *not* reset
sequences between database uploads. This should ensure that every such
test will produce two JSON files with differing pks, which should give
us fairly thorough coverage.

Issue: getsentry/team-ospo#170
Issue: getsentry/team-ospo#171
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant