-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add unique constraint to tagged_objects #26654
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #26654 +/- ##
==========================================
- Coverage 69.56% 69.46% -0.11%
==========================================
Files 1892 1894 +2
Lines 74162 74150 -12
Branches 8263 8263
==========================================
- Hits 51593 51506 -87
- Misses 20488 20563 +75
Partials 2081 2081
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
.alias("min_ids") | ||
) | ||
|
||
delete_query = tagged_object_table.delete().where( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little scary, needs review, but should be ok
Thanks @mistercrunch for the cleanup. There's been a slew of PRs in recent months where we're tried to adopt the "shift left" approach having added various uniqueness/foreign key constraints and cascade deletes at the database level. |
Yeah @john-bodley it seemed hard to fix the cascading delete, shit-left / prevention seemed like the way to go. That delete on the migration could use another set of eyes. I'm pretty sure it's good, but a little scary to look at. I wouldn't want to wipe tag associations for folks out there. |
Oh also about this, I noticed that running the alembic auto-migration, there are some extra metadata around index/constraints/nullable in the ORM that are not in sync with set of migrations (not implemented in the database). Maybe a few dozens or so, it could be good to create a sync migration PR eventually to sync up everything. |
@mistercrunch I wonder if that's because we're not tightly overly coupled with Flask-Migrate in terms of how migrations are defined and the lack of auto generation. I tried to tackle that in #26172 but hit a brick wall. There's a discussion I created in Flask-Migrate related to it as well. |
6ade642
to
d948a5f
Compare
type_ = TagType.custom | ||
tag_name = name.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could have led to dups, if you add "hello" and "hello " it would probably trigger an error. Wasn't the root cause but might as well clean it up.
import sys | ||
|
||
# hack to be able to import / reuse migration_utils.py in revisions | ||
module_dir = os.path.dirname(os.path.realpath(__file__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a breakthrough, I tried before to get some sort of migration_utils.py
module going to reuse code across revisions, and never was able to do it. Here's the hack that enables it. This is great because constraint handling and such get verbose if done right, and all the dialect-specific stuff shouldn't be repeated in each revision. Hoping we can grow migration_utils
in the future to simplify and improve revisions.
@@ -63,7 +63,7 @@ def test_create_custom_tag_command(self): | |||
example_dashboard = ( | |||
db.session.query(Dashboard).filter_by(slug="world_health").one() | |||
) | |||
example_tags = ["create custom tag example 1", "create custom tag example 2"] | |||
example_tags = {"create custom tag example 1", "create custom tag example 2"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if that was flaky before, but it turned flaky with the new logic
I ran into this issue where I had duplicates in tagged_object that prevented the deletion of a dashboard. This PR adds a unique constraint to that table, preventing this issue from happening again. This may create another error on update or insert of a dashboard object, hoping a unit test would catch that if there's such a raise condition. Handling it on the cascade delete seemed harder than preventing it, and prevention is also a great thing. Note that this includes a migration that does delete eventual dups too.
We are maintaining a fork based on 3.1.0, so running In my understanding that unique constraint does not stop the situation that I mentioned earlier, as the combination of
Of course it's possible that there exists a relevant code change/migration that was added through another ticket that I'm not aware of yet and that isn't included in 3.1 (but in a later version and/or master it is) Thank you for the quick response! |
Thanks for the clarifications, I'm encountering the same issue. |
@mistercrunch I also encountered this problem. It seems that 4.0.0 cannot delete the chart normally. |
@zijiwork can you share your stacktrace? From my understanding, the [bad] state that I was in prior to this particular PR and the error I had should not be possible after my fix. My fix not only removed the duplicates for the particular combination of fields, but it prevented the same issue from occurring again with a unique constraint on the database. Now I'm guessing that your error, while similar, must have been slightly different, especially since i see you seem to have the Heard @Vitor-Avila might have seen something similar too (?) if you can chime in. |
@mistercrunch stacktrace logs
|
@mistercrunch we got a few reports of users facing errors to delete objects on our end as well (after the fix was released to production). Here's an example stack trace:
I managed to reproduce with a certain dashboard on my end as well, but it's inconsistent (I can't reliably reproduce with every new dashboard). |
Thinking about this, I think I understand the issue better now, and it's not the same issue as the one I tackled in this PR originally - though it raises the same exception... Looking simply at this: class TaggedObject(Model, AuditMixinNullable):
"""An association between an object and a tag."""
__tablename__ = "tagged_object"
id = Column(Integer, primary_key=True)
tag_id = Column(Integer, ForeignKey("tag.id"))
object_id = Column(
Integer,
ForeignKey("dashboards.id"),
ForeignKey("slices.id"),
ForeignKey("saved_query.id"),
)
object_type = Column(Enum(ObjectType))
tag = relationship("Tag", back_populates="objects", overlaps="tags") there's nothing that clarifies the nature of the relationship being dependent on object_type. For sqlalchemy to properly cascade deletes here from dashboard or other object type, is has to apply a condition on object_type, which I'm guessing it doesn't. I really can't given that semantically we don't really tell it as we defined the model. In terms of solutions, there's either:
I think option #1 seems best AFAIC, if possible. |
actually nevermind the relationship is defined well on the other side: class Dashboard(AuditMixinNullable, ImportExportMixin, Model):
{{ ... }}
tags = relationship(
"Tag",
overlaps="objects,tag,tags,tags",
secondary="tagged_object",
primaryjoin=f"and_(Dashboard.id == TaggedObject.object_id)",
secondaryjoin="and_(TaggedObject.tag_id == Tag.id, TaggedObject.object_type == 'dashboard')",
) |
I wasn't able to recreate the issue (tried to associate a tag to all the objects I could, and then tried deleting dashboards after that, hoping to hit the same issue). BUT, here's one idea I'd recommend trying, promoting the object_type filter to the
I think it's worth giving it a shot in your environment. |
I wasn't able to reproduce the issues in #26654, but wanted to submit this diff here as something to try. I'm not likely to push this through but thought a PR would be a good way to suggest a fix and open centralize the discussion.
Can someone with the issue try this DRAFT solution? #28769 |
I tried to use the DRAFT solution, but there still seems to be a problem
|
Can you run a The other option I mentioned is to not rely on the ORM's cascading the delete and handling this proactively in the dao here superset/daos/dashboard.py. I don't think it's as clean of a solution and may have to be repeated for all "taggable" object types. |
@Vitor-Avila Thank you, it seems that the issue still exists. The record with ID 71 has been deleted I tried using this PR for repair, but it didn't solve the problem
|
@Vitor-Avila Thanks for your #29229 PR, it helped me solve the problem. |
great news, @zijiwork! I'm glad it worked 🙌 |
@Vitor-Avila how can I test #29229 to see if it fixes my issue? I have a running instance using the 4.0.0 image tag, I'm not sure how I can use #29229 as it doesnt seem to have been released yet, but I'm having the same issue described |
You can cherry-pick the commit on top of 4.0.0 https://git-scm.com/docs/git-cherry-pick If there's conflict and you're in a sandbox/dev envrionment, you can simply run the latest |
SUMMARY
I ran into this issue where I had duplicates in tagged_object that prevented the deletion of a dashboard. This PR adds a unique constraint to that table, preventing this issue from happening again. This may create another error on update or insert of a dashboard object, hoping a unit test would catch that if there's such a raise condition.
The error I observed was:
Looking into my database and researching the issue, I found dups in my tagged_object table:
Handling it on the cascade delete seemed harder than preventing it, and prevention is also a great thing. Note that this includes a migration that does delete eventual dups too.
phase 2 - fixing the root cause
After setting up the unique constraint, I hit the root cause, something around auto-owner management where we maintain special owner tags for objects. The approach was to delete-and-recreate them all systematically, and I think the logic + transaction handling was the root cause for the dups. I changed the logic there to go id-by-id and remove and only remove or add individual ids if needed.
I also cleaned up the API to upsert tags it gets instead of raising issue. So if you call the API now to add a tag that's already set, it won't return an error, just add if needed.
reuse code in alembic migration
I cracked a solution to share a module across alembic migrations. Somehow that's way more tricky than it seems, I failed at least a few times before at finding a solution for this. We now have
superset/migrations/migration_utils.py
for that purpose