Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eventuser): Migrate IssuesByTagProcessor away from EventUser #59672

Conversation

NisanthanNanthakumar
Copy link
Contributor

Objective:

Create the for_tags method in the EventUser dataclass. Migrates get_eventuser_callback to use the dataclass.

@NisanthanNanthakumar NisanthanNanthakumar requested a review from a team November 9, 2023 01:31
@NisanthanNanthakumar NisanthanNanthakumar marked this pull request as ready for review November 9, 2023 01:31
@NisanthanNanthakumar NisanthanNanthakumar requested a review from a team November 9, 2023 01:31
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 9, 2023
@NisanthanNanthakumar NisanthanNanthakumar force-pushed the 59394-migrate-issuesbytagprocessor-away-from-eventuser-model branch from 08ca9e4 to eba2bdb Compare November 9, 2023 01:47
@NisanthanNanthakumar NisanthanNanthakumar force-pushed the 59394-migrate-issuesbytagprocessor-away-from-eventuser-model branch from eba2bdb to 9eca754 Compare November 9, 2023 18:54
@@ -97,7 +97,7 @@ def serialize_row(item, key):
}
if key == "user":
euser = item._eventuser
result["id"] = euser.ident if euser else ""
result["id"] = euser.user_id if euser else ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we stop using ident here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm nm, I remember your comment here:

@wedamija the dataclass's id is to mirror the EventUser model id column. user_id in the dataclass will replace the ident column in EventUser

Although now that I'm seeing it it is a little confusing to call it user_id since it implies an id from the User model. Should this at least be user_ident or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EventUser dataclass does not have the property ident. ident in the EventUser model is equivalent to the user_id. This is where we create the EventUser row and you can see that ident is from user.id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! I am ok to rename to user_ident

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here in this PR #59590

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And updated this PR

@NisanthanNanthakumar NisanthanNanthakumar force-pushed the 59394-migrate-issuesbytagprocessor-away-from-eventuser-model branch from 9eca754 to 6bba8b7 Compare November 9, 2023 20:35
result = {}
for value in values:
keyword_filters = {value.split(":", 1)[0]: value.split(":", 1)[-1]}
eventusers = EventUser.for_projects(projects, keyword_filters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Tbh we should probably change for_projects to just return an EventUser dataclass or None, rather than having to unwrap it like this.

src/sentry/utils/eventuser.py Show resolved Hide resolved
src/sentry/utils/eventuser.py Show resolved Hide resolved
Copy link

codecov bot commented Nov 13, 2023

Codecov Report

Merging #59672 (4a8ac6d) into master (2927744) will increase coverage by 0.14%.
Report is 135 commits behind head on master.
The diff coverage is 93.54%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #59672      +/-   ##
==========================================
+ Coverage   80.61%   80.76%   +0.14%     
==========================================
  Files        5175     5180       +5     
  Lines      227259   227578     +319     
  Branches    38243    38289      +46     
==========================================
+ Hits       183214   183795     +581     
+ Misses      38464    38187     -277     
- Partials     5581     5596      +15     
Files Coverage Δ
src/sentry/analytics/events/__init__.py 100.00% <100.00%> (ø)
...c/sentry/analytics/events/eventuser_snuba_query.py 100.00% <100.00%> (ø)
src/sentry/data_export/processors/issues_by_tag.py 100.00% <100.00%> (ø)
src/sentry/search/utils.py 84.19% <100.00%> (ø)
src/sentry/utils/eventuser.py 94.28% <92.30%> (+16.65%) ⬆️

... and 129 files with indirect coverage changes

)

if return_all:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not return_all?

src/sentry/utils/eventuser.py Show resolved Hide resolved
Comment on lines 207 to 209
def _find_matching_items(
snuba_results: List[dict[str, Any]], filters: Mapping[str, Any], filter_boolean: str
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to match on the keyword filters. The way to find a unique EventUser is to look at the sentry:user tag, which maps to the user column in Snuba. The value is assigned here:

set_tag(data, "sentry:user", user.tag_value)

So the user column will just be the result of

def tag_value(self):
"""
Return the identifier used with tags to link this user.
"""
for key, value in self.iter_attributes():
if value:
return f"{KEYWORD_MAP[key]}:{value}"

That tag is mapped to a column here: https://github.com/getsentry/snuba/blob/94c47010b1c6425942684b54d3ead222c0ad181a/snuba/datasets/configuration/events/storages/errors.yaml#L350

So if we had a window function, we'd partition on user and fetch the first row for each user. In this case, we have to group on all the cols and just select the most recent user in code instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh lol! that would have saved a lot of time yesterday 😭. This simplifies the logic alot.

snuba_keyword_filters[snuba_column] = value

matches = self._find_matching_items(data_results, snuba_keyword_filters, filter_boolean)
first_matching_items.extend(matches.values())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just assign directly here? Or just use matches.values() directly to get results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has been removed with the simplification of just using tag_value to get the most recent unique EventUsers


columns = [
Column("project_id"),
Column("group_id"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is group_id for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I removed it. Its a legacy column from the original snuba query for analytics.
I also removed the user column from the Snuba query.

Copy link
Member

@wedamija wedamija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Tests are pretty thorough, thanks.

Make sure you monitor this deploy and be ready for a rollback in case we see anything going wrong. Ideally just at a time when we're both not in meetings.

self,
projects: List[Project],
keyword_filters: Mapping[str, List[Any]],
filter_boolean="AND",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Wondering if this should just be the BooleanOp

project_ids=[p.id for p in projects],
query=query.print(),
count_rows_returned=len(data_results),
count_rows_filtered=len(data_results) - len(results),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be worth including query time here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does raw_snql_query func return this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, might be in the response, I don't remember. Was thinking you could just keep your own timer here

@@ -72,7 +73,10 @@ def from_event(event: Event) -> EventUser:
project_id=event.project_id if event else None,
email=event.data.get("user", {}).get("email") if event else None,
username=event.data.get("user", {}).get("username") if event else None,
name=event.data.get("user", {}).get("name") if event else None,
name=event.data.get("user", {}).get("name")
or event.data.get("user", {}).get("username")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not totally sure we should put username here, should be ok to leave it blank if not provided (unless there's an example elsewhere where we substitute in username)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

for keyword, value in keyword_filters.items():
if not isinstance(value, list):
raise Exception(f"{keyword} filter must be a list of values")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ValueError?

@NisanthanNanthakumar NisanthanNanthakumar merged commit e3f4723 into master Nov 15, 2023
49 checks passed
@NisanthanNanthakumar NisanthanNanthakumar deleted the 59394-migrate-issuesbytagprocessor-away-from-eventuser-model branch November 15, 2023 23:02
@NisanthanNanthakumar NisanthanNanthakumar added the Trigger: Revert Add to a merged PR to revert it (skips CI) label Nov 16, 2023
@getsentry-bot
Copy link
Contributor

PR reverted: a84ed80

getsentry-bot added a commit that referenced this pull request Nov 16, 2023
…User (#59672)"

This reverts commit e3f4723.

Co-authored-by: NisanthanNanthakumar <[email protected]>
@NisanthanNanthakumar NisanthanNanthakumar restored the 59394-migrate-issuesbytagprocessor-away-from-eventuser-model branch November 16, 2023 03:22
@wedamija wedamija deleted the 59394-migrate-issuesbytagprocessor-away-from-eventuser-model branch November 16, 2023 18:12
wedamija pushed a commit that referenced this pull request Nov 16, 2023
…ntUser (#60071)

## Objective:
Create the for_tags method in the EventUser dataclass. Migrates
get_eventuser_callback to use the dataclass.

Unreverts #59672
@github-actions github-actions bot locked and limited conversation to collaborators Dec 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components Trigger: Revert Add to a merged PR to revert it (skips CI)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants