feat(seer grouping): Add Seer-related ingest helpers #70999

lobsterkatie · 2024-05-16T07:15:39Z

This adds two helpers, should_call_seer_for_grouping and get_seer_similar_issues, to be used when we (maybe) call Seer as part of event ingestion.

should_call_seer_for_grouping does exactly what you'd think given the name, right now only basing the decision on feature flags and whether or not the event has a usable title and/or stacktrace. In the future we'll also include rate limit and killswitch checks, and any other criteria which it makes sense to add.

get_seer_similar_issues is a wrapper around get_similarity_data_from_seer (which is what actually makes the API call to Seer). It extracts request data from the given event, makes the request, pulls together metadata about the results, and if a matching group is found and the flag is on, pulls the Group record out of the database. (I chose to put the feature flag check there rather than in the code where the the the grouping actually happens so that we can save the trip to the database if we're not going to end up using the results for grouping.)

Code to actually use these helpers is added in #71026.

codecov · 2024-05-16T07:44:03Z

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 77.89%. Comparing base (1e61cb7) to head (2d9484b).
Report is 1 commits behind head on master.

❗ Current head 2d9484b differs from pull request most recent head b2bc90b

Please upload reports for the commit b2bc90b to get more accurate results.

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #70999       +/-   ##
===========================================
+ Coverage        0   77.89%   +77.89%     
===========================================
  Files           0     6515     +6515     
  Lines           0   290393   +290393     
  Branches        0    50252    +50252     
===========================================
+ Hits            0   226215   +226215     
- Misses          0    57936    +57936     
- Partials        0     6242     +6242

Files	Coverage Δ
src/sentry/grouping/ingest/seer.py	`92.59% <92.59%> (ø)`

... and 6514 files with indirect coverage changes

src/sentry/grouping/ingest/seer.py

JoshFerge · 2024-05-16T16:45:48Z

src/sentry/grouping/ingest/seer.py

+    num_neighbors: int = 1,
+) -> tuple[
+    dict[
+        str, str | list[dict[str, float | bool | int | str]]


worth making a TypedDict for this type?

sigh This is why I say Python is the IHOP of typing. I agree this isn't great, but there's no way that I've been able to figure out (and googling suggests it's because no way exists) to create a typeddict out of a dataclass.

So I can, but it'll just mean hard coding a second copy of all of the attributes from both the SeerSimilarIssuesMetadata and SeerSimilarIssueData dataclasses, which feels worse to me (I think because a change to either one of those classes would be guaranteed to make the hypothetical typeddict need updating whereas there's a good chance they could be changed without this type needing to be any different).

WDYT?

yeah totally. its good as is!

Googling is so 2010s :-P Just ask Copilot ;-)

def dataclass_to_typeddict(cls): return TypedDict(cls.__name__ + 'Dict', {k: v for k, v in get_type_hints(cls).items()}) SeerSimilarIssuesMetadataDict = dataclass_to_typeddict(SeerSimilarIssuesMetadata) SeerSimilarIssueDataDict = dataclass_to_typeddict(SeerSimilarIssueData)

If only, @vartec. TIL about get_type_hints, but my very first attempt to wrestle with the dataclass/typeddict divide was to do something like this:

fields = {"a": int, "b": str} DerivedTypedDict = TypedDict("DerivedTypedDict", fields)

and then construct the dataclass from the same fields value. Alas, mypy will have none of it: error: TypedDict() expects a dictionary literal as the second argument.

Here's an open issue about it in the mypy repo: python/mypy#4128.

mypy is the worst...

vartec · 2024-05-16T18:08:12Z

src/sentry/grouping/ingest/seer.py

+    if (
+        event.title in PLACEHOLDER_EVENT_TITLES
+        and not get_path(event.data, "exception", "values", -1, "stacktrace", "frames")
+        and not get_path(event.data, "threads", "values", -1, "stacktrace", "frames")
+    ):


Can you add a comment explaining this condition, please?

Suggested change

if (

event.title in PLACEHOLDER_EVENT_TITLES

and not get_path(event.data, "exception", "values", -1, "stacktrace", "frames")

and not get_path(event.data, "threads", "values", -1, "stacktrace", "frames")

):

if not (

get_path(event.data, "exception", "values", -1, "stacktrace", "frames")

or get_path(event.data, "threads", "values", -1, "stacktrace", "frames")

or event.title not in PLACEHOLDER_EVENT_TITLES

):

This uses the helpers added in #70999 to - depending on the state of the `projects:similarity-embeddings-metadata` and `projects:similarity-embeddings-grouping` flags - decide whether we should call Seer before creating a new group, make the API call if so, and then store the results and/or use them to actually prevent new group creation in favor of using an existing similar issue. The behavior is as follows: | metadata | grouping | call | metadata in | metadata in | use Seer-matched | | flag | flag | Seer? | event? | group? | group, if any? | |-----------|----------|-------|-------------|-------------|------------------| | off | off | no | - | - | - | | on | off | yes | yes * | yes | no | | on or off | on | yes | yes * | only if new | yes | * For now, the only event with the data will be the event which triggers the Seer call, not subsequent events with that hash. In the long run we will probably need to store the data on the `GroupHash` record itself. See #70454. This should be enough for us to run a POC on S4S and measure the effect on grouping.

This adds two helpers, `should_call_seer_for_grouping` and `get_seer_similar_issues`, to be used when we (maybe) call Seer as part of event ingestion. `should_call_seer_for_grouping` does exactly what you'd think given the name, right now only basing the decision on feature flags and whether or not the event has a usable title and/or stacktrace. In the future we'll also include rate limit and killswitch checks, and any other criteria which it makes sense to add. `get_seer_similar_issues` is a wrapper around `get_similarity_data_from_seer` (which is what actually makes the API call to Seer). It extracts request data from the given event, makes the request, pulls together metadata about the results, and if a matching group is found and the flag is on, pulls the `Group` record out of the database. (I chose to put the feature flag check there rather than in the code where the the the grouping actually happens so that we can save the trip to the database if we're not going to end up using the results for grouping.) Code to actually use these helpers is added in #71026.

This uses the helpers added in #70999 to - depending on the state of the `projects:similarity-embeddings-metadata` and `projects:similarity-embeddings-grouping` flags - decide whether we should call Seer before creating a new group, make the API call if so, and then store the results and/or use them to actually prevent new group creation in favor of using an existing similar issue. The behavior is as follows: | metadata | grouping | call | metadata in | metadata in | use Seer-matched | | flag | flag | Seer? | event? | group? | group, if any? | |-----------|----------|-------|-------------|-------------|------------------| | off | off | no | - | - | - | | on | off | yes | yes * | yes | no | | on or off | on | yes | yes * | only if new | yes | * For now, the only event with the data will be the event which triggers the Seer call, not subsequent events with that hash. In the long run we will probably need to store the data on the `GroupHash` record itself. See #70454. This should be enough for us to run a POC on S4S and measure the effect on grouping.

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 16, 2024

lobsterkatie force-pushed the kmclb-add-seer-ingest-helpers branch from 2ac0867 to 7df4e9e Compare May 16, 2024 08:00

vercel bot deployed to Preview May 16, 2024 08:03 View deployment

lobsterkatie marked this pull request as ready for review May 16, 2024 15:56

lobsterkatie requested a review from a team as a code owner May 16, 2024 15:56

lobsterkatie mentioned this pull request May 16, 2024

feat(seer grouping): Call Seer before creating a new group #71026

Merged

JoshFerge reviewed May 16, 2024

View reviewed changes

src/sentry/grouping/ingest/seer.py Show resolved Hide resolved

JoshFerge reviewed May 16, 2024

View reviewed changes

src/sentry/grouping/ingest/seer.py Show resolved Hide resolved

JoshFerge reviewed May 16, 2024

View reviewed changes

JoshFerge approved these changes May 16, 2024

View reviewed changes

lobsterkatie force-pushed the kmclb-add-seer-ingest-helpers branch from 7df4e9e to 49ec31c Compare May 16, 2024 17:17

vercel bot deployed to Preview May 16, 2024 17:19 View deployment

Base automatically changed from kmclb-add-SeerSimilarIssuesMetadata-type to master May 16, 2024 17:28

lobsterkatie force-pushed the kmclb-add-seer-ingest-helpers branch from 49ec31c to ebf05ed Compare May 16, 2024 17:32

vercel bot deployed to Preview May 16, 2024 17:35 View deployment

vartec reviewed May 16, 2024

View reviewed changes

lobsterkatie force-pushed the kmclb-add-seer-ingest-helpers branch from ebf05ed to 2d9484b Compare May 16, 2024 18:27

vercel bot deployed to Preview May 16, 2024 18:30 View deployment

lobsterkatie added 4 commits May 16, 2024 12:55

add should_call_seer_for_grouping helper

dd735a3

add should_call_seer_for_grouping tests

a1e14a6

add get_seer_similar_issues function

5b9bcfc

add get_seer_similar_issues tests

b2bc90b

lobsterkatie force-pushed the kmclb-add-seer-ingest-helpers branch from 2d9484b to b2bc90b Compare May 16, 2024 19:56

vercel bot deployed to Preview May 16, 2024 19:58 View deployment

lobsterkatie merged commit 08a47c9 into master May 16, 2024
48 checks passed

lobsterkatie deleted the kmclb-add-seer-ingest-helpers branch May 16, 2024 20:56

github-actions bot locked and limited conversation to collaborators Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(seer grouping): Add Seer-related ingest helpers #70999

feat(seer grouping): Add Seer-related ingest helpers #70999

lobsterkatie commented May 16, 2024 •

edited

Loading

codecov bot commented May 16, 2024 •

edited

Loading

JoshFerge May 16, 2024

lobsterkatie May 16, 2024

JoshFerge May 16, 2024

vartec May 16, 2024

lobsterkatie May 16, 2024

vartec May 16, 2024

vartec May 16, 2024

lobsterkatie May 16, 2024

vartec May 16, 2024

feat(seer grouping): Add Seer-related ingest helpers #70999

feat(seer grouping): Add Seer-related ingest helpers #70999

Conversation

lobsterkatie commented May 16, 2024 • edited Loading

codecov bot commented May 16, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lobsterkatie commented May 16, 2024 •

edited

Loading

codecov bot commented May 16, 2024 •

edited

Loading