-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(seer grouping): Add Seer-related ingest helpers #70999
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #70999 +/- ##
===========================================
+ Coverage 0 77.89% +77.89%
===========================================
Files 0 6515 +6515
Lines 0 290393 +290393
Branches 0 50252 +50252
===========================================
+ Hits 0 226215 +226215
- Misses 0 57936 +57936
- Partials 0 6242 +6242
|
2ac0867
to
7df4e9e
Compare
num_neighbors: int = 1, | ||
) -> tuple[ | ||
dict[ | ||
str, str | list[dict[str, float | bool | int | str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth making a TypedDict
for this type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh This is why I say Python is the IHOP of typing. I agree this isn't great, but there's no way that I've been able to figure out (and googling suggests it's because no way exists) to create a typeddict out of a dataclass.
So I can, but it'll just mean hard coding a second copy of all of the attributes from both the SeerSimilarIssuesMetadata
and SeerSimilarIssueData
dataclasses, which feels worse to me (I think because a change to either one of those classes would be guaranteed to make the hypothetical typeddict need updating whereas there's a good chance they could be changed without this type needing to be any different).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah totally. its good as is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Googling is so 2010s :-P Just ask Copilot ;-)
def dataclass_to_typeddict(cls):
return TypedDict(cls.__name__ + 'Dict', {k: v for k, v in get_type_hints(cls).items()})
SeerSimilarIssuesMetadataDict = dataclass_to_typeddict(SeerSimilarIssuesMetadata)
SeerSimilarIssueDataDict = dataclass_to_typeddict(SeerSimilarIssueData)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only, @vartec. TIL about get_type_hints
, but my very first attempt to wrestle with the dataclass/typeddict divide was to do something like this:
fields = {"a": int, "b": str}
DerivedTypedDict = TypedDict("DerivedTypedDict", fields)
and then construct the dataclass from the same fields
value. Alas, mypy will have none of it: error: TypedDict() expects a dictionary literal as the second argument
.
Here's an open issue about it in the mypy repo: python/mypy#4128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy is the worst...
7df4e9e
to
49ec31c
Compare
49ec31c
to
ebf05ed
Compare
if ( | ||
event.title in PLACEHOLDER_EVENT_TITLES | ||
and not get_path(event.data, "exception", "values", -1, "stacktrace", "frames") | ||
and not get_path(event.data, "threads", "values", -1, "stacktrace", "frames") | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining this condition, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( | |
event.title in PLACEHOLDER_EVENT_TITLES | |
and not get_path(event.data, "exception", "values", -1, "stacktrace", "frames") | |
and not get_path(event.data, "threads", "values", -1, "stacktrace", "frames") | |
): | |
if not ( | |
get_path(event.data, "exception", "values", -1, "stacktrace", "frames") | |
or get_path(event.data, "threads", "values", -1, "stacktrace", "frames") | |
or event.title not in PLACEHOLDER_EVENT_TITLES | |
): |
ebf05ed
to
2d9484b
Compare
2d9484b
to
b2bc90b
Compare
This uses the helpers added in #70999 to - depending on the state of the `projects:similarity-embeddings-metadata` and `projects:similarity-embeddings-grouping` flags - decide whether we should call Seer before creating a new group, make the API call if so, and then store the results and/or use them to actually prevent new group creation in favor of using an existing similar issue. The behavior is as follows: | metadata | grouping | call | metadata in | metadata in | use Seer-matched | | flag | flag | Seer? | event? | group? | group, if any? | |-----------|----------|-------|-------------|-------------|------------------| | off | off | no | - | - | - | | on | off | yes | yes * | yes | no | | on or off | on | yes | yes * | only if new | yes | * For now, the only event with the data will be the event which triggers the Seer call, not subsequent events with that hash. In the long run we will probably need to store the data on the `GroupHash` record itself. See #70454. This should be enough for us to run a POC on S4S and measure the effect on grouping.
This adds two helpers, `should_call_seer_for_grouping` and `get_seer_similar_issues`, to be used when we (maybe) call Seer as part of event ingestion. `should_call_seer_for_grouping` does exactly what you'd think given the name, right now only basing the decision on feature flags and whether or not the event has a usable title and/or stacktrace. In the future we'll also include rate limit and killswitch checks, and any other criteria which it makes sense to add. `get_seer_similar_issues` is a wrapper around `get_similarity_data_from_seer` (which is what actually makes the API call to Seer). It extracts request data from the given event, makes the request, pulls together metadata about the results, and if a matching group is found and the flag is on, pulls the `Group` record out of the database. (I chose to put the feature flag check there rather than in the code where the the the grouping actually happens so that we can save the trip to the database if we're not going to end up using the results for grouping.) Code to actually use these helpers is added in #71026.
This uses the helpers added in #70999 to - depending on the state of the `projects:similarity-embeddings-metadata` and `projects:similarity-embeddings-grouping` flags - decide whether we should call Seer before creating a new group, make the API call if so, and then store the results and/or use them to actually prevent new group creation in favor of using an existing similar issue. The behavior is as follows: | metadata | grouping | call | metadata in | metadata in | use Seer-matched | | flag | flag | Seer? | event? | group? | group, if any? | |-----------|----------|-------|-------------|-------------|------------------| | off | off | no | - | - | - | | on | off | yes | yes * | yes | no | | on or off | on | yes | yes * | only if new | yes | * For now, the only event with the data will be the event which triggers the Seer call, not subsequent events with that hash. In the long run we will probably need to store the data on the `GroupHash` record itself. See #70454. This should be enough for us to run a POC on S4S and measure the effect on grouping.
This adds two helpers,
should_call_seer_for_grouping
andget_seer_similar_issues
, to be used when we (maybe) call Seer as part of event ingestion.should_call_seer_for_grouping
does exactly what you'd think given the name, right now only basing the decision on feature flags and whether or not the event has a usable title and/or stacktrace. In the future we'll also include rate limit and killswitch checks, and any other criteria which it makes sense to add.get_seer_similar_issues
is a wrapper aroundget_similarity_data_from_seer
(which is what actually makes the API call to Seer). It extracts request data from the given event, makes the request, pulls together metadata about the results, and if a matching group is found and the flag is on, pulls theGroup
record out of the database. (I chose to put the feature flag check there rather than in the code where the the the grouping actually happens so that we can save the trip to the database if we're not going to end up using the results for grouping.)Code to actually use these helpers is added in #71026.