-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
triage: bail on certain global clusters after 30s #17643
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: spiffxp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
triage works by clustering test failures in two stages: - locally: create clusters of test failures for each unique test - globally: merge each test's clusters into a global set of clusters The clustering/merging is done by computing edit distance between the failure text of each test failure or failure cluster and accepting the first pair that has an edit distance of 10% of their combined length. This can add up in the worst case, where edit distance is going to be computed for every existing cluster before creating a new cluster. We've arbitrarily handled it thus far by: - truncating failure text to ~200k~ 10k chars - bailing out on local clustering after 60s per unique test This PR adds: - bailing out on global clustering of pathological / low value clusters after 30s - more logging to see where clustering is working vs. not
/test pull-test-infra-bazel
|
/cc @dims @BenTheElder |
LGTM this definitely helps with the additional logs :) /lgtm |
triage works by clustering test failures in two stages:
The clustering/merging is done by computing edit distance between the
failure text of each test failure or failure cluster and accepting the
first pair that has an edit distance of 10% of their combined length.
This can add up in the worst case, where edit distance is going to be
computed for every existing cluster before creating a new cluster.
We've arbitrarily handled it thus far by:
200k10k charsThis PR adds:
This should address #17625