-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test result ingestion into BigQuery via Kettle is hiccuping increasingly frequently #13432
Comments
/sig testing |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
@spiffxp: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen We no longer have velodrome. But we do have a job that fails if Most recent example
This suggests kettle is pretty consistently not updating bigquery from 1am - 12pm PT, which means go.k8s.io/triage is serving stale data during CET working hours |
@spiffxp: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
FYI @MushuEE |
@spiffxp thanks, will start looking into it. |
Span is increasing deltas, not sure if kettle is taking longer to process all the data or not https://k8s-testgrid.appspot.com/sig-testing-misc#metrics-kettle It might be worth it to start publishing the time that steps take to run within commands. Will add logging shortly |
Seeing fairly frequent failure on build.week calls. Though there is not much more context... |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Still seeing hiccups, but the duration has slowed |
Thanks, I will check on the instances |
Does appear we had another lockup |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
http://velodrome.k8s.io/dashboard/db/bigquery-metrics?panelId=12&fullscreen&orgId=1&from=now-6M&to=now
Lots of alerts happening. We've only had one or two weeks in Q2 where an alert hasn't fired.
What you expected to happen:
No alerts to fire.
How to reproduce it (as minimally and precisely as possible):
Please provide links to example occurrences, if any:
Anything else we need to know?:
I suspect this isn't due to Kettle being written in python, but rather the pattern that Kettle follows. It may require a redesign or rewrite but I'm not necessarily sure it's mandated that we switch languages for performance. I suspect i/o is our problem here (due to unbounded growth of a sqlite database)
/area kettle
/area metrics
The text was updated successfully, but these errors were encountered: