Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s-gubernator:build dataset is stale #20599

Closed
spiffxp opened this issue Jan 25, 2021 · 12 comments
Closed

k8s-gubernator:build dataset is stale #20599

spiffxp opened this issue Jan 25, 2021 · 12 comments
Assignees
Labels
area/kettle area/triage kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Milestone

Comments

@spiffxp
Copy link
Member

spiffxp commented Jan 25, 2021

What happened:
https://testgrid.k8s.io/sig-testing-misc#metrics-kettle&width=20 - fails if k8s-gubernator:build.all is more than 6 hours stale, sends email to kubernetes-sig-testing+alerts@

Most recent failure https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/metrics-kettle/1353794161165209600

I0125 19:58:18.941] ERROR: table k8s-gubernator:build.all is 34.7 hours old. Max allowed: 6 hours.
I0125 19:58:18.941] ERROR: table k8s-gubernator:build.week is 34.2 hours old. Max allowed: 6 hours.
I0125 19:58:18.941] ERROR: table k8s-gubernator:build.day is 33.8 hours old. Max allowed: 6 hours.

What you expected to happen:
Kettle should be keeping these up to date

How to reproduce it (as minimally and precisely as possible): n/a

Please provide links to example occurrences, if any: see above

Anything else we need to know?:
FYI @kubernetes/ci-signal things that query from k8s-gubernator:build will also be stale until this is resolved, e.g.

/priority important-soon
/sig testing
/area kettle
/assign @MushuEE

@spiffxp spiffxp added the kind/bug Categorizes issue or PR as related to a bug. label Jan 25, 2021
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. area/kettle labels Jan 25, 2021
@MushuEE
Copy link
Contributor

MushuEE commented Jan 26, 2021

WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
16750.55user 1339.34system 3:15:33elapsed 154%CPU (0avgtext+0avgdata 6603536maxresident)k
13384inputs+57032outputs (0major+4052238minor)pagefaults 0swaps

Another freeze from CPU pinning.

@MushuEE
Copy link
Contributor

MushuEE commented Jan 26, 2021

Tasks:   8 total,   2 running,   6 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.8 us,  0.6 sy,  0.0 ni, 85.4 id,  0.0 wa,  0.0 hi,  0.1 si,  0.1 st
KiB Mem : 53588008 total,  1185340 free,  8723756 used, 43678912 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 44343416 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 254477 root      20   0 6429740 5.927g  42916 R  99.3 11.6   2464:04 pypy3

@MushuEE
Copy link
Contributor

MushuEE commented Jan 26, 2021

 ~/w/test-infra   docs/started_finished.json     kubectl scale deployment kettle --replicas=0
deployment.apps/kettle scaled
 ~/w/test-infra   docs/started_finished.json     kubectl scale deployment kettle --replicas=1

performed restart cycle last night, will need to monitor. Python might not be the best for this job.

Tables are updating and will monitor for a while before closing

@MushuEE
Copy link
Contributor

MushuEE commented Jan 26, 2021

Seeing pinning again:

root@kettle-766bfc5b9f-pxnrf:/# ps -p 596 -o args
COMMAND
pypy3 make_json.py --reset-emitted --days 7
596 root      20   0 6553672 6.039g  42640 R  99.7 11.8 499:25.91 pypy3

@spiffxp
Copy link
Member Author

spiffxp commented Jan 29, 2021

/area triage
I checked in on https://go.k8s.io/triage just now (was trying to figure out if integration flake is hitting me or more PR's) and saw this
Screen Shot 2021-01-28 at 8 40 45 PM

@MushuEE
Copy link
Contributor

MushuEE commented Jan 29, 2021

I saw this also, I have been trying to recover kettle. Very difficult to debug this behavior.

@MushuEE
Copy link
Contributor

MushuEE commented Jan 29, 2021

Trialing a fix for atleast the streaming failure now in staging

@MushuEE
Copy link
Contributor

MushuEE commented Feb 1, 2021

Migrating to #20024 for commentary on progress.

@spiffxp
Copy link
Member Author

spiffxp commented Feb 9, 2021

/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 9, 2021
@MushuEE
Copy link
Contributor

MushuEE commented Mar 4, 2021

I think we can close this as a dupe of #13432

@spiffxp
Copy link
Member Author

spiffxp commented Mar 5, 2021

/close
SGTM

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
SGTM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kettle area/triage kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

3 participants