Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kettle] Process hangs when generating json.gz for 'all' table #20024

Closed
MushuEE opened this issue Nov 23, 2020 · 15 comments
Closed

[Kettle] Process hangs when generating json.gz for 'all' table #20024

MushuEE opened this issue Nov 23, 2020 · 15 comments
Assignees
Labels
area/kettle kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@MushuEE
Copy link
Contributor

MushuEE commented Nov 23, 2020

What would you like to be added:
Add more compute to Kettle instances. Or figure out if CPU is impacting speed/lockup.

Why is this needed:

top - 10:34:45 up 2 days,  5:18,  0 users,  load average: 1.16, 1.19, 1.18
Tasks:   8 total,   2 running,   6 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.4 us,  0.3 sy,  0.0 ni, 86.2 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 53588024 total,  2290884 free,  7649364 used, 43647776 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 45492108 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    619 root      20   0 5952940 5.110g  42432 R 100.0 10.0   2897:56 pypy3
      1 root      20   0   18384   3096   2820 S   0.0  0.0   0:00.02 runner.sh
     40 root      20   0   26768   9056   5204 S   0.0  0.0   0:00.03 python3
    618 root      20   0    4636    836    768 S   0.0  0.0   0:00.00 sh
    620 root      20   0    4700    828    768 S   0.0  0.0   1:05.94 pv
    621 root      20   0    4792   1528   1252 S   0.0  0.0   0:01.64 gzip
    643 root      20   0   18512   3404   3028 S   0.0  0.0   0:00.00 bash
    652 root      20   0   36628   3092   2644 R   0.0  0.0   0:00.03 top

It seems that Kettle Prod is hitting cpu limits when trying to build json. It takes extremely long to complete an update cycle and now seems to freeze at point, not updating at all and "catching" on specific builds logs seem to end there

Error while reading data, error message: JSON parsing error in row starting at position 605377307: Parser terminated before end of string

Error while reading data, error message: JSON parsing error in row starting at position 752722019: Parser terminated before end of string

Error while reading data, error message: JSON parsing error in row starting at position 1126381032: Parser terminated before end of string

ERROR:root:error on gs://pivotal-e2e-results/kubo-windows-2019/1553782223
Traceback (most recent call last):
  File "make_json.py", line 281, in make_rows
    yield rowid, row_for_build(path, started, finished, results)
  File "make_json.py", line 254, in row_for_build
    build = Build.generate(path, tests, started, finished, metadata, repos)
  File "make_json.py", line 94, in generate
    build = cls(path, tests)
  File "make_json.py", line 90, in __init__
    self.populate_path_to_job_and_number()
  File "make_json.py", line 112, in populate_path_to_job_and_number
    raise ValueError(f'unknown build path for {self.path} in known bucket paths')
ValueError: unknown build path for gs://pivotal-e2e-results/kubo-windows-2019/1553782223 in known bucket paths

/area kettle
/assign

@MushuEE MushuEE added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 23, 2020
@MushuEE
Copy link
Contributor Author

MushuEE commented Nov 30, 2020

The logs also seem to freeze here which makes me worry something is getting pinned or hitting an infinite loop

@MushuEE
Copy link
Contributor Author

MushuEE commented Dec 1, 2020

/cc @spiffxp

I am 90% sure this is a symptom of the main failure we are seeing. Previously I had thought that Kettle was failing due to tracebacks on path parsing. But I now know that the failures is being caused by the run of: pypy3 make_json.py | pv | gzip > build_all.json.gz

image

ERROR:root:unknown build path for gs://pivotal-e2e-results/kubo-windows-2019/1553782223 in known bucket paths                ]
^C.7MiB 0:08:05 [0.00 B/s] [

While there is an issue on some paths, it pins CPU and hangs indefinitely. This causes Kettle to simply freeze.
I ran with pypy3 make_json.py --days 90 | pv | gzip > build_all.json.gz and did not hit the same issues. I think I might temporarily just had set the number of days back so it completes until I can figure out why it might be struggling so hard.

@MushuEE MushuEE changed the title [Kettle] CPU in container clipping 100% consistently [Kettle] Process hangs when generating json.gz for 'all' table Dec 1, 2020
@BenTheElder
Copy link
Member

Did we figure this out?

@MushuEE
Copy link
Contributor Author

MushuEE commented Jan 6, 2021

It has something to do with the size of the build_all, the root cause was not figured out. I migrated to a new 30 day table (really just a new all table that only goes back a month from creation). This fixed the issue but I imagine we will hit something again in 3 years. I think the real solution is rearchitecting Kettle to primarily use just Streams to BQ and once a week run a job that checks for any job result mismatches. @amwat and I had talked about this

@MushuEE
Copy link
Contributor Author

MushuEE commented Feb 1, 2021

Current command that is hanging pypy3 make_sjon.py --reset_emitted --days 7

@MushuEE
Copy link
Contributor Author

MushuEE commented Feb 1, 2021

This command is running a json.dump() to std.out. This seems to frequently hang and pick up when run manually. I am not sure if this is a process restraint or a RAM issue.

@MushuEE
Copy link
Contributor Author

MushuEE commented Feb 1, 2021

I have a feeling the offending lines are:

for rowid, row in make_rows(db, builds):
json.dump(row, outfile, sort_keys=True)
outfile.write('\n')
rows_emitted.add(rowid)

where we are printing a TON of json to STDOUT piping to pv into gzip It looks to lock up after a LOT of output.

@MushuEE
Copy link
Contributor Author

MushuEE commented Feb 1, 2021

I think I will try something like this to avoid the STDOUT issues
https://stackoverflow.com/questions/49534901/is-there-a-way-to-use-json-dump-with-gzip

@BenTheElder
Copy link
Member

SGTM

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 6, 2021
@BenTheElder BenTheElder added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 23, 2021
@spiffxp
Copy link
Member

spiffxp commented Aug 9, 2021

/sig testing

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Aug 9, 2021
@spiffxp
Copy link
Member

spiffxp commented Nov 2, 2022

/close
Closing this in favor of #25658 as the umbrella issue for kettle freezing

I didn't see CPU exhaustion when looking at kettle just now

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
Closing this in favor of #25658 as the umbrella issue for kettle freezing

I didn't see CPU exhaustion when looking at kettle just now

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spiffxp
Copy link
Member

spiffxp commented Nov 2, 2022

/remove-kind feature
/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 2, 2022
@k8s-ci-robot k8s-ci-robot removed the kind/feature Categorizes issue or PR as related to a new feature. label Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kettle kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

5 participants