Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI Problem] x386 CI running out of RAM #10180

Closed
mbrookhart opened this issue Feb 7, 2022 · 6 comments
Closed

[CI Problem] x386 CI running out of RAM #10180

mbrookhart opened this issue Feb 7, 2022 · 6 comments

Comments

@mbrookhart
Copy link
Contributor

On a recent PR that added a few extra tests to Relay, we discovered that pytest was running over the 4GB RAM limit on the x386 CI job. We fixed this by reducing the memory use of the failing test ~10%, but we're getting to the point in our test size were running pytest tests/python/relay seems to be accumulating too much in RAM via the tests and pytest logs to actually run on x386. I imagine we'll hit this again in the future, should we perhaps write a bash script to run the test files 1 by 1 for the 32 bit job?

cc @driazati @areusch

Also wondering if @leandron might have some thoughts.

Branch/PR Failing

#10026

@areusch
Copy link
Contributor

areusch commented Feb 7, 2022

we could also try to investigate why pytest-forked doesn't like GPUs. could you post any information you have about that?

@mbrookhart
Copy link
Contributor Author

I attempted to fix this using pytest --forked here: #10174

But it failed a lot of tests on a lot of jobs related to GPU. I got the feeling that initializing the GPU interface on the main thread and then trying to access it from the forked thread broke an assumption somewhere in the stack, but I didn't dig very deeply on what the root cause might be.

@masahi
Copy link
Member

masahi commented Feb 7, 2022

@mbrookhart Can you point to the failed log from a job in #10026?

@FranckQC
Copy link
Contributor

FranckQC commented Feb 8, 2022

Hi everyone.
It looks like I have the same issue on this i386 test on the CSE PR : #9482

Let's see how the current build will end up (in theory the docs should be ok this time, and the Windows buikd too, it was just due to a URL change for the docs, and to a Github maintenance for the Windows build).

@mbrookhart
Copy link
Contributor Author

@masahi https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10026/20/pipeline

@driazati
Copy link
Member

driazati commented Aug 9, 2022

cautiously closing this since we've changed the CI infra good bit in the meantime, please re-open if this happens again

@driazati driazati closed this as completed Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants