Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel-MLlib require more memory to run Bayes algorithm. #204

Closed
haojinIntel opened this issue Apr 9, 2022 · 3 comments · Fixed by #208
Closed

Intel-MLlib require more memory to run Bayes algorithm. #204

haojinIntel opened this issue Apr 9, 2022 · 3 comments · Fixed by #208

Comments

@haojinIntel
Copy link
Collaborator

I notice that the commit "401891edef8aebefb129f59d7422fb2b26b0f746" add the code to cache label and features columns. This logic causes Intel-MLlib to consume more memory when executing Bayes algorithm. We previously can use 1TB memory to run Bytes algorithm with 450GB data scale. After catching more data, 1.5TB memory is not enough.

image

@haojinIntel
Copy link
Collaborator Author

@xwu99 @minmingzhu Please tracking this issue. Thanks!

@xwu99
Copy link
Collaborator

xwu99 commented Apr 10, 2022

Performance is always a tradeoff. It trades memory for performance. In our experiment, caching did improve some E2E workloads if the memory is enough, since without caching the input RDD will be calculated again and again.
So it is needed to prove the tradeoff is invalid, otherwise I will not consider it as an issue even it can't run the workload you mentioned.

@xwu99
Copy link
Collaborator

xwu99 commented Apr 10, 2022

@minmingzhu Could you also check if we can use persist to memory-and-disk instead of memory-only so that more memory can be released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants