Intel-MLlib require more memory to run Bayes algorithm. #204

haojinIntel · 2022-04-09T07:35:20Z

I notice that the commit "401891edef8aebefb129f59d7422fb2b26b0f746" add the code to cache label and features columns. This logic causes Intel-MLlib to consume more memory when executing Bayes algorithm. We previously can use 1TB memory to run Bytes algorithm with 450GB data scale. After catching more data, 1.5TB memory is not enough.

haojinIntel · 2022-04-09T07:36:03Z

@xwu99 @minmingzhu Please tracking this issue. Thanks!

xwu99 · 2022-04-10T08:52:20Z

Performance is always a tradeoff. It trades memory for performance. In our experiment, caching did improve some E2E workloads if the memory is enough, since without caching the input RDD will be calculated again and again.
So it is needed to prove the tradeoff is invalid, otherwise I will not consider it as an issue even it can't run the workload you mentioned.

xwu99 · 2022-04-10T09:01:58Z

@minmingzhu Could you also check if we can use persist to memory-and-disk instead of memory-only so that more memory can be released?

minmingzhu mentioned this issue Apr 13, 2022

[ML-204][NaiveBayes] Remove cache from NaiveBayes #208

Merged

xwu99 closed this as completed in #208 Jun 6, 2022

haojinIntel added the performance label Jun 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel-MLlib require more memory to run Bayes algorithm. #204

Intel-MLlib require more memory to run Bayes algorithm. #204

haojinIntel commented Apr 9, 2022

haojinIntel commented Apr 9, 2022

xwu99 commented Apr 10, 2022

xwu99 commented Apr 10, 2022

Intel-MLlib require more memory to run Bayes algorithm. #204

Intel-MLlib require more memory to run Bayes algorithm. #204

Comments

haojinIntel commented Apr 9, 2022

haojinIntel commented Apr 9, 2022

xwu99 commented Apr 10, 2022

xwu99 commented Apr 10, 2022