Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks / is better memory management possible? #4239

Closed
pplonski opened this issue Apr 28, 2021 · 8 comments
Closed

Memory leaks / is better memory management possible? #4239

pplonski opened this issue Apr 28, 2021 · 8 comments
Labels

Comments

@pplonski
Copy link
Contributor

Description

I'm working on AutoML package. My users observed increased memory usage (mljar/mljar-supervised#381) so I started to dig.

I found that LightGBM consumes a lot of RAM and it doesn't release it, even if the model is deleted.

Reproducible example

import gc
import numpy as np
import pandas as pd
from sklearn import datasets
from lightgbm import LGBMRegressor


def mem(msg=""):
    """ Memory usage in MB """
    with open("/proc/self/status") as f:
        memusage = f.read().split("VmRSS:")[1].split("\n")[0][:-3]

    print(msg, "- memory:", np.round(float(memusage.strip()) / 1024.0), "MB")

mem("Start")

X, y = datasets.make_regression(
    n_samples=100000,
    n_features=1000,
    n_informative=5,
    random_state=0,
)

mem("Created data frame")

for i in range(5):
    gbm = LGBMRegressor()
    gbm.fit(X, y)
    del gbm
    gc.collect()
    mem(f"Iteration #{i}")

del X
del y
gc.collect()
mem("End of script")

Output:

Start - memory: 109.0 MB
Created data frame - memory: 875.0 MB
Iteration #0 - memory: 2270.0 MB
Iteration #1 - memory: 2515.0 MB
Iteration #2 - memory: 2600.0 MB
Iteration #3 - memory: 2713.0 MB
Iteration #4 - memory: 2643.0 MB
End of script - memory: 1880.0 MB

Environment info

LightGBM version 3.2.1
Python 3.8.5
OS Ubuntu 20.04

Additional Comments

I love using LightGBM because of its speed. It is much faster than other GBMs especially on multiclass classification tasks with many classes > 50.

@guolinke
Copy link
Collaborator

guolinke commented May 9, 2021

cc @shiyu1994

@shiyu1994
Copy link
Collaborator

The memory leakage happens from version 3.0.0. Now I can only identify the leakage is from Dataset. But I've checked that the destructors of Dataset, FeatureGroup and DenseBin are all called. I'm still trying to identify the exact source of leakage.

@StrikerRUS StrikerRUS added bug and removed question labels Jun 9, 2021
@StrikerRUS StrikerRUS mentioned this issue Jul 12, 2021
21 tasks
@ravehun
Copy link

ravehun commented Jul 21, 2021

memory leak with LGBMRegressor(device_type='cuda').

import gc
import numpy as np
import pandas as pd
from sklearn import datasets
from lightgbm import LGBMRegressor


def mem(msg=""):
    """ Memory usage in MB """
    with open("/proc/self/status") as f:
        memusage = f.read().split("VmRSS:")[1].split("\n")[0][:-3]

    print(msg, "- memory:", np.round(float(memusage.strip()) / 1024.0), "MB")

mem("Start")

X, y = datasets.make_regression(
    n_samples=1000,
    n_features=1000,
    n_informative=5,
    random_state=0,
)

mem("Created data frame")

for i in range(50000):
    gbm = LGBMRegressor(device_type='cuda')
    gbm.fit(X, y)
    del gbm
    gc.collect()
    mem(f"Iteration #{i}")

gc.collect()
mem("End of script")

outputs:

Start - memory: 93.0 MB
Created data frame - memory: 112.0 MB
Iteration #0 - memory: 360.0 MB
Iteration #1 - memory: 372.0 MB
Iteration #2 - memory: 376.0 MB
Iteration #3 - memory: 380.0 MB
Iteration #4 - memory: 384.0 MB
Iteration #5 - memory: 388.0 MB
Iteration #6 - memory: 392.0 MB
Iteration #7 - memory: 395.0 MB
Iteration #8 - memory: 399.0 MB
Iteration #9 - memory: 403.0 MB

@guolinke
Copy link
Collaborator

guolinke commented Mar 1, 2022

@shiyu1994 any updates for this issue?

@ennnas
Copy link

ennnas commented Apr 12, 2023

Are there any solutions/workarounds to this problem?

@RyanShahidi
Copy link

Just to add a small datapoint, I also experience this issue when trying to fit the LGBMClassifier and device type cuda. I also attempted deleting the model similar to @ravehun with no improvement. I do not seem to have the memory leak when using the cpu, but would greatly prefer to use the gpu if this issue can be resolved.

@jameslamb
Copy link
Collaborator

I strongly suspect that this has been fixed by changes to LightGBM, its dependencies, or Python in the 3 years since it was first reported.

I ran the following today on an M2 mac (so arm64 architecture), to set up a Linux environment with lightgbm==4.3.0.

docker run \
    --rm \
    -it python:3.11 \
    bash

pip install 'lightgbm==4.3.0' 'pandas>=2.2.2' 'scikit-learn>=1.4.2'

Ran a slightly-modified version of the original script provided for this issue (just added verbose=-1 to suppress LightGBM's logs and did 20 consecutive runs instead of 5).

check-lgb.py (click me)
cat << EOF > check-lgb.py
import gc
import numpy as np
import pandas as pd
from sklearn import datasets
from lightgbm import LGBMRegressor


def mem(msg=""):
    """ Memory usage in MB """
    with open("/proc/self/status") as f:
        memusage = f.read().split("VmRSS:")[1].split("\n")[0][:-3]

    print(msg, "- memory:", np.round(float(memusage.strip()) / 1024.0), "MB")

mem("Start")

X, y = datasets.make_regression(
    n_samples=100000,
    n_features=1000,
    n_informative=5,
    random_state=0,
)

mem("Created data frame")

for i in range(20):
    gbm = LGBMRegressor(verbose=-1)
    gbm.fit(X, y)
    del gbm
    gc.collect()
    mem(f"Iteration #{i}")

del X
del y
gc.collect()
mem("End of script")
EOF
python ./check-lgb.py

I don't see evidence of a memory leak.

Start - memory: 144.0 MB
Created data frame - memory: 910.0 MB
Iteration #0 - memory: 1683.0 MB
Iteration #1 - memory: 2249.0 MB
Iteration #2 - memory: 2249.0 MB
Iteration #3 - memory: 2249.0 MB
Iteration #4 - memory: 2249.0 MB
Iteration #5 - memory: 1672.0 MB
Iteration #6 - memory: 1672.0 MB
Iteration #7 - memory: 2249.0 MB
Iteration #8 - memory: 2249.0 MB
Iteration #9 - memory: 2249.0 MB
Iteration #10 - memory: 2249.0 MB
Iteration #11 - memory: 2249.0 MB
Iteration #12 - memory: 2249.0 MB
Iteration #13 - memory: 2249.0 MB
Iteration #14 - memory: 1672.0 MB
Iteration #15 - memory: 1672.0 MB
Iteration #16 - memory: 1672.0 MB
Iteration #17 - memory: 1672.0 MB
Iteration #18 - memory: 1672.0 MB
Iteration #19 - memory: 1672.0 MB
End of script - memory: 909.0 MB

Some other things that make me think this may have been fixed in recent versions of LightGBM.

Anyone reporting that this is "still" a problem, please provide a reproducible example using a recent (>=4.0.0) version of LightGBM and we'd be happy to investigate it. For issues specific to the CUDA version of the package, please open a new issue.

I'm adding the label awaiting response, so this will be automatically closed in 30 days.

@pplonski
Copy link
Contributor Author

Thank you @jameslamb and whole LightGBM team!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants