Memory leaks / is better memory management possible? #4239

pplonski · 2021-04-28T15:38:48Z

Description

I'm working on AutoML package. My users observed increased memory usage (mljar/mljar-supervised#381) so I started to dig.

I found that LightGBM consumes a lot of RAM and it doesn't release it, even if the model is deleted.

Reproducible example

import gc
import numpy as np
import pandas as pd
from sklearn import datasets
from lightgbm import LGBMRegressor


def mem(msg=""):
    """ Memory usage in MB """
    with open("/proc/self/status") as f:
        memusage = f.read().split("VmRSS:")[1].split("\n")[0][:-3]

    print(msg, "- memory:", np.round(float(memusage.strip()) / 1024.0), "MB")

mem("Start")

X, y = datasets.make_regression(
    n_samples=100000,
    n_features=1000,
    n_informative=5,
    random_state=0,
)

mem("Created data frame")

for i in range(5):
    gbm = LGBMRegressor()
    gbm.fit(X, y)
    del gbm
    gc.collect()
    mem(f"Iteration #{i}")

del X
del y
gc.collect()
mem("End of script")

Output:

Start - memory: 109.0 MB
Created data frame - memory: 875.0 MB
Iteration #0 - memory: 2270.0 MB
Iteration #1 - memory: 2515.0 MB
Iteration #2 - memory: 2600.0 MB
Iteration #3 - memory: 2713.0 MB
Iteration #4 - memory: 2643.0 MB
End of script - memory: 1880.0 MB

Environment info

LightGBM version 3.2.1
Python 3.8.5
OS Ubuntu 20.04

Additional Comments

I love using LightGBM because of its speed. It is much faster than other GBMs especially on multiclass classification tasks with many classes > 50.

The text was updated successfully, but these errors were encountered:

guolinke · 2021-05-09T23:16:45Z

cc @shiyu1994

shiyu1994 · 2021-05-11T08:35:04Z

The memory leakage happens from version 3.0.0. Now I can only identify the leakage is from Dataset. But I've checked that the destructors of Dataset, FeatureGroup and DenseBin are all called. I'm still trying to identify the exact source of leakage.

ravehun · 2021-07-21T06:29:18Z

memory leak with LGBMRegressor(device_type='cuda').

import gc
import numpy as np
import pandas as pd
from sklearn import datasets
from lightgbm import LGBMRegressor


def mem(msg=""):
    """ Memory usage in MB """
    with open("/proc/self/status") as f:
        memusage = f.read().split("VmRSS:")[1].split("\n")[0][:-3]

    print(msg, "- memory:", np.round(float(memusage.strip()) / 1024.0), "MB")

mem("Start")

X, y = datasets.make_regression(
    n_samples=1000,
    n_features=1000,
    n_informative=5,
    random_state=0,
)

mem("Created data frame")

for i in range(50000):
    gbm = LGBMRegressor(device_type='cuda')
    gbm.fit(X, y)
    del gbm
    gc.collect()
    mem(f"Iteration #{i}")

gc.collect()
mem("End of script")

outputs:

Start - memory: 93.0 MB
Created data frame - memory: 112.0 MB
Iteration #0 - memory: 360.0 MB
Iteration #1 - memory: 372.0 MB
Iteration #2 - memory: 376.0 MB
Iteration #3 - memory: 380.0 MB
Iteration #4 - memory: 384.0 MB
Iteration #5 - memory: 388.0 MB
Iteration #6 - memory: 392.0 MB
Iteration #7 - memory: 395.0 MB
Iteration #8 - memory: 399.0 MB
Iteration #9 - memory: 403.0 MB

guolinke · 2022-03-01T13:12:41Z

@shiyu1994 any updates for this issue?

ennnas · 2023-04-12T09:38:42Z

Are there any solutions/workarounds to this problem?

RyanShahidi · 2023-04-23T06:13:49Z

Just to add a small datapoint, I also experience this issue when trying to fit the LGBMClassifier and device type cuda. I also attempted deleting the model similar to @ravehun with no improvement. I do not seem to have the memory leak when using the cpu, but would greatly prefer to use the gpu if this issue can be resolved.

jameslamb · 2024-04-24T03:30:00Z

I strongly suspect that this has been fixed by changes to LightGBM, its dependencies, or Python in the 3 years since it was first reported.

I ran the following today on an M2 mac (so arm64 architecture), to set up a Linux environment with lightgbm==4.3.0.

docker run \
    --rm \
    -it python:3.11 \
    bash

pip install 'lightgbm==4.3.0' 'pandas>=2.2.2' 'scikit-learn>=1.4.2'

Ran a slightly-modified version of the original script provided for this issue (just added verbose=-1 to suppress LightGBM's logs and did 20 consecutive runs instead of 5).

check-lgb.py (click me)

cat << EOF > check-lgb.py
import gc
import numpy as np
import pandas as pd
from sklearn import datasets
from lightgbm import LGBMRegressor


def mem(msg=""):
    """ Memory usage in MB """
    with open("/proc/self/status") as f:
        memusage = f.read().split("VmRSS:")[1].split("\n")[0][:-3]

    print(msg, "- memory:", np.round(float(memusage.strip()) / 1024.0), "MB")

mem("Start")

X, y = datasets.make_regression(
    n_samples=100000,
    n_features=1000,
    n_informative=5,
    random_state=0,
)

mem("Created data frame")

for i in range(20):
    gbm = LGBMRegressor(verbose=-1)
    gbm.fit(X, y)
    del gbm
    gc.collect()
    mem(f"Iteration #{i}")

del X
del y
gc.collect()
mem("End of script")
EOF

python ./check-lgb.py

I don't see evidence of a memory leak.

Start - memory: 144.0 MB
Created data frame - memory: 910.0 MB
Iteration #0 - memory: 1683.0 MB
Iteration #1 - memory: 2249.0 MB
Iteration #2 - memory: 2249.0 MB
Iteration #3 - memory: 2249.0 MB
Iteration #4 - memory: 2249.0 MB
Iteration #5 - memory: 1672.0 MB
Iteration #6 - memory: 1672.0 MB
Iteration #7 - memory: 2249.0 MB
Iteration #8 - memory: 2249.0 MB
Iteration #9 - memory: 2249.0 MB
Iteration #10 - memory: 2249.0 MB
Iteration #11 - memory: 2249.0 MB
Iteration #12 - memory: 2249.0 MB
Iteration #13 - memory: 2249.0 MB
Iteration #14 - memory: 1672.0 MB
Iteration #15 - memory: 1672.0 MB
Iteration #16 - memory: 1672.0 MB
Iteration #17 - memory: 1672.0 MB
Iteration #18 - memory: 1672.0 MB
Iteration #19 - memory: 1672.0 MB
End of script - memory: 909.0 MB

Some other things that make me think this may have been fixed in recent versions of LightGBM.

the R package tests are run under valgrind on every release (https://github.com/microsoft/LightGBM/tree/master/R-package#valgrind), and those cover a significant portion of the C API
the C++ tests are run with clang-17's leak sanitizer on every commit
- LightGBM/.vsts-ci.yml
  
  Lines 156 to 158 in 1443548
  
  cpp_tests:
  
  TASK: cpp-tests
  
  METHOD: with-sanitizers
- LightGBM/CMakeLists.txt
  
  Lines 10 to 12 in 1443548
  
  set(
  
  ENABLED_SANITIZERS
  
  "address" "leak" "undefined"

Anyone reporting that this is "still" a problem, please provide a reproducible example using a recent (>=4.0.0) version of LightGBM and we'd be happy to investigate it. For issues specific to the CUDA version of the package, please open a new issue.

I'm adding the label awaiting response, so this will be automatically closed in 30 days.

pplonski · 2024-04-24T07:14:15Z

Thank you @jameslamb and whole LightGBM team!

jameslamb added the question label Apr 28, 2021

StrikerRUS added bug and removed question labels Jun 9, 2021

StrikerRUS mentioned this issue Jul 12, 2021

release 3.3.0 #4310

Closed

21 tasks

hzy46 mentioned this issue Oct 20, 2021

[Draft] Oct~Nov iteration Plan #4677

Closed

16 tasks

jameslamb mentioned this issue Nov 23, 2022

Is it possible to train estimator for 1500 classes? #5583

Closed

jameslamb mentioned this issue Nov 13, 2023

Custom Weighted Absolute Error loss does not work as expected #6188

Closed

pplonski mentioned this issue Dec 14, 2023

LightGBM causes Optuna to go out of memory and crash mljar/mljar-supervised#687

Open

jameslamb added the awaiting response label Apr 24, 2024

pplonski closed this as completed Apr 24, 2024

github-actions bot removed the awaiting response label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leaks / is better memory management possible? #4239

Memory leaks / is better memory management possible? #4239

pplonski commented Apr 28, 2021

guolinke commented May 9, 2021

shiyu1994 commented May 11, 2021

ravehun commented Jul 21, 2021

guolinke commented Mar 1, 2022

ennnas commented Apr 12, 2023

RyanShahidi commented Apr 23, 2023

jameslamb commented Apr 24, 2024

pplonski commented Apr 24, 2024