Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Memory leak when runing the example code with <1k points #404

Closed
MDCHAMP opened this issue Oct 27, 2023 · 6 comments
Closed

bug: Memory leak when runing the example code with <1k points #404

MDCHAMP opened this issue Oct 27, 2023 · 6 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@MDCHAMP
Copy link

MDCHAMP commented Oct 27, 2023

Bug Report

GPJax version:

0.7.1

Current behavior:

Memory leak when calling gpx.fit in the "Simple example" code.

Expected behavior:

Example to run correctly.

Steps to reproduce:

Copy code from front page of gh "Simple example" and set n=1001`.

Other information:

Looking at the stacktrace it seems like the issue is coming from a side effct inside cola. Could this be related to a heuristic algortihm swap for operations greater than 1k points?

@MDCHAMP MDCHAMP added the bug Something isn't working label Oct 27, 2023
@MDCHAMP
Copy link
Author

MDCHAMP commented Oct 27, 2023

It actually seems like any standard dense GPR with more than 1k points is causing a memory leak in my environment? Was there a recent change on the cola side that could cause this?

@daniel-dodd
Copy link
Member

Thanks @MDCHAMP. Yes, it is my understanding the default (auto) algorithm for CoLA changes >1000 datapoints so this will likely be the issue. What version of CoLA do you have installed in your environment?

@MDCHAMP
Copy link
Author

MDCHAMP commented Oct 27, 2023

I'm using cola=0.0.1 as it came with gpjax. I tried it with 0.0.5 but it doesn't seem that gpjax currently supports that version?

@daniel-dodd
Copy link
Member

Thanks for the info @MDCHAMP. I have bumped CoLA tov0.0.5 in this PR #405. There's certainly a memory leak with default CoLA settings on >1000 datapoints given the docs fail to build on such examples. Tests pass fine otherwise. Will have a little dig around the traceback and open an issue over on CoLA for this shortly.

@MDCHAMP
Copy link
Author

MDCHAMP commented Oct 27, 2023

Ah that makes a lot of sense then. A current workaround is to not jit the objective function in the call to gpx.fit

@daniel-dodd
Copy link
Member

daniel-dodd commented Oct 27, 2023

Ooh that interesting. Maybe we are slightly at fault on that one. As briefly mentioned in #402, jitting objective functions before passing to fit seems little dodgy, as it is a class. I think removing the ability to do this, and rather only allowing to jit objective.step would be a better way to go. But for the fit abstraction, you could still take in your objective as normal (just no jit now). But I still think there's something interesting going on with CoLA - as on my M1 machine 999 datapoints seems fast (a few microseconds) and 1001 datapoints seems slow (3 mins) for jit(gpx.ConjugateMLL(negative=True).step). I just need to find what the issue is. But I reckon if we used the algorithm=Cholesky() for all of CoLA we will probably have a fix for now. (Edit: bencmarks are for the #405 branch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants