Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Lagrange bases caching on the web #1906

Merged
merged 2 commits into from
Nov 20, 2024
Merged

Fix Lagrange bases caching on the web #1906

merged 2 commits into from
Nov 20, 2024

Conversation

mitschabaude
Copy link
Collaborator

@mitschabaude mitschabaude commented Nov 19, 2024

fixes #1891

changes are in the bindings PR: o1-labs/o1js-bindings#313

EDIT:
a frustrating aspect of this fix is that the problem with compiling recursive programs in the browser must have been present for at least a year, if I'm not mistaken -- ever since Lagrange basis caching was introduced.

Ideally, following up this PR we would add a test that catches such a regression. We do currently have web tests but they don't use recursion.

@45930
Copy link
Contributor

45930 commented Nov 19, 2024

@mitschabaude you're a hero

a frustrating aspect of this fix is that the problem with compiling recursive programs in the browser must have been present for at least a year, if I'm not mistaken -- ever since Lagrange basis caching was introduced.

Yeah, caching never worked until 1.9.0, which introduced the performance bug. Then in 1.9.1 it worked well in node but still not in web. Going forward we need to be more diligent about testing our API in every way a user might actually use it. Since you saved me a lot of time in trying to track down this bug, I can re-allocate it to adding more tests :)

@mitschabaude
Copy link
Collaborator Author

Yeah, caching never worked until 1.9.0, which introduced the performance bug. Then in 1.9.1 it worked well in node but still not in web.

This is not true AFAICT, caching always worked in node.

In node, we only had the edge case of failing verification when lagrange bases hadn't been computed before. This didn't involve caching but was a problem exposed by caching, since caching made o1js skip computation of the thing verify() had relied on.

@kadirchan
Copy link
Contributor

In node (o1js 1.9.1), I faced recursive proving stuck without using any cpu or giving any error on proving phase and only workaround I found is restarting app as I said #1890. On second run programs compile faster and I can run recursive proofs.

@mitschabaude
Copy link
Collaborator Author

mitschabaude commented Nov 19, 2024

In node (o1js 1.9.1), I faced recursive proving stuck without using any cpu or giving any error on proving phase and only workaround I found is restarting app as I said #1890. On second run programs compile faster and I can run recursive proofs.

I'm not sure why it gets stuck, but it could be memory related. My idea for why it works after restart is that after the first run, you're already done compiling and compilation artifacts are used from the cache. This uses much less memory than full compiling, so you have more memory available for proving on the second try

@kadirchan
Copy link
Contributor

Tried on 16 and 32 GB ram vps and it stuck on both. In first run docker container uses around 5-6 GB of ram and stuck on proving phase, gives no error and almost 0% ram usage. After restarting containers ram usage reduced to around 4 GB and they able to run proving.

@mitschabaude
Copy link
Collaborator Author

Tried on 16 and 32 GB ram vps and it stuck on both. In first run docker container uses around 5-6 GB of ram and stuck on proving phase, gives no error and almost 0% ram usage. After restarting containers ram usage reduced to around 4 GB and they able to run proving.

ok! no idea what's happening there. but if it's on nodejs it can't be related to this PR, since here we only change the web version

@kadirchan
Copy link
Contributor

Maybe it's something that's always been there, I just notice it.

Copy link
Contributor

@45930 45930 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed the fix locally. The tradeoff with the writable cache case seems fine to me since the current behavior is silently failing in all cases, the 80/20 version is a strict upgrade.

@45930
Copy link
Contributor

45930 commented Nov 19, 2024

@boray will follow up with the test cases PR soon!

@45930
Copy link
Contributor

45930 commented Nov 19, 2024

(Leaving this unmerged to give Florian an opportunity to veto my approval, but we will target the next o1js release)

@45930 45930 merged commit 784f02d into main Nov 20, 2024
28 checks passed
@45930 45930 deleted the fix/lagrange-caching-web branch November 20, 2024 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compiling ZkProgram in the browser never ends
3 participants