Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment: Wrap all capnp code in a context-manager to avoid segfaults #317

Merged
merged 2 commits into from
Oct 3, 2023

Conversation

LasseBlaauwbroek
Copy link
Contributor

@LasseBlaauwbroek LasseBlaauwbroek commented Jun 12, 2023

This is me trying some things out. I'm aware that @haata hasn't signed off on #316 yet.

The main goal of this PR is to program defenses into Pycapnp such that a segfault can never be triggered from Python code. To achieve this, I've used the following strategy:

  • Starting the event-loop has to be done through a context-manager async with capnp.kj_loop():.
  • Within the context-manager we keep track of (1) open AsyncIoStream's, (2) open TwoPartyServer's and TwoPartyClient's and (3) ongoing capability method calls.
  • When we exit from the context-manager, we check if any of the above are still active. If so, we
    • Close the open streams. This should cancel most of the pending promises in the KJ loop.
    • We also destroy the C++ RpcSystem and VatNetwork associated with any servers and clients (without actually killing the python-level server and client objects, but they do become unusable). This is because when you destroy these C++ objects, they schedule a task in the KJ loop. So we have to destroy these objects before destroying the loop.
    • We cancel any pending capability method calls, which might have promises running in the KJ loop.
  • At this point, I believe that no new tasks can be scheduled in the KJ loop (because everything has been cancelled and destroyed).
  • We then run the KJ loop until it is empty and destroy it.
  • Finally, for any python object that might cause a new task to be scheduled on the KJ loop, we add a guard to check that the loop is actually running. For example, when the context-manager is closed, one might still have CapabilityClient objects around that are backed by a closed TwoPartyClient. We need to guard any method call on those to ensure we don't segfault.

I've added a bunch of tests that used to segfault. Most likely there are more, but my theory is that with the current approach, we can solve all of those.

Feedback from @haata and @kentonv is appreciated on the validity of this approach.

Fixes #316

@fabiorossetto
Copy link

We have encountered what seems to be a related issue. We develop a C++ library that links statically to capnproto. This library can be used in Python as well through Python bindings. It seems that when we import our library before pycapnp, the event loop is not created and pycanp encounters a null point dereference when getting the event loop.

Having an explicit context instead of relying on global thread variables (as it seems to be the case for the event loop), would probably help us.

@LasseBlaauwbroek
Copy link
Contributor Author

After getting a ping from @tobiasah: I believe that this is ready for review/merging. After this is merged, and #323 is fixed, I'd suggest making a 2.0~beta release.

@haata haata merged commit e13a0c9 into capnproto:master Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Wrap the kj event loop in a context manager
3 participants