Experiment: Wrap all capnp code in a context-manager to avoid segfaults #317

LasseBlaauwbroek · 2023-06-12T22:31:52Z

This is me trying some things out. I'm aware that @haata hasn't signed off on #316 yet.

The main goal of this PR is to program defenses into Pycapnp such that a segfault can never be triggered from Python code. To achieve this, I've used the following strategy:

Starting the event-loop has to be done through a context-manager async with capnp.kj_loop():.
Within the context-manager we keep track of (1) open AsyncIoStream's, (2) open TwoPartyServer's and TwoPartyClient's and (3) ongoing capability method calls.
When we exit from the context-manager, we check if any of the above are still active. If so, we
- Close the open streams. This should cancel most of the pending promises in the KJ loop.
- We also destroy the C++ RpcSystem and VatNetwork associated with any servers and clients (without actually killing the python-level server and client objects, but they do become unusable). This is because when you destroy these C++ objects, they schedule a task in the KJ loop. So we have to destroy these objects before destroying the loop.
- We cancel any pending capability method calls, which might have promises running in the KJ loop.
At this point, I believe that no new tasks can be scheduled in the KJ loop (because everything has been cancelled and destroyed).
We then run the KJ loop until it is empty and destroy it.
Finally, for any python object that might cause a new task to be scheduled on the KJ loop, we add a guard to check that the loop is actually running. For example, when the context-manager is closed, one might still have CapabilityClient objects around that are backed by a closed TwoPartyClient. We need to guard any method call on those to ensure we don't segfault.

I've added a bunch of tests that used to segfault. Most likely there are more, but my theory is that with the current approach, we can solve all of those.

Feedback from @haata and @kentonv is appreciated on the validity of this approach.

Fixes #316

fabiorossetto · 2023-07-17T09:51:27Z

We have encountered what seems to be a related issue. We develop a C++ library that links statically to capnproto. This library can be used in Python as well through Python bindings. It seems that when we import our library before pycapnp, the event loop is not created and pycanp encounters a null point dereference when getting the event loop.

Having an explicit context instead of relying on global thread variables (as it seems to be the case for the event loop), would probably help us.

LasseBlaauwbroek · 2023-10-03T15:57:18Z

After getting a ping from @tobiasah: I believe that this is ready for review/merging. After this is merged, and #323 is fixed, I'd suggest making a 2.0~beta release.

LasseBlaauwbroek added 2 commits June 13, 2023 00:10

Experiment: Wrap all capnp code in a context-manager

5b9fb19

Fix segfault in on_disconnect

fd9f8ac

LasseBlaauwbroek force-pushed the kj-context-manager branch from 7f745f7 to fd9f8ac Compare June 12, 2023 23:37

fabiorossetto mentioned this pull request Jul 17, 2023

Proposal: Wrap the kj event loop in a context manager #316

Closed

haata merged commit e13a0c9 into capnproto:master Oct 3, 2023

LasseBlaauwbroek mentioned this pull request Oct 3, 2023

Fix broken test in test_load #329

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Wrap all capnp code in a context-manager to avoid segfaults #317

Experiment: Wrap all capnp code in a context-manager to avoid segfaults #317

LasseBlaauwbroek commented Jun 12, 2023 •

edited

Loading

fabiorossetto commented Jul 17, 2023

LasseBlaauwbroek commented Oct 3, 2023

Experiment: Wrap all capnp code in a context-manager to avoid segfaults #317

Experiment: Wrap all capnp code in a context-manager to avoid segfaults #317

Conversation

LasseBlaauwbroek commented Jun 12, 2023 • edited Loading

fabiorossetto commented Jul 17, 2023

LasseBlaauwbroek commented Oct 3, 2023

LasseBlaauwbroek commented Jun 12, 2023 •

edited

Loading