Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document memory model for channel capability #89

Open
lthibault opened this issue Feb 24, 2023 · 4 comments
Open

Document memory model for channel capability #89

lthibault opened this issue Feb 24, 2023 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@lthibault
Copy link
Collaborator

Overview

Follow-up thoughts on some of the notes in the original HackMD file. I think you can make a convoluted case that this is kinda-sorta two-phase commit (2PC), but it's more an exercise in mental gymnastics something truly useful.

Instead, here is what I think the channel invariants are. As we progress on this work, I want us to keep a close eye on these, and be very deliberate about any changes we make to this list. It's essential that we have a clear understanding of the precise guarantees offered by channels, or they're going to make distributed systems harder to write, not easier!

Invariants

Ordering

To begin, channels should inherit the ordering invariants from Go's chan type. There are four invariants (identified by the italic typesetting):

  1. A send on a channel is synchronized before the completion of the corresponding receive from that channel.
  2. The closing of a channel is synchronized before a receive that returns a zero value because the channel is closed.
  3. A receive from an unbuffered channel is synchronized before the completion of the corresponding send on that channel.
  4. The kth receive on a channel with capacity C is synchronized before the completion of the k+Cth send from that channel completes.

I recommend reading the entire section (its short), as you'll ultimately be writing unit tests that directly assert these invariants.

We also have a Peek operation, so let's define ordering invariants on that:

  1. For the purposes of synchronization, a peek on a channel is treated as a receive.

Error Handling

Additionally, and because operations in distributed systems can fail, we have a set of error-handling invariants:

  1. Memory allocation errors on one side of a send-recv pair are not observed by the other.
  2. A send, receive, peek or close is synchronized before the RPC results are written.
  3. A canceled receive is not observed by a send.

N.B.: despite outward appearances, item 2 is an error-handling invariant because it implies that network errors are not considered when synchronizing channel operations. Put plainly: if you're sending a value and your connection suddenly drops, a consumer MAY (or may not) have consumed your value.

Note also that the ordering invariants apply to cancelations. Thus, canceled sends are synchronized before any receives (canceled or otherwise), and canceled receives are synchronized before the corresponding send returns. Put plainly: either the sender and receiver both succeed, or they both fail (barring network errors).

Conclusion

I want to stress that we need to actually think about these invariants, and not leave anything to chance. I encourage everyone (or indeed anyone) to raise any questions they may have, and to challenge any part of the above proposition that seems dubious.

@lthibault lthibault added the documentation Improvements or additions to documentation label Feb 24, 2023
@lthibault lthibault added this to the 0.1.0 Public Beta Release milestone Feb 24, 2023
@lthibault
Copy link
Collaborator Author

lthibault commented Feb 24, 2023

NOTE: As per discussion on Matrix, the server/receiver-side context of an RPC call expires in response to a Finish message. Thus, the caller has already moved on by the time we are notified.

@lthibault
Copy link
Collaborator Author

lthibault commented Feb 24, 2023

⚠️ TODO ⚠️

At the time this comment was written, #88 does not properly synchronize cancelations. Any cancelation that takes place after the the sender.recved chan is closed will be ignored, leading to dropped values. And even if return from Recv before the caller cancels, they may cancel while the results are in-flight, so I don't see any obvious way to solve this.

I think we need to decide what we want to guarantee, and then ask whether the current design can satisfy those guarantees. From the perspective of the sender, I think the guarantee should look something like this:

A nil error returned from Send indicates that the receiver observed my value.

Again, this currently isn't the case, and I'm not sure the current design can support it.

@lthibault
Copy link
Collaborator Author

lthibault commented Feb 25, 2023

I think we need to decide what we want to guarantee [...]

A nil error returned from Send indicates that the receiver observed my value.

Upon further reflection, a better way to phrase this is that we want to provide "at most once" delivery at the channel level, and leave it to the application to perform any retries/deduping.

One way to get this behavior is to implement SyncChan as a channel that transports channels, analogous to chan chan<- capnp.Ptr:

  1. Receivers send the inner chan<- capnp.Ptr through the outer channel
  2. Senders (a) receive the chan<- capnp.Ptr, and then (b) send the pointer.

If 2(b) succeeds, we know the receiver observed the value. If it fails, it may or may not have observed the value, and re-sending introduces the possibility of duplicates. Cancelations behave similarly, which is nice since we'd like to consider them as a special kind of error.

Note to future self: at first glance, it might seem like the existing SyncChan implementation provides the desired semantics, since errors introduce the possibility of duplicates. But remember, it is possible to lose messages and still observe a nil error! To remedy this, the channel-of-senders approach is needed.

The problem now is how to support peek 🤔. I expect we'll need to formalize the channel-of-senders approach into something more abstract, and figure out an implementation that doesn't rely on actual channels.

@lthibault
Copy link
Collaborator Author

lthibault commented Feb 25, 2023

Ok, I think I have it. The problem is actually with peek. It completely violates the assumption that send operations are atomic. In a synchronous channel, peek would allow you to observe the value of a sender that is blocked. In other words, it would allow you to observe intermediate state! By definition, atomic operations don't allow this.

A trivial (but ultimately unsatisfactory) workaround is to require that synchronous channels always return a zero-value capnp.Ptrs from peeks. On the surface, this can be justified by defining Peek to return the first value in the channel's buffer, after which we would smugly add, "Synchronous channels don't have buffers!" Unfortunately, there are two problems:

  1. It's confusing to users. It leaks implementation details, namely: the fact that asynchronous channels are implemented by a queue, and that synchronous channels are not. The API should define behavior, not implementation.
  2. It's confusing to developers. At least one future implementation is sure to get this wrong, and the consequences are rather severe.

In the end, applications can use groups of channels to create a "peekable" queue where this makes sense. We don't need to provide it as a first-class object. It's worth noting that neither Go, nor CSP/π-calculus define a peek operation, probably for this very reason.

tl;dr: nuke Peek. Nobody is using it anyway.

@evan-schott Being able to walk your advisor through this thought process is sure to score points. It allows you to explain the "why" behind our API choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants