Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add is_closed and closed operations to Lwt_stream #223

Closed
wants to merge 10 commits into from
Closed

Add is_closed and closed operations to Lwt_stream #223

wants to merge 10 commits into from

Conversation

seliopou
Copy link
Contributor

@seliopou seliopou commented Mar 13, 2016

Previously, the Lwt_stream API provided no reliable way to detect if a stream had been closed. While on_termination notifies the user if the stream is closed in the future, it does not tell the user if the stream was closed in the past.

Introducing the is_closed function addresses this problem, allowing the user to detect if a stream has been closed at any point in time using code like the following:

if Lwt_stream.is_closed stream then
  rest_of_code ()
else
  Lwt_stream.on_termination rest_of_code

Alternative to this pattern, one could use the Lwt_stream.close operation to get a thread that will sleep until the stream is closed:

Lwt_stream.closed stream >|= rest_of_code

Before this commit the Lwt_stream API provided no reliable way to detect
if a stream had been closed. While on_termination notifies the user if
the stream is closed in the future, it does not tell the user if the
stream was closed in the past.

Introducing the is_closed function addresses this problem, allowing the
user to detect if a stream has been closed at any point in time.
This function returns a thread that will sleep until the stream is
closed. The thread can be used in place of if_closed and on_terminate to
detect when a stream is closed at any point in time.
@rneswold
Copy link
Contributor

Shouldn't Lwt_stream.on_termination call your handler if the stream is already closed? I know LWT is cooperatively multithreaded but, when SMP OCaml becomes available, your construct becomes a bug since it isn't atomic.

@aantron
Copy link
Collaborator

aantron commented Apr 29, 2016

I would expect on_termination to have this behavior (same as the proposed close), but it doesn't. @seliopou, mind changing the PR to update on_termination instead of adding new functions?

@diml As original author of on_termination, can you say if we're overlooking anything?

I checked depending OPAM packages. It seems that only cohttp uses this function, and it doesn't depend on this behavior being either way.

hooks = ref [];
let closed, closed_w = Lwt.wait () in
let mark_closed () =
if not (Lwt.is_sleeping closed) then Lwt.wakeup closed_w () in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be clearer to explicitly wakeup the closed thread at the end of the stream rather than rely on the execution of this hook. Plus with the current code, I think you get Lwt_stream.is_closed stream = false inside on_termination callbacks given that this hook is executed last. Maybe that was the indented behavior? In any case it'd be good to have tests for this in tests/core/test_lwt_stream.ml.

If you want to avoid storing both the Lwt.t and the Lwt.u inside the Lst_stream.t, you can store only the latter and define closed as follow, which is a no-op:

let closed s = Lwt.waiter_of_wakener s.closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How I respond to this comment depends on how—if at all—the semantics of on_termination will change in this pull request, so I'll hold off for now.

@ghost
Copy link

ghost commented May 2, 2016

The proposed change in behavior for Lwt_stream.on_termination seems right to me, it's more in line with what Lwt.on_termination does. I would still add the closed and is_closed functions though; we use these two functions everywhere in async and they have proven to be useful.

@rneswold, when you write Lwt code, you implicitly make the assumption that the code is running on a single system thread. You should definitely not try to use the same Lwt_stream.t from multiple system threads

The hooks will be called at the appropriate time once `f` finally
returns `None`.
@seliopou
Copy link
Contributor Author

seliopou commented May 2, 2016

The reason I did not modify on_termination to have this behavior is because I would consider that an API change that justifies a major version bump. If that's not the view of the maintainers, then I can change this pull request to add closed and is_closed, and in addition, change the behavior of on_termination.

@aantron
Copy link
Collaborator

aantron commented May 2, 2016

depends on how—if at all—the semantics of on_termination will change in this pull request

I think you should change the semantics. If we need to do a major version because of it, we will. There are also several other changes that can be included at that point.

This could also be considered a bug fix. The documentation makes no claims about the behavior of on_termination on streams that have already been closed, and at least some users seem to have assumed that it would have the "new" behavior. And, I did a search for whether at least OPAM packages depend on the "old' behavior, and it doesn't seem that they do. That doesn't cover private code, but it does give some idea of the likelihood that this change will break code.

@seliopou
Copy link
Contributor Author

seliopou commented May 2, 2016

If it's the case that there's no coyness about releasing new major versions (which would be great!), and that there's agreement that closed and is_closed should be included alongside on_termination, then on_termination should just be dropped entirely in favor of closed and is_closed. Several reasons for this. First, parsimony. Second, closed and is_closed is enough to reproduce both the "broken" and "non-broken" on_termination behavior. Third, using closed gives the user much more control over blocking and error recovery.

@ghost
Copy link

ghost commented May 2, 2016

I agree with @seliopou. I can't remember why I added Lwt_stream.on_terminate in the first place, but closed is clearly better

@aantron
Copy link
Collaborator

aantron commented May 2, 2016

I also agree, on_termination doesn't seem to fit in with the rest of Lwt well (it should have produced a thread). But this will affect at least Cohttp, so cc @rgrinberg.

@seliopou
Copy link
Contributor Author

seliopou commented May 2, 2016

Then barring any new issues that have yet to surface, I will commit a patch to this branch that includes closed, is_closed, and on_termination, with on_termination's implementation being in terms of closed. I don't know what the release plans are, so I'll leave it to the maintainers to deprecate and remove on_termination at the appropriate time. After that commit, it's my expectation that this PR will be ready to be merged.

I am interested in this feature solely for use in cohttp, so @rgrinberg if you would like any assistance, let me know in the usual places.

@rgrinberg
Copy link
Contributor

I agree with is_closed being better than on_{terminate,terminatinon}.

Why remove on_termination rather than on_terminate? on_terminate is the one that's named inconsistently

@aantron
Copy link
Collaborator

aantron commented May 2, 2016

I think we mean to remove (first deprecate) both of them.

@seliopou
Copy link
Contributor Author

seliopou commented May 2, 2016

I was speaking of them collectively. Both of them should get the boot. I'll edit my post to refer to on_termination only (currently, the non-deprecated one), to make that a bit clearer.

@rgrinberg
Copy link
Contributor

OK cool. Kill 'em both.

As per the discussion on the issue tracker, Lwt_stream.on_termination
will be deprecated and ultimately removed in favor for
Lwt_stream.closed. To facilitate that, this commit introduces a new
implementation of Lwt_stream.closed that does not rely on termination
hooks, and implementation Lwt_stream.on_termination in terms of
Lwt_Stream.closed.
The Lwt_stream module had multiple copies of low-level enqueueing code
of two varieties copied in serveral places.This commit moves one variety
of enqueing code into a single function called enqueue', and implements
the second variety as the function enqueue, in terms of enqueue'.
Previously, the of_list, of_array, and of_string constructors each would
produce a stream s for which

  is_closed s = false

This is counter-intuitive as all the elements of the stream s have been
generated and should be consumable without blocking. This commit changes
the behavior of this constructors such that

  is_closed s = true

immediately upon creation.
@seliopou
Copy link
Contributor Author

seliopou commented May 3, 2016

This should be ready to go. The last commit may be controversial, but from my perspective is the sensible way to treat those sorts of steams. So please review and discuss if necessary.

@aantron
Copy link
Collaborator

aantron commented May 4, 2016

@seliopou, AFAICT your last commit in this PR notionally "restores" semantics to of_list, of_array, and of_string that you removed in 398b953 in this PR, by removing them from from_direct. Granted, that was a bit different, because hooks hadn't been replaced yet, but the meaning was analogous nonetheless. It seems to me that from_direct also creates a stream that doesn't block in the Lwt sense, so it would be better to wake the "closed" thread there.

@seliopou
Copy link
Contributor Author

seliopou commented May 4, 2016

You are correct, however in general streams constructed by from_direct should not close the stream. If the generator function never returns None, doing so would result in an infinite loop.

@aantron
Copy link
Collaborator

aantron commented May 4, 2016

Indeed, the difference is in which side effects you can observe after marking a stream "closed" early. With of_list, etc., you can observe page faults, page dirty bits, execution time, breakpoints, out-of-memory errors, among others. With from_direct, you can additionally observe non-termination of the stream, or even a single call to the function, or any other side effect that can be caused by an OCaml function.

It's not clear to me that either from_direct or of_list, etc., should mark a stream closed early in the first place. But, if they should have different behavior from each other, which side effects do we care about, that we use to decide this? Perhaps @diml can comment on why from_direct originally called the hooks (even though it seems that could not have had any effect, due to the way it was implemented).

The documentation for is_closed states that reads are guaranteed not to block, perhaps we should change it to "not to block and to terminate?"

It also seems odd to me that streams created with of_list and friends would get this special treatment, but if I had some data type in a custom library, that could be turned into a stream using from_direct, there would be no way to give it the same special treatment with respect to closed (except of course by converting to an intermediate array or list).

@seliopou
Copy link
Contributor Author

seliopou commented May 4, 2016

Point of clarification: from_direct never called the termination hooks before this pull request. That was an erroneous attempt to introduce the closing behavior into of_list, etc. that overreached and that @diml picked up on while reviewing. from_direct should definitely not close the stream immediately upon construction.

@aantron
Copy link
Collaborator

aantron commented May 4, 2016

Oops, understood. Perhaps we can do this, but why these special cases? Streams may contain a finite number of elements and not block readers, without being closed, in general.

@seliopou
Copy link
Contributor Author

seliopou commented May 4, 2016

It's not just that the stream length is finite, but that length, and each element that contributes to that length, is known by the constructor. No computation needs to be done in order to generate the streams, whereas using from_direct implies that generating each element of the stream will possibly require work (and in the case of non-termination, a lot of work).

You could get the same behavior for some type in a module M that exports a to_generator function and some notion of length with a function along the lines of this:

let to_stream t =
  let st = Lwt_stream.from_direct (M.to_generator t) in
  ignore (Lwt_stream.npeek (M.length t) st);
  assert (Lwt_stream.is_closed st);
  st

The code doesn't take this approach in of_list, etc., for the sake of efficiency, but that is essentially what it's doing. All the elements are ready to go, so might as well queue them up let the list spine, etc. get collected.

It's also worth noting that this is the same behavior that Pipe.t's have in async_kernel.

@aantron
Copy link
Collaborator

aantron commented May 4, 2016

But computation does need to be done to generate the streams, at the least memory allocation, because Lwt's of_list, as currently implemented, is lazy. AFAICT, as a result, if you take an enormous list, convert it to a stream, clone the stream, and then try to read only one of the clones to the end, you can run out of memory. I think this side effect also matters.

For comparison, Async's of_list is, again AFAICT, eager, so you will see a memory problem during construction. There, it makes sense to mark the stream closed. Indeed, AFAICT it is implemented the way you suggest.

I'm open to proposals for changing the implementation of_list and friends, providing new interfaces for more sensible interactions with is_closed, or others, but I think we should move this to a separate issue. So far, I think the proposal in this PR doesn't really address the issue, but complicates the meaning of is_closed. Right now, it means "the stream was actually built up all the way to None, or prepared with one of three special functions in Lwt_stream," and there are observable non-trivial differences between the two.

TL;DR I don't think this is suitable for a lazy of_list (and friends).

@seliopou
Copy link
Contributor Author

seliopou commented May 4, 2016

Clones share queues, but walk them independently, so I don't think your observation is correct. It's the same behavior.

@aantron
Copy link
Collaborator

aantron commented May 4, 2016

Precisely because they share queues and walk them independently, the entire queue will have to be retained in memory as a "copy" of the large list. Is that correct?

let feed_direct f last close =
let x = f () in
(* Push the element to the end of the queue. *)
enqueue' x last;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aantron this is pushing elements onto the queue. feed_direct is called in a loop in from_fixed.

@aantron
Copy link
Collaborator

aantron commented May 4, 2016

Thank you, saw feed_direct. However, why not instead provide a function that can close a stream, and keep of_list lazy for performance, for users who don't need to have the stream pre-allocated (and closed)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants