Add is_closed and closed operations to Lwt_stream #223

seliopou · 2016-03-13T01:04:52Z

Previously, the Lwt_stream API provided no reliable way to detect if a stream had been closed. While on_termination notifies the user if the stream is closed in the future, it does not tell the user if the stream was closed in the past.

Introducing the is_closed function addresses this problem, allowing the user to detect if a stream has been closed at any point in time using code like the following:

if Lwt_stream.is_closed stream then
  rest_of_code ()
else
  Lwt_stream.on_termination rest_of_code

Alternative to this pattern, one could use the Lwt_stream.close operation to get a thread that will sleep until the stream is closed:

Lwt_stream.closed stream >|= rest_of_code

Before this commit the Lwt_stream API provided no reliable way to detect if a stream had been closed. While on_termination notifies the user if the stream is closed in the future, it does not tell the user if the stream was closed in the past. Introducing the is_closed function addresses this problem, allowing the user to detect if a stream has been closed at any point in time.

This function returns a thread that will sleep until the stream is closed. The thread can be used in place of if_closed and on_terminate to detect when a stream is closed at any point in time.

rneswold · 2016-03-14T18:55:43Z

Shouldn't Lwt_stream.on_termination call your handler if the stream is already closed? I know LWT is cooperatively multithreaded but, when SMP OCaml becomes available, your construct becomes a bug since it isn't atomic.

aantron · 2016-04-29T20:46:09Z

I would expect on_termination to have this behavior (same as the proposed close), but it doesn't. @seliopou, mind changing the PR to update on_termination instead of adding new functions?

@diml As original author of on_termination, can you say if we're overlooking anything?

I checked depending OPAM packages. It seems that only cohttp uses this function, and it doesn't depend on this behavior being either way.

ghost · 2016-05-02T09:49:29Z

src/core/lwt_stream.ml

-    hooks = ref [];
+  let closed, closed_w = Lwt.wait () in
+  let mark_closed () =
+    if not (Lwt.is_sleeping closed) then Lwt.wakeup closed_w () in


It would be clearer to explicitly wakeup the closed thread at the end of the stream rather than rely on the execution of this hook. Plus with the current code, I think you get Lwt_stream.is_closed stream = false inside on_termination callbacks given that this hook is executed last. Maybe that was the indented behavior? In any case it'd be good to have tests for this in tests/core/test_lwt_stream.ml.

If you want to avoid storing both the Lwt.t and the Lwt.u inside the Lst_stream.t, you can store only the latter and define closed as follow, which is a no-op:

let closed s = Lwt.waiter_of_wakener s.closed

How I respond to this comment depends on how—if at all—the semantics of on_termination will change in this pull request, so I'll hold off for now.

ghost · 2016-05-02T10:03:41Z

The proposed change in behavior for Lwt_stream.on_termination seems right to me, it's more in line with what Lwt.on_termination does. I would still add the closed and is_closed functions though; we use these two functions everywhere in async and they have proven to be useful.

@rneswold, when you write Lwt code, you implicitly make the assumption that the code is running on a single system thread. You should definitely not try to use the same Lwt_stream.t from multiple system threads

The hooks will be called at the appropriate time once `f` finally returns `None`.

seliopou · 2016-05-02T14:46:00Z

The reason I did not modify on_termination to have this behavior is because I would consider that an API change that justifies a major version bump. If that's not the view of the maintainers, then I can change this pull request to add closed and is_closed, and in addition, change the behavior of on_termination.

aantron · 2016-05-02T14:58:38Z

depends on how—if at all—the semantics of on_termination will change in this pull request

I think you should change the semantics. If we need to do a major version because of it, we will. There are also several other changes that can be included at that point.

This could also be considered a bug fix. The documentation makes no claims about the behavior of on_termination on streams that have already been closed, and at least some users seem to have assumed that it would have the "new" behavior. And, I did a search for whether at least OPAM packages depend on the "old' behavior, and it doesn't seem that they do. That doesn't cover private code, but it does give some idea of the likelihood that this change will break code.

seliopou · 2016-05-02T16:37:49Z

If it's the case that there's no coyness about releasing new major versions (which would be great!), and that there's agreement that closed and is_closed should be included alongside on_termination, then on_termination should just be dropped entirely in favor of closed and is_closed. Several reasons for this. First, parsimony. Second, closed and is_closed is enough to reproduce both the "broken" and "non-broken" on_termination behavior. Third, using closed gives the user much more control over blocking and error recovery.

ghost · 2016-05-02T17:24:44Z

I agree with @seliopou. I can't remember why I added Lwt_stream.on_terminate in the first place, but closed is clearly better

aantron · 2016-05-02T17:27:25Z

I also agree, on_termination doesn't seem to fit in with the rest of Lwt well (it should have produced a thread). But this will affect at least Cohttp, so cc @rgrinberg.

seliopou · 2016-05-02T17:44:30Z

Then barring any new issues that have yet to surface, I will commit a patch to this branch that includes closed, is_closed, and on_termination, with on_termination's implementation being in terms of closed. I don't know what the release plans are, so I'll leave it to the maintainers to deprecate and remove on_termination at the appropriate time. After that commit, it's my expectation that this PR will be ready to be merged.

I am interested in this feature solely for use in cohttp, so @rgrinberg if you would like any assistance, let me know in the usual places.

rgrinberg · 2016-05-02T17:55:37Z

I agree with is_closed being better than on_{terminate,terminatinon}.

Why remove on_termination rather than on_terminate? on_terminate is the one that's named inconsistently

aantron · 2016-05-02T17:57:04Z

I think we mean to remove (first deprecate) both of them.

seliopou · 2016-05-02T17:57:58Z

I was speaking of them collectively. Both of them should get the boot. I'll edit my post to refer to on_termination only (currently, the non-deprecated one), to make that a bit clearer.

rgrinberg · 2016-05-02T17:58:47Z

OK cool. Kill 'em both.

As per the discussion on the issue tracker, Lwt_stream.on_termination will be deprecated and ultimately removed in favor for Lwt_stream.closed. To facilitate that, this commit introduces a new implementation of Lwt_stream.closed that does not rely on termination hooks, and implementation Lwt_stream.on_termination in terms of Lwt_Stream.closed.

The Lwt_stream module had multiple copies of low-level enqueueing code of two varieties copied in serveral places.This commit moves one variety of enqueing code into a single function called enqueue', and implements the second variety as the function enqueue, in terms of enqueue'.

Previously, the of_list, of_array, and of_string constructors each would produce a stream s for which is_closed s = false This is counter-intuitive as all the elements of the stream s have been generated and should be consumable without blocking. This commit changes the behavior of this constructors such that is_closed s = true immediately upon creation.

seliopou · 2016-05-03T16:59:33Z

This should be ready to go. The last commit may be controversial, but from my perspective is the sensible way to treat those sorts of steams. So please review and discuss if necessary.

aantron · 2016-05-04T16:14:45Z

@seliopou, AFAICT your last commit in this PR notionally "restores" semantics to of_list, of_array, and of_string that you removed in 398b953 in this PR, by removing them from from_direct. Granted, that was a bit different, because hooks hadn't been replaced yet, but the meaning was analogous nonetheless. It seems to me that from_direct also creates a stream that doesn't block in the Lwt sense, so it would be better to wake the "closed" thread there.

seliopou · 2016-05-04T16:32:05Z

You are correct, however in general streams constructed by from_direct should not close the stream. If the generator function never returns None, doing so would result in an infinite loop.

aantron · 2016-05-04T20:52:29Z

Indeed, the difference is in which side effects you can observe after marking a stream "closed" early. With of_list, etc., you can observe page faults, page dirty bits, execution time, breakpoints, out-of-memory errors, among others. With from_direct, you can additionally observe non-termination of the stream, or even a single call to the function, or any other side effect that can be caused by an OCaml function.

It's not clear to me that either from_direct or of_list, etc., should mark a stream closed early in the first place. But, if they should have different behavior from each other, which side effects do we care about, that we use to decide this? Perhaps @diml can comment on why from_direct originally called the hooks (even though it seems that could not have had any effect, due to the way it was implemented).

The documentation for is_closed states that reads are guaranteed not to block, perhaps we should change it to "not to block and to terminate?"

It also seems odd to me that streams created with of_list and friends would get this special treatment, but if I had some data type in a custom library, that could be turned into a stream using from_direct, there would be no way to give it the same special treatment with respect to closed (except of course by converting to an intermediate array or list).

seliopou · 2016-05-04T21:32:50Z

Point of clarification: from_direct never called the termination hooks before this pull request. That was an erroneous attempt to introduce the closing behavior into of_list, etc. that overreached and that @diml picked up on while reviewing. from_direct should definitely not close the stream immediately upon construction.

aantron · 2016-05-04T21:40:21Z

Oops, understood. Perhaps we can do this, but why these special cases? Streams may contain a finite number of elements and not block readers, without being closed, in general.

seliopou · 2016-05-04T21:59:47Z

It's not just that the stream length is finite, but that length, and each element that contributes to that length, is known by the constructor. No computation needs to be done in order to generate the streams, whereas using from_direct implies that generating each element of the stream will possibly require work (and in the case of non-termination, a lot of work).

You could get the same behavior for some type in a module M that exports a to_generator function and some notion of length with a function along the lines of this:

let to_stream t =
  let st = Lwt_stream.from_direct (M.to_generator t) in
  ignore (Lwt_stream.npeek (M.length t) st);
  assert (Lwt_stream.is_closed st);
  st

The code doesn't take this approach in of_list, etc., for the sake of efficiency, but that is essentially what it's doing. All the elements are ready to go, so might as well queue them up let the list spine, etc. get collected.

It's also worth noting that this is the same behavior that Pipe.t's have in async_kernel.

aantron · 2016-05-04T22:27:32Z

But computation does need to be done to generate the streams, at the least memory allocation, because Lwt's of_list, as currently implemented, is lazy. AFAICT, as a result, if you take an enormous list, convert it to a stream, clone the stream, and then try to read only one of the clones to the end, you can run out of memory. I think this side effect also matters.

For comparison, Async's of_list is, again AFAICT, eager, so you will see a memory problem during construction. There, it makes sense to mark the stream closed. Indeed, AFAICT it is implemented the way you suggest.

I'm open to proposals for changing the implementation of_list and friends, providing new interfaces for more sensible interactions with is_closed, or others, but I think we should move this to a separate issue. So far, I think the proposal in this PR doesn't really address the issue, but complicates the meaning of is_closed. Right now, it means "the stream was actually built up all the way to None, or prepared with one of three special functions in Lwt_stream," and there are observable non-trivial differences between the two.

TL;DR I don't think this is suitable for a lazy of_list (and friends).

seliopou · 2016-05-04T22:35:18Z

Clones share queues, but walk them independently, so I don't think your observation is correct. It's the same behavior.

aantron · 2016-05-04T22:36:58Z

Precisely because they share queues and walk them independently, the entire queue will have to be retained in memory as a "copy" of the large list. Is that correct?

seliopou · 2016-05-04T23:03:23Z

src/core/lwt_stream.ml

+let feed_direct f last close  =
+  let x = f () in
+  (* Push the element to the end of the queue. *)
+  enqueue' x last;


@aantron this is pushing elements onto the queue. feed_direct is called in a loop in from_fixed.

aantron · 2016-05-04T23:06:41Z

Thank you, saw feed_direct. However, why not instead provide a function that can close a stream, and keep of_list lazy for performance, for users who don't need to have the stream pre-allocated (and closed)?

This reverts commit 47cd91f.

seliopou added 3 commits March 12, 2016 19:47

add closed to Lwt_stream.t

c3e5aa3

This function returns a thread that will sleep until the stream is closed. The thread can be used in place of if_closed and on_terminate to detect when a stream is closed at any point in time.

fix typos and punctuation in Lwt_stream docstrings

b1fbc72

ghost reviewed May 2, 2016
View reviewed changes

do not call termination hooks in from_direct constructor

398b953

The hooks will be called at the appropriate time once `f` finally returns `None`.

seliopou added 4 commits May 2, 2016 14:43

improve Lwt_stream.is_closed documentation

b9aed80

aantron mentioned this pull request May 4, 2016

Rebased and reorganized #223 for Jenkins #238

Merged

clarify Lwt_stream.is_closed documentation

6aec9a6

seliopou reviewed May 4, 2016
View reviewed changes

Revert "change close semantics for fixed-length sources"

15001ef

This reverts commit 47cd91f.

seliopou mentioned this pull request May 5, 2016

change close semantics for fixed-length sources #239

Closed

aantron closed this in #238 May 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add is_closed and closed operations to Lwt_stream #223

Add is_closed and closed operations to Lwt_stream #223

seliopou commented Mar 13, 2016 •

edited by ghost

Loading

rneswold commented Mar 14, 2016

aantron commented Apr 29, 2016

ghost May 2, 2016

seliopou May 2, 2016

ghost commented May 2, 2016

seliopou commented May 2, 2016

aantron commented May 2, 2016

seliopou commented May 2, 2016

ghost commented May 2, 2016

aantron commented May 2, 2016

seliopou commented May 2, 2016 •

edited

Loading

rgrinberg commented May 2, 2016

aantron commented May 2, 2016

seliopou commented May 2, 2016

rgrinberg commented May 2, 2016

seliopou commented May 3, 2016 •

edited

Loading

aantron commented May 4, 2016

seliopou commented May 4, 2016

aantron commented May 4, 2016 •

edited

Loading

seliopou commented May 4, 2016

aantron commented May 4, 2016

seliopou commented May 4, 2016

aantron commented May 4, 2016

seliopou commented May 4, 2016

aantron commented May 4, 2016

seliopou May 4, 2016

aantron commented May 4, 2016

Add is_closed and closed operations to Lwt_stream #223

Add is_closed and closed operations to Lwt_stream #223

Conversation

seliopou commented Mar 13, 2016 • edited by ghost Loading

rneswold commented Mar 14, 2016

aantron commented Apr 29, 2016

ghost May 2, 2016

Choose a reason for hiding this comment

seliopou May 2, 2016

Choose a reason for hiding this comment

ghost commented May 2, 2016

seliopou commented May 2, 2016

aantron commented May 2, 2016

seliopou commented May 2, 2016

ghost commented May 2, 2016

aantron commented May 2, 2016

seliopou commented May 2, 2016 • edited Loading

rgrinberg commented May 2, 2016

aantron commented May 2, 2016

seliopou commented May 2, 2016

rgrinberg commented May 2, 2016

seliopou commented May 3, 2016 • edited Loading

aantron commented May 4, 2016

seliopou commented May 4, 2016

aantron commented May 4, 2016 • edited Loading

seliopou commented May 4, 2016

aantron commented May 4, 2016

seliopou commented May 4, 2016

aantron commented May 4, 2016

seliopou commented May 4, 2016

aantron commented May 4, 2016

seliopou May 4, 2016

Choose a reason for hiding this comment

aantron commented May 4, 2016

seliopou commented Mar 13, 2016 •

edited by ghost

Loading

seliopou commented May 2, 2016 •

edited

Loading

seliopou commented May 3, 2016 •

edited

Loading

aantron commented May 4, 2016 •

edited

Loading