-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying (async) generator behavior #18
Comments
My initial instinct is option 1,
I'd suggest that in the default case, it should be option 1 but we should have something defined that allows for option 2. |
I think Option 1 is the correct choice, because it matches the sync calls-stack behavior that
It's not too hard to define a wrapper that does this without you having to change your generator function: const it = wrapGen(gen());
it.next();
it.next();
function wrapGen(it) {
const getNext = AsyncContext.wrap(() => it.next());
if (it[Symbol.iterator]) {
return wrapSync(it, getNext);
}
return wrapAsync(it, getNext);
}
function* wrapSync(getNext) {
while (true) {
const next = getNext();
if (next.done) return next;
yield next;
}
}
async function* wrapAsync(getNext) {
while (true) {
const next = await getNext();
if (next.done) return next;
yield next;
}
} |
Actually, we'll need a bit of spec test to handle parallel calls to an async generator: const ctx = new AsyncContext();
async function* foo() {
// Make it so the gen doesn't complete sync
await sleep(100);
console.log(ctx.get());
yield;
console.log(ctx.get());
}
const it = ctx.run(1, () => foo());
await ctx.run(2, () => it.next());
await ctx.run(3, () => it.next()); Which combination of
Well, currently it logs const it = ctx.run(1, () => foo());
await Promise.all([
ctx.run(2, () => it.next()),
ctx.run(3, () => it.next()),
]); This one logs The queuing behavior of async generators is a little weird. It maintains its own queue of promises, but doesn't use |
During today's meeting, we're aligned with having generators record the context at the initialization time, and having every |
To clarify, this would apply to both generators and async generators, or only the latter? |
Both |
While working on the spec text for this, I noticed there are various spec-created iterators which use the generators machinery under the hood through This can only be observed if there's some user code running inside the generator, but in the spec as it is right now, that is already the case for array iterators (and I think also regexp iterators) if some property has a getter: const asyncVar = new AsyncContext.Variable();
const array = [23, 34, 45];
Object.defineProperty(array, 1, {
get() {
return asyncVar.get();
}
});
const iter = asyncVar.run("foo", () => array.values());
array[0] = 12;
asyncVar.run("bar", () => {
console.log([...iter]); // [12, "foo", 45]
}); And with the iterator helper proposal, any iterator helper callbacks would run in the generator context: const asyncVar = new AsyncContext.Variable();
const array = [23, 34, 45];
const iter1 = array.values();
const iter2 = asyncVar.run("foo", () => {
return iter1.map(v => asyncVar.get());
});
asyncVar.run("bar", () => {
console.log(iter2.next()); // { value: "foo", done: false }
}); Is this expected? Or should we somehow exclude |
Would it be difficult to exclude? I don't think the behavior is problematic, but I would prefer consistency for the language's iterators. |
For sync iterators, it would need a flag on the generator so it doesn't switch to the snapshot active when the execution was last suspended. For async iterators, I don't know what would be needed yet. |
Wouldn't the alternative be to make language iterators that don't use internal generators to manually carry the async context? |
* Make sync and async generators restore the initialization context Closes #18. * Don't store the snapshots at yield points * Remove the .vscode folder * Assert that the snapshot hasn't changed inside the generator * Fix wrong assertions
Is it reasonable to reopen this issue in light of discussion on #75? To make this concrete, consider the following snippet: const v = new AsyncContext.Variable();
async function* foo() {
for (let i = 0;; i++) {
await 0;
console.log(v.get());
yield;
}
}
const it = v.run(1, () => foo());
v.run(2, () => it.next());
v.run(3, async () => {
await it.next();
it.next();
}); I posit two reasonable options for what this should log (independent of removing any combinations of
I understand from discussion on the recent issue (#75) that @littledan and @bakkot were in favor of option 1, and indeed I initially preferred it. On the other hand, @jridgewell and @Qard have argued for option 2. As I see it, some benefits of option 1 include
Benefits of option 2 include
I wasn't in the earlier discussions last year, but from what I can glean from the various meeting notes and matrix archives, it looks like "Committee prefers init time capture and restore" (from Jun 27 meeting) was in actuality a pretty weak preference? Given the trade-offs above (and feel free to expand on any that I missed), I suspect it might make sense to pivot and go with option 2 here? Or are there considerations I missed? Note about |
I hadn't seen @bakkot's example before, but I think it's flawed. Turning this into a "library takes a promise" example, we can see a similar context escape if the library ever needs to call back into user code: async function libraryTakingAPromise(promise) {
// Only the promise passed to us was wrapped, not the actual call to libraryTakingAPromise.
// So, by chaining on the promise (which escaped its context), we've lost that context.
await promise;
// No way for user code to recover the context id, because this runs outside the `ctx.run()`.
rerender();
}
libraryTakingAPromise(context.run('id', async () => {
await 0;
// Any chains on the async-fn's return promise will not propagate context id.
})); The appropriate way to write that would work for both promises and generators: context.run('id', () => {
libraryTakingAPromise((async () => {
await 0;
// The library is capable of chaining on this promise just fine.
})());
});
context.run('id', () => {
libraryTakingAnIterator(context.run('id', function* () {
// The library is capable of capturing context just fine.
// If it were sync calling `.next()` nothing would be needed.
// And if it was used `await`, then context would propagate already.
yield 0;
}));
}); |
For more context, I tried to do a GitHub search for code that's using |
I'm confused by this transformation. I don't know what |
IMHO, as a library vendor, I wouldn't expect the context to be bound through the flow of a generator - the same way I would not expect a variable in a try...finally (a sync context) to do so. Isn't the proposal about async context? Why not keep the rule that simple? Is there any convincing use case for alternative 1 that can't be solved in userland? |
I don't know what that means. What does try/finally have to do with anything? |
With try...finally, I mean a pattern for maintaining a sync context via a variable export let currentContext;
export function run(v, callback) {
const parentContext = currentContext;
try {
currentContext = v;
return callback();
} finally {
currentContext = parentContext;
}
} With this simplified implementation of sync context, a consumer can read the currentContext from anywhere, just like with the purpose of AsyncContext.run(). |
Sorry, my example wasn't great. But the point is that the library could reenter into user code, and due to the nesting the // This is Library code.
function render(ui: Promise<JSX>) {
// We wait for the ui to be ready.
const vdom = await ui;
// just imagine this does the normal calls to to a rerender a component tree…
}
// - - -
// This is User code
// Imagine we want to capture RAIL metric from click to rerender.
const rail = new AsyncContext.Variable();
function Button() {
// This measure would probably wouldn't be in Button, but this is just an example.
const span = rail.get();
if (span) {
// If we're rerendering, then we want to end the span now that we've painted
span.end();
}
const onclick = () => {
// We start the span during the user interation. But notice that `render` isn't wrapped,
// only the UI generation. That means the `await` in `render` will not inherit our
// context's span, and the rerender won't have a `span` variable to end.
render(rail.run(new Span(), async () => {
// Some work to generate a new UI… maybe a network request or whatever.
return <Button />;
}));
};
return <button onClick={onclick}>click me</button>;
} My point is that it's unlikely anyone intends to wrap only an async/generator function when they pass it to a library function. Instead, you'd wrap the entire call to the library function. In both async and generator functions, it'd be easy to propagate the context with that.
I don't see this as different. If I call an async function in the function* gen() {
console.log('before yield');
yield;
console.log('after yield');
}
const it = ctx.run('id', () => {
const it = gen();
it.next();
return it;
});
ctx.run('other', () => {
it.next();
});
async function asyncFn() {
console.log('before yeild');
}
const p = ctx.run('id', asyncFn);
ctx.run('other', async () => {
p.then(() => {
// "resuming" the generator in a different context.
console.log('after yield');
});
}); If you want a generator to keep the same context, then consume the generator within that context. Eg, call
I'm not sure how this relates to the generator discussion? |
I think this is our core disagreement. To me the behavior you've written for async functions is totally expected: it has |
Sorry for a bad example. The only thing I am trying to exemplify is that it would be easier to understand the expected behaviour if it was stick to propagating contexts around async calls only and not involve generators at all to have a simpler principle. I'm afraid having special context wrapping around generators will complicate things. I haven't seen a use case where this would be motivated, so if there is such, it would be nice to get an understanding of it. |
I do want to point out something: from an implementation perspective, having async/await but not generators snapshot state means you have to conditionally do snapshots in both initialization and
Option 1 would be much simpler to implement and wouldn't require any type-specific code (or patching async generator requests to add any fields):
I also expect it to be noticeably faster despite the memory load, due to less branching and the fact it's only a few instructions in both suspend and resume. |
Also, nodejs/node#42237 (comment) suggests Node at least still regards their current behavior (option 2 but IIUC with the Zalgo issue in #18 (comment)) as a bug, and that option 1 is the "intended" behavior. So there's a couple data points to consider. |
I suggest that when thinking about the correct behavior, we should consider async functions and sync generators as the two fundamental cases. The behavior of async generators should fall out from merging the above two cases (though it's of course worth validating that the derived behavior is still reasonable). Async functions is clear, so that leaves the question of what makes the most sense for sync generators. If I write function* gen() {
a();
yield;
b();
}
const it = v.run(1, gen);
v.run(2, () => it.next());
v.run(3, () => it.next()); which contexts do I expect within the body of the generator, when In considering this, it seems like the question becomes whether it's more important to make option 2 possible at all, or whether it's more important to make option 1 an easy and fast default. If we go with option 2, then option 1 is quite possible with a wrapper around the generator (as demonstrated in the nodejs issue). If we go with option 1, then option 2 becomes completely impossible. So, what are the relevant trade-offs to consider here? To @dead-claudia's point about performance - it's a little surprising to me (though still understandable) that the performance would be noticeably worse for async generators when it seems to naturally be doing less work. Snapshotting and restoring context for async functions is additional work on top of what would happen without Finally, a brief further word about async generators. When I searched for generators declared in the same file as |
@shicks To be clear, the perf impact, if it does exist, would most impact Also, this overhead can be easily avoided if you duplicate the code (say, via three different template instantiations), and JITs can of course generate specialized code for each one - this trades some minor CPU cost for a significant growth in binary size. Size-optimized runtimes running on lower-spec devices like low-end phones and (especially) microcontrollers won't be able to specialize away the overhead like this, though, especially if JIT compiling isn't an option. |
I looked into nodejs/node#42237, and the linked nice-grpc fix linked in open-telemetry/opentelemetry-js#2951 (comment). It seems people really are going to return the generator's iterator without consuming it themselves. 🫤 It also has code using I expect the performance of either choice to be trivial, I don't think it's something that we should consider. If we were to do init-time, how could we support call-time? Given that the generator may already expect an argument when calling function* gen(snapshot) {
snapshot.run(…);
}
let snapshot = AsyncContext.Snapshot();
const it = gen({
run(...args) {
return snapshot.run(...args);
}
});
it.next();
ctx.run(1, () => {
snapshot = AsyncContext.Snapshot();
});
it.next(); It's not great to support this way. If we really want to support it, maybe we could expose another way to start the generator (maybe |
I think it's important to consider this use case. One big problem with a I feel like it also moves the decision to the wrong place - generally I would expect the author of the function know what context it expects, rather than the caller - so I'd rather see the decision cemented at definition time, rather than call time (i.e. by wrapping the function somehow - this could be done with a decorator, maybe - though again, I don't see a great way to get the right default). Probably if the default were to snapshot at init and I wanted a generator that captured the snapshot, I'd end up writing something along the lines of function wrapNoContext(fn) {
return function() {
let snapshot;
const fauxSnapshot = {run: (...args) => snapshot.run(...args)};
const gen = fn.apply(this, [fauxSnapshot].concat(arguments));
return {
next: arg => (snapshot = new AsyncContext.Snapshot(), gen.next(arg)),
return: arg => (snapshot = new AsyncContext.Snapshot(), gen.return(arg)),
throw: arg => (snapshot = new AsyncContext.Snapshot(), gen.throw(arg)),
[Symbol.iterator]() { return this; },
};
};
} and the usage would look something like this: const gen = wrapNoContext(function*(snapshot, arg1, ...) {
...
yield;
... snapshot.run(var1.get()) ...
})
gen(arg1, ...) It's basically the same ideas as yours, but you can at least wrap up the usage sites to "just work". Unfortunately there's no way to have it just work inside the generator definition, and you still end up needing the extra parameter, and to bend over backwards to actually extract anything out of the context. @TerribleIdeaBot said:
|
Note this Node.js bug report where people seem to be asking for the restore-after-yield generator behavior: nodejs/node#52317 |
Do we have a conclusion for this thread? @jridgewell seems to have written for our April 2024 update slides that we're going to save/restore across |
Our agreement from last group meeting was that landing #77 was an acceptable compromise so that generators can use init-time by default. |
Form library developer perspective, I will return string or iterator of tokens for a AI tools. And I use async local storage to track which llm is being called and will do logging. It’s very to trick to handle iterator and I had to wrap every response (greatly there is function decorator in standard) So I will go with option 2. but another side this behavior looks like v8 inherited, there could have more breaking changes I think |
Related: nodejs/node#42237, open-telemetry/opentelemetry-js#2951
Should a generator or an async generator capture the async context when being created? Given:
The output options could be:
'first iter'
/'second iter'
, which is the context when the generator'snext()
is invoked.'init'
/'init'
, which is the context when the generator is being constructed.The text was updated successfully, but these errors were encountered: