-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Chaining middleware using Promises #353
Comments
Thank you for your great proposal here. I'm also excited to seek ways to move this framework forward to cover broader use cases! I still may misunderstand some, but it seems the proposed changes are effective in addressing the known middleware issues mentioned. I've got a few questions below. I would like to know your further thoughts.
The first one is a concern from the perspective of Bolt users. The second is about the effectiveness as a solution for FaaS (or some may say serverless) support. Avoiding listener breaking changestl;dr - I'd like to seek ways to avoid breaking changes affecting all users If I understand correctly, the primary motivation of the changes is to address the current issues around middleware (#248 #239). I agree that it may be inevitable to bring breaking changes to fix them. Changing how to write middleware affects some users (I'm sorry for those people), but not all. Changing I know a listener is a kind of specialized middleware, but I'm still thinking ways to avoid introducing incompatibilities to listeners. Have you already verified it's impossible to keep Even if Bolt decides to change the communication between I haven't tried to implement your idea yet, but I think there are certainly some ways to achieve the goal without bringing changes to listeners. Safe way to run remote functionstl;dr - For FaaS (e.g., AWS Lambda, Google's Cloud Functions, etc.) support, providing ways to run functions remotely after request acknowledgment is a required piece. "Remotely" here means running functions on another node/container with no shared memory. Also, the mechanism should be beneficial not only for FaaS but for others, as Slack apps have to avoid 3-second timeouts in some ways. I believe we can assume that functions given to the On that premise, supporting closure functions may be challenging. Let's imagine the following situation. I have no idea to safely serialize the code along with deep-copied function/value references in its lexical scope. app.command("/hello", async ({ body, context, ack }) => {
// Bolt needs to serialize the following code to run in another Lambda (or other equivalent)
const localValue = await calculateSomething(body);
context.defer(() => {
return doSomethingWith(localValue)
.then(successHandler)
.catch(errorHandler);
});
return ack();
}) The above example may look a bit arbitrary to some. Let me show you another example. Probably, deferred functions tend to use some of the listener arguments. The arguments can be shallowed in the same lexical scope. That may induce more mistakes like directly referring to the app.command("/hello", async ({ body, ack }) => {
context.defer(({ context }) => {
// body here is a reference to the outer lexical scope
return doSomethingWith(context, body.text);
});
return ack();
}) Warnings like "You can not pass any closures to My idea for FaaS (also it's possibly applicable to any use cases)Let me share another option for FaaS support. I've already briefly shared this idea with @aoberoi and others before. Take a look at the following code example. If Bolt introduces this new way to define listeners, dealing with FaaS limitations or designing Slack apps free from 3-second timeouts can be much simpler. I believe the design looks straight-forward for Slack apps. const app = new TwoPhaseApp({
token: process.env.SLACK_BOT_TOKEN,
receiver: new AwsLambdaReceiver({
signingSecret: process.env.SLACK_SIGNING_SECRET
})
});
app.command('/lambda')
.ack(({ ack }) => {
// inside this method, no breaking changes
ack('this text will be posted immediately');
})
.then(async ({ body, say }) => {
// this function will be excuted in another `Event` type invocation using the same lambda function
return say('How are you?').then(() => say("I'm good!"));
});
// convert to AWS Lambda handler
export const main = app.receiver().toHandler(); This approach doesn't have the lexical scope issue. Even if the listener has references to values/functions (global ones, imported ones, etc), it's safe to access them. The downsides caused by introducing yet another
I may miss something about Ankur's idea. I would like to know others' comments on it. Also, I would love to know others' impressions/feedback on my prototype as well. If many are interested in the idea, I can create another GitHub issue to discuss it. |
Personally, I like the idea of of everything returning promises and chaining them together. That makes sense to me coming form a JS background. The migration steps @aoberoi described seem simple on first glance and potentially allow devs to leave their code as is and still work. Though I'm sure we will recommend everyone update their apps. Definitely going to be a good amount of work updating our own sample code + docs to this new model. |
Great job @aoberoi synthesizing these ideas. I'm generally less afraid of breaking changes than others, I think using them can actually be fairly effective. (Preferencing this: Only an opinion, I care significantly more about giving devs control over error handling and simpler more obvious testing approaches which is currently quite hard). I would lean towards switching the middleware function signature to be koa-styled for two reasons:
(Still an opinion, obviously maintainers choice) Attempting to make these changes without breaking the api I think is possible, but it would add a significant amount of code to bolt to maintain that backwards compatibility. If this were my project, I would opt for simpler code and breaking changes so that it would be easier to update and fix issues, and allow external folks to collaborate easily. Safe Error HandlingThis is one of my key concerns, in the current architecture, it's easy to create unhandled promise rejections. I think this proposal does a good job of addressing them, but there are a few things in contention that might leave problems. The big thing is this error from node:
Unhandled errors will prevent graceful shutdown and if the runtime doesn't reboot (which might not be a real thing now?) would cause the app to just be down. Complex systems fail in complex ways. (Very real) Hypothetical that would bring down a bolt app: My datacenter's network has a glitch, my database connection breaks, that library can't reconnect either by design choice (node does this a lot), or perhaps my Docker container simply needs a restart on a new node without this networking issue. If my database connection throws an error in the middleware stage, think oauth or fetching the correct bot token for a workspace, that exception will go unhandled and the app will stay up (on the port), so health checks pass, but would be functionally dead causing an outage. @seratch This leads me to your point here:
Putting There are also some benefits too, the Lastly, if a developer doesn't make this change, it actually wouldn't affect anything, bolt would behave exactly as it does now: The http response will be sent as soon as the event loop frees up, any errors will go unhandled, other processing will still happen assuming the response was sent correctly. Improved testing abilityFrom this proposal, I read an assumption of a "testing receiver"? Possibly bolt would publish a receiver that's designed to send events programmatically and then resolve for assertions after If so this comment is possibly concerning:
I agree that this seems right from a normal receiver's perspective. How do I test my error handling? It would be great to expect various errors. Slightly off-topic for the heading, but wouldn't it also make sense for a Performance concerns@aeoberoi I'm curious to dig more into the performance issues. I think there possibly is something I'm missing?
Possibly this is different than my current expectations of how bolt works, but apps can do this now right? Middleware currently can do asynchronous processing and let the middleware continue by calling I wouldn't expect this change to effect performance noticeably or at all. Possibly the only change would be how v8 now has to process a Promise vs a callback which I assume takes a few extra instructions, but even that could be incorrect as v8 is quite good at optimizing common patterns. Dispatch algorithmI think the description was correct, I'm going to write a code sample just to specify clearly what I'd expect to happen, taken largely from #337 // console logs indicate call order,
// astrix indicate parallel processing and therefor call order is unpredictable
app.use(async ({ next }) => {
try {
console.log('1');
await next();
console.log('8 if `next` resolves');
} catch (error) {
// This block would be functionally equivalent to `app.error`
console.error(error);
// Still throw, but this way we could have error handlers at every step of the middleware chain
if (!(error instanceof MyErrorType)) {
throw error;
}
}
});
app.use(async ({ next }) => {
console.log('2');
await next();
console.log('7');
});
app.command(
'/command',
async ({ next }) => {
// Some interesting middleware
console.log('3*');
await next();
console.log('6*');
},
async ({ say, ack }) => {
// Normal handler
console.log('4*');
await ack();
await say("Command response");
console.log('5*');
},
);
app.command(
'/another-command',
async ({ next }) => {
// Some interesting middleware
console.log('3*');
await next();
console.log('6*');
},
async ({ say, ack }) => {
// Normal handler
console.log('4*');
await ack();
await say("Command response");
console.log('5*');
},
);
app.error(async error => {
// For those using transports, have the ability to log the failure,
// flush any non-sent logs, then shutdown gracefully
log.error(error);
await log.flush();
process.exit(1);
}); ☝️ This aligning to how you see code working? |
@seratch thanks for the detailed review of the proposal! i'll address some of your points here:
You're right that we could choose not to break the listener API, but it seems prudent to me that we comprehensively "correct" the leaky abstraction we've built. We've chosen to hide the fact that an asynchronous operation is occurring when the listener invokes I don't agree that there is no benefit. The benefit is that errors during I think the biggest cost will be in the documentation effort and support effort of the maintainers. Let's be a little more specific about these costs:
I think we should continue to gather more input here. Let's keep this point open as a variation of the proposal. I'll update the description above to capture the variation, and let's work together to build a list of pros/cons for it. This should make it easier to answer the question "are the benefits worth the cost?".
You are totally right here. I did not think as carefully about the Maybe remote function execution with all the benefit of lexical scope capture is not feasible within this proposal. I think this proposal still offers a useful improvement for the problem of running Bolt apps on FaaS providers. The signal that would be gained regarding when the event is completely processed by middleware/listeners is valuable in the FaaS use case. But this proposal does not offer a comprehensive solution, because it doesn't add clarity to how an app should deal with asynchronous work within an FaaS context. There are likely a few ways to fill this gap, including the I think there's a substantial amount of added complexity to deliver |
@barlock thanks for the comments as well! i'll try to answer some of your questions.
I'd suggest you do something like this: const assert = require('assert');
// Example system under test.
// This listener throws an error when the action type is not 'button'
async function myListener({ action }) {
if (action && action.type !== 'button') {
throw new Error('myListener should only work with buttons');
}
// ...
}
// Example test suite
describe('myListener', () => {
it('should fail when action type is not button', async () => {
// Arrange
let caughtError;
const actionFixture = {
body: {
type: 'block_actions',
actions: [{
action_id: 'block_action_id'
}],
channel: {},
user: {},
team: {},
},
respond: noop,
ack: noop,
};
const app = new App({
token: '',
receiver: {}, // dummy receiver, could replace with a TestReceiver impl
});
app.error(e => caughtError = e);
// Act
app.action('block_action_id', myListener);
await app.processEvent(actionFixture);
// Assert
assert(caughtError instanceof Error);
});
});
async function noop() { } But if you trust that the
I don't think so. The value passed to
While listener middleware can perform async work before it decides to call
I didn't propose that |
Good catch. In my other prototype, I was thinking of changing |
Change to
|
You're approach works, the spec would simply need to allow for multiple error receivers. 👍 Context for my thinking: I agree this is generally a good approach and for error handling possibly it's the right approach, however, I find that it's useful to include as many happy path tests through as much of the code as possible. It helps a ton for catching bugs in libraries, quickly testing for compatibility of library upgrades (too many modules don't semver correctly), and ensuring that you're catching hard-to-find bugs at integration points between library and application code. Also, mocking code is a pain, it usually requires me to understand the library to mock it, which takes a bunch of time. Given how tightly coupled listener functions are to bolt, most pure function test cases end up "testing that the code is the code" (eg: I called ack once, and say with these args), rather than the business logic. |
What's next for making this proposal a reality? It would be great to start working on it. To summarize the points of agreement I'm seeing:
Things still needing discussion: (My commentary as subpoints)
Did I miss anything? If there are parts that are good to go already, what are the steps to getting code changes? |
Thanks for putting this all into a proposal @aoberoi ! All of the proposals around changes to make listeners and middleware async sound great to me. I was also particularly excited about this side-effect that I didn't even consider
|
Thanks for all the input everyone! At this point, this proposal is being accepted and will move on to implementation. I'll update the top comment to describe the final state, but here's a summary for historical record: Updates:
Non-changes:
|
Thoughts on 3:
I agree that the error handlers shouldn't be called at the same time. They should have an order. App developers should have a safe place to call It would though if (Forgive the psudo-code) async processEvent(event: ReceiverEvent) {
try {
verifySignature(event.body); // Throws if fails validation
composeMiddleware(
...this.globalMiddleware,
Promise.all(
this.listeners.map(listener => composeMiddleware(listener)),
),
);
} catch (err) {
// This needs a way to signal to `ack` that it should send a 500, not attached to this
// Also, the receiver would need to be able to handle multiple `ack` calls, only doing anything on the first call.
await event.ack(err);
this.globalErrorHandler(err);
throw err;
} This would mean that the receiver would probably only want to retrow in the event of an error, and my global error handler would probably look like: app.error((err) => {
if (err instanceof SomeErrorICanHandle) {
// handle it
} else if (receiver instanceof JestReceiver) { //or some way to know it's a test time
throw err; // lets my test Receiver reject and I can inspect it
} else {
process.exit(1); // Reboot the things
}
}) I don't love it, not sure why, but I can't think of anything else that works. Thoughts? |
An interesting note about discoveries when implementing this: #375 (comment) Not sure where to discuss, here or in PR. |
From my understanding, this is considered a bad practice and should not be the recommended way to signal the process to terminate (and your process manager to possibly restart). From the Node.js Docs:
I think we might be talking about two different scenarios here:
For a graceful exit which involves a Receiver that implements an HTTP server, the Receiver would need to be aware of the external signal. It needs to tell its own HTTP server that it should no longer accept any incoming connections, and ensure existing open connections are closed in a timely manner. Closing them doesn't necessarily mean forcing the HTTP response to indicate success. For example, if I want my server to gracefully exit within 1 second, then I'd likely want to terminate connections which take longer by using a non-200 response since the work was not complete and the server is closing. That would allow Slack to either retry the event or show the user an error. I think this is a good enhancement for the default For an unexpected exit, the advice about throwing an app.error(async (error) => {
// when my handler wants to signal that the entire program is in a bad state and should terminate,
// it will (re)throw a error with the globalFailure property set to true
if (error.globalFailure) {
// TODO: clean up global resources/handles
process.nextTick(() => {
// By indirecting through nextTick the call stack is outside the async function.
// The following throw will cause process termination.
throw new Error('Global Failure');
});
return;
}
// all other failures should only impact the processing of this event (more common)
// TODO: clean up any event-specific resources/handles. middleware has already gotten a chance to clean up any resources it may have allocated
// The following throw will reject the returned promise and allow the Receiver to determine what to do
// NOTE: the default ExpressReceiver will respond with a 5xx level HTTP status
throw new Error('Event Failure');
});
I've gotta be honest, I can't think of a use case for this. I think you're talking about an unexpected exit except, regardless of whether its a global failure or a failure for a single event, I can't see why you'd want the Receiver to complete normally. The last example seems to indicate that you want a way for a custom Receiver to emit errors that can be inspected by the developer after the receiver's done its job. I don't see any use cases where this is a general need (but I'd like to hear them). Would you be able to handle that with an API on your custom Receiver? Then your receiver could choose to use the rejection from const appOptions = { ... };
if (process.env.NODE_ENV === 'test') {
const receiver = new JestReceiver(...);
receiver.onFailure((error) => {
// some last resort processing and/or inspection
});
appOptions.receiver = receiver;
}
const app = new App(appOptions); |
Sorry that
What's interesting to me is that from the perspective of my runtime (currently k8s, but I think the same is true for lambda and heroku), restarts of my application are basically free. It's fast, and doesn't loose any uptime by doing it. So I actually view Really, all I'm looking for is that developers using bolt, not bolt the framework, should have the last say in error handling. In terms of why I'd want my http requests to complete normally, maybe
Testing is the only one I know of, I think it's a very important one however. Related to above, I want to shutdown on any unrecognized error. What about the case of a recognized error? I'd like to test that I'm handling it correctly.
I think so? I'm not sure exactly what your trying to discuss here. Could you leave comments on my pull request to implement this so we can discuss over actual code? It's much easier to follow along and more concrete. Possibly the best thing would be to implement failing test cases to show points of contention. I'm not following everything. Relevant lines: https://github.com/slackapi/bolt/pull/380/files#diff-0286657a0789ea9446fa3de979ff3e09R569 |
I'm also still interested in a discussion over this comment (pasted below): #375 (comment)
There's some unavoidable unexpected behavior when you embed |
We've gone ahead and landed the implementations we've converged on in the |
This proposal is meant to address the following use cases:
Receiver
implementations (those which need to know when an incoming event is fully processed).The changes
Adjust
ack()
andnext()
to return PromisesAll listeners (with the exception of those listening using
app.event()
) and all middleware will becomeasync
functions. The consequence of using a normal function where an async one was expected would be either to interpret it the same way asPromise.resolve()
. Listeners and middleware’s returned promise should resolve when processing the event is complete. If it rejects, the middleware preceding it in the chain may catch, but if it doesn’t catch the middleware should also reject, and so on. Rejections that bubble to the first middleware are handled through the global error handler.ack()
returnsPromise<void>
. The promise resolves when the receiver is done acknowledging the event (typically when the HTTP response is done being written). By allowing the promise to reject, listeners can handle errors that occur during acknowledgement (like trying to send the HTTP response). Currently, receivers are expected to handle these sorts of errors, but they typically have no ability to do anything intelligent.Code for work that should be performed after the incoming event is acknowledged should be performed after calling
await ack()
in the listener or middleware function. The consequence of forgetting theawait
keyword will likely be invisible. The returned promise would be unhandled and might reject, but that kind of error is very rare.next()
returnsPromise<void>
. The promise resolves when the next middleware in the chain’s returned promise resolves. This builds a chain through middleware where early middleware gain a signal for when processing has completed in all later middleware. That signal can then be passed onto the receiver. It also means middleware useawait next()
and follow with more code to post-process the event (such as logging middleware). This replaces the usage ofnext()
where a callback function is passed in.The global error handler can return a Promise
If a rejection bubbles through the first middleware, the global error handler is triggered. This is not new. But instead of only returning
void
it can optionally a Promise. If the returned promise rejects, that rejection is exposed to the receiver as described below. The default global error handler will log the error and then reject, which means by default all unhandled errors in listeners and middleware make their way back to the receiver.Receiver
s callApp.processEvent()
instead of emittingReceivers are no longer
EventEmitter
s. Instead, they are provided with anApp
instance upon initialization. When the receiver has an event for the app to process, it calls theApp.processEvent()
method, which returnsPromise<void>
. The promise resolves when all middleware and listeners have finished processing the event. The promise rejects when the global error handler's returned promise rejects.There’s no resolved value for the returned promise. The receiver is expected to remember the value passed to the
ack()
function (which the receiver created) and associate that value to the returned promise if it chooses to delay the HTTP response until all processing is complete. This allows us to built synchronous receivers that only respond once all processing is complete.Dispatch algorithm changes
Events are dispatched through 2 separate phases:
Between these two phases, a special middleware manages dispatching to all the listener middleware chains in parallel. This will not change. However, the return values of the individual listener middleware chains will start to become meaningful, and need to be bubbled up through the global middleware chain. This special middleware between the phases will aggregate the returned promises and wait for them all to complete (a la
Promise.all()
). If any promise(s) reject, then the middleware will return a rejected promise (whose error is a wrapper of all the resolved promises) to bubble through the global middleware chain. Conversely, if all of them succeed, the middleware will return a resolved promise to bubble through the global middleware chain.Middleware which choose not to handle an event should return a resolved promise without ever calling
next()
.Align
say()
andrespond()
to return PromisesNow that listeners and middleware are expected to be
async
functions, it follows that all the utility functions given to them which perform asynchronous work should also return Promises. This aligns withack()
andnext()
by similarly allowing listeners to handle errors when callingsay()
or when callingrespond()
.The consequence of forgetting the
await
keyword will likely be invisible. The returned promise would be unhandled and might reject, but that kind of error is very rare.Disadvantages
This is a breaking change. The scope of the backwards incompatibility is pretty limited, and here are some thoughts about migration:
async
functions.await ack()
(andawait next()/say()/respond()
).UnhandledPromiseRejection
error.await next()
instead of calling it directlyawait next()
.UnhandledPromiseRejection
error.The performance might suffer. The change in the dispatch algorithm requires all the listener middleware chain returned promises to resolve. Most should resolve rather quickly (since the first middleware in most of the chains will filter out the event and immediately return). However, if some middleware needs to process the event asynchronously before it can decide to call
next()
or not, it could slow down the event from being fully processed by another much quicker listener middleware chain. Once again, we haven’t heard from many users who use listener middleware, especially for asynchronous tasks, so we think this impact is relatively small. We should measure the performance of our example code (and new examples) to understand whether or not this has a significant impact.There will be less compatibility with Hubot and Slapp apps who wish to migrate and continue to use middleware they wrote. A design similar to this one was considered before releasing Bolt v1. It was deliberately rejected in an effort to maximize compatibility with Hubot and Slapp. However, at this point there’s no indication that many Hubot or Slapp developers are actively migrating their code, and no indication that this specific part of migration is a problem worth holding back on this design’s advantages for.
Other benefits
The old way could be implemented in terms of the new way (but not the other way around). This could be a useful way to assist developers in migrating. In fact, this could pave the way for how “routers” within Bolt are composed together to make a larger application. The special middleware between the two phases can be thought of as a router that you get by default, and could be instantiated separately in other files. They could all then be composed back into the main app.
Credit to @selfcontained, @seratch, @barlock, and @tteltrab. I’ve simply synthesized their ideas into a proposal.
Requirements (place an
x
in each of the[ ]
)The text was updated successfully, but these errors were encountered: