-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kong plugins cannot log read and connection timeouts #3193
Comments
Yeah seeing some more of these kinda errors in nginx logs, will be super helpful when Kong figures out a way to expose these through plugins so we can index them into splunk/kibana or any other kinda external record keepers for transactions:
|
Look forward to this feature,it's important,upstream timeout should be recorded. |
Agreed, when working with APIs there is always opportunity for these issues to occur and currently the logging plugins are blind to these scenarios. In our use of testing around with Kong for 8+ months I have found this problem to be one of the most troublesome with no easy workaround. I don't want to tail my stdout(or the nginx log files if you leave it there by default) and write some wonky code to REST transact it to our http sinks. Hoping Kong or OpenResty come up with an elegant solution. |
Wondering if this is still on the roadmap or any progress being made here. I am up for testing in a dev environment if anyone has any crazy ideas. |
I am also concerned about the progress of this issue. We are using kong as an light and high performance esb,but the timeout requet could not be recored now,some important info was lost. Hoping for your reply,Thanks a lot. @jeremyjpj0916 @thibaultcha |
@deepshrift slightly off topic, but as an ESB are you using Kong as a Gateway for SOAP services as well? Where you implement SOAP WS Security at the gateway layer via a Kong plugin? That is a feature I have wanted for awhile since many companies have deprecated SOAP services and claim they are to expensive to migrate off of. If you have anything opensource plugins of that style would love to check it out. EDIT- |
Adding new notes here(taken from forum post), this should be the solution Kong goes with (probably should live in CORE upon my initial guess): " EDIT - Ohhh 3scale api gateway who uses nginx similarly to Kong also wrote a module for it(you can see it tied to the issue I linked, Makes sense I think that this would integrate into Kong core rather than a plugin/template I would think (it is Apache 2.0 Licensed) - |
@jeremyjpj0916 Thanks for helping in investigating our options here. As we discussed in the past, copying As of today, the reason why As of today, copying However, more work is required to catch HTTP 4xx errors produced by Nginx (e.g. "414 Request URI Too Large"). That is because when Nginx responds with one of these status codes, it does so before the router (i.e. Kong's proxy), is given a chance to run at all. In order to catch these, we would need to run a dummy router and construct the plugins runloop from scratch from the error handler's log phase. One could think that if the latter is done, then copying the Another question we need to answer is whether or not such responses produced by the error handlers should trigger the Anyway, I have pushed a WIP branch with the suggested |
This is certainly a step in the right direction and really appreciate you looking into this on a Sunday! Capturing L4 errors such as 502/504 is a fairly big game changer on the logging front IMO since many applications and networks under duress will throw those fairly frequently I have been finding. 414 and some of the other obscure errors are definitely a new topic of consideration but I am fairly convinced this little WIP will solve 90% of the logging woes Kongers face (or probably don't know they would face if their apps/network are just always flawless hah) . I will be digesting what you have done here and will test as well and report back. Thanks again! |
@jeremyjpj0916 Realized I didn't push the branch before I left the laptop earlier. Done now. |
@thibaultcha 👍 thanks, at first I looked at it for 10 minutes earlier this afternoon thinking I must be doing something wrong and just was not familiar enough with comparing branches lol. Seeing as I generally pull down the Kong tarball in my image build I am hoping I can just overwrite the default files prior to the Kong prepare like-
And ofc apply the template changes. If it will take anything more such as recompiling nginx or openresty I may need a line or two snippit that do that but I am hoping its just as easy as when I overwrite the default Kong plugin code with a modified lua files and just drop it right into their default locations prior to startup(probably not best practice considering yall may change those but so far can't say it has bitten me :) ). Do you think you will roll this stuff out incrementally into the community edition and subsequently enterprise as well with just these basic l4 errors being handled logged first, or do you intend to wait until you can mature the logic a bit further to cover all possible situations this new code is applicable too(such as 4xx's)? Regardless I am reading over the code and its starting to make sense to me how you integrated it, impressive and exciting times for Kong! |
@jeremyjpj0916 I am of the opinion that if we can offer incremental improvements, we should. So handling 502/504 will be better than not. That said, it will have to go into a major release at least because of the changes to the Nginx template. Future work to handle other 4xx responses doesn’t seem like it would be breaking once we have the Nginx template changes in place. So my take would be 0.14. We’ll schedule this in one of our next sprints. |
@thibaultcha I had a chance to test the code today. Does not seem to be doing what your testing resulted in. My testing of it resulted in all plugins stopped functioning all together that were enabled globally or on routes/services/api's . I was able to call oauth2/jwt protected API's as if they had no plugins enabled at all, statsD stopped logging. HTTP log plugin stopped logging. No plugins seem to be running and I did a diff between this branches changes vs our existing code and cant see any difference. To give you a taste of how I attempted to recreate your code I did the following, this is needed bit from our Dockerfile:
Other changes were changing runloop to core in the init.lua (your tarball/deployed kong does not have a runloop, seems yall differ in that respect from packaged source to the repo source) Added a zip of our custom template as well as the init.lua and ctx.lua(which is identical to the one sitting in repo). Is there some other dependency I should be aware of when attempting to deploy these changes with the "release" codebase? Based on my Docker sample above and nginx custom template I would think I seemingly covered the bases. Of course I could wait till .14 comes out but I was going to give you confirmation it works on my end if you wanted further testing and performance analysis done with it off/on to see if any impact. EDIT - Also logs do not reveal any lua runtime errors. Build's fine. --vv verbose mode revealed nothing either as well. |
@jeremyjpj0916 I can make this as a patch on top of 0.13.1 if you want. FYI the unit and integration tests do not seem to have any worrying failures, so something probably went wrong in your environment because of discrepancies between the 0.13.1 and master branches. |
Yeah but I can't imagine 0.13.1 vs master have any breaking changes in the init.lua? You did intend to have this new resty ctx module live in the : /usr/local/share/lua/5.1/kong/resty/ctx.lua location correct? Tis a head scratcher because not that much really changed. Will keep poking/prodding it. Ideally if it was on a personal bintray alpine tarball then its identical to how I have been building. Stumped until further notice, will update here if I hit a break through of seeing where the problem lies otherwise can wait a till .14 . |
It does actually: f20432f. I just pushed a tmp/0.13.1-log-plugins-on-error-page branch that you can safely apply to 0.13.1 (it should just be a commit on top of it). It has a successful CI build: https://travis-ci.org/Kong/kong/builds/378979503. |
@thibaultcha I def see one difference off top of my head I missed between master vs 0.13.1 changes:
If missing lines in the rockspec like that will result in a lack of plugins running at all then that probably is it, there is a rock_manifest in the directory alongside the rockspec but reading into it I assume that gets generated at build time by kong based on what I saw within it. Already applied the runloop back to core change from your original master wip to my init.lua and the change in custom kong template is minimal and I still believe it to be correct, but will certainly diff on all the files once again later today between this new 0.13.1 wip branch and our current build out. Will get back to this thread sometime later today with results. Thanks again for your stellar support! Edit - welp thought:
Would do the trick. Strangely enough the rock_manifest produced still had the same hash as an unmodified 0.13.1 baseline branch.
And the resty/ctx.lua was no where to be found in the manifest and I assume if the rockspec did its thing correctly the produced manifest would have included something additional like:
Op.. jeesh 6:45am, time for nap and will resume tackling this later. |
Did try getting this to work some more today, beats me the difference in src vs 0.13.1 vs tarball docker method w custom template(which does work on baseline tarball 0.13.x that comes down from bintray). Might be best just to wait until this releases into v14 (don't wanna waste any more of your time trying to support me testing it) and then I will post the results here in 3-4 weeks or whatever it takes for next major release candidate. Will do a performance benchmark of an environment with identical specs one on Kong 14 with the additional ctx saving functionality for internal redirect handling and one test on 0.13.1 without it. Will see if there is an increase in latency / mem consumption. So will post here in weeks to follow, really looking forward to seeing how far Kong can harden its logging functionalities now with this newer implementation falling into place. |
Are you using this template: You do have that |
@bungle yeah the zip is our latest form of it. When I attempted to use these changes(and may have had some bad/old code in the core files too at times from not changing enough of the core files to match thibaultcha changes) it seemed to cause our db to stop recognizing all plugins tied to api definitions and even after reverting to older code it stayed broken where the plugins were not getting recognized even though they were in the db bringing kong up/down. I have resolved to once .14 is officially out with a tarball to pull I will come back here and to a test comparison between 13 and 14 for performance against an arbitrary API. I figure these changes will not increase latency really just use more ram due to duplicating the ctx essentially. + IMO the most important things a gateway can have is proper logging so people on both ends client / back-end know what the heck happened(especially in times of L4 errors and TLS related woes). |
@jeremyjpj0916 For now we are going to experiment a few other possibilities, and @bungle volunteered to take a look at some of them in the next few days. I am sure he will keep this issue updated with his latest findings 👍 |
Very interested to see what is produced, I thought your first crack at it was great as its what other gateways seem to do with nginx, but maybe Kong can ascend above 💯. May the best implementation win, best of luck bungle! #kongStrong |
Hi @jeremyjpj0916, There is @thibaultcha's solution: There is my solution: I learned a lot in this process. In general both solutions do seem to work. My solution is a little bit bigger change fundamentally, though it might be good in a long run. I am not sure which one of these I prefer, even if I am biased to say anything here :-). @thibaultcha's solution is (simplified):
Plus Points:
Minus Points:
My solution is (simplified)
Plus Points:
Minus Points:
Third option is to use |
@bungle When I was pondering this problem originally I was wondering the reason why Kong chose to use the error_page directive in the first place(less "handle with code" I guess?). I have to say I prefer your follow up impl simply because it requires no copying of the ctx, and in lua land we only have so much memory to work with per worker thread. I also tend to not be super reserved when making changes, even if its a large one if it drives a better implementation going forward I am all for them and I feel like ridding yourself of the internal redirect puts Kong more in control of its Impl (but I am just a newbie scrub to most of this hah so its just my initial impression), I'm gonna read through the changes some more later tonight probably, |
There is one ugly thing: It is really a flaky hack. The thing is that with The remaining idea is to add
All non-standard error codes. I think we could even change them in some cases like |
And one option is to is to still have |
Just to be clear here: there is no "copying" per say, only passing a reference from one request to another. Thanks for the write-up @bungle! |
Glad for the clarification, passing as a reference makes the first impl not seem as bad on mem as I was assuming then. |
Yes, when I wrote that, and got away of computer. I also started wondering, whether I used word |
pass by value vs pass by reference | copy by value vs copy by reference are different phrasings I have always followed in Java. So pass/copy by reference is okay so I don't think you you were wrong. I just was thinking it was value and not reference :) so the clarification helped me out. |
Reviewed the code best I could, following what I know now based on thibaultcha comments + bungle insights I agree its a toss up, both have their merits and are primarily grounded on "minimal change" vs "do more with code" and they both have minus points that seem to have similar impact downsides. Interested to see which one Kong goes with. I am still EXTREMELY excited for this functionality regardless and I am sure Kong is too, will really help community/enterprise teams have better logs to diagnose the real uptime/health of their apis. My list of major Kong improvement needs will thin by 1 and is in the single digit count anyways :) . Onward and upward! EDIT - On the third option idk how I feel about that, incorporating both would add confusion to a newbie developer(like meee 😆) as you now have error handling taking two different approaches. Functionally it may yield the best coverage/combination though? Regardless the end game should be to log as many error types as Kong possibly can to its various log sinks and maybe this method takes it there. The gateway ultimately serves as the primary proof of transaction in the API space and we need these transactions documented. |
Context ------- Before this patch, several scenarios would make Kong/Nginx produce errors that did not execute plugins: 1. Nginx-produced client errors (HTTP 4xx) on invalid requests per the Nginx settings (e.g. headers too large, URI too large, etc...) 2. Proxy TCP/HTTP errors on the last balancer try (e.g. connection refused, connection timeout, read/write timeout, etc...) 3. Any other Nginx-produced server errors (HTTP 5xx), if any Because plugins are not executed in such cases, logging plugins in particular do not get a chance to report such errors, leaving Kong administrator blind when relying on bundled or homegrown logging plugins. Additionally, other plugins, e.g. transformation, do not get a chance to run either and could lead to scenarios where those errors produce headers or response bodies that are undesired by the administrator. This issue is similar to that fixed by #3079, which handled the following case: 4. Requests short-circuited by Kong itself (vs Nginx in 1, 2, and 3.) See also #490 & #892 for issues related to 4. This patches addresses 1, 2, and 3. Problem ------- Nginx-produced errors (4xx and 5xx) are redirected to the `/kong_error_handler` location by way of the `error_page` directive, but the runloop is not given a chance to run, since the error handler is only instructed to override the Nginx-defined HTML responses, and produces Kong-friendly ones. In short, a patch solving this issue must enable the runloop within the error handler. Proposed solution ----------------- Several (many) possibilities were evaluated and explored to fix this. The proposed patch is the result of many attempts, observations, and adjustments to initial proposals. In a gist, it proposes: * Executing the response part of the runloop in the error_page location. * Ensure that if the request part of the runloop did not have a chance to run, we do build the list of plugins to execute (thus, global plugins only since the request was not processed). * Ensure that if the request part of the runloop did run, we preserve its `ngx.ctx` values. A few catches: * Currently, `ngx.req.get_method()` in the `error_page` location block will return `GET` even if the request method was different (only `HEAD` is preserved). A follow up patch for Nginx is being prepared. As such, several tests are currently marked as "Pending". * When the `rewrite` phase did not get a chance to run, some parts of the request may not have been parsed by Nginx (e.g. on HTTP 414), thus some variables, such as `$request_uri` or request headers may not have been parsed. Code should be written defensively against such possibilities, and logging plugins may report incomplete request data (but will still report the produced error). * The `uninitialized_variable_warn` directive must be disabled in the `error_page` location block for cases when the `rewrite` phase did not get a chance to run (Nginx-produced 4xx clients errors). * The `ngx.ctx` variables must be copied by reference for plugins to be able to access it in the response phases when the request was short-circuited by Nginx (e.g. HTTP 502/504 from the upstream module). * Using a named location for the `error_page` handler is not possible, since in some cases the URI might not be parsed by Nginx, and thus Nginx refuses to call named locations in such cases (e.g. on HTTP 414). Resulting behavior ------------------ The `03-plugins_triggering_spec.lua` test suite is now a good reference for the behavior observed upon Nginx-produced errors. This behavior can be classified in 2 categories: 1. When the request is not parsed by Nginx (HTTP 4xx) a. The `error_page` directive is called (`rewrite` and `access` are skipped) b. The runloop retrieves the list of **global** plugins to run (it cannot retrieve more specific plugins since the request could not be parsed) c. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) d. It executes the `header_filter` phase of each global plugin. e. It executes the `body_filter` phase of each global plugin. f. It executes the `log` phase of each global plugin. 2. When the error was produced during upstream module execution (HTTP 5xx) a. The `rewrite` and `access` phases ran normally (we stashed the `ngx.ctx` reference) b. The `error_page` directive is called c. The error content handler applies the original `ngx.ctx` by reference d. It does **not** run the plugins loop, since `ngx.ctx` contains the list of plugins to run (global **and** specific plugins in this case) e. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) f. It executes the `header_filter` phase of each global plugin. g. It executes the `body_filter` phase of each global plugin. h. It executes the `log` phase of each global plugin. Fix #3193
@jeremyjpj0916 Just opened #3533 |
@bungle @thibaultcha Exceptional engineering with great focus on detail and analysis. The resulting patch is really above and beyond my original expectation of just solving 5xx level L4 logging. Glad this is making it into the upcoming release. Puts Kong logging on a whole new level. Thanks! |
Context ------- Before this patch, several scenarios would make Kong/Nginx produce errors that did not execute plugins: 1. Nginx-produced client errors (HTTP 4xx) on invalid requests per the Nginx settings (e.g. headers too large, URI too large, etc...) 2. Proxy TCP/HTTP errors on the last balancer try (e.g. connection refused, connection timeout, read/write timeout, etc...) 3. Any other Nginx-produced server errors (HTTP 5xx), if any Because plugins are not executed in such cases, logging plugins in particular do not get a chance to report such errors, leaving Kong administrator blind when relying on bundled or homegrown logging plugins. Additionally, other plugins, e.g. transformation, do not get a chance to run either and could lead to scenarios where those errors produce headers or response bodies that are undesired by the administrator. This issue is similar to that fixed by #3079, which handled the following case: 4. Requests short-circuited by Kong itself (vs Nginx in 1, 2, and 3.) See also #490 & #892 for issues related to 4. This patches addresses 1, 2, and 3. Problem ------- Nginx-produced errors (4xx and 5xx) are redirected to the `/kong_error_handler` location by way of the `error_page` directive, but the runloop is not given a chance to run, since the error handler is only instructed to override the Nginx-defined HTML responses, and produces Kong-friendly ones. In short, a patch solving this issue must enable the runloop within the error handler. Proposed solution ----------------- Several (many) possibilities were evaluated and explored to fix this. The proposed patch is the result of many attempts, observations, and adjustments to initial proposals. In a gist, it proposes: * Executing the response part of the runloop in the error_page location. * Ensure that if the request part of the runloop did not have a chance to run, we do build the list of plugins to execute (thus, global plugins only since the request was not processed). * Ensure that if the request part of the runloop did run, we preserve its `ngx.ctx` values. A few catches: * Currently, `ngx.req.get_method()` in the `error_page` location block will return `GET` even if the request method was different (only `HEAD` is preserved). A follow up patch for Nginx is being prepared. As such, several tests are currently marked as "Pending". * When the `rewrite` phase did not get a chance to run, some parts of the request may not have been parsed by Nginx (e.g. on HTTP 414), thus some variables, such as `$request_uri` or request headers may not have been parsed. Code should be written defensively against such possibilities, and logging plugins may report incomplete request data (but will still report the produced error). * The `uninitialized_variable_warn` directive must be disabled in the `error_page` location block for cases when the `rewrite` phase did not get a chance to run (Nginx-produced 4xx clients errors). * The `ngx.ctx` variables must be copied by reference for plugins to be able to access it in the response phases when the request was short-circuited by Nginx (e.g. HTTP 502/504 from the upstream module). * Using a named location for the `error_page` handler is not possible, since in some cases the URI might not be parsed by Nginx, and thus Nginx refuses to call named locations in such cases (e.g. on HTTP 414). Resulting behavior ------------------ The `03-plugins_triggering_spec.lua` test suite is now a good reference for the behavior observed upon Nginx-produced errors. This behavior can be classified in 2 categories: 1. When the request is not parsed by Nginx (HTTP 4xx) a. The `error_page` directive is called (`rewrite` and `access` are skipped) b. The runloop retrieves the list of **global** plugins to run (it cannot retrieve more specific plugins since the request could not be parsed) c. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) d. It executes the `header_filter` phase of each global plugin. e. It executes the `body_filter` phase of each global plugin. f. It executes the `log` phase of each global plugin. 2. When the error was produced during upstream module execution (HTTP 5xx) a. The `rewrite` and `access` phases ran normally (we stashed the `ngx.ctx` reference) b. The `error_page` directive is called c. The error content handler applies the original `ngx.ctx` by reference d. It does **not** run the plugins loop, since `ngx.ctx` contains the list of plugins to run (global **and** specific plugins in this case) e. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) f. It executes the `header_filter` phase of each global plugin. g. It executes the `body_filter` phase of each global plugin. h. It executes the `log` phase of each global plugin. Fix #3193
Context ------- Before this patch, several scenarios would make Kong/Nginx produce errors that did not execute plugins: 1. Nginx-produced client errors (HTTP 4xx) on invalid requests per the Nginx settings (e.g. headers too large, URI too large, etc...) 2. Proxy TCP/HTTP errors on the last balancer try (e.g. connection refused, connection timeout, read/write timeout, etc...) 3. Any other Nginx-produced server errors (HTTP 5xx), if any Because plugins are not executed in such cases, logging plugins in particular do not get a chance to report such errors, leaving Kong administrator blind when relying on bundled or homegrown logging plugins. Additionally, other plugins, e.g. transformation, do not get a chance to run either and could lead to scenarios where those errors produce headers or response bodies that are undesired by the administrator. This issue is similar to that fixed by #3079, which handled the following case: 4. Requests short-circuited by Kong itself (vs Nginx in 1, 2, and 3.) See also #490 & #892 for issues related to 4. This patches addresses 1, 2, and 3. Problem ------- Nginx-produced errors (4xx and 5xx) are redirected to the `/kong_error_handler` location by way of the `error_page` directive, but the runloop is not given a chance to run, since the error handler is only instructed to override the Nginx-defined HTML responses, and produces Kong-friendly ones. In short, a patch solving this issue must enable the runloop within the error handler. Proposed solution ----------------- Several (many) possibilities were evaluated and explored to fix this. The proposed patch is the result of many attempts, observations, and adjustments to initial proposals. In a gist, it proposes: * Executing the response part of the runloop in the error_page location. * Ensure that if the request part of the runloop did not have a chance to run, we do build the list of plugins to execute (thus, global plugins only since the request was not processed). * Ensure that if the request part of the runloop did run, we preserve its `ngx.ctx` values. A few catches: * Currently, `ngx.req.get_method()` in the `error_page` location block will return `GET` even if the request method was different (only `HEAD` is preserved). A follow up patch for Nginx is being prepared. As such, several tests are currently marked as "Pending". * When the `rewrite` phase did not get a chance to run, some parts of the request may not have been parsed by Nginx (e.g. on HTTP 414), thus some variables, such as `$request_uri` or request headers may not have been parsed. Code should be written defensively against such possibilities, and logging plugins may report incomplete request data (but will still report the produced error). * The `uninitialized_variable_warn` directive must be disabled in the `error_page` location block for cases when the `rewrite` phase did not get a chance to run (Nginx-produced 4xx clients errors). * The `ngx.ctx` variables must be copied by reference for plugins to be able to access it in the response phases when the request was short-circuited by Nginx (e.g. HTTP 502/504 from the upstream module). * Using a named location for the `error_page` handler is not possible, since in some cases the URI might not be parsed by Nginx, and thus Nginx refuses to call named locations in such cases (e.g. on HTTP 414). Resulting behavior ------------------ The `03-plugins_triggering_spec.lua` test suite is now a good reference for the behavior observed upon Nginx-produced errors. This behavior can be classified in 2 categories: 1. When the request is not parsed by Nginx (HTTP 4xx) a. The `error_page` directive is called (`rewrite` and `access` are skipped) b. The runloop retrieves the list of **global** plugins to run (it cannot retrieve more specific plugins since the request could not be parsed) c. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) d. It executes the `header_filter` phase of each global plugin. e. It executes the `body_filter` phase of each global plugin. f. It executes the `log` phase of each global plugin. 2. When the error was produced during upstream module execution (HTTP 5xx) a. The `rewrite` and `access` phases ran normally (we stashed the `ngx.ctx` reference) b. The `error_page` directive is called c. The error content handler applies the original `ngx.ctx` by reference d. It does **not** run the plugins loop, since `ngx.ctx` contains the list of plugins to run (global **and** specific plugins in this case) e. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) f. It executes the `header_filter` phase of each global plugin. g. It executes the `body_filter` phase of each global plugin. h. It executes the `log` phase of each global plugin. Fix #3193
Context ------- Before this patch, several scenarios would make Kong/Nginx produce errors that did not execute plugins: 1. Nginx-produced client errors (HTTP 4xx) on invalid requests per the Nginx settings (e.g. headers too large, URI too large, etc...) 2. Proxy TCP/HTTP errors on the last balancer try (e.g. connection refused, connection timeout, read/write timeout, etc...) 3. Any other Nginx-produced server errors (HTTP 5xx), if any Because plugins are not executed in such cases, logging plugins in particular do not get a chance to report such errors, leaving Kong administrator blind when relying on bundled or homegrown logging plugins. Additionally, other plugins, e.g. transformation, do not get a chance to run either and could lead to scenarios where those errors produce headers or response bodies that are undesired by the administrator. This issue is similar to that fixed by #3079, which handled the following case: 4. Requests short-circuited by Kong itself (vs Nginx in 1, 2, and 3.) See also #490 & #892 for issues related to 4. This patches addresses 1, 2, and 3. Problem ------- Nginx-produced errors (4xx and 5xx) are redirected to the `/kong_error_handler` location by way of the `error_page` directive, but the runloop is not given a chance to run, since the error handler is only instructed to override the Nginx-defined HTML responses, and produces Kong-friendly ones. In short, a patch solving this issue must enable the runloop within the error handler. Proposed solution ----------------- Several (many) possibilities were evaluated and explored to fix this. The proposed patch is the result of many attempts, observations, and adjustments to initial proposals. In a gist, it proposes: * Executing the response part of the runloop in the error_page location. * Ensure that if the request part of the runloop did not have a chance to run, we do build the list of plugins to execute (thus, global plugins only since the request was not processed). * Ensure that if the request part of the runloop did run, we preserve its `ngx.ctx` values. A few catches: * Currently, `ngx.req.get_method()` in the `error_page` location block will return `GET` even if the request method was different (only `HEAD` is preserved). A follow up patch for Nginx is being prepared. As such, several tests are currently marked as "Pending". * When the `rewrite` phase did not get a chance to run, some parts of the request may not have been parsed by Nginx (e.g. on HTTP 414), thus some variables, such as `$request_uri` or request headers may not have been parsed. Code should be written defensively against such possibilities, and logging plugins may report incomplete request data (but will still report the produced error). * The `uninitialized_variable_warn` directive must be disabled in the `error_page` location block for cases when the `rewrite` phase did not get a chance to run (Nginx-produced 4xx clients errors). * The `ngx.ctx` variables must be copied by reference for plugins to be able to access it in the response phases when the request was short-circuited by Nginx (e.g. HTTP 502/504 from the upstream module). * Using a named location for the `error_page` handler is not possible, since in some cases the URI might not be parsed by Nginx, and thus Nginx refuses to call named locations in such cases (e.g. on HTTP 414). Resulting behavior ------------------ The `03-plugins_triggering_spec.lua` test suite is now a good reference for the behavior observed upon Nginx-produced errors. This behavior can be classified in 2 categories: 1. When the request is not parsed by Nginx (HTTP 4xx) a. The `error_page` directive is called (`rewrite` and `access` are skipped) b. The runloop retrieves the list of **global** plugins to run (it cannot retrieve more specific plugins since the request could not be parsed) c. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) d. It executes the `header_filter` phase of each global plugin. e. It executes the `body_filter` phase of each global plugin. f. It executes the `log` phase of each global plugin. 2. When the error was produced during upstream module execution (HTTP 5xx) a. The `rewrite` and `access` phases ran normally (we stashed the `ngx.ctx` reference) b. The `error_page` directive is called c. The error content handler applies the original `ngx.ctx` by reference d. It does **not** run the plugins loop, since `ngx.ctx` contains the list of plugins to run (global **and** specific plugins in this case) e. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) f. It executes the `header_filter` phase of each global plugin. g. It executes the `body_filter` phase of each global plugin. h. It executes the `log` phase of each global plugin. Fix #3193 From #3533
@jeremyjpj0916 Thank you for the kind words. Now closing this since #3533 has been merged - expect a 0.14.0rc1 in a few days. Thanks again! |
Summary
Has been discussed in the forum but since its a bug figured it would be good to make a git issue since that is easier to track long term:
https://discuss.konghq.com/t/kong-http-log-plugin-errors/445/10
Basically when it comes to read and connection timeouts Kong can't invoke any of the logging plugins on transactions.
Steps To Reproduce
Enable any of the logging plugins globally/on an api/on a consumer
Create an API, set the upstream read timeout property to 1ms or something crazy to force the timeout.
Watch as you will see in proxy error logs the read timeout with nginx, but no plugins at the Kong layer can trap this problem currently, thibaultcha informed us:
"
kong.log() call will not execute plugins indeed, for another reason: NGINX’s error_page does an internal redirect, which causes ngx.ctx (the value storing the API and plugins to execute for this request) to be reset. Thus, no plugins will be run (Kong even lost the information relating to which API was matched in that particular request at this point).
There seems to be no easy fix for this, we’ll have to add this to our backlog and think of a workaround.
"
Additional Details & Logs
Thanks,
-Jeremy
The text was updated successfully, but these errors were encountered: