Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(plugins/proxy-cache): add wildcard and parameter match support for content_type #10055

Merged
merged 4 commits into from
Mar 7, 2023

Conversation

vm-001
Copy link
Contributor

@vm-001 vm-001 commented Jan 4, 2023

Summary

Add wildcard and parameter match support for content_type

Checklist

Issue reference

FTI-1131

@vm-001 vm-001 added this to the 3.2 milestone Jan 4, 2023
@vm-001
Copy link
Contributor Author

vm-001 commented Jan 4, 2023

sister PR: #10030

@vm-001 vm-001 marked this pull request as ready for review January 4, 2023 08:55
@vm-001 vm-001 changed the base branch from master to feat/mime-type January 13, 2023 09:04
@vm-001 vm-001 changed the base branch from feat/mime-type to master January 13, 2023 09:05
@windmgc windmgc changed the base branch from master to feat/mime-type January 13, 2023 10:01
@windmgc windmgc force-pushed the feat/proxy-caching-content-type-wildcard branch from 47f3f08 to 4683619 Compare January 13, 2023 10:04
@github-actions github-actions bot added changelog chore Not part of the core functionality of kong, but still needed core/pdk core/templates plugins/acme labels Jan 13, 2023
@windmgc windmgc force-pushed the feat/proxy-caching-content-type-wildcard branch from 4683619 to 0042da5 Compare January 16, 2023 03:53
@github-actions github-actions bot removed chore Not part of the core functionality of kong, but still needed plugins/acme changelog core/pdk core/templates labels Jan 16, 2023
@windmgc
Copy link
Member

windmgc commented Jan 16, 2023

LGTM on the code wise, this needs to get merged after #10030

@vm-001 vm-001 force-pushed the feat/proxy-caching-content-type-wildcard branch from 1ec0b59 to 39b66ad Compare February 2, 2023 11:43
@vm-001 vm-001 force-pushed the feat/proxy-caching-content-type-wildcard branch from 5b1f384 to 0883866 Compare February 6, 2023 06:04
@vm-001 vm-001 removed this from the 3.2 milestone Feb 6, 2023
@hbagdi
Copy link
Member

hbagdi commented Feb 22, 2023

@vm-001 Could you please rebase?

@vm-001 vm-001 force-pushed the feat/proxy-caching-content-type-wildcard branch from 0883866 to 0fe260e Compare February 23, 2023 07:40
@hanshuebner
Copy link
Contributor

Does this PR overlap with #10374, @samugi ?

@samugi
Copy link
Member

samugi commented Mar 2, 2023

Does this PR overlap with #10374, @samugi ?

it will not create conflicts @hanshuebner because this PR is creating a new dedicated kong.tools.mime_type module and is only using it in the proxy cache plugin, it does not affect what we currently have in the pdk (moved in utils with #10374). However... we will want to unify things and only do this one way, it should be simple once we have this mime_type module to point to it from whatever function needs it in utils etc. Perhaps that should be a separate PR once these 2 are merged. What do you think?

EDIT: tracked in KAG-789

@hanshuebner
Copy link
Contributor

Unification would be good. We don't want to have to implementations of basically the same thing. Please create a KAG ticket for visibility.

fffonion pushed a commit that referenced this pull request Mar 2, 2023
It has been confirmed that the lua-protobuf under version 0.4.2 causes the Segmentation fault(core dump) while used with lpeg. The issue was found in another PR(#10055).
@vm-001 vm-001 force-pushed the feat/proxy-caching-content-type-wildcard branch from 0fe260e to 298e5b0 Compare March 2, 2023 08:50
@vm-001 vm-001 added this to the 3.3 milestone Mar 2, 2023
@fffonion
Copy link
Contributor

fffonion commented Mar 7, 2023

put a block on this PR as the original perf (#10030 (comment)) doesn't seem to prove that lpeg is faster than ngx.re

@vm-001
Copy link
Contributor Author

vm-001 commented Mar 7, 2023

put a block on this PR as the original perf (#10030 (comment)) doesn't seem to prove that lpeg is faster than ngx.re

@fffonion The log of that test is reversed(due to the typo), I've just corrected that.

Here is the output from my computer:

case ngxreg: time used  : 519.99998092651 ms
case lpeg: time used  : 273.00000190735 ms

@fffonion fffonion merged commit 2504033 into master Mar 7, 2023
@fffonion fffonion deleted the feat/proxy-caching-content-type-wildcard branch March 7, 2023 05:30
@hanshuebner
Copy link
Contributor

put a block on this PR as the original perf (#10030 (comment)) doesn't seem to prove that lpeg is faster than ngx.re

@fffonion The log of that test is reversed(due to the typo), I've just corrected that.

Here is the output from my computer:

case ngxreg: time used  : 519.99998092651 ms
case lpeg: time used  : 273.00000190735 ms

The reason why lpeg is faster in this example program is that the regular expression to parse the parameters is evaluated multiple times, once for each parameter provided. If the whole matching is done by one regular expression evaluation, it is faster than lpeg. Let's not try to be too cheap when making arguments about performance.

The regular expression based version of the code also is easier to understand and reason about than the lpeg based version because it uses less named local state. Also, regular expressions are much more widely known than lpeg, which is a plus in terms of review- and maintainability.

@vm-001
Copy link
Contributor Author

vm-001 commented Mar 7, 2023

put a block on this PR as the original perf (#10030 (comment)) doesn't seem to prove that lpeg is faster than ngx.re

@fffonion The log of that test is reversed(due to the typo), I've just corrected that.
Here is the output from my computer:

case ngxreg: time used  : 519.99998092651 ms
case lpeg: time used  : 273.00000190735 ms

The reason why lpeg is faster in this example program is that the regular expression to parse the parameters is evaluated multiple times, once for each parameter provided. If the whole matching is done by one regular expression evaluation, it is faster than lpeg. Let's not try to be too cheap when making arguments about performance.

The regular expression based version of the code also is easier to understand and reason about than the lpeg based version because it uses less named local state. Also, regular expressions are much more widely known than lpeg, which is a plus in terms of review- and maintainability.

Writing a single static regular expression able to match a content type(with multiple parameters) is almost impossible. If an example content type with parameters here is to be considered cheap. Then Feel free to remove (parameters/merge_params) and do another test.

- local media_type = (types/format_types) * (parameters/merge_params) * P(-1)
+ local media_type = (types/format_types) * P(-1)

The Regular expression is indeed easy to maintain compared to Lpeg, but the media type is something RFC and hasn't been changed for a decade. That's why I think using Lpeg for this scenario is acceptable.

@hanshuebner
Copy link
Contributor

hanshuebner commented Mar 7, 2023

Writing a single static regular expression able to match a content type(with multiple parameters) is almost impossible.

I'm not sure what you mean by that. How about

(?<type>.+) \/ (?<subtype>[^ ;]+) (?:;\s*([^= ]+)=([^; ]+))*?

The performance benefit of the lpeg version is too small to warrant an implementation that is opaque and can't be reviewed unless one learns a non-standard parsing library. There is much more to be gained spending time learning how to use regular expressions.

@vm-001
Copy link
Contributor Author

vm-001 commented Mar 7, 2023

Writing a single static regular expression able to match a content type(with multiple parameters) is almost impossible.

I'm not sure what you mean by that. How about

(?<type>.+) \/ (?<subtype>[^ ;]+) (?:;\s*([^= ]+)=([^; ]+))*?

The performance benefit of the lpeg version is too small to warrant an implementation that is opaque and can't be reviewed unless one learns a non-standard parsing library. There is much more to be gained spending time learning how to use regular expressions.

It's not working. Missing "charset = UTF-8"

local inspect = require("inspect")
local r = ngx.re.match("application/json;charset=UTF-8;Foo=Bar", [[(?<type>.+)\/(?<subtype>[^ ;]+)(?:;\s*([^= ]+)=([^; ]+))*]])
print(inspect(r))

--[[
{ "application", "json", "Foo", "Bar",
  [0] = "application/json;charset=UTF-8;Foo=Bar",
  subtype = "json",
  type = "application"
}
--]]

@hanshuebner
Copy link
Contributor

That's true, and you're probably right in claiming that it is difficult to make it work in one regular expression. However, the overhead lies completely in the invocation of the regular expression parser, so reducing the number of regular expression evaluations is the key to improving performance. Regular expressions are easier to review and more universal than lpeg, and they're also faster to write, so the use of an obscure parsing library does not seem to be warranted, in particular given that the performance argument can be made both ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants