Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(core) integrate new events library #8890

Merged
merged 241 commits into from
Jun 8, 2022
Merged

Conversation

chronolaw
Copy link
Contributor

@chronolaw chronolaw commented Jun 2, 2022

Summary

Now we have a new event libaray, let's try to introduce it to replace lua-resty-worker-events.

#8715 has too many changes, and it is hard to rebase to master, so I create this new PR.

See also:

https://github.com/Kong/lua-resty-events
Kong/kong-build-tools#460

Full changelog

  • Modify init_worker_events in global.lua, switch to new events library.
  • Modify templates/nginx_kong.lua, add unix listening
  • Modify templates/kong_defaults.lua, add config legacy_worker_events
  • Add a temporary kong/resty/healthcheck.lua to use resty.events.compat
  • Modify test cases to pass ci, wrap them with wait_until()
  • events library version is 0.1.0

Notice

  • Temporary kong/resty/healthcheck.lua should be replaced by a formal version.
  • Flaky test 02-integration/05-proxy/10-balancer/01-healthchecks_spec.lua now is fixed.

@@ -20,7 +20,9 @@ local healthcheckers_M = {}
local healthcheck_subscribers = {}

function healthcheckers_M.init()
healthcheck = require("resty.healthcheck") -- delayed initialization
-- XXX: test only
healthcheck = require("kong.resty.healthcheck") -- delayed initialization
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it before merge to master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will remove this temporary file after feature freeze.

@chronolaw chronolaw added this to the 3.0 milestone Jun 2, 2022
.requirements Outdated Show resolved Hide resolved
Copy link
Member

@bungle bungle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have approved this with the comments. I still wish we could:

  1. drop the old events library completely
  2. remove the compatibility layer and use the new apis

kong/global.lua Show resolved Hide resolved
.requirements Outdated Show resolved Hide resolved
.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved
@chronolaw
Copy link
Contributor Author

I have approved this with the comments. I still wish we could:

  1. drop the old events library completely
  2. remove the compatibility layer and use the new apis

It is a bit risky to switch to the new events library, so we have to keep the old one.

If the new events library is proven working well, we can remove old one.

@bungle
Copy link
Member

bungle commented Jun 2, 2022

It is a bit risky to switch to the new events library, so we have to keep the old one.

I know. It is risky to update events library, router and timer. But if on all of these we carry to old code with us, it means we have huge amount of different ways to configure Kong and each bug gets harder to resolve as you need to find out:

  1. what router were you using
  2. what events library were you using
  3. was the timer library being used

Plus, we already have done a lot of deprecation removals. I do know that this is not the right place to decide on it, but I still would love to have the last conversation on it. Having wrpc and legacy protocol feels similar, now we need to maintain two implementations. This will definitely slow us down and create unique bugs.

This is a very big risk too. And it possibly means we need to carry the code around. For events, I would be more than willing to take that risk. If people report issues on 3.0.0 we will fix them in 3.0.1, etc. What I mean is that I don't want this to become a theme, where we reimplement something, and then don't remove the older one.

If the new events library is proven working well, we can remove old one.

Is there any reason to believe it would not? I mean we definitely have bugs in current events library too.

@chronolaw
Copy link
Contributor Author

I understood your feeling.

The problem is that there are many places to use events library with old apis, we will cost a lot of time to find/change/test them . I think it can not be done in this PR.

If this PR is merged maybe we can start a new PR to clean old apis.

@bungle
Copy link
Member

bungle commented Jun 2, 2022

The problem is that there are many places to use events library with old apis, we will cost a lot of time to find/change/test them . I think it can not be done in this PR.

Yes, we can make separate PR to remove compatibility mode, if decided so. But perhaps we need to first decide. I try to write something.

}))

local res
helpers.wait_until(function()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this pattern added in so many integration tests is concerning to me, because:

  • helpers.wait_until is generic and doesn't really provide meaningful output when it fails, so it's harder to debug failing tests
  • it makes me worry that there are many more test cases which may be flaky, and we could be chasing after them for a long time

Is there no alternative to this? Is there any function within the new events library to block until all workers have handled/seen events up until some known event? If so, we could expose it via the admin API and incorporate it into spec.helpers. Alternatively, we could add an optional, (maybe private) query arg to the CRUD APIs for traditional mode, so that we can do things like POST /services?sync=true, and it would block until all workers have caught up with events.

Additionally, something that might help would be to set nginx_main_worker_processes to 1 in most cases, since this will cut down on how much time is spent ensuring all workers are in sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new events library affects too many test cases, It is a hard work to make all of them green.
I think if we could improve these test cases after this PR is merged.

@github-actions github-actions bot removed the chore Not part of the core functionality of kong, but still needed label Jun 6, 2022
.requirements Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants