Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix script loading behavior on script flush with pipelines #1497

Merged

Conversation

marcbachmann
Copy link
Collaborator

@marcbachmann marcbachmann commented Jan 31, 2022

Lua scripts get loaded once and not retried unless a client side interval expires: https://github.com/luin/ioredis/blob/master/lib/pipeline.ts#L376

This causes issues with redis instances restarting or failing over.
There's no way of knowing whether a script is still present in the new instance.
We should reset the local state and just load them again when a reconnect happens and rely on the existing code that checks whether the scripts are present on the server.

I ran into that issue while testing the reliability of a service, but it might also fix #1405 See #1499 for the cluster fixes.

There's still a small time window where NOSCRIPT errors occur as messages are already queued for execution. Previously the script handler only executed EVAL in the non-pipeline mode. This now changed and always gets executed.

Changelog

  • 🐛 Reset loaded script hashes to force a reload of scripts after reconnect of redis

@marcbachmann marcbachmann force-pushed the fix-redis-reconnect-NOSCRIPT-errors branch from 020bac9 to b7b213c Compare January 31, 2022 00:41
@marcbachmann marcbachmann force-pushed the fix-redis-reconnect-NOSCRIPT-errors branch from b7b213c to bdf88bd Compare January 31, 2022 10:19
@marcbachmann
Copy link
Collaborator Author

Regarding the tests, maybe listing the redis version during the docker run could be useful.

Somehow the node 6 tests failed before some small fix.
Probably because it uses redis 3.2.6 https://packages.debian.org/stretch/redis-server

But that should have worked with the CLIENT KILL ID clientId and CLIENT KILL addr:port commands.

@artur-ma
Copy link

Isnt this case is handled by this code?
https://github.com/luin/ioredis/blob/b8177479c348aa4bbd467fa944d61fe9b35aec19/lib/script.ts#L42

If the script is missing then we will get the NOSCRIPT error, which will force ioredis to load it.

@marcbachmann
Copy link
Collaborator Author

Not in pipeline mode. That gets skipped as the result of sendCommand is no promise.
await redis.pipeline(['something']).exec()

    const result = container.sendCommand(evalsha);
    // This is false, result is an instance of Pipeline
    if (isPromise(result)) {

@marcbachmann
Copy link
Collaborator Author

marcbachmann commented Jan 31, 2022

I did another commit to handle the NOSCRIPT errors in 53472a7

Resetting the scripts is still useful with this as without it, it would execute EVAL all the time, which is quite expensive.

I'm totally not sure whether those changes are fine. It might have some side effects in the ordering (in the tests it was fine). Not sure about the cluster support.
I'd prefer to just have the first commits in.

@marcbachmann
Copy link
Collaborator Author

marcbachmann commented Jan 31, 2022

I'm still getting NOSCRIPT errors even with the changes above. 🤔
It's able to recover for new queries, but queued ones fail despite the promise wrapping for the script. I guess it's because pipeline.sendCommand hooks up the promise earlier:
Just didn't run npm run build

Now everything's working properly.

@marcbachmann marcbachmann force-pushed the fix-redis-reconnect-NOSCRIPT-errors branch 2 times, most recently from 731bbb4 to f36a582 Compare January 31, 2022 14:01
lib/script.ts Outdated
if (err.toString().indexOf("NOSCRIPT") === -1) {
throw err;
}
const command = new Command("eval", [this.lua].concat(args), options);
Copy link
Collaborator Author

@marcbachmann marcbachmann Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing that with a SCRIPT LOAD would be another task.

redis._addedScriptHashes[this.sha] = true
return redis.pipeline(['script', 'load', this.lua], ['evalsha', this.sha]])
  .exec()
  .then(([,res]) => res)

I might refactor the script load behavior once I have some time. Scripts themselves are loaded synchronously instead of using another pipeline or the same pipeline.

Copy link
Collaborator Author

@marcbachmann marcbachmann Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have other fixes that uses pipeline-based script loading: https://github.com/marcbachmann/ioredis/compare/fix-redis-reconnect-NOSCRIPT-errors...marcbachmann:pipelined-evalsha?expand=1
As it's a bit bigger, and I don't like to merge it into this pr.

The only disadvantage is that it doesn't use script exits checks anymore and just uses script load after the interval expired. I guess redis already hashes by sha internally and only loads the script if it's not present. So that should be fine.
But as there's no roundtrip anymore and script caching time can be set much higher with it. NOSCRIPT errors are also retried in there (should also work better for the cluster support).

edit: Ok. it looks like I can simplify the code there. An eval also automatically caches the script in redis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now ready in #1499
@artur-ma want to try? 😁

@marcbachmann marcbachmann changed the title Reset loaded script hashes to force a reload of scripts after reconnect of redis Fix script loading behavior on script flush with pipelines Jan 31, 2022
@artur-ma
Copy link

Resetting the scripts is still useful with this as without it, it would execute EVAL all the time, which is quite expensive.

IMO this is a serious statement, we have really high load on our services with a lot of lua scripts(we use redis with redisLabs as our main persistent DB), from time to time we loose connection so we have to reconnect. if it will cause sending the script string each time and run EVAL, it may be a disaster :)
also it will be hard to understand where this extra latency comes from after reconnect.

@marcbachmann
Copy link
Collaborator Author

Yes. This PR now fixes this behavior. New commands coming in will first load the script again. And at the moment it always does an exists check: https://github.com/luin/ioredis/blob/f275bc24de3825f80415a69ff227a45251dd1a3b/lib/pipeline.ts#L282

With the update, we only have the issue that already queued messages will do an eval. Somewhat like #1497 (comment) would fix this.

@luin
Copy link
Collaborator

luin commented Feb 5, 2022

Hey @marcbachmann 👋

Thanks for the contribution. I didn't realize this issue so it's a great fix!

The reason we don't retry on NOSCRIPT error in pipeline mode is we want to ensure the command order, so for example, in .pipeline().mycustom('foo').set('foo', 'bar').exec(), mycustom shouldn't be executed after the set command. Breaking this rule results in a breaking change, and I don't think we should include this change in a bug fix PR.

Resetting the scripts is still useful with this as without it, it would execute EVAL all the time, which is quite expensive.

Not sure if I understand this, why not resetting the scripts results in EVAL being executed all the time? Won't it be always EVALSHA first and then EVAL when failed? So only first several commands will execute EVAL in a small time window.

#1499 looks promising, as I still need some time reviewing it, for this PR, should we just delete the _addedScriptHashes flag for the sha in case of NOSCRIPT error so following custom commands in pipeline mode should work?

@marcbachmann
Copy link
Collaborator Author

You're welcome. Yes, then let's remove the fallback logic again and simplify that PR. #1499 might be a better approach to handle the script caches.

The reason we don't retry on NOSCRIPT error in pipeline mode is we want to ensure the command order, so for example, in .pipeline().mycustom('foo').set('foo', 'bar').exec(), mycustom shouldn't be executed after the set command. Breaking this rule results in a breaking change, and I don't think we should include this change in a bug fix PR.

Good to know, not sure this was mentioned anywhere.
That could make sense in some cases. But redis doesn't cancel commands followed after a NOSCRIPT of the evalsha command. Therefore multiple commands that belong together will always be an issue, even when they are in the same pipeline or within a MULTI transaction.
With that in mind, I'm not sure that is the better approach. I guess it completely depends on the use-case.
E.g. I'm doing an xack and some other script in the same pipeline. If the second script never executes, I'll have a worse state as the exactly-once delivery isn't guaranteed for those commands that failed.

Won't it be always EVALSHA first and then EVAL when failed? So only first several commands will execute EVAL in a small time window.

Yes, but it will execute the EVAL for all commands in the offline queue. Mainly because EVAL is queued after all the other commands already the queue. With #1499 we'll have some trade off of loading the script once on every reconnect, but will resolve those issues.

for this PR, should we just delete the _addedScriptHashes flag for the sha in case of NOSCRIPT error so following custom commands in pipeline mode should work?

👍

@marcbachmann marcbachmann force-pushed the fix-redis-reconnect-NOSCRIPT-errors branch from f36a582 to 6439c20 Compare February 5, 2022 17:46
@marcbachmann
Copy link
Collaborator Author

This is now ready to merge

@luin
Copy link
Collaborator

luin commented Feb 6, 2022

With that in mind, I'm not sure that is the better approach. I guess it completely depends on the use-case.

Fair point. I 💯 agree with that. I think we should then just priority code maintainability over the behavior choice, though introducing breaking changes means we have to wait for the schedule of the next major version.

@luin luin merged commit f357a31 into redis:master Feb 6, 2022
ioredis-robot pushed a commit that referenced this pull request Feb 6, 2022
## [4.28.5](v4.28.4...v4.28.5) (2022-02-06)

### Bug Fixes

* Reset loaded script hashes to force a reload of scripts after reconnect of redis ([#1497](#1497)) ([f357a31](f357a31))
@ioredis-robot
Copy link
Collaborator

🎉 This PR is included in version 4.28.5 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom Lua scripts are not loaded in cluster mode
4 participants