[BUG] Background saving not working with 6.2.1 #378

Talkabout · 2021-11-24T09:35:33Z

Describe the bug

Background saving seems to not work after update from 6.0.18 to 6.2.1

To reproduce

Not sure about that. I updated to new version and restarted 2 servers (active-active replica). After initial sync there has been no background saving at all (since 3 days) Before that this task was executed every 5 minutes...

Expected behavior

Background saving should work :)

Additional information

Using 2 raspberry pis with active-active replication.

Talkabout · 2021-11-25T08:28:20Z

As an additional information, these are the settings in conf file I am using for background saving:

VivekSainiEQ · 2021-12-07T17:31:38Z

Hi @Talkabout,

I unfortunately cannot seem to replicate this on my end, what do your config files look like?

Talkabout · 2021-12-08T08:27:17Z

Hi @VivekSainiEQ,

these are my settings:

bind 192.168.XX.XX protected-mode no port 6379 timeout 0 tcp-keepalive 0 daemonize no supervised systemd pidfile /var/run/redis/redis-server.pid loglevel notice syslog-enabled yes syslog-ident keydb databases 4 always-show-logo no save 900 1 save 300 100 save 60 10000 stop-writes-on-bgsave-error no rdbcompression yes rdbchecksum no dbfilename dump.rdb dir /var/lib/redis repl-diskless-sync no replica-priority 100 maxmemory 512M maxmemory-policy allkeys-lru lazyfree-lazy-eviction no lazyfree-lazy-expire no lazyfree-lazy-server-del no replica-lazy-flush no appendonly no appendfilename "appendonly.aof" appendfsync everysec no-appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes aof-use-rdb-preamble yes lua-time-limit 5000 slowlog-log-slower-than 10000 slowlog-max-len 16 latency-monitor-threshold 0 notify-keyspace-events "" hash-max-ziplist-entries 512 hash-max-ziplist-value 64 list-max-ziplist-size -2 list-compress-depth 0 set-max-intset-entries 512 zset-max-ziplist-entries 128 zset-max-ziplist-value 64 hll-sparse-max-bytes 3000 stream-node-max-bytes 4096 stream-node-max-entries 100 activerehashing yes client-output-buffer-limit normal 0 0 0 client-output-buffer-limit replica 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60 hz 10 dynamic-hz yes aof-rewrite-incremental-fsync yes rdb-save-incremental-fsync yes active-replica yes replicaof 192.168.XX.XX 6379 server-threads 2

As you can see I have migrated from redis and using the same config file with specific keydb options added.

thanks for your help!

Bye

sts · 2021-12-14T05:23:52Z

@VivekSainiEQ

We've got the very same issue with KeyDB. I reported it to the community forum two weeks ago: https://community.keydb.dev/t/keydb-rdb-bgsave-never-competes/189

Is there anything else we can provide to debug the issue?

Talkabout · 2021-12-14T08:44:30Z

Thanks @sts, I killed the running processes and immediately the background saving took place. Concurrent background savings are also working as of now, but I am assuming that at some point it will stop again (the background process will hang). Would be nice to have somebody looking into that.

Thanks!

Talkabout · 2021-12-14T08:54:05Z

And now one of the keydb servers has a hanging background saving process again:

the other one is still executing the background saving...

Talkabout · 2021-12-14T10:05:10Z

And here we have the second hanging process:

There is surely an issue somewhere with the new version of KeyDB. Any other information needed for you guys to analyze it?

VivekSainiEQ · 2021-12-17T00:57:20Z

Hi @Talkabout and @sts,

Do BGSAVEs always hang or can they run multiple times before hanging? And if so how many times does it take before hanging? I suspect there is a bug in how having systemd supervision enabled interacts with the BGSAVE mechanism.

Talkabout · 2021-12-17T08:31:02Z

Hi @VivekSainiEQ,

the first server executed 7 background savings before it got stuck, the second one executed 18 of them. I have now disabled systemd supervision in my config file and restarted both keydb instances. Let's see if it helps.
Thanks for taking a look at the issue!

Talkabout · 2021-12-17T09:12:33Z

At the moment background saving runs without problems:

Will keep you guys posted about status.

Talkabout · 2021-12-17T12:30:23Z

Unfortunately:

even after removing the supervisor systemd from the config. Any other idea?

Talkabout · 2021-12-17T14:03:20Z

Another update, both servers are stuck now:

esatterwhite · 2021-12-20T20:24:26Z

@Talkabout For sake of clarity and narrowing the problem, Does this happen on 6.2.0 ?

Talkabout · 2021-12-20T20:29:34Z

Hi @esatterwhite,

I have not used 6.2.0 because it was causing major memory issues on my system, I switched directly from 6.0.16 to 6.2.1...

MalavanEQAlpha · 2021-12-21T18:33:35Z

Hi @Talkabout @sts @esatterwhite it turns out this issue has always been present but just made incredibly more likely with 6.2.1(so 6.2.0 is as safe as 6.0.18).

The localtime_r function internally requires a lock, so any multithreaded program that forks when another thread is in the middle of a call to localtime_r will hang when the forked process calls localtime_r.

Prior to 6.2.1 that was extremely unlikely to happen, but in 6.2.1 we added a thread dedicated to checking the time. This makes repeated calls to localtime_r, massively increasing the chance that one is in flight when we fork for a background save.

The only call to localtime_r in the background save is within the syslog command, so disabling syslog(by setting the config: syslog-enabled no) should solve your issue for now while we work on a more complete fix.

Talkabout · 2021-12-21T19:53:19Z

Hi @MalavanEQAlpha,

I have disabled syslog on my 2 servers and restarted KeyDB. At the moment background saving works, will report tomorrow again if issue is fixed or not.
Thanks for looking into this!

Talkabout · 2021-12-22T10:42:33Z

Good news!

It seems that you brought it to the point @MalavanEQAlpha, thanks again, waiting for the fix!

benschermel · 2022-01-19T20:34:16Z

Thanks @MalavanEQAlpha, @Talkabout, & all on this thread. This issue should be resolved with the 6.2.2 release (PR#384). Closing this issue.

Talkabout · 2022-01-20T12:34:06Z

Hi,

thanks for the information and the fix!

Looks good so far. Will report back if anything changes.

Bye

Talkabout · 2022-01-21T08:14:59Z

Hi,

unfortunately the issue seems to be not fixed yet:

Anything I can provide for further analysis?

esatterwhite · 2022-01-21T08:25:38Z

Even with syslog disabled?

Talkabout · 2022-01-21T08:29:21Z

I had syslog enabled during the above test. Now I have disabled it again and will report tomorrow if there also is a problem.

esatterwhite · 2022-01-21T08:39:37Z

What is this ui you are using?

esatterwhite · 2022-01-21T08:41:41Z

Is there anything in the replication logic that needs a quorum? I wonder if its related to the fact that you have an even number of servers?. Total shot in the dark

Talkabout · 2022-01-21T08:46:48Z

What is this ui you are using?

This is phpredisadmin tool.

Is there anything in the replication logic that needs a quorum? I wonder if its related to the fact that you have an even number of servers?. Total shot in the dark

Not sure what to say here. I have 2 servers to assure a fallback if one of them crashes. As the issue was not there in 6.0.18 I don't think it has something to do with my setup, as it didn't change.

Talkabout · 2022-01-22T12:31:24Z

Without syslog enabled the saving seems to work again. So there is still a bug in the syslog implementation.

MalavanEQAlpha · 2022-01-25T18:28:23Z

Hi @Talkabout,
I am unable to reproduce the issue on version 6.2.2, can you please provide more details about how you are running KeyDB?
What operating system, how did you acquire KeyDB(binary/docker/git), details of the machine/VM/docker image running KeyDB, etc.
If possible a stack trace of the hanging process would be helpful as well, you can find instructions on how to do that with gdb here: https://sourceware.org/gdb/onlinedocs/gdb/Attach.html

Talkabout · 2022-01-26T09:57:28Z

Hi,

Operating System: Debian 10 (Buster) on a Raspberry Pi
KeyDB: Build from source acquired from Github
Build steps:

sudo apt-get install -y build-essential nasm autotools-dev autoconf libjemalloc-dev tcl tcl-dev uuid-dev libcurl4-openssl-dev pkg-config uuid-dev && \
make distclean && \
make clean && \
make -j4 && \
checkinstall --install=no

Does that help already? Never worked with stack tracing running processes, but can try if really required.
Bye

MalavanEQAlpha · 2022-01-28T18:17:30Z

Hi @Talkabout,
Thanks for the info, I was able to reproduce. I believe this should be fixed with PR #391 , can you try building from that branch for now(titled complete_fix_rdb_hang)?

Talkabout · 2022-01-28T21:43:38Z

Hi @MalavanEQAlpha,
great that you were able to reproduce it. I have build this branch and running that version now:

Will report tomorrow about results of that test.
Thanks!

Talkabout · 2022-01-30T00:19:46Z

Hi @MalavanEQAlpha,

Seems to be fixed :)

Will observe it the next days but I guess the problem is solved.

Thanks!

Talkabout · 2022-01-31T09:08:11Z

Hi @MalavanEQAlpha,

Still looks good:

Bye

marcocapetta · 2022-02-18T11:52:48Z

Hi @MalavanEQAlpha,

We are having the same issue with KeyDB version 6.2.2 in an active-active replica.
keydb-rdb-bgsave process is stuck and in our case this is also preventing aof resize process to run, resulting in huge aof files.
Disabling the syslog as you suggested looks solving the issue.

I see you created a pull request about one month ago (PR #391) but this is still not merged. Is there any update for an official fix of the issue?

Thanks

kitobelix · 2023-07-16T14:47:47Z

Hey everyone, I know this may be marked as fixed, but I stumbled with this bug when using the docker image. I'm using 6.3.3, no replication, official image. Background save works for a day or so, and after that I have to either kill the save process or reboot the container to avoid having a fat aof.

I realized today about this because i though this was happening because of a misconfigured installation. But after basting my mind against a wall, i came here to see this might still be an alive bug.

nicknezis mentioned this issue Jan 12, 2022

Active-active replication is broken in 6.0.18-6.2.1 #389

Closed

benschermel closed this as completed Jan 19, 2022

msotheeswaran-sc mentioned this issue Jan 10, 2023

[BUG] rdb_sgsave regularly gets stuck #435

Closed

keithchew mentioned this issue Nov 18, 2024

[BUG] Module Fork Dead Lock #766

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Background saving not working with 6.2.1 #378

[BUG] Background saving not working with 6.2.1 #378

Talkabout commented Nov 24, 2021

Talkabout commented Nov 25, 2021

VivekSainiEQ commented Dec 7, 2021

Talkabout commented Dec 8, 2021

sts commented Dec 14, 2021 •

edited

Loading

Talkabout commented Dec 14, 2021

Talkabout commented Dec 14, 2021

Talkabout commented Dec 14, 2021

VivekSainiEQ commented Dec 17, 2021

Talkabout commented Dec 17, 2021 •

edited

Loading

Talkabout commented Dec 17, 2021

Talkabout commented Dec 17, 2021

Talkabout commented Dec 17, 2021

esatterwhite commented Dec 20, 2021

Talkabout commented Dec 20, 2021

MalavanEQAlpha commented Dec 21, 2021

Talkabout commented Dec 21, 2021

Talkabout commented Dec 22, 2021

benschermel commented Jan 19, 2022

Talkabout commented Jan 20, 2022

Talkabout commented Jan 21, 2022

esatterwhite commented Jan 21, 2022

Talkabout commented Jan 21, 2022

esatterwhite commented Jan 21, 2022

esatterwhite commented Jan 21, 2022

Talkabout commented Jan 21, 2022

Talkabout commented Jan 22, 2022

MalavanEQAlpha commented Jan 25, 2022

Talkabout commented Jan 26, 2022

MalavanEQAlpha commented Jan 28, 2022

Talkabout commented Jan 28, 2022

Talkabout commented Jan 30, 2022

Talkabout commented Jan 31, 2022

marcocapetta commented Feb 18, 2022

kitobelix commented Jul 16, 2023

[BUG] Background saving not working with 6.2.1 #378

[BUG] Background saving not working with 6.2.1 #378

Comments

Talkabout commented Nov 24, 2021

Talkabout commented Nov 25, 2021

VivekSainiEQ commented Dec 7, 2021

Talkabout commented Dec 8, 2021

sts commented Dec 14, 2021 • edited Loading

Talkabout commented Dec 14, 2021

Talkabout commented Dec 14, 2021

Talkabout commented Dec 14, 2021

VivekSainiEQ commented Dec 17, 2021

Talkabout commented Dec 17, 2021 • edited Loading

Talkabout commented Dec 17, 2021

Talkabout commented Dec 17, 2021

Talkabout commented Dec 17, 2021

esatterwhite commented Dec 20, 2021

Talkabout commented Dec 20, 2021

MalavanEQAlpha commented Dec 21, 2021

Talkabout commented Dec 21, 2021

Talkabout commented Dec 22, 2021

benschermel commented Jan 19, 2022

Talkabout commented Jan 20, 2022

Talkabout commented Jan 21, 2022

esatterwhite commented Jan 21, 2022

Talkabout commented Jan 21, 2022

esatterwhite commented Jan 21, 2022

esatterwhite commented Jan 21, 2022

Talkabout commented Jan 21, 2022

Talkabout commented Jan 22, 2022

MalavanEQAlpha commented Jan 25, 2022

Talkabout commented Jan 26, 2022

MalavanEQAlpha commented Jan 28, 2022

Talkabout commented Jan 28, 2022

Talkabout commented Jan 30, 2022

Talkabout commented Jan 31, 2022

marcocapetta commented Feb 18, 2022

kitobelix commented Jul 16, 2023

sts commented Dec 14, 2021 •

edited

Loading

Talkabout commented Dec 17, 2021 •

edited

Loading