Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid expiration and eviction during data syncing #1185

Open
wants to merge 8 commits into
base: unstable
Choose a base branch
from

Conversation

lyq2333
Copy link

@lyq2333 lyq2333 commented Oct 17, 2024

When we sync data from the source Valkey to the destination Valkey using some sync tools like redis-shake, the destination Valkey think it's a primary and can perform expiration and eviction, which may cause data corruption. This problem has been discussed in redis/redis#9760 (reply in thread) and Redis already have a solution. But in Valkey we haven't fixed it by now.

i.e. we call set key 1 ex 1 on the source server and transfer this command to the destination server. Then we call incr key on the source server before the key expired, we will have a key on the source server with a value of 2. But when the command arrived at the destination server, the key may be expired and has deleted. So we will have a key on the destination server with a value of 1, which is inconsistent with the source server.

In standalone mode, we can use writable replica to simplify the sync process. However, in cluster mode, we still need a sync tool to help us transfer the source data to the destination. The sync tool usually work as a normal client and the destination works as a primary which keep expiration and eviction.

In this PR, we add a new mode named 'import-mode'. In this mode, server stop expiration and eviction just like a replica. Notice that this mode exists only in sync state to avoid data inconsistency caused by expiration and eviction. The server in import mode can't turn to a real replica by replicaof or cluster replicate and vice versa. Sync tools can mark their clients as an import source by CLIENT IMPORT-SOURCE, which work like a client from primary and can visit expired keys in lookupkey.

Any better idea?

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 65.71429% with 12 lines in your changes missing coverage. Please review.

Project coverage is 70.69%. Comparing base (a62d1f1) to head (7883e9f).
Report is 4 commits behind head on unstable.

Files with missing lines Patch % Lines
src/config.c 42.85% 4 Missing ⚠️
src/networking.c 66.66% 4 Missing ⚠️
src/cluster_legacy.c 33.33% 2 Missing ⚠️
src/replication.c 33.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1185      +/-   ##
============================================
+ Coverage     70.65%   70.69%   +0.03%     
============================================
  Files           114      114              
  Lines         61799    63119    +1320     
============================================
+ Hits          43664    44621     +957     
- Misses        18135    18498     +363     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/db.c 88.79% <100.00%> (+0.28%) ⬆️
src/evict.c 97.75% <100.00%> (+0.02%) ⬆️
src/expire.c 96.57% <100.00%> (+0.14%) ⬆️
src/server.c 88.75% <100.00%> (+0.10%) ⬆️
src/server.h 100.00% <ø> (ø)
src/cluster_legacy.c 86.19% <33.33%> (-0.13%) ⬇️
src/replication.c 87.24% <33.33%> (-0.17%) ⬇️
src/config.c 78.68% <42.85%> (-0.02%) ⬇️
src/networking.c 88.49% <66.66%> (-0.01%) ⬇️

... and 84 files with indirect coverage changes

@soloestoy
Copy link
Member

good work! @valkey-io/core-team please take a look

@lyq2333 lyq2333 requested a review from a team October 17, 2024 11:34
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I think it's a little bit strange, but with some better documentation it's not too strange.

We need to have good support for migration tools, especially for users who want to migrate from proprietary software to open source software. 😄

Redis already have a solution.

I remember this discussion from the Redis times. Do you know what Redis solution is (the same REPLCONF PSEUDO-MASTER?) or is it secret?

Any better idea?

Have you considered the idea to let RediShake act as a primary and let the target database replicate from RediShake? It can act as a replication proxy?

+-----------+   PSYNC  +-----------+   PSYNC  +--------+
| Source DB |<---------| RediShake |<---------| Valkey |
+-----------+          +-----------+          +--------+

* - pseudo-master <0|1>
* Set this connection behaving like a master if server.pseudo_replica is true.
* Sync tools can set their connections into 'pseudo-master' state to visit expired keys.
* */
void replconfCommand(client *c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sync tool sends REPLCONF PSEUDO-MASTER to say "I'm a pseudo-master, so ignore expire for this connection"?

Almost all other REPLCONF commands are sent by the replica to the primary before doing the PSYNC. This one is different. Can we add a more explicit comment about this difference, similar to REPLCONF GETACK which is also different:

 * - getack <dummy>
 * Unlike other subcommands, this is used by primary to get the replication
 * offset from a replica.

Copy link
Author

@lyq2333 lyq2333 Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sync tool sends REPLCONF PSEUDO-MASTER to say "I'm a pseudo-master, so ignore expire for this connection"?

Yes. It does look a bit strange. I add a new command CLIENT IMPORT-SOURCE <on|off> like what @soloestoy say. Maybe it'll look better this way?

valkey.conf Outdated
Comment on lines 804 to 810
# Make the master behave like a replica, which forbids expiration and evcition.
# This is useful for sync tools, because expiration and evcition may cause the data corruption.
# Sync tools can set their connections into 'pseudo-master' state by REPLCONF PSEUDO-MASTER to
# behave like a master(i.e. visit expired keys).
#
# pseudo-replica no

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "pseudo-master" is a little bit confusing. :) I understand how it works now, but only after I read the test cases. Let's try to improve this documentation later.

In Valkey we are no longer not using "master", so new commands and configs should use "primary".

Maybe we can use master if Redis has exactly the same REPLCONF or config, but otherwise let's use primary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, the new mode server is master node, but it is named as pseudo-replica, it is confused by most people. Let us first give it a better name first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import-mode?

Should this node forbid writes from all other clients? It makes it behave even more like a replica.

Btw, I'm thinking now that if we want a better implementation of slot migration, maybe it can use the same or a similar feature. The slot replication is also similar to replication but initiated from the source node. @enjoy-binbin how is the implementation you want to upstream?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because clients may sync data to a working server, I think this is not very friendly to the 24/7 businesses if we simply forbid writes from all other clients. But if we allow writes from other clients, we may face the same dual-write problem as writable replica, I'll document it in the valkey.conf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the slot replication is also similar the replication in the implementation, it is something like slot RDB + slot replication propagate + slot failover, something like that

@hwware hwware added the major-decision-pending Major decision pending by TSC team label Oct 17, 2024
@soloestoy
Copy link
Member

+-----------+   PSYNC  +-----------+   PSYNC  +--------+
| Source DB |<---------| RediShake |<---------| Valkey |
+-----------+          +-----------+          +--------+

interesting, and we have also considered such a method, but there are several issues.

First, as a cloud provider, we do not allow an instance to become a replica for external instances. This is a very risky operation, especially since the primary connection uses a super user, which has excessive permissions. Additionally, the replica needs to establish an outbound connection, this is also not allowed. These restrictions, I believe, are not unique to cloud providers, many users' security control policies also prohibit such actions.

Another point is that, in a cluster mode, the source and target instances for migration typically have different slot distributions, and redisShake can help with correctly routing the data.

Copy link
Member

@soloestoy soloestoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, I think we should only allow write commands for pseudo-master client when server is in pseudo-replica mode.

src/evict.c Outdated
@@ -546,8 +546,8 @@ int performEvictions(void) {
goto update_metrics;
}

if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION) {
result = EVICT_FAIL; /* We need to free memory, but policy forbids. */
if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION || server.pseudo_replica) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to place it in isSafeToPerformEvictions together with the server.primary_host check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If place it in isSafeToPerformEvictions , the server will ignore maxmemory. I think it's better to return an OOM error to stop the syncing process.

src/replication.c Outdated Show resolved Hide resolved
@lyq2333
Copy link
Author

lyq2333 commented Oct 18, 2024

Redis already have a solution.

I remember this discussion from the Redis times. Do you know what Redis solution is (the same REPLCONF PSEUDO-MASTER?) or is it secret?

It's secret. I used to push this PR to Redis Community(#13077) and they say they will PR their implementation. But no news after that.

Any better idea?

Have you considered the idea to let RediShake act as a primary and let the target database replicate from RediShake? It can act as a replication proxy?

+-----------+   PSYNC  +-----------+   PSYNC  +--------+
| Source DB |<---------| RediShake |<---------| Valkey |
+-----------+          +-----------+          +--------+

What @soloestoy said is a major limitation for us. BTW, sometimes the destination already has some data, PSYNC will delete them.

@enjoy-binbin
Copy link
Member

i did not read it carefully, internally we simply pause the expiration on both side i guess.
Redis has an issue that seems to be discussing this: redis/redis#13478

@lyq2333
Copy link
Author

lyq2333 commented Oct 18, 2024

i did not read it carefully, internally we simply pause the expiration on both side i guess. Redis has an issue that seems to be discussing this: redis/redis#13478

Interesting. I think this PR can solve that problem if the server has no other read. If the server has other read, one possible solution is to disable all expiration in lookupKey for all clients. But I don't want this PR to affect read/write from normal client.

src/networking.c Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
valkey.conf Outdated Show resolved Hide resolved
valkey.conf Outdated Show resolved Hide resolved
@zuiderkwast
Copy link
Contributor

@lyq2333 Can you commit the changes to commands.def?

When you run make locally and you have python3 installed, make updates commands.def if there are any new or changed commands.

@lyq2333
Copy link
Author

lyq2333 commented Oct 21, 2024

@lyq2333 Can you commit the changes to commands.def?

When you run make locally and you have python3 installed, make updates commands.def if there are any new or changed commands.

@zuiderkwast Thanks. Sorry forgot it before. I also forgot to sign off and have to force push. No code is modified.

@madolson
Copy link
Member

Weekly core meeting. No specific comments other than we should review this offline and make progress. Directionally seems like a good idea.

@lyq2333
Copy link
Author

lyq2333 commented Oct 22, 2024

The scenario we encountered is that some users want to migrate data from Server A to Server B. Both A and B work as primary and B may have some data before the migration. We find expiration and eviction may lead to data inconsistency. So we came up with a simple method, but it still has a few places to discuss.

We plan to introduce a new config(import-mode in this PR) to mark that this server is importing data. As for expiration, in import mode, active expiration in cron is prohibited and passive expiration in commands is limited depend on the client state. Clients marked as import source work like server.primary, which will not trigger passive expiration for read and write. But there are many ways to handle other normal clients. We think of some ways as follow.

  1. Normal clients work as usual, which means they can perform passive expiration whether for read or write. The drawback is that normal clients accessing the migrating data will also trigger passive expiration and may affect the migration.
  2. Normal clients are prohibited from write commands, meanwhile don't trigger passive expiration when read and don't read expired keys. The server works like a read-only replica. The drawback is that clients need to stop writing during the data migration and may affect users business.
  3. Normal clients don't trigger passive expiration when read and don't read expired keys, but still trigger passive expiration when write. We can't prohibit passive expiration in write commands, because server will crash when we call incr/decr commands on expired keys. The reason is that these commands must delete expired keys first, otherwise it will hit the assert in dbAdd. The drawback is that normal clients calling write commands on the migrating data will trigger passive expiration. This is a trade-off between the above two ways and I think it's better because it won't affect the normal clients and have less impact on the migration process.

As for eviction, should server disable eviction automatically to ensure data consistency in import mode?Or let users choose the maxmeory-policy, I guess no one will choose an option other than noeviction.

@valkey-io/core-team WDYT? Any suggestions would be greatly appreciated.

@soloestoy
Copy link
Member

The scenario we encountered is that some users want to migrate data from Server A to Server B. Both A and B work as primary and B may have some data before the migration. We find expiration and eviction may lead to data inconsistency. So we came up with a simple method, but it still has a few places to discuss.

We plan to introduce a new config(import-mode in this PR) to mark that this server is importing data. As for expiration, in import mode, active expiration in cron is prohibited and passive expiration in commands is limited depend on the client state. Clients marked as import source work like server.primary, which will not trigger passive expiration for read and write. But there are many ways to handle other normal clients. We think of some ways as follow.

  1. Normal clients work as usual, which means they can perform passive expiration whether for read or write. The drawback is that normal clients accessing the migrating data will also trigger passive expiration and may affect the migration.
  2. Normal clients are prohibited from write commands, meanwhile don't trigger passive expiration when read and don't read expired keys. The server works like a read-only replica. The drawback is that clients need to stop writing during the data migration and may affect users business.
  3. Normal clients don't trigger passive expiration when read and don't read expired keys, but still trigger passive expiration when write. We can't prohibit passive expiration in write commands, because server will crash when we call incr/decr commands on expired keys. The reason is that these commands must delete expired keys first, otherwise it will hit the assert in dbAdd. The drawback is that normal clients calling write commands on the migrating data will trigger passive expiration. This is a trade-off between the above two ways and I think it's better because it won't affect the normal clients and have less impact on the migration process.

As for eviction, should server disable eviction automatically to ensure data consistency in import mode?Or let users choose the maxmeory-policy, I guess no one will choose an option other than noeviction.

@valkey-io/core-team WDYT? Any suggestions would be greatly appreciated.

The core issue is that for the target node of data migration that is active, we need to provide normal read and write services. However, we don't want to affect the data being migrated and can't identify which data is being migrated.

Method 3 seems to be a relatively suitable option, but of course, it requires users to ensure they do not perform write operations on the data being migrated.

@zuiderkwast
Copy link
Contributor

I agree method 3 seems to be the most suitable. The node behaves like when reading from a replica, a writable replica if write commands are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants