-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid expiration and eviction during data syncing #1185
base: unstable
Are you sure you want to change the base?
Conversation
52be593
to
b23b09b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1185 +/- ##
============================================
+ Coverage 70.65% 70.69% +0.03%
============================================
Files 114 114
Lines 61799 63119 +1320
============================================
+ Hits 43664 44621 +957
- Misses 18135 18498 +363
|
good work! @valkey-io/core-team please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. I think it's a little bit strange, but with some better documentation it's not too strange.
We need to have good support for migration tools, especially for users who want to migrate from proprietary software to open source software. 😄
Redis already have a solution.
I remember this discussion from the Redis times. Do you know what Redis solution is (the same REPLCONF PSEUDO-MASTER
?) or is it secret?
Any better idea?
Have you considered the idea to let RediShake act as a primary and let the target database replicate from RediShake? It can act as a replication proxy?
+-----------+ PSYNC +-----------+ PSYNC +--------+
| Source DB |<---------| RediShake |<---------| Valkey |
+-----------+ +-----------+ +--------+
src/replication.c
Outdated
* - pseudo-master <0|1> | ||
* Set this connection behaving like a master if server.pseudo_replica is true. | ||
* Sync tools can set their connections into 'pseudo-master' state to visit expired keys. | ||
* */ | ||
void replconfCommand(client *c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sync tool sends REPLCONF PSEUDO-MASTER
to say "I'm a pseudo-master, so ignore expire for this connection"?
Almost all other REPLCONF commands are sent by the replica to the primary before doing the PSYNC. This one is different. Can we add a more explicit comment about this difference, similar to REPLCONF GETACK
which is also different:
* - getack <dummy>
* Unlike other subcommands, this is used by primary to get the replication
* offset from a replica.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sync tool sends
REPLCONF PSEUDO-MASTER
to say "I'm a pseudo-master, so ignore expire for this connection"?
Yes. It does look a bit strange. I add a new command CLIENT IMPORT-SOURCE <on|off>
like what @soloestoy say. Maybe it'll look better this way?
valkey.conf
Outdated
# Make the master behave like a replica, which forbids expiration and evcition. | ||
# This is useful for sync tools, because expiration and evcition may cause the data corruption. | ||
# Sync tools can set their connections into 'pseudo-master' state by REPLCONF PSEUDO-MASTER to | ||
# behave like a master(i.e. visit expired keys). | ||
# | ||
# pseudo-replica no | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "pseudo-master" is a little bit confusing. :) I understand how it works now, but only after I read the test cases. Let's try to improve this documentation later.
In Valkey we are no longer not using "master", so new commands and configs should use "primary".
Maybe we can use master if Redis has exactly the same REPLCONF or config, but otherwise let's use primary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, the new mode server is master node, but it is named as pseudo-replica, it is confused by most people. Let us first give it a better name first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import-mode
?
Should this node forbid writes from all other clients? It makes it behave even more like a replica.
Btw, I'm thinking now that if we want a better implementation of slot migration, maybe it can use the same or a similar feature. The slot replication is also similar to replication but initiated from the source node. @enjoy-binbin how is the implementation you want to upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because clients may sync data to a working server, I think this is not very friendly to the 24/7 businesses if we simply forbid writes from all other clients. But if we allow writes from other clients, we may face the same dual-write problem as writable replica, I'll document it in the valkey.conf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the slot replication is also similar the replication in the implementation, it is something like slot RDB + slot replication propagate + slot failover, something like that
interesting, and we have also considered such a method, but there are several issues. First, as a cloud provider, we do not allow an instance to become a replica for external instances. This is a very risky operation, especially since the primary connection uses a super user, which has excessive permissions. Additionally, the replica needs to establish an outbound connection, this is also not allowed. These restrictions, I believe, are not unique to cloud providers, many users' security control policies also prohibit such actions. Another point is that, in a cluster mode, the source and target instances for migration typically have different slot distributions, and redisShake can help with correctly routing the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, I think we should only allow write commands for pseudo-master
client when server is in pseudo-replica
mode.
src/evict.c
Outdated
@@ -546,8 +546,8 @@ int performEvictions(void) { | |||
goto update_metrics; | |||
} | |||
|
|||
if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION) { | |||
result = EVICT_FAIL; /* We need to free memory, but policy forbids. */ | |||
if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION || server.pseudo_replica) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to place it in isSafeToPerformEvictions
together with the server.primary_host
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If place it in isSafeToPerformEvictions
, the server will ignore maxmemory. I think it's better to return an OOM error to stop the syncing process.
It's secret. I used to push this PR to Redis Community(#13077) and they say they will PR their implementation. But no news after that.
What @soloestoy said is a major limitation for us. BTW, sometimes the destination already has some data, |
i did not read it carefully, internally we simply pause the expiration on both side i guess. |
Interesting. I think this PR can solve that problem if the server has no other read. If the server has other read, one possible solution is to disable all expiration in |
@lyq2333 Can you commit the changes to When you run make locally and you have python3 installed, make updates commands.def if there are any new or changed commands. |
Signed-off-by: lvyanqi.lyq <[email protected]>
…mand Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
0655bbc
to
7883e9f
Compare
@zuiderkwast Thanks. Sorry forgot it before. I also forgot to sign off and have to force push. No code is modified. |
Weekly core meeting. No specific comments other than we should review this offline and make progress. Directionally seems like a good idea. |
The scenario we encountered is that some users want to migrate data from Server A to Server B. Both A and B work as primary and B may have some data before the migration. We find expiration and eviction may lead to data inconsistency. So we came up with a simple method, but it still has a few places to discuss. We plan to introduce a new config(import-mode in this PR) to mark that this server is importing data. As for expiration, in import mode, active expiration in cron is prohibited and passive expiration in commands is limited depend on the client state. Clients marked as import source work like
As for eviction, should server disable eviction automatically to ensure data consistency in import mode?Or let users choose the maxmeory-policy, I guess no one will choose an option other than @valkey-io/core-team WDYT? Any suggestions would be greatly appreciated. |
The core issue is that for the target node of data migration that is active, we need to provide normal read and write services. However, we don't want to affect the data being migrated and can't identify which data is being migrated. Method 3 seems to be a relatively suitable option, but of course, it requires users to ensure they do not perform write operations on the data being migrated. |
I agree method 3 seems to be the most suitable. The node behaves like when reading from a replica, a writable replica if write commands are used. |
When we sync data from the source Valkey to the destination Valkey using some sync tools like redis-shake, the destination Valkey think it's a primary and can perform expiration and eviction, which may cause data corruption. This problem has been discussed in redis/redis#9760 (reply in thread) and Redis already have a solution. But in Valkey we haven't fixed it by now.
i.e. we call
set key 1 ex 1
on the source server and transfer this command to the destination server. Then we callincr key
on the source server before the key expired, we will have a key on the source server with a value of 2. But when the command arrived at the destination server, the key may be expired and has deleted. So we will have a key on the destination server with a value of 1, which is inconsistent with the source server.In standalone mode, we can use writable replica to simplify the sync process. However, in cluster mode, we still need a sync tool to help us transfer the source data to the destination. The sync tool usually work as a normal client and the destination works as a primary which keep expiration and eviction.
In this PR, we add a new mode named 'import-mode'. In this mode, server stop expiration and eviction just like a replica. Notice that this mode exists only in sync state to avoid data inconsistency caused by expiration and eviction. The server in import mode can't turn to a real replica by
replicaof
orcluster replicate
and vice versa. Sync tools can mark their clients as an import source byCLIENT IMPORT-SOURCE
, which work like a client from primary and can visit expired keys inlookupkey
.Any better idea?