-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor return and goto statements #945
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #945 +/- ##
============================================
- Coverage 70.69% 70.64% -0.05%
============================================
Files 114 114
Lines 61693 61735 +42
============================================
+ Hits 43611 43613 +2
- Misses 18082 18122 +40
|
src/replication.c
Outdated
@@ -3541,7 +3531,7 @@ void syncWithPrimary(connection *conn) { | |||
clearFailoverState(); | |||
} else { | |||
abortFailover("Failover target rejected psync request"); | |||
return; | |||
goto cleanup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no cleanup needed in line 3524?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider refactoring this code further by breaking out the different if
paths so each function, including fullSyncWithPrimary
, is dealing with a single concern. What do you think about a skeleton function like this?
static void fullSyncWithPrimary(connection *conn) {
char *err = NULL;
serverAssert(conn == server.repl_rdb_transfer_s);
if (server.repl_state == REPL_STATE_NONE) goto error;
if (connGetState(conn) != CONN_STATE_CONNECTED) {
serverLog(LL_WARNING, "Error condition on socket for dual channel replication: %s", connGetLastError(conn));
goto error;
}
switch (server.repl_rdb_channel_state) {
case REPL_DUAL_CHANNEL_SEND_HANDSHAKE:
handleHandshake(conn, &err); /* TO BE FILLED OUT */
break;
case REPL_DUAL_CHANNEL_RECEIVE_AUTH_REPLY:
if (server.primary_auth) {
handleAuthReply(conn, &err); /* TO BE FILLED OUT */
} else {
server.repl_rdb_channel_state = REPL_DUAL_CHANNEL_RECEIVE_REPLCONF_REPLY;
}
break;
case REPL_DUAL_CHANNEL_RECEIVE_REPLCONF_REPLY:
handleReplConfReply(conn, &err); /* TO BE FILLED OUT */
break;
case REPL_DUAL_CHANNEL_RECEIVE_ENDOFF:
handleEndOffsetResponse(conn, &err); /* TO BE FILLED OUT */
break;
default:
break;
}
if (err) goto error;
return;
error:
if (err) sdsfree(err);
connClose(conn);
server.repl_transfer_s = NULL;
if (server.repl_rdb_transfer_s) {
connClose(server.repl_rdb_transfer_s);
server.repl_rdb_transfer_s = NULL;
}
if (server.repl_transfer_fd != -1) close(server.repl_transfer_fd);
server.repl_transfer_fd = -1;
server.repl_state = REPL_STATE_CONNECT;
replicationAbortDualChannelSyncTransfer();
}
Started with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. My feedback is all on fit-n-finish.
src/replication.c
Outdated
int ret; | ||
|
||
switch (server.repl_state) { | ||
case REPL_STATE_SEND_HANDSHAKE: ret = dualChannelReplicationMainConnSendHandshake(conn, &err); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this naming scheme but can we shorten it a bit? It is a bit mouthful :)
case REPL_STATE_SEND_HANDSHAKE: ret = dualChannelReplicationMainConnSendHandshake(conn, &err); break; | |
case REPL_STATE_SEND_HANDSHAKE: ret = dualChanReplMainConnSendHandshake(conn, &err); break; |
src/replication.c
Outdated
/* Replication: Replica side. | ||
* This connection handler fires after rdb-connection was initialized. We use it | ||
* to adjust the replica main for loading incremental changes into the local buffer. */ | ||
void setupMainConnForPsync(connection *conn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about prefixing this function with dualChan
as well?
void setupMainConnForPsync(connection *conn) { | |
void dualChanSetupMainConnForPsync(connection *conn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I just don't like the abbreviation Channel => Chan, I don't think it is very clean.
src/replication.c
Outdated
switch (server.repl_state) { | ||
case REPL_STATE_SEND_HANDSHAKE: ret = dualChannelReplicationMainConnSendHandshake(conn, &err); break; | ||
case REPL_STATE_RECEIVE_CAPA_REPLY: | ||
ret = dualChannelReplicationMainConnReceiveCapaReply(conn, &err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ret = dualChannelReplicationMainConnReceiveCapaReply(conn, &err); | |
ret = dualChanReplMainConnRecvCapaReply(conn, &err); |
src/replication.c
Outdated
sdsfree(err); | ||
err = NULL; | ||
/* fall through */ | ||
case REPL_STATE_SEND_PSYNC: ret = dualChannelReplicationMainConnSendPsync(conn, &err); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case REPL_STATE_SEND_PSYNC: ret = dualChannelReplicationMainConnSendPsync(conn, &err); break; | |
case REPL_STATE_SEND_PSYNC: ret = dualChanReplMainConnSendPsync(conn, &err); break; |
src/replication.c
Outdated
err = NULL; | ||
/* fall through */ | ||
case REPL_STATE_SEND_PSYNC: ret = dualChannelReplicationMainConnSendPsync(conn, &err); break; | ||
case REPL_STATE_RECEIVE_PSYNC_REPLY: ret = dualChannelReplicationMainConnReceivePsyncReply(conn, &err); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case REPL_STATE_RECEIVE_PSYNC_REPLY: ret = dualChannelReplicationMainConnReceivePsyncReply(conn, &err); break; | |
case REPL_STATE_RECEIVE_PSYNC_REPLY: ret = dualChanReplMainConnRecvPsyncReply(conn, &err); break; |
src/replication.c
Outdated
@@ -55,6 +55,7 @@ int cancelReplicationHandshake(int reconnect); | |||
void replicationSteadyStateInit(void); | |||
void setupMainConnForPsync(connection *conn); | |||
void dualChannelSyncHandleRdbLoadCompletion(void); | |||
static void fullSyncWithPrimary(connection *conn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used in the dual-chan replication path only so I would think it makes more sense to prefix it too
static void fullSyncWithPrimary(connection *conn); | |
static void dualChanFullSyncWithPrimary(connection *conn); |
UNUSED(err); | ||
if (server.debug_pause_after_fork) debugPauseProcess(); | ||
if (replicaTryPartialResynchronization(conn, 0) == PSYNC_WRITE_ERROR) { | ||
serverLog(LL_WARNING, "Aborting dual channel sync. Write error."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connection
has last_errno
. can we return an error string here to caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of this specific function you are right, but generally while error strings can be useful for logging purposes, relying solely on them may not be the best approach. Based on my observations, there's no guarantee that the error string will always be non-NULL when error accures. Moreover last_errno
could potentially be overridden by unrelated code changes, making it less reliable over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of this specific function you are right, but generally while error strings can be useful for logging purposes, relying solely on them may not be the best approach.
Right. Logging is my intent here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no guarantee that the error string will always be non-NULL when error accures
I think dualChannelSetupMainConnForPsync
requires a non-NULL err
when C_ERR
is returned - https://github.com/valkey-io/valkey/pull/945/files#diff-955c0fee32b574a1d9adb4540b8ff1564b15db9695d88a8dabc66b3c19c825daR3298
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixing now
src/replication.c
Outdated
} | ||
|
||
int dualChannelReplicationMainConnReceivePsyncReply(connection *conn, char **err) { | ||
UNUSED(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this function can't fail, I would suggest we null out err
so we stick to a more intuitive contract, which is null err
means success. Let's not leave the err
state undefined (or decided by the caller).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of relying on the error string. From my observation, it is not maintained for every error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is another inconsistency in the existing code. Functions like sendCommand
return a non-NULL error string to indicate failures already. I am fine with treating an error string returned via an outgoing parameter as "informational" and for "logging purpose only" hence optional. This should help ensure consistency.
+1 on refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @naglera! I think we are very close.
src/replication.c
Outdated
serverLog(LL_WARNING, "Sending command to primary in dual channel replication handshake: %s", *err); | ||
return C_ERR; | ||
} | ||
server.repl_rdb_channel_state = REPL_DUAL_CHANNEL_RECEIVE_AUTH_REPLY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see us pulling all state changes into the driver function, i.e., dualChannelFullSyncWithPrimary
in this case. This way it is very easy to see all the state transitions together. I will suggest some changes in dualChannelFullSyncWithPrimary
next
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok nice
src/replication.c
Outdated
int ret; | ||
|
||
switch (server.repl_state) { | ||
case REPL_STATE_SEND_HANDSHAKE: ret = dualChannelReplMainConnSendHandshake(conn, &err); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I meant by consolidating state manipulations in the "driver" function and then all the "worker" functions (such as dualChannelReplMainConnRecvCapaReply
) become "stateless".
case REPL_STATE_SEND_HANDSHAKE: ret = dualChannelReplMainConnSendHandshake(conn, &err); break; | |
case REPL_STATE_SEND_HANDSHAKE: | |
ret = dualChannelReplMainConnSendHandshake(conn, &err); | |
if (ret != C_ERR) server.repl_state = REPL_STATE_RECEIVE_CAPA_REPLY; | |
/* else server.repl_state = XXX; // if it makes sense */ | |
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
src/replication.c
Outdated
@@ -3415,7 +3447,7 @@ void syncWithPrimary(connection *conn) { | |||
if (err) goto write_error; | |||
|
|||
server.repl_state = REPL_STATE_RECEIVE_AUTH_REPLY; | |||
return; | |||
goto cleanup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to move these changes to the PR that refactors syncWithPrimary
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I will revert it
UNUSED(err); | ||
if (server.debug_pause_after_fork) debugPauseProcess(); | ||
if (replicaTryPartialResynchronization(conn, 0) == PSYNC_WRITE_ERROR) { | ||
serverLog(LL_WARNING, "Aborting dual channel sync. Write error."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of this specific function you are right, but generally while error strings can be useful for logging purposes, relying solely on them may not be the best approach.
Right. Logging is my intent here too.
src/replication.c
Outdated
} | ||
|
||
int dualChannelReplicationMainConnReceivePsyncReply(connection *conn, char **err) { | ||
UNUSED(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is another inconsistency in the existing code. Functions like sendCommand
return a non-NULL error string to indicate failures already. I am fine with treating an error string returned via an outgoing parameter as "informational" and for "logging purpose only" hence optional. This should help ensure consistency.
src/replication.c
Outdated
return C_OK; | ||
} | ||
|
||
#define C_RETRY -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this to where C_OK/C_ERR is defined?
src/replication.c
Outdated
|
||
error: | ||
sdsfree(err); | ||
connClose(conn); | ||
server.repl_transfer_s = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bug to me. It seems the two connection objects are getting mixed up. We should zero out the RDB channel connection here, i.e., server.repl_rdb_transfer_s = NULL
. Then on L2726/L2747, we should be testing the normal channel connection instead, i.e., if (server.repl_transfer_s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we are trying to close repl_transfer_s
twice 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fix #1088 in a separate PR first? I think we will need to backport it to Valkey 8.0
src/replication.c
Outdated
} | ||
switch (server.repl_rdb_channel_state) { | ||
case REPL_DUAL_CHANNEL_SEND_HANDSHAKE: | ||
ret = dualChannelReplHandleHandshake(conn, &err); | ||
server.repl_rdb_channel_state = REPL_DUAL_CHANNEL_RECEIVE_AUTH_REPLY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only update the state upon successful returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't matter because in case of C_ERR, we will reset the state, but I agree that it makes more sense for the reader. I'll fix it.
003291f
to
5dfb259
Compare
src/replication.c
Outdated
|
||
if (connSetReadHandler(conn, dualChannelFullSyncWithPrimary) == C_ERR) { | ||
char conninfo[CONN_INFO_LEN]; | ||
serverLog(LL_WARNING, "Can't create readable event for SYNC: %s (%s)", *err, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems incorrect. *err
is not assigned at this point. we should revert back to strerror(errno)
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right err was not assigned in this case. will fix
sdsfree(err); | ||
connClose(conn); | ||
server.repl_transfer_s = NULL; | ||
if (server.repl_transfer_s) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding sdsfree(err)
after the error:
label as well so that if we ever goto error
in the future with a non-NULL err
, we won't leak memory accidentally.
/* AUTH with the primary if required. */ | ||
switch (server.repl_rdb_channel_state) { | ||
case REPL_DUAL_CHANNEL_SEND_HANDSHAKE: | ||
ret = dualChannelReplHandleHandshake(conn, &err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just realize that we never consume err
in dualChannelFullSyncWithPrimary
, whose only job when it comes to err
is to free the memory. Was a logging statement missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, we used to log it in case of an error, but after the refactor we're already logging it inside the helper functions, so there's no need to log it inside dualChannelFullSyncWithPrimary
anymore. That being said, I believe we should retain the error deallocation logic within dualChannelFullSyncWithPrimary
for the sake of maintaining simplicity in our code structure.
UNUSED(err); | ||
if (server.debug_pause_after_fork) debugPauseProcess(); | ||
if (replicaTryPartialResynchronization(conn, 0) == PSYNC_WRITE_ERROR) { | ||
serverLog(LL_WARNING, "Aborting dual channel sync. Write error."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no guarantee that the error string will always be non-NULL when error accures
I think dualChannelSetupMainConnForPsync
requires a non-NULL err
when C_ERR
is returned - https://github.com/valkey-io/valkey/pull/945/files#diff-955c0fee32b574a1d9adb4540b8ff1564b15db9695d88a8dabc66b3c19c825daR3298
src/replication.c
Outdated
break; | ||
default: | ||
serverLog(LL_WARNING, "Unexpected replication state: %d", server.repl_state); | ||
ret = C_ERR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing an assignment to err
? This branch if hit will break the assumption on L3298.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it will break the log at L3298, we should be able to handle an empty err
.
Consolidate the cleanup of local variables to a single point within the method, ensuring proper resource management and preventing memory leaks or double-free issues. Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
renaming and refactor Signed-off-by: naglera <[email protected]>
Co-authored-by: Ping Xie <[email protected]> Signed-off-by: Amit Nagler <[email protected]> Signed-off-by: naglera <[email protected]>
all worker functions should be stateless err should be type sds endoffset response can be empty Signed-off-by: naglera <[email protected]>
This reverts commit 3c0933497fea5d84860d24f2864a9c14ffa991bf. Signed-off-by: naglera <[email protected]>
Co-authored-by: Ping Xie <[email protected]> Signed-off-by: Amit Nagler <[email protected]> Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am running out of comments ... if you could resolve the last few comments, let's merge this PR. It is great refactoring.
src/replication.c
Outdated
ret = dualChannelReplHandleEndOffsetResponse(conn, &err); | ||
if (ret == C_OK) server.repl_rdb_channel_state = REPL_DUAL_CHANNEL_RDB_LOAD; | ||
break; | ||
default: break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is unreachable, we should serverPanic
server.repl_state = REPL_STATE_TRANSFER; | ||
/* In case the RDB is already loaded, the repl_state will be set during establishPrimaryConnection. */ | ||
break; | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a logic bug if we ever get here. should we serverPanic
instead?
src/replication.c
Outdated
sdsfree(err); | ||
if (ret == C_ERR) goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should give the error handling logic a chance to examine the err string. whether or not the error handling logic needs to consume the string is a separate concern.
sdsfree(err); | |
if (ret == C_ERR) goto error; | |
if (ret == C_ERR) goto error; | |
sdsfree(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right so we should change the order to
if (ret == C_ERR) goto error;
sdsfree(err);
Also maybe we better log the error string here as well just in case we didn't log it earlier (better log twice then miss it).
Signed-off-by: Amit Nagler <[email protected]>
Co-authored-by: Ping Xie <[email protected]> Signed-off-by: Amit Nagler <[email protected]> Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @naglera!
Consolidate the cleanup of local variables to a single point within the method, ensuring proper resource management and p reventing memory leaks or double-free issues. Previoslly descused here: - valkey-io#60 (comment) - valkey-io#60 (comment) --------- Signed-off-by: naglera <[email protected]> Signed-off-by: Amit Nagler <[email protected]> Co-authored-by: Ping Xie <[email protected]>
Consolidate the cleanup of local variables to a single point within the method, ensuring proper resource management and p
reventing memory leaks or double-free issues.
Previoslly descused here: