-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add crc32c-checksum verification on message header-payload #43
Conversation
*/ | ||
protected boolean updateChecksumIfRequire(OpSendMsg op) { | ||
|
||
if (op.cmd instanceof DoubleByteBuf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, to avoid creation of new ByteBuf we modify same DoubleByteBuf of the message with newly computed checksum.
However, while message creation if we see memory-leak then we create SimpleLeakAwareByteBuf
or AdvancedLeakAwareByteBuf
(based on ResourceLeak Level
) instead DoubleByteBuf
. So, should we keep this check.
CLA is valid! |
@@ -51,7 +51,7 @@ message MessageMetadata { | |||
optional CompressionType compression = 8 [default = NONE]; | |||
optional uint32 uncompressed_size = 9 [default = 0]; | |||
// XXHash64 checksum of the original message payload | |||
optional sfixed64 checksum = 10; | |||
//optional sfixed64 checksum = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment to mention this field was removed in favor of header+payload checksum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.. mentioned in comment.
@@ -406,18 +428,23 @@ private static ByteBuf serializeWithSize(BaseCommand.Builder cmdBuilder) { | |||
return buf; | |||
} | |||
|
|||
private static ByteBuf serializeCommandSendWithSize(BaseCommand.Builder cmdBuilder, MessageMetadata msgMetadata, | |||
private static ByteBuf serializeCommandSendWithSize(BaseCommand.Builder cmdBuilder, boolean includeChecksum, MessageMetadata msgMetadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer an enum value instead of boolean to make it easier to read where the function is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. added enum ChecksumType
instead boolean
if (includeChecksum) { | ||
headers.writeShort(magicCrc32c); | ||
checksumReaderIndex = headers.writerIndex(); | ||
headers.writeZero(4); // write dummy checksum int to skip 4 bytes in write index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just move the writerIndex 4 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.. made change to skip 4 bytes in writerIndex
CommandSendError sendError = sendErrorBuilder.build(); | ||
ByteBuf res = serializeWithSize(BaseCommand.newBuilder().setType(Type.SEND_ERROR).setSendError(sendError)); | ||
sendErrorBuilder.recycle(); | ||
sendError.recycle(); | ||
return res; | ||
} | ||
|
||
|
||
public static Long readChecksum(ByteBuf buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have 2 methods :
boolean hasChecksum(ByteBuf headersAndPayload);
int readChecksum(ByteBuf headersAndPayload);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
introduced hasChecksum()
again.
headers.retain(); | ||
payload.retain(); | ||
headers.readerIndex(metadataReaderIndex); | ||
ByteBuf msgMetadataBuf = DoubleByteBuf.get(headers, payload); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a DoubleByteBuf
, wouldn't it be easier to compute the checksum incrementally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there are two things:
- Circe java implementation doesn't have incremental checksum support.
public int resume(int current, long address, long length) {
throw new UnsupportedOperationException();
}
- and client side: payload is type of
UnpooledHeapByteBuf
. So, we can't usememoryAddress
so, java implementation also doesn't have incremental checksum using array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #44 gets merged I guess we should make it work with incremental checksum for all kinds of buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored Code which avoids creation of new ByteBuf and computes checksum on same DoubleByteBuf
which doesn't need incremental checksum
as well.
* @param op | ||
* @return isUpdated: returns true only if checksum is updated in {@link OpSendMsg} | ||
*/ | ||
protected boolean updateChecksumIfRequire(OpSendMsg op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method name is a bit awkward. What about verifyLocalBufferIsNotCorrupted()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, in any case it should not "update" the checksum.. If the checksum doesn't match we just need to give error to the application
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed method name as verifyLocalBufferIsNotCorrupted
and it just verifies checksum, if it is different then fails the callback else we will retry send-message again.
msg.resetReaderIndex(); | ||
} | ||
} else { | ||
log.warn("[{}] [{}] Memory leak detected while creating message with id {}", topic, producerName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When can this happen? And why would that be a memory leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen it before in past. I will try to see if I can reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the very first object of DoubleByteBuf is created as SimpleLeakAwareByteBuf
or AdvancedLeakAwareByteBuf
based on ResourceLeak Level
. And it can be reproduced by starting any test-case and put the break-point at Commands where we create DoubleByteBuf
. and at very first time it will create LeakAwareByteBuf
.
It is because of following code which netty has:
if ((leakCheckCnt ++ & mask) == 0) {
reportLeak(level);
return new DefaultResourceLeak(obj);
} else {
return null;
}
At very first time leakCheckCnt
will be 0 and it reports leak and it cause to create LeakAwareByteBuf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged fixed at netty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the leak is a false positive. The commit in netty doesn't change the substance of the leak detector. By default, when leak detection level is simple, it will pick 1 out 100 allocated buffers and instrument it for leak detection.
Whether you start picking the 1st buffer or the 101st, only changes when you're going to detect the leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you've fixed the incremental crc part, we should get rid of the DoubleByteBuf here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.. with netty leak-detection my only concern was that there could be a possibility where we create LeakAwareByteBuf
and that can fail DoubleByteBuf
casting.
- even with use of incremental-checksum, we need two buffers (b1, b2) that present into created
DoubleByteBuf
and to retrieve those buffers, we might have to cast intoDoubleByteBuf
. so, we may not be able to get rid ofDoubleByteBuf
right.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, yes.. we can avoid DoubleByteBuf
* | ||
* @param op | ||
*/ | ||
private static void removeChecksum(OpSendMsg op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stripChecksum()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. Renamed method name as stripChecksum()
temp.skipBytes(cmdSize); | ||
boolean hasChecksum = Commands.readChecksum(temp) != null; | ||
|
||
if (hasChecksum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!hasChecksum) {
return;
}
// strip the checksum
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. addressed this.
int msgBufSize = op.cmd.readableBytes(); | ||
// ByteBuf can't use readBytes() to same buffer as it always requires readerIndex < writerIndex while writing | ||
// into buffer. So, creating new temp ByteBuf to copy data without checksum. | ||
ByteBuf temp = op.cmd.alloc().buffer(msgBufSize, msgBufSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to avoid the copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modifying same buf to strip checksum without creating new buf.
fe5bdb2
to
dfe114e
Compare
ffdec7c
to
65388ee
Compare
f858b05
to
3b64d60
Compare
int checksum = readChecksum(headerFrame).intValue(); | ||
// msg.readerIndex is already at header-payload index, Recompute checksum for headers-payload | ||
int metadataChecksum = computeChecksum(headerFrame); | ||
long computedChecksum = resumeChecksum(metadataChecksum, msg.getSecond()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged change with incremental-checksum computation.
if (sequenceId == expectedSequenceId) { | ||
boolean corrupted = !verifyLocalBufferIsNotCorrupted(op); | ||
if (corrupted) { | ||
op.callback.sendComplete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the op
was just peeked from the queue but not actually removed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.. removing and cleaning op from queue after failing callback.
0dc4f28
to
02c6693
Compare
@@ -81,6 +88,8 @@ | |||
|
|||
private static final AtomicLongFieldUpdater<ProducerImpl> msgIdGeneratorUpdater = AtomicLongFieldUpdater | |||
.newUpdater(ProducerImpl.class, "msgIdGenerator"); | |||
// it prevents client to compute checksum and adding into payload | |||
private static boolean checksumEnabled = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this flag? Shouldn't we enabled the checksum always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was part of out rollout plan.. we wanted to enable this feature in two phases: rollout broker first and later on enable at client-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client will already check for the v6
protocol version, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed it..
if (checksum == computedChecksum) { | ||
return true; | ||
} else { | ||
log.error("[{}] [{}] Failed to verify checksum", topic, producerName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to include message id as well at this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, at this point we haven't persisted the message so, we don't have message-id so, we are not logging message-id.
private final static IncrementalIntHash CRC32C_HASH; | ||
|
||
static { | ||
if (Sse42Crc32C.isSupported()) { | ||
CRC32C_HASH = new Crc32cSse42Provider().getIncrementalInt(CRC32C); | ||
log.info("SSE4.2 CRC32C provider initialized"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to debug level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, as it logs only once when broker starts. So, can we keep it "INFO" initially to get confirmation about broker loaded library successfully and it is not computing checksum using slower-software-algo .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, this will all print in client lib logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. changed log-level to debug
.
However, client-lib should also print it only once, and I think, it would be great if user has transparency to know which version (hw/sw) of checksum is being used by app.
} | ||
} | ||
// close connection and let producer resend pending-messages | ||
cnx.ctx().close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we resend messages without closing the connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.. actually we can do resendMessages(cnx);
to resend without closing/disturbing connection.
cfd3480
to
e78c8b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@rdhabalia This looks good to go. Can you rebase to resolve the conflict in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
statsIntervalInSeconds can be 0 to disable log
Signed-off-by: xiaolong.ran [email protected] * Support batch logic for project * add unit test case of event time * add some unit tests case for producer * fix error result type * add unit test case of producer flush * add receiver queue size test logic * support partition consumer receive async * add unit test case of ack timeout * Fix consumer receiving message out of order
) Signed-off-by: tison <[email protected]>
Motivation
XXHashChecksum
algorithm to compute checksum andSSE4.2CRC32C checksum
uses machine-instruction which is faster thanXXHashChecksum
.Modifications
XXHashChecksum
withSSE4.2CRC32c checksum
.checksum: magicByte + checksum-value
fields with in message-command.Result
Client and Broker can do
SSE4.2CRC32c checksum
to identify message corruption.