-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobuf and etcd upgrade #2997
Conversation
5a16af1
to
2f6c72d
Compare
A few regressions from etcd upgrade to look into, very interesting issues to look into. Don't mind me, I'll keep digging. |
7d8a93c
to
f6fa718
Compare
@atoulme can you update your branch & update the PR. |
Done - tests running. |
there are failures related to protobuf changes: https://dev.azure.com/Hyperledger/Fabric/_build/results?buildId=46926&view=logs&j=6b58850f-3858-5a05-33e2-5e41cbf03c4e&t=bddec1cf-ba37-5883-9c3e-fd1e8608f9a1&l=3726 This testcase fails with panic. I am seeing the same issue on my local environment. I am currently debugging this one. |
@atoulme , following change should address the panic(mentioned in the last comment). Please check by updating this PR.
|
Thanks Param, I have applied your changes. |
Looks like we have 2 errors left, but both pass locally on my laptop. Any ideas? |
I rerun the unit-tests job(with the assumption that the problem related to timing), the failed testcases are passed now in the 2nd run. It needs bit investigation on what is the issue in the first run. There is separate failure, it seems to me related to GRPC msg update. We need to update the testcase wrt latest msg. https://dev.azure.com/Hyperledger/Fabric/_build/results?buildId=47086&view=logs&j=e306c17a-d139-54bf-a475-f5a11259cee7&t=1e3023a5-584f-52f3-49bc-66bd27d27b6d&l=130 |
Yes, I was wondering about this. I fixed the problem now by increasing the caller skip level when zap looks for the caller. |
What is also missing is an explanation of the nature of the changes. Can you also explain the following?
|
@yacovm at the risk of disappointing you, this is just a straight up upgrade of etcd and protobuf to newer versions. There's some API changes, especially in the way a node starts and is configured with existing peers. There's some differences in the errors thrown by protobuf. There are no functional changes besides this. I have no clue at all as to whether those changes make this code incompatible with previous revisions, ie I have not tried to form a cluster with the code before and after, and I am certainly not the best equipped for that. |
Sure, I am not blaming or implying the work you did is not valuable.
I understood that etcd made a functional change in how they handle snapshots and reconfiguration (specifically, removal of nodes). As per @guoger 's comment:
Now, clearly this means that the PR already contains a functional change (in the dependencies). It is up to Fabric to make sure that the functional change in dependencies doesn't translate to a functional change in operations. I'm not sure that the trick that Jay pointed out that etcd is doing (which you also attempt to perform) is done correctly, because:
What if this node was restarted and then it means the
I understand, but I think we need to test it before merging this PR so we will know how to advise users. |
@Param-S I think it works for you just because the latest config block was still relatively "fresh" and as a result it's still in the WAL and hasn't been garbage collected by snapshot. If you put many transactions between the last config and the restart of the node, you will see that ApplyConfChange will not be called. I made a small test where I put 100 transactions to enforce a snapshot that prunes the WAL:
and added a print to ApplyConfChange:
and it doesn't print it after a restart. |
Signed-off-by: Parameswaran Selvam <[email protected]>
@yacovm The current implementation of NewChain reads the confstate of latest snapshot and stores it chain object. fabric/orderer/consensus/etcdraft/chain.go Line 245 in ccfa8a4
Now I updated the same flow to set the same value to node's confstate attribute which can be used later in the flow. |
Signed-off-by: Parameswaran Selvam <[email protected]>
Signed-off-by: Parameswaran Selvam <[email protected]>
@yacovm Now, the ConfState picked up from the latest snapshot and at the node initialization time itself, it should address the issue. Could you check & confirm |
Looks good, let's get some more eyes on this. |
How many eyes needed? Can this be merged please? |
At least one more pair of eyes besides mine :-) |
Do you have someone in mind? |
Probably @guoger is best, afterwards maybe @C0rWin or @manish-sethi |
I can do it over the weekend |
@C0rWin any update? |
Sure, any ETA? |
@C0rWin any update please? Are the tests not covering enough this change? What can we do here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atoulme sorry, took me some time to review, run locally and seems the PR is fine, though a little bit too bigger than I would go with merge :)
but given there is not other way, LGTM and thanks
Great, please merge? |
Hi guys, so does this pr merge means we are completed with protobuf upgrade? and which means for PR been blocked by protobuf can be reopened in pr review process? thanks and regards |
@SamYuan1990 Yes, please proceed now! |
|
Type of change
Description
Update protobuf and etcd to their latest versions.
Additional details
This is a reprisal of the work on updating protobuf by #2185
Related issues
FAB-18363