-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vrf]: Fix freezing during interface binding (#6209) #1325
Conversation
This pull request introduces 1 alert when merging 7d6cf0466cb756fd9428935655120ba2c624d76f into becf5b5 - view on LGTM.com new alerts:
|
e779726
to
0f433de
Compare
@tylerlinp , can you take a look at the PR?. Is it caused due to the change in get_all introduced as part of #1119 |
@prsunny , I think it should be the return of |
Dear @prsunny, @tylerlinp, I have already investigated get_all function and there is no difference between new and old Python. In #6209 I have commented a real situation with the records in STATE_DB, which I faced while execution of bind command. It seems "INTERFACE_TABLE|EthernetXX" record is being completely deleted while binding, what causes the issue. I think the handling procedure of config_db.set_entry(table_name, interface_name, None) has changed, so "INTERFACE_TABLE|EthernetXX" record has begun to be deleted. |
@maksymbelei95 I cannot understand you completely, you mean set_entry with None now is not del but add? Here is the workflow of bind operation,please help to get status of config_db and state_db.
|
Dear @tylerlinp, Currently, the situation with config_db and state_db while preforming
I am sorry, but I am not sure what exactly consumer/producer handles those changes and executes redis scripts. It has past a lot of time since I have been investigating the issue. Previously, the scripts from step 2 has not being performed by the system, so, I though that the while loop has made as workaround to check that the system handled Maybe, for example, intfmgrd daemon became to handle changes in config_db sequentially, so, currently it does not lose DEL operation? |
I think you'd better read db status via redis-cli before bind command. Previously, interface Ethernet8 is not existence in config_db and state_db. What about now?
Yes, as you said, there are two changes.
I cannot download log files at the moment, but I think it maybe the DEL operations by intfmgrd.
No, your thinking is beyond it was. The while loop was to wait real DEL op was executed over. It is used to handle case b) mentioned before, that it has nothing to do with this case - or case a). The system is never lose real DEL operations. The workaround is to handle rebind (unbind->bind) in bind command, it is to del existent entry just like unbind. The lost in notes is because two continuous operations were merged when addToSync.
I think intfmgrd hasn't been changed yet, while SyncMap/addToSync changed for holding DEL and SET, so your conclusion should be right. |
Dear @tylerlinp Thank you for clarification. |
@maksymbelei95 Yes, we expect To verify it, more test is needed:
|
@tylerlinp,
Not completely so. exists checks whether the key exists or not, but
Me too. I think the reason, why bind operation was working is that the record in
Yes, Any way, I will update the PR with |
* Replacing using 'get_all' with 'exists' in port state checking procedure inside bind function to avoid freezing in the while loop, what caused by absence of related record in STATE_DB. Signed-off-by: Maksym Belei <[email protected]>
0f433de
to
ee2ac68
Compare
I have updated the PR according to result of discussion with @tylerlinp. Could you review it? |
retest this please |
It becomes more clear now, just like my first guess - the result of |
@tylerlinp, thank you for review. I will check execution stack of get_all function and will try to restore possibility of catching an exceptions. @prsunny, as @tylerlinp has approved the PR, could I ask you to merge it? |
* Replacing using 'get_all' with 'exists' in port state checking procedure inside bind function to avoid freezing in the while loop, what caused by absence of related record in STATE_DB. Signed-off-by: Maksym Belei <[email protected]>
* Replacing using 'get_all' with 'exists' in port state checking procedure inside bind function to avoid freezing in the while loop, what caused by absence of related record in STATE_DB. Signed-off-by: Maksym Belei <[email protected]>
procedure inside bind function to avoid freezing in the while loop,
what caused by absence of related record in STATE_DB.
Signed-off-by: Maksym Belei [email protected]
- What I did
Resolves sonic-net/sonic-buildimage#6209
Freezing of "sudo config interface vrf bind EthernetXX Vrf_XX" operation has resolved.
- How I did it
As currently the system completely deletes "INTERFACE_TABLE|EthernetXX" record in STATE_DB on unbinding or on preparing stage of bind, there is not need to check presence of any keys of the record.
Checking of keys of "INTERFACE_TABLE|EthernetXX" record in STATE_DB, what caused freezing due to absence of the record, has changed to checking for existence of entire record.
- How to verify it
Perform the next steps:
After the steps above command "show vrf" should display Ethernet36 is bound to Vrf_custom.