-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sysready #8889
sysready #8889
Conversation
This pull request introduces 12 alerts when merging 3068490182bd487229aa451538852924019e1815 into 3e397ce - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging 5c28c6d4c91ce8c4a76ce3bd5b8d6f8fd9a453ba into df6361f - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging e5bd0e9ee56db3a56abf110651a3c3b81d202606 into df6361f - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging d820b985ef04b9922eaae828595743cef224e015 into df6361f - view on LGTM.com new alerts:
|
d820b98
to
4053ebb
Compare
This pull request introduces 1 alert when merging 4053ebbe5e6a4d1887723762ddd22675a1cb7640 into 7d40384 - view on LGTM.com new alerts:
|
4053ebb
to
36f9aad
Compare
This pull request introduces 1 alert when merging 36f9aad2a24d409f9dfc743b55b039a9c81777b2 into b9366f3 - view on LGTM.com new alerts:
|
36f9aad
to
c5ed18d
Compare
c5ed18d
to
0286c09
Compare
retest this please |
/azpw run |
0286c09
to
46cbef9
Compare
This pull request introduces 1 alert when merging 46cbef94a90d3e19214ff13f75ab6d8d33608f40 into 3788294 - view on LGTM.com new alerts:
|
46cbef9
to
fd11a1e
Compare
This pull request introduces 1 alert when merging fd11a1e4804cb8d49a2f854b6c40ae54c8349ff6 into 51c9c98 - view on LGTM.com new alerts:
|
key "name"; | ||
|
||
leaf name { | ||
description "host feature name in host Feature table"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give some examples of host features like hostcfgd, caclmgrd...etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included the example in description now.
"high_mem_alert": "disabled" | ||
"high_mem_alert": "disabled", | ||
{%- if feature in ["bgp", "swss", "pmon", "nat", "teamd", "dhcp_relay", "sflow", "l2mcd", "udld", "stp", "snmp", "lldp", "radv", "iccpd", "syncd", "vrrp", "mgmt-framework", "tam"] %} | ||
"check_up_status" : "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this initialization, cant we assume if check_up_status field is not present, it's false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of framework provision for dockers to set "true" against "check_up_status" flag.
For now, all these dockers check_up_status is set to 'false'. Later once these dockers implement the logic of marking the up_status flag in STATE_DB, can set it to 'true' eventually.
sysready_lock.unlock() | ||
except Exception as e: | ||
logger.log_error( str(e)) | ||
time.sleep(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to periodically check the status every 2 secs? what's the CPU usage for this service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep(2) is invoked only in case of exception.
There is no periodic check.
As the design is on event based framework, the logic is to check the status of a service unit that was received in the queue. It will not hog the cpu.
System Ready UT log file uploaded. |
fd11a1e
to
3135d67
Compare
This pull request introduces 1 alert when merging 3135d67 into 0290207 - view on LGTM.com new alerts:
|
Venkat approved and all build check passed. Merge it |
to note, this pr has been reverted. per our discussion, i think this pr need mellanox folks to review and approve. |
@zhangyanzhao as we have comments on the HLD itself i suggest to revert this PR as well. |
Add yang label since it has yang related change |
@@ -0,0 +1,691 @@ | |||
#!/usr/bin/python3 | |||
|
|||
from datetime import datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to package it into a proper python package, need unit test to validate the service.
spl_srv_list= ['database-chassis', 'gbsyncd'] | ||
core_srv_list = [ | ||
'swss.service', | ||
'bgp.service', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure bgp.service is a core service, we need to check how this interack with application extension framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably need to define whether a docker is a core or not in the docker's manifest.
import argparse | ||
import multiprocessing as mp | ||
from swsssdk import ConfigDBConnector | ||
from swsssdk import SonicV2Connector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swsssdk is on the deprecation path, it cannot be used any more.
|
||
def db_connect(): | ||
try: | ||
st_db = SonicV2Connector() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should all use the swsscommon library.
} | ||
} | ||
} | ||
container HOST_FEATURE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure we should introduce such yang model, why not in the feature table, we add a flag to indicate if it is host or container feature.
@@ -51,7 +51,27 @@ | |||
{%- if feature in ["lldp", "pmon", "radv", "snmp", "telemetry"] %} | |||
"set_owner": "kube", {% else %} | |||
"set_owner": "local", {% endif %} {% endif %} | |||
"high_mem_alert": "disabled" | |||
"high_mem_alert": "disabled", | |||
{%- if feature in ["bgp", "swss", "pmon", "nat", "teamd", "dhcp_relay", "sflow", "l2mcd", "udld", "stp", "snmp", "lldp", "radv", "iccpd", "syncd", "vrrp", "mgmt-framework", "tam"] %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dhcp_relay is now an application extension package. We should not have it hardcoded as we may have images with dhcp_relay and without it.
We need to add this new field here - https://github.com/Azure/sonic-utilities/blob/master/sonic_package_manager/service_creator/feature.py#L34.
And we need to add a new manifest variable to control whether "check_up_status" should be up true or false which will also be an indication whether docker implements marking the up_status flag in STATE_DB.
spl_srv_list= ['database-chassis', 'gbsyncd'] | ||
core_srv_list = [ | ||
'swss.service', | ||
'bgp.service', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably need to define whether a docker is a core or not in the docker's manifest.
Why I did it
System Ready Feature Implementation as per https://github.com/Azure/SONiC/pull/875/files
At present, there is no mechanism to know that the system is up with all the essential sonic services and also, all the docker apps are ready along with port ready status to start the network traffic. With the asynchronous architecture of SONiC, we will not be able to verify if the config has been applied all the way down to the HW. But we can get the closest up status of each app and arrive at the system readiness.
How I did it
A new python based System monitor framework is introduced to monitor all the essential system host services including docker wrapper services on an event based model and declare the system is ready. This framework gives provision for docker apps to notify its closest up status. CLIs are provided to fetch the current system status and also service running status and its app ready status along with failure reason if any.
How to verify it
"show system status core" click CLI
"show system status all" click CLI
Syslogs for system ready
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)