-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[services] Fix Delay Start of SNMP And Telemetry #5211
[services] Fix Delay Start of SNMP And Telemetry #5211
Conversation
SNMP and Telemetry services are not critical to switch startup. They also cause fast-reboot not to meet timing requirements. In order to delay start those service are associated with systemd timer units, however when hostcfgd initiate service start, it start the service and not the timer. This PR fixes this issue by starting the timer associated with systemd unit. signed-off-by: Tamer Ahmed <[email protected]>
a9b6881
to
26e5564
Compare
Just wanted to add a note that we should probably now also check the |
Good point! I was thinking about it last night, however the rationale is to delay non critical service during boot time in order to meet fast/warm boot time requirements. During config load/reload, we don't have such urgency. What do you think? |
If that is the only rationale, and there is no other reason for the delay (race conditions, etc.), then I agree we don't need to be concerned about config load/reload. |
@@ -41,12 +41,13 @@ def obfuscate(data): | |||
return data | |||
|
|||
|
|||
def update_feature_state(feature_name, state): | |||
def update_feature_state(feature_name, state, has_timer=False): | |||
feature_suffix = "timer" if has_timer else "service" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tahmed-dev wondering is it it not better to check for timer service exits for any service and start it if present? That way it can be dynamic and we don't to pre-define in init_cfg.json.j2 as this is always can break if new service is added but init_cfg file is not updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdosi that would work as well. The argument is applicable to features as well. This would also break a feature as it will not be started. I think it is simpler for hostcfgd to not assume any knowledge about systemd internals or where .service/.timer files are on disk had in chance systemd relocated those files. After all this is one time configuration and it should be well defined during development.
After all, if you feel strongly about it, please go ahead and put out a PR to that effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdosi: I made the same suggestion above. I think the current solution (relying on init_cfg.json) is better than explicitly specifying the names of the services which have a .timer file. I'm still open to checking for the presence of a .timer file. The more foolproof and maintenance-free we can make the codebase, the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tahmed-dev and @jleveque.
I was thinking just check return value of below command and based on that use either .service or .timer
"sudo "sudo systemctl list-unit-files | grep {}.timer".format(feature_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdosi Thanks! I did not know about this command.
The only thing that would hold me off is that this comes with a cost during boot time as such check for every service will consume precious CPU cycles in this path (boot time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tahmed-dev: I was also concerned about that downside to checking for the unit files. I guess we could check the runtime of the sudo systemctl list-unit-files | grep ...
command to understand how intensive it is. But as above, I'm OK with expecting this information to be added to init_cfg.json -- now, all new services should be added there. It's one location and it's a data file. What I really wanted to avoid (and this implementation does that) is the need to add new service names into various code files if they are exceptions to the norm (e.g., they have a .timer file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jleveque As I was discussing with Tamer one concern I had:
- In future we add timer to any existing service then it is not intuitive to go and add the change into init_cfg.json accordingly
Also regarding boot-time performance we can run this command only one and not for all services and save the state/O-P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had offline chat with @tahmed-dev and since using this approach can have boot-time impact so we can park this discussion for now .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Definitely something to reconsider in the future.
@tahmed-dev Adding to @jleveque point currently on doing config reload seeing this. May be we need to check if we want to add .timer in config commands also sudo config reload -y |
stop_cmds.append("sudo systemctl stop {}.service".format(feature_name)) | ||
stop_cmds.append("sudo systemctl disable {}.service".format(feature_name)) | ||
stop_cmds.append("sudo systemctl mask {}.service".format(feature_name)) | ||
stop_cmds.append("sudo systemctl stop {}.{}".format(feature_name, feature_suffix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When stopping the service, if the service has a .timer file, I believe we need to stop both the timer AND the service. If the timer has already started the service, we need to stop the service. If the timer is currently running and hasn't started the service, we need to stop the timer. Thus, we should always stop both to be safe.
SNMP and Telemetry services are not critical to switch startup. They also cause fast-reboot not to meet timing requirements. In order to delay start those service are associated with systemd timer units, however when hostcfgd initiate service start, it start the service and not the timer. This PR fixes this issue by starting the timer associated with systemd unit. signed-off-by: Tamer Ahmed <[email protected]>
* [BFN] Add support pcied daemon for Montara and Newport (sonic-net#5199) Signed-off-by: Petro Bratash <[email protected]> * [cfggen] Allow Write To Redis DB With Template/Batch Mode (sonic-net#5203) Argument to write to config-db is not allowed when using template. This PR allows cfggen to write to redis db when using template mode. signed-off-by: Tamer Ahmed <[email protected]> * [submodule]: Advance sonic-snmpagent. (sonic-net#5213) Update sonic-snmpagent submodule to include below commits: 1a2b62a [Namespace]: Fix SAI_ID key used in cpfcIfTable and csqIfQosGroupStatsTable implementation (sonic-net#138) d06f00c [pytest/coverage]: add coverage support (sonic-net#156) 90e9f2e [Namespace]: Simplify sync_d functions to use higher order (sonic-net#154) b5815d9 [LLDP]: Modify OID index of LLDPRemTableUpdater MIB (sonic-net#155) d5f2b92 [Multiasic]: Provide namespace support for ipNetToMediaPhysAddress (sonic-net#129) 166c221 [Namespace]: Fix interface counters in RFC 1213 (sonic-net#145) Signed-off-by: SuvarnaMeenakshi <[email protected]> * [cfggen] Conform With Python 3 Syntax (sonic-net#5154) Preparing sonic-cfggen for migration to Python 3. signed-off-by: Tamer Ahmed <[email protected]> * [redis-dump-load] Update submodule (sonic-net#5215) * src/redis-dump-load 832a645...7585497 (2): > Merge pull request sonic-net#63 from jleveque/update_gitignore > Merge pull request sonic-net#59 from breser/redis-load-empty * [services] Fix Delay Start of SNMP And Telemetry (sonic-net#5211) SNMP and Telemetry services are not critical to switch startup. They also cause fast-reboot not to meet timing requirements. In order to delay start those service are associated with systemd timer units, however when hostcfgd initiate service start, it start the service and not the timer. This PR fixes this issue by starting the timer associated with systemd unit. signed-off-by: Tamer Ahmed <[email protected]> * [sonic-py-common][multi ASIC] API to get a list of frontend ports (sonic-net#5221) * [sonic-py-common][multi ASIC] utility to get a list of frontend ports from a given list of ports * [sonic-config-engine] Update .gitignore (sonic-net#5223) - Ignore directories generated by building Python wheel package - Move all sonic-config-engine ignores from the root .gitignore to src/sonic-config-engine/.gitignore * Advance swss-common submodule. (sonic-net#5222) 9a7c9d Dbconnector namespace support (sonic-net#376) c32f0b5 add state db entry for fgnhg route entry (sonic-net#374) * [caclmgrd] Add support for multi-ASIC platforms (sonic-net#5022) * Support for Control Plane ACL's for Multi-asic Platforms. Following changes were done: 1) Moved from using blocking listen() on Config DB to the select() model via python-swsscommon since we have to wait on event from multiple config db's 2) Since python-swsscommon is not available on host added libswsscommon and python-swsscommon and dependent packages in the base image (host enviroment) 3) Made iptables programmed in all namespace using ip netns exec Signed-off-by: Abhishek Dosi <[email protected]> * Address Review Comments Signed-off-by: Abhishek Dosi <[email protected]> * Fix Review Comments * Fix Comments * Added Change for Multi-asic to have iptables rules to accept internal docker tcp/udp traffic needed for syslog and redis-tcp connection. Signed-off-by: Abhishek Dosi <[email protected]> * Fix Review Comments * Added more comments on logic. * Fixed all warning/errors reported by http://pep8online.com/ other than line > 80 characters. * Fix Comment Signed-off-by: Abhishek Dosi <[email protected]> * Verified with swsscommon package. Fix issue for single asic platforms. * Moved to new python package * Address Review Comments. Signed-off-by: Abhishek Dosi <[email protected]> * Address Review Comments. * Add support to VS platform for platform.json and DPB CLI Tests (sonic-net#5192) - Reverts commit 457674c - Creates "platform.json" for vs docker - Adds test case for port breakout CLI - Explicitly sets admin status of all the VS interfaces to down to be compatible with SWSS test cases, specifically vnet tests and sflow tests Signed-off-by: Sangita Maity <[email protected]> * [iccpd] Fix uninitialized variable. (sonic-net#5112) To declare *tb[] but do not initialize it, it might be very risky. We get iccpd exception during processing arp/nd event. Initialize it to {0}; * Fix unwanted python exception in syslog during database container (sonic-net#5227) startup when doing redis PING since database_config.json getting generated from jinja2 template is still not ready. Signed-off-by: Abhishek Dosi <[email protected]> * [hostcfgd] Handle Both Service And Timer Units (sonic-net#5228) Commit e484ae9 introduced systemd .timer unit to hostcfgd. However, when stopping service that has timer, there is possibility that timer is not running and the service would not be stopped. This PR address this situation by handling both .timer and .service units. signed-off-by: Tamer Ahmed <[email protected]> * [arista] Update driver submodules (sonic-net#5147) - fix watchdog timeout units - fix import path for thermal_manager - remove arista bind mounts for docker-snmp - improve arista bind mounts for pmon * [docker-radv] Fix startup issues (sonic-net#5230) **- Why I did it** PR sonic-net#4599 introduced two bugs in the startup of the router advertiser container: 1. References to the `wait_for_intf.sh` script were changed to `wait_for_link.sh`, but the actual script was not renamed 2. The `ipv6_found` Jinja2 variable added to the supervisor config file goes out of scope before it is read. **- How I did it** 1. Rename the `wait_for_intf.sh` script to `wait_for_link.sh` 2. Use the Jinja2 "namespace" construct to fix the scope issue **- How to verify it** Ensure all processes in the radv container start properly under the correct conditions (i.e., whether or not there is at least one VLAN with an IPv6 address assigned). * [sonic-utilities] Update submodule (sonic-net#5233) * src/sonic-utilities d5fdd74...17fb378 (7): > [sonic-installer] Import re module (sonic-net#1061) > [fast-reboot]: Fix fail to execute fast-reboot problem (sonic-net#1047) > [config] Reduce Calls to SONiC Cfggen (sonic-net#1052) > [filter-fdb] Call Filter FDB Main From Within Test Code (sonic-net#1051) > [sflow_test.py]: Fix show sflow display. (sonic-net#1054) > Change fast-reboot script to use swss and radv service script (sonic-net#1036) > Common functions for show CLI support on multi ASIC (sonic-net#999) * [sonic-host-service]: Add SONiC Host Services infrastructure (sonic-net#4840) - Why I did it When SONiC is configured with the management framework and/or telemetry services, the applications running inside those containers need to access some functionality on the host system. The following is a non-exhaustive list of such functionality: Image management Configuration save and load ZTP enable/disable and status Show tech support - How I did it The host service is a Python process that listens for requests via D-Bus. It will then service those requests and send a response back to the requestor. This PR only introduces the host service infrastructure. Applications that need access to the host services must add applets that will register on D-Bus endpoints to service the appropriate functionality. - How to verify it - Description for the changelog Add SONiC Host Service for container to execute select commands in host Signed-off-by: Nirenjan Krishnan <[email protected]> * Add common functions applicable to single/multi asic platforms (sonic-net#5224) * Add common functions applicable to single/multi asic platforms * Raise exception if invalid namespace is given as input. * [sonic-swss] Update submodule (sonic-net#5231) * src/sonic-swss d2bab10...c4949a2 (34): > [dvs] Add new common issues and TOC to DVS README (sonic-net#1405) > Avoid adding loopback interface (ip link add) when setting nat zone on loopback interface (sonic-net#1411) > [portsorch] add buffer drop FC group (sonic-net#1368) > [dvs/chassis] Bring up SONiC interfaces in virtual chassis (sonic-net#1410) > [chassis/dvs] Add support for virtual chassis to DVS testbed (sonic-net#1345) > [sonic-swsss] Fix the issue of field "next_hop_ip" not getting updated in state DB in ERSPAN Mirror (sonic-net#1375) > [intfmgr] Fix OA crash issue due to link local configurations (sonic-net#1195) > Fix the issue when persistent DVS is used to run pytest which has number of front-panel ports < 32 (sonic-net#1373) > [dvs] Refactor AsicDbValidator (sonic-net#1402) > [fec] Get FEC mode when port is already admin down (sonic-net#1403) > [fec] added logic that put port down before applying fec onfiguration (sonic-net#1399) > [dvs] Add performance test for adding and deleting routes (sonic-net#1392) > Ignore IPv6 link-local and multicast entries as Vnet routes (sonic-net#1401) > [vlanmgr] Support Jumbo Frame By Default (sonic-net#1393) > Fix log/syslog not being correct when last test fails for given module (sonic-net#1395) > Get initial speed from ASIC DB (sonic-net#1390) > [dvs] Add options to limit CPU usage (sonic-net#1394) > [intfsorch] Retrieve Port object before setting NAT zone on router interfaces. (sonic-net#1372) > [.gitignore] Ignore gearsyncd binary (sonic-net#1381) > Added Max Nexthopgroup/ECMP Count supported by device into State DB. (sonic-net#1383) > [dvs] Upload logs even if failure occurs during startup (sonic-net#1389) > [rates] fix issue with rates init (sonic-net#1387) > [dvs] Validate that SWSS is ready to receive input before starting tests (sonic-net#1385) > [dvs] Convert sflow and speed tests to use dvslib (sonic-net#1382) > [dvs_acl] Refactor and document dvs_acl library (sonic-net#1378) > [dvs] Fix install instructions in README (sonic-net#1379) > [dvs] Update README with new flags, options, and known issues (sonic-net#1380) > swss: gearsyncd should return 0 on exit (sonic-net#1376) > Remove 00-copp.config.json from swss debian package. (sonic-net#1366) > fix undefined var in rates lua scripts (sonic-net#1365) > [fdborch] Fixed Orchagent crash in FDB flush on port disable. (sonic-net#1369) > [tlm_teamd]: Try to add LAG again, when teamd is not ready first time (sonic-net#1347) > [vs] Incorporate python3 best practices into DVSLib (sonic-net#1357) > [dvs] Mark unstable tests as xfail (sonic-net#1356) * [arista/aboot]: Zero out 1st MB before repartitioning (sonic-net#5220) The first partition starting point was changed to be 1M as part of this commit: 6ba2f97. On systems that are misaligned before conversion (partition start is the first sector), the relica partition that is left in the first MB can cause problems in Aboot and result in corruption of the filesystem on the new aligned partition. Zeroing this old relica makes sure that there is nothing left of the old partition lying around. There won't be any risk of having Aboot corrupt the new filesystem because of the old relica. Signed-off-by: Baptiste Covolato <[email protected]> * [sonic-py-common] Add unit test framework (sonic-net#5238) **- Why I did it** To install the framework for adding unit tests to the sonic-py-common package and report coverage. ** How I did it ** - Incorporate pytest and pytest-cov into sonic-py-common package build - Updgrade version of 'mock' installed to version 3.0.5, the last version which supports Python 2. This fixes a bug where the file object returned from `mock_open()` was not iterable (see https://bugs.python.org/issue32933) - Add support for Python 3 setuptools and pytest in sonic-slave-buster environment - Add tests for `device_info.get_machine_info()` and `device_info.get_platform()` functions - Also add a .gitignore in the root of the sonic-py-common directory, move all related ignores from main .gitignore file, and add ignores for files and dirs generated by pytest-cov * Add switch for synchronous mode (sonic-net#5237) Add a master switch so that the sync/async mode can be configured. Example usage of the switch: 1. Configure mode while building an image `make ENABLE_SYNCHRONOUS_MODE=y <target>` 2. Configure when the device is running Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db` Restart swss with `systemctl restart swss` * [enable counters] Enable port buffer drops by default and update MLNX SAI submodule (sonic-net#5059) * Enable port buffer drops by default * [Mellanox] Update SAI_Implementation Signed-off-by: Mykola Faryma <[email protected]> * Platform monitor changes in daemon_base for multi_asic (sonic-net#4932) Adding namespace support for db connect API. Co-authored-by: Petro Bratash <[email protected]> Co-authored-by: Tamer Ahmed <[email protected]> Co-authored-by: SuvarnaMeenakshi <[email protected]> Co-authored-by: Joe LeVeque <[email protected]> Co-authored-by: Mahesh Maddikayala <[email protected]> Co-authored-by: judyjoseph <[email protected]> Co-authored-by: abdosi <[email protected]> Co-authored-by: Sangita Maity <[email protected]> Co-authored-by: Kelly Chen <[email protected]> Co-authored-by: Samuel Angebault <[email protected]> Co-authored-by: nirenjan <[email protected]> Co-authored-by: Baptiste Covolato <[email protected]> Co-authored-by: shi-su <[email protected]> Co-authored-by: Mykola F <[email protected]>
* [BFN] Add support pcied daemon for Montara and Newport (sonic-net#5199) Signed-off-by: Petro Bratash <[email protected]> * [cfggen] Allow Write To Redis DB With Template/Batch Mode (sonic-net#5203) Argument to write to config-db is not allowed when using template. This PR allows cfggen to write to redis db when using template mode. signed-off-by: Tamer Ahmed <[email protected]> * [submodule]: Advance sonic-snmpagent. (sonic-net#5213) Update sonic-snmpagent submodule to include below commits: 1a2b62a [Namespace]: Fix SAI_ID key used in cpfcIfTable and csqIfQosGroupStatsTable implementation (sonic-net#138) d06f00c [pytest/coverage]: add coverage support (sonic-net#156) 90e9f2e [Namespace]: Simplify sync_d functions to use higher order (sonic-net#154) b5815d9 [LLDP]: Modify OID index of LLDPRemTableUpdater MIB (sonic-net#155) d5f2b92 [Multiasic]: Provide namespace support for ipNetToMediaPhysAddress (sonic-net#129) 166c221 [Namespace]: Fix interface counters in RFC 1213 (sonic-net#145) Signed-off-by: SuvarnaMeenakshi <[email protected]> * [cfggen] Conform With Python 3 Syntax (sonic-net#5154) Preparing sonic-cfggen for migration to Python 3. signed-off-by: Tamer Ahmed <[email protected]> * [redis-dump-load] Update submodule (sonic-net#5215) * src/redis-dump-load 832a645...7585497 (2): > Merge pull request sonic-net#63 from jleveque/update_gitignore > Merge pull request sonic-net#59 from breser/redis-load-empty * [services] Fix Delay Start of SNMP And Telemetry (sonic-net#5211) SNMP and Telemetry services are not critical to switch startup. They also cause fast-reboot not to meet timing requirements. In order to delay start those service are associated with systemd timer units, however when hostcfgd initiate service start, it start the service and not the timer. This PR fixes this issue by starting the timer associated with systemd unit. signed-off-by: Tamer Ahmed <[email protected]> * [sonic-py-common][multi ASIC] API to get a list of frontend ports (sonic-net#5221) * [sonic-py-common][multi ASIC] utility to get a list of frontend ports from a given list of ports * [sonic-config-engine] Update .gitignore (sonic-net#5223) - Ignore directories generated by building Python wheel package - Move all sonic-config-engine ignores from the root .gitignore to src/sonic-config-engine/.gitignore * Advance swss-common submodule. (sonic-net#5222) 9a7c9d Dbconnector namespace support (sonic-net#376) c32f0b5 add state db entry for fgnhg route entry (sonic-net#374) * [caclmgrd] Add support for multi-ASIC platforms (sonic-net#5022) * Support for Control Plane ACL's for Multi-asic Platforms. Following changes were done: 1) Moved from using blocking listen() on Config DB to the select() model via python-swsscommon since we have to wait on event from multiple config db's 2) Since python-swsscommon is not available on host added libswsscommon and python-swsscommon and dependent packages in the base image (host enviroment) 3) Made iptables programmed in all namespace using ip netns exec Signed-off-by: Abhishek Dosi <[email protected]> * Address Review Comments Signed-off-by: Abhishek Dosi <[email protected]> * Fix Review Comments * Fix Comments * Added Change for Multi-asic to have iptables rules to accept internal docker tcp/udp traffic needed for syslog and redis-tcp connection. Signed-off-by: Abhishek Dosi <[email protected]> * Fix Review Comments * Added more comments on logic. * Fixed all warning/errors reported by http://pep8online.com/ other than line > 80 characters. * Fix Comment Signed-off-by: Abhishek Dosi <[email protected]> * Verified with swsscommon package. Fix issue for single asic platforms. * Moved to new python package * Address Review Comments. Signed-off-by: Abhishek Dosi <[email protected]> * Address Review Comments. * Add support to VS platform for platform.json and DPB CLI Tests (sonic-net#5192) - Reverts commit 457674c - Creates "platform.json" for vs docker - Adds test case for port breakout CLI - Explicitly sets admin status of all the VS interfaces to down to be compatible with SWSS test cases, specifically vnet tests and sflow tests Signed-off-by: Sangita Maity <[email protected]> * [iccpd] Fix uninitialized variable. (sonic-net#5112) To declare *tb[] but do not initialize it, it might be very risky. We get iccpd exception during processing arp/nd event. Initialize it to {0}; * Fix unwanted python exception in syslog during database container (sonic-net#5227) startup when doing redis PING since database_config.json getting generated from jinja2 template is still not ready. Signed-off-by: Abhishek Dosi <[email protected]> * [hostcfgd] Handle Both Service And Timer Units (sonic-net#5228) Commit e484ae9 introduced systemd .timer unit to hostcfgd. However, when stopping service that has timer, there is possibility that timer is not running and the service would not be stopped. This PR address this situation by handling both .timer and .service units. signed-off-by: Tamer Ahmed <[email protected]> * [arista] Update driver submodules (sonic-net#5147) - fix watchdog timeout units - fix import path for thermal_manager - remove arista bind mounts for docker-snmp - improve arista bind mounts for pmon * [docker-radv] Fix startup issues (sonic-net#5230) **- Why I did it** PR sonic-net#4599 introduced two bugs in the startup of the router advertiser container: 1. References to the `wait_for_intf.sh` script were changed to `wait_for_link.sh`, but the actual script was not renamed 2. The `ipv6_found` Jinja2 variable added to the supervisor config file goes out of scope before it is read. **- How I did it** 1. Rename the `wait_for_intf.sh` script to `wait_for_link.sh` 2. Use the Jinja2 "namespace" construct to fix the scope issue **- How to verify it** Ensure all processes in the radv container start properly under the correct conditions (i.e., whether or not there is at least one VLAN with an IPv6 address assigned). * [sonic-utilities] Update submodule (sonic-net#5233) * src/sonic-utilities d5fdd74...17fb378 (7): > [sonic-installer] Import re module (sonic-net#1061) > [fast-reboot]: Fix fail to execute fast-reboot problem (sonic-net#1047) > [config] Reduce Calls to SONiC Cfggen (sonic-net#1052) > [filter-fdb] Call Filter FDB Main From Within Test Code (sonic-net#1051) > [sflow_test.py]: Fix show sflow display. (sonic-net#1054) > Change fast-reboot script to use swss and radv service script (sonic-net#1036) > Common functions for show CLI support on multi ASIC (sonic-net#999) * [sonic-host-service]: Add SONiC Host Services infrastructure (sonic-net#4840) - Why I did it When SONiC is configured with the management framework and/or telemetry services, the applications running inside those containers need to access some functionality on the host system. The following is a non-exhaustive list of such functionality: Image management Configuration save and load ZTP enable/disable and status Show tech support - How I did it The host service is a Python process that listens for requests via D-Bus. It will then service those requests and send a response back to the requestor. This PR only introduces the host service infrastructure. Applications that need access to the host services must add applets that will register on D-Bus endpoints to service the appropriate functionality. - How to verify it - Description for the changelog Add SONiC Host Service for container to execute select commands in host Signed-off-by: Nirenjan Krishnan <[email protected]> * Add common functions applicable to single/multi asic platforms (sonic-net#5224) * Add common functions applicable to single/multi asic platforms * Raise exception if invalid namespace is given as input. * [sonic-swss] Update submodule (sonic-net#5231) * src/sonic-swss d2bab10...c4949a2 (34): > [dvs] Add new common issues and TOC to DVS README (sonic-net#1405) > Avoid adding loopback interface (ip link add) when setting nat zone on loopback interface (sonic-net#1411) > [portsorch] add buffer drop FC group (sonic-net#1368) > [dvs/chassis] Bring up SONiC interfaces in virtual chassis (sonic-net#1410) > [chassis/dvs] Add support for virtual chassis to DVS testbed (sonic-net#1345) > [sonic-swsss] Fix the issue of field "next_hop_ip" not getting updated in state DB in ERSPAN Mirror (sonic-net#1375) > [intfmgr] Fix OA crash issue due to link local configurations (sonic-net#1195) > Fix the issue when persistent DVS is used to run pytest which has number of front-panel ports < 32 (sonic-net#1373) > [dvs] Refactor AsicDbValidator (sonic-net#1402) > [fec] Get FEC mode when port is already admin down (sonic-net#1403) > [fec] added logic that put port down before applying fec onfiguration (sonic-net#1399) > [dvs] Add performance test for adding and deleting routes (sonic-net#1392) > Ignore IPv6 link-local and multicast entries as Vnet routes (sonic-net#1401) > [vlanmgr] Support Jumbo Frame By Default (sonic-net#1393) > Fix log/syslog not being correct when last test fails for given module (sonic-net#1395) > Get initial speed from ASIC DB (sonic-net#1390) > [dvs] Add options to limit CPU usage (sonic-net#1394) > [intfsorch] Retrieve Port object before setting NAT zone on router interfaces. (sonic-net#1372) > [.gitignore] Ignore gearsyncd binary (sonic-net#1381) > Added Max Nexthopgroup/ECMP Count supported by device into State DB. (sonic-net#1383) > [dvs] Upload logs even if failure occurs during startup (sonic-net#1389) > [rates] fix issue with rates init (sonic-net#1387) > [dvs] Validate that SWSS is ready to receive input before starting tests (sonic-net#1385) > [dvs] Convert sflow and speed tests to use dvslib (sonic-net#1382) > [dvs_acl] Refactor and document dvs_acl library (sonic-net#1378) > [dvs] Fix install instructions in README (sonic-net#1379) > [dvs] Update README with new flags, options, and known issues (sonic-net#1380) > swss: gearsyncd should return 0 on exit (sonic-net#1376) > Remove 00-copp.config.json from swss debian package. (sonic-net#1366) > fix undefined var in rates lua scripts (sonic-net#1365) > [fdborch] Fixed Orchagent crash in FDB flush on port disable. (sonic-net#1369) > [tlm_teamd]: Try to add LAG again, when teamd is not ready first time (sonic-net#1347) > [vs] Incorporate python3 best practices into DVSLib (sonic-net#1357) > [dvs] Mark unstable tests as xfail (sonic-net#1356) * [arista/aboot]: Zero out 1st MB before repartitioning (sonic-net#5220) The first partition starting point was changed to be 1M as part of this commit: 6ba2f97. On systems that are misaligned before conversion (partition start is the first sector), the relica partition that is left in the first MB can cause problems in Aboot and result in corruption of the filesystem on the new aligned partition. Zeroing this old relica makes sure that there is nothing left of the old partition lying around. There won't be any risk of having Aboot corrupt the new filesystem because of the old relica. Signed-off-by: Baptiste Covolato <[email protected]> * [sonic-py-common] Add unit test framework (sonic-net#5238) **- Why I did it** To install the framework for adding unit tests to the sonic-py-common package and report coverage. ** How I did it ** - Incorporate pytest and pytest-cov into sonic-py-common package build - Updgrade version of 'mock' installed to version 3.0.5, the last version which supports Python 2. This fixes a bug where the file object returned from `mock_open()` was not iterable (see https://bugs.python.org/issue32933) - Add support for Python 3 setuptools and pytest in sonic-slave-buster environment - Add tests for `device_info.get_machine_info()` and `device_info.get_platform()` functions - Also add a .gitignore in the root of the sonic-py-common directory, move all related ignores from main .gitignore file, and add ignores for files and dirs generated by pytest-cov * Add switch for synchronous mode (sonic-net#5237) Add a master switch so that the sync/async mode can be configured. Example usage of the switch: 1. Configure mode while building an image `make ENABLE_SYNCHRONOUS_MODE=y <target>` 2. Configure when the device is running Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db` Restart swss with `systemctl restart swss` * [enable counters] Enable port buffer drops by default and update MLNX SAI submodule (sonic-net#5059) * Enable port buffer drops by default * [Mellanox] Update SAI_Implementation Signed-off-by: Mykola Faryma <[email protected]> * Platform monitor changes in daemon_base for multi_asic (sonic-net#4932) Adding namespace support for db connect API. * [py-swsssdk] Submodule Update (sonic-net#5249) Change: c25d492 Merge pull request sonic-net#83 from tahmed-dev/taahme/add-redis-pipeline-operation 198d143 review comments - part of [configdb] Add Ability to Query/Update Redis Using Pipelines 994851c review comments - part of [configdb] Add Ability to Query/Update Redis Using Pipelines 2d2b7e1 making lgtm happy - part of [configdb] Add Ability to Query/Update Redis Using Pipelines fa9093c [configdb] Add Ability to Query/Update Redis Using Pipelines signed-off-by: Tamer Ahmed <[email protected]> * [cfggen] Use Redis Pipeline (sonic-net#5250) This PR enables cfggen to readr/write from Redis DB using pipelines. Pipelines enables batch read/write from/to Redis DB. signed-off-by: Tamer Ahmed <[email protected]> Co-authored-by: Petro Bratash <[email protected]> Co-authored-by: Tamer Ahmed <[email protected]> Co-authored-by: SuvarnaMeenakshi <[email protected]> Co-authored-by: Joe LeVeque <[email protected]> Co-authored-by: Mahesh Maddikayala <[email protected]> Co-authored-by: judyjoseph <[email protected]> Co-authored-by: abdosi <[email protected]> Co-authored-by: Sangita Maity <[email protected]> Co-authored-by: Kelly Chen <[email protected]> Co-authored-by: Samuel Angebault <[email protected]> Co-authored-by: nirenjan <[email protected]> Co-authored-by: Baptiste Covolato <[email protected]> Co-authored-by: shi-su <[email protected]> Co-authored-by: Mykola F <[email protected]>
SNMP and Telemetry services are not critical to switch startup. They also cause fast-reboot not to meet timing requirements. In order to delay start those service are associated with systemd timer units, however when hostcfgd initiate service start, it start the service and not the timer. This PR fixes this issue by starting the timer associated with systemd unit. signed-off-by: Tamer Ahmed <[email protected]>
SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.
Fixes #5172
closes #5172
signed-off-by: Tamer Ahmed [email protected]
- Why I did it
Mellanox reports fast-reboot is failing due to anmp/telemetry services being started early
- How I did it
Enabled systemd timer unit instead of systemd service unit
- How to verify it
fast-reboot :
syslog file shows delayed start succeeds:
- Which release branch to backport (provide reason below if selected)
- A picture of a cute animal (not mandatory but encouraged)