Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load config after subscribe #5740

Merged
merged 3 commits into from
Oct 31, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions files/image_config/hostcfgd/hostcfgd
Original file line number Diff line number Diff line change
Expand Up @@ -233,17 +233,24 @@ class HostConfigDaemon:
self.config_db = ConfigDBConnector()
self.config_db.connect(wait_for_init=True, retry_on=True)
syslog.syslog(syslog.LOG_INFO, 'ConfigDB connect success')

self.aaacfg = AaaCfg()
self.iptables = Iptables()
# Cache the values of 'state' field in 'FEATURE' table of each container
self.cached_feature_states = {}

self.is_multi_npu = device_info.is_multi_npu()


def load(self):
aaa = self.config_db.get_table('AAA')
tacacs_global = self.config_db.get_table('TACPLUS')
tacacs_server = self.config_db.get_table('TACPLUS_SERVER')
self.aaacfg = AaaCfg()
self.aaacfg.load(aaa, tacacs_global, tacacs_server)

lpbk_table = self.config_db.get_table('LOOPBACK_INTERFACE')
self.iptables = Iptables()
self.iptables.load(lpbk_table)
self.is_multi_npu = device_info.is_multi_npu()
# Cache the values of 'state' field in 'FEATURE' table of each container
self.cached_feature_states = {}


def update_feature_state(self, feature_name, state, feature_table):
has_timer = ast.literal_eval(feature_table[feature_name].get('has_timer', 'False'))
Expand Down Expand Up @@ -367,14 +374,19 @@ class HostConfigDaemon:
self.update_feature_state(feature_name, state, feature_table)

def start(self):
# Update all feature states once upon starting
self.update_all_feature_states()

self.config_db.subscribe('AAA', lambda table, key, data: self.aaa_handler(key, data))
self.config_db.subscribe('TACPLUS_SERVER', lambda table, key, data: self.tacacs_server_handler(key, data))
self.config_db.subscribe('TACPLUS', lambda table, key, data: self.tacacs_global_handler(key, data))
self.config_db.subscribe('LOOPBACK_INTERFACE', lambda table, key, data: self.lpbk_handler(key, data))
self.config_db.subscribe('FEATURE', lambda table, key, data: self.feature_state_handler(key, data))

# Update all feature states once upon starting
self.update_all_feature_states()

# Defer load until subscribe
self.load()

self.config_db.listen()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the code, find that self.config_db.subscribe() is a misnomer .. it doesn't actually do a pubsub.psubscribe(). It just assigns a handler and comes out. It is this API self.config_db.listen() which actually does pubsub.psubscribe()

So we could still miss the events since self.update_all_feature_states() is before listen call ?
Just wondering if this fix will take care of all cases still ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judyjoseph self.load() should take care care of that at least for TACACS/AAA
I think one window is config-db write happening in between self.load() and self.config_db.listen(). Might be very corner case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still this 23 seconds long call -- taken to start all services, even when all are running, is way too expensive. We would need to take a second look at it. Better would be to run it on a different thread using timer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes !! It may be ok for the user to wait for 23 secs to get login access in case of single ASIC. On multi-asic wait will be longer, as we have more processes per ASIC and the total wait could be more than 1min.



Expand Down