Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added changes to handle dependency check in FdbSyncd and FpmSyncd for warm-boot #1556

Merged
merged 14 commits into from
Mar 3, 2021
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 37 additions & 2 deletions fdbsyncd/fdbsync.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,36 @@ FdbSync::~FdbSync()
}
}


// Check if interface entries are restored in kernel
bool FdbSync::isIntfRestoreDone()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FdbSync::isIntfRestoreDone() , FdbSync::isReadyToReconcile() and RouteSync::isReadyToReconcile() are doing similar tasks, it seems we could make them into library calls with different input parameters. Later PR is probably fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure .. will consider in future updates to the code

{
vector<string> required_modules = {
"vxlanmgrd",
"intfmgrd",
"vlanmgrd",
"vrfmgrd"
};

for (string& module : required_modules)
{
WarmStart::WarmStartState state;

WarmStart::getWarmStartState(module, state);
if (state == WarmStart::REPLAYED || state == WarmStart::RECONCILED)
{
SWSS_LOG_INFO("Module %s Replayed or Reconciled %d",module.c_str(), (int) state);
}
else
{
SWSS_LOG_INFO("Module %s NOT Replayed or Reconciled %d",module.c_str(), (int) state);
return false;
}
}

return true;
}

void FdbSync::processCfgEvpnNvo()
{
std::deque<KeyOpFieldsValuesTuple> entries;
Expand Down Expand Up @@ -447,14 +477,17 @@ void FdbSync::macDelVxlanDB(string key)
fvVector.push_back(t);
fvVector.push_back(v);

SWSS_LOG_NOTICE("%sVXLAN_FDB_TABLE: DEL_KEY %s vtep:%s type:%s",
m_AppRestartAssist->isWarmStartInProgress() ? "WARM-RESTART:" : "" ,
key.c_str(), vtep.c_str(), type.c_str());

// If warmstart is in progress, we take all netlink changes into the cache map
if (m_AppRestartAssist->isWarmStartInProgress())
{
m_AppRestartAssist->insertToMap(APP_VXLAN_FDB_TABLE_NAME, key, fvVector, true);
return;
}

SWSS_LOG_INFO("VXLAN_FDB_TABLE: DEL_KEY %s vtep:%s type:%s", key.c_str(), vtep.c_str(), type.c_str());
m_fdbTable.del(key);
return;

Expand All @@ -476,14 +509,16 @@ void FdbSync::macAddVxlan(string key, struct in_addr vtep, string type, uint32_t
fvVector.push_back(t);
fvVector.push_back(v);

SWSS_LOG_INFO("%sVXLAN_FDB_TABLE: ADD_KEY %s vtep:%s type:%s",
m_AppRestartAssist->isWarmStartInProgress() ? "WARM-RESTART:" : "" ,
key.c_str(), svtep.c_str(), type.c_str());
// If warmstart is in progress, we take all netlink changes into the cache map
if (m_AppRestartAssist->isWarmStartInProgress())
{
m_AppRestartAssist->insertToMap(APP_VXLAN_FDB_TABLE_NAME, key, fvVector, false);
return;
}

SWSS_LOG_INFO("VXLAN_FDB_TABLE: ADD_KEY %s vtep:%s type:%s", key.c_str(), svtep.c_str(), type.c_str());
m_fdbTable.set(key, fvVector);

return;
Expand Down
15 changes: 12 additions & 3 deletions fdbsyncd/fdbsync.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,17 @@
#include "netmsg.h"
#include "warmRestartAssist.h"

// The timeout value (in seconds) for fdbsyncd reconcilation logic
#define DEFAULT_FDBSYNC_WARMSTART_TIMER 30
/*
* Default warm-restart timer interval for routing-stack app
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the comment to default timer for fdb reconciliation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure .. will do

*/
#define DEFAULT_FDBSYNC_WARMSTART_TIMER 120
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier, how is this fdb depending on routing stack? If user configures the routing stack warm-restart timer to a bigger value and it actually took that much time to reconcile for routing stack, what is the consequence?
If the dependency is must, We should probably also read the routing stack reconciliation status before we reconcile here for fdb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdbsyncd reconciliation is dependant on BGP convergence time. Will change it to use the BGP warm-restart timer config value instead of hardcoded value. That way the reconcile is related to the control plane convergence. Same way its done in fpmsyncd too. Further to optimise the reconciliation time, EOIU feature is implemented for fpmsyncd to check for actual protocol convergence. This is not yet validated for fdbsyncd and will be implemented later. For now fdbsyncd will only use the bgp warm-restart timer config value as in fpmsyncd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdbsyncd reconciliation is dependant on BGP convergence time. Will change it to use the BGP warm-restart timer config value instead of hardcoded value. That way the reconcile is related to the control plane convergence. Same way its done in fpmsyncd too. Further to optimise the reconciliation time, EOIU feature is implemented for fpmsyncd to check for actual protocol convergence. This is not yet validated for fdbsyncd and will be implemented later. For now fdbsyncd will only use the bgp warm-restart timer config value as in fpmsyncd


/*
* This is the MAX time in seconds, fdbsyncd will wait after warm-reboot
* for the interface entries to be recreated in kernel before attempting to
* write the FDB data to kernel
*/
#define INTF_RESTORE_MAX_WAIT_TIME 180

namespace swss {

Expand Down Expand Up @@ -43,7 +52,7 @@ class FdbSync : public NetMsg

virtual void onMsg(int nlmsg_type, struct nl_object *obj);

bool isFdbRestoreDone();
bool isIntfRestoreDone();

AppRestartAssist *getRestartAssist()
{
Expand Down
43 changes: 41 additions & 2 deletions fdbsyncd/fdbsyncd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ int main(int argc, char **argv)
Selectable *temps;
int ret;
Select s;
SelectableTimer replayCheckTimer(timespec{0, 0});

using namespace std::chrono;

Expand All @@ -45,7 +46,31 @@ int main(int argc, char **argv)
if (sync.getRestartAssist()->isWarmStartInProgress())
{
sync.getRestartAssist()->readTablesToMap();

steady_clock::time_point starttime = steady_clock::now();
while (!sync.isIntfRestoreDone())
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isIntfRestoreDone [](start = 29, length = 17)

CPU is wasted on waiting. Could you subscribe Redis? #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is not a continuous busy wait ( sleep is present), this should not cause the cpu to be continuously busy. Also there is nothing for fdbsyncd to do until the interface info is populated to kernel after system warm-reboot, hence it needs to wait till such time.

{
duration<double> time_span =
duration_cast<duration<double>>(steady_clock::now() - starttime);
int pasttime = int(time_span.count());

if (pasttime > INTF_RESTORE_MAX_WAIT_TIME)
{
SWSS_LOG_INFO("timed-out before all interface data was replayed to kernel!!!");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if intf is not restored after max_wait_time? Shouldn't we abort to avoid more issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System will proceed further. Some mac programming to kernel might fail because underlying interface is not yet created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could not restore interface, why we should proceed further and get into some limbo state that may or may not have critical issues. I would suggest we abort to bring user's attention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interfaces will be eventually restored. The only impact will be that warm-reboot might not be hitless and there will be traffic loss seen. Not sure if we need to go for full abort and impact everything and all traffic. Requesting @prsunny to comment on this too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Some mac programming to kernel might fail because underlying interface is not yet created." so this condition will be recovered by someone later? Again, if it is a critical condition, we should raise/abort so we don't get into limbo state.

throw runtime_error("fdbsyncd: timedout on interface data replay");
}
sleep(1);
}
SWSS_LOG_NOTICE("Starting ReconcileTimer");
sync.getRestartAssist()->startReconcileTimer(s);
replayCheckTimer.setInterval(timespec{1, 0});
replayCheckTimer.start();
s.addSelectable(&replayCheckTimer);
}
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra blank line #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

else
{
sync.getRestartAssist()->warmStartDisabled();
sync.m_reconcileDone = true;
}

netlink.registerGroup(RTNLGRP_LINK);
Expand All @@ -67,14 +92,28 @@ int main(int argc, char **argv)
{
s.select(&temps);

if(temps == (Selectable *)sync.getFdbStateTable())
if (temps == (Selectable *)sync.getFdbStateTable())
{
sync.processStateFdb();
}
else if (temps == (Selectable *)sync.getCfgEvpnNvoTable())
{
sync.processCfgEvpnNvo();
}
else if (temps == &replayCheckTimer)
{
if (sync.getFdbStateTable()->empty() && sync.getCfgEvpnNvoTable()->empty())
{
sync.getRestartAssist()->appDataReplayed();
SWSS_LOG_NOTICE("FDB Replay Complete");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeSelectable for replayCheckTimer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since the replaychecktimer and reconciliation timer are in parallel, what is the consequence if reconciliation timer is up, but we haven't replayed? If replay is must, but not yet done after reconciliation timer, we should log the error and raise.

Copy link
Contributor Author

@nkelapur nkelapur Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will removeSelectable replayCheckTimer and start recontillation timer after replay is done.

}
else
{
replayCheckTimer.setInterval(timespec{1, 0});
// re-start replay check timer
replayCheckTimer.start();
}
}
else
{
/*
Expand All @@ -88,7 +127,7 @@ int main(int argc, char **argv)
sync.m_reconcileDone = true;
sync.getRestartAssist()->stopReconcileTimer(s);
sync.getRestartAssist()->reconcile();
SWSS_LOG_NOTICE("VXLAN FDB VNI Reconcillation Complete (Timer)");
SWSS_LOG_NOTICE("VXLAN FDB VNI Reconcillation Complete");
}
}
}
Expand Down
7 changes: 7 additions & 0 deletions fpmsyncd/fpmsyncd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ using namespace swss;
*/
const uint32_t DEFAULT_ROUTING_RESTART_INTERVAL = 120;


// Wait 3 seconds after detecting EOIU reached state
// TODO: support eoiu hold interval config
const uint32_t DEFAULT_EOIU_HOLD_INTERVAL = 3;
Expand Down Expand Up @@ -67,6 +68,7 @@ int main(int argc, char **argv)
SelectableTimer eoiuCheckTimer(timespec{0, 0});
// After eoiu flags are detected, start a hold timer before starting reconciliation.
SelectableTimer eoiuHoldTimer(timespec{0, 0});

/*
* Pipeline should be flushed right away to deal with state pending
* from previous try/catch iterations.
Expand Down Expand Up @@ -108,6 +110,10 @@ int main(int argc, char **argv)
s.addSelectable(&eoiuCheckTimer);
SWSS_LOG_NOTICE("Warm-Restart eoiuCheckTimer timer started.");
}
else
{
sync.m_warmStartHelper.setState(WarmStart::WSDISABLED);
}

while (true)
{
Expand All @@ -132,6 +138,7 @@ int main(int argc, char **argv)
{
SWSS_LOG_NOTICE("Warm-Restart EOIU hold timer expired.");
}

if (sync.m_warmStartHelper.inProgress())
{
sync.m_warmStartHelper.reconcile();
Expand Down
4 changes: 2 additions & 2 deletions tests/test_warm_reboot.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def swss_app_check_RestoreCount_single(state_db, restore_count, name):
if fv[0] == "restore_count":
assert int(fv[1]) == restore_count[key] + 1
elif fv[0] == "state":
assert fv[1] == "reconciled" or fv[1] == "disabled"
assert fv[1] == "reconciled" or fv[1] == "disabled"

def swss_app_check_warmstart_state(state_db, name, state):
warmtbl = swsscommon.Table(state_db, swsscommon.STATE_WARM_RESTART_TABLE_NAME)
Expand Down Expand Up @@ -1150,7 +1150,7 @@ def test_routing_WarmRestart(self, dvs, testlog):
time.sleep(5)

# Verify FSM
swss_app_check_warmstart_state(state_db, "bgp", "")
swss_app_check_warmstart_state(state_db, "bgp", "disabled")

# Verify that multiple changes are seen in swss and sairedis logs as there's
# no warm-reboot logic in place.
Expand Down
10 changes: 10 additions & 0 deletions warmrestart/warmRestartAssist.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,16 @@ AppRestartAssist::cache_state_t AppRestartAssist::getCacheEntryState(const std::
throw std::logic_error("cache entry state is invalid");
}

void AppRestartAssist::appDataReplayed()
{
WarmStart::setWarmStartState(m_appName, WarmStart::REPLAYED);
}

void AppRestartAssist::warmStartDisabled()
{
WarmStart::setWarmStartState(m_appName, WarmStart::WSDISABLED);
}

// Read table(s) from APPDB and append stale flag then insert to cachemap
void AppRestartAssist::readTablesToMap()
{
Expand Down
2 changes: 2 additions & 0 deletions warmrestart/warmRestartAssist.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ class AppRestartAssist
void stopReconcileTimer(Select &s);
bool checkReconcileTimer(Selectable *s);
void readTablesToMap(void);
void appDataReplayed(void);
void warmStartDisabled(void);
void insertToMap(std::string tableName, std::string key, std::vector<FieldValueTuple> fvVector, bool delete_key);
void reconcile(void);
bool isWarmStartInProgress(void)
Expand Down