Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data path healing should be disabled #447

Merged

Conversation

ljkiraly
Copy link
Contributor

Related issue: #446

Signed-off-by: Laszlo Kiraly [email protected]

@ljkiraly ljkiraly marked this pull request as draft May 31, 2022 12:00
@ljkiraly ljkiraly marked this pull request as ready for review May 31, 2022 12:02
@ljkiraly ljkiraly requested a review from uablrek May 31, 2022 12:03
internal/config/config.go Outdated Show resolved Hide resolved
@edwarnicke
Copy link
Member

edwarnicke commented May 31, 2022

@ljkiraly I'm completely fine with an option to disable DataPath healing check... I'm curious though why :) I suspect there's an interesting issue/usecase there :)

@ljkiraly ljkiraly force-pushed the fix-en-dis-data-path-healing branch from 4ac756f to a5f42ad Compare May 31, 2022 14:09
@ljkiraly
Copy link
Contributor Author

@edwarnicke The new data path healing in NSC check the connectivity to NSE by ping. In case of nse-remote-vlan the NSE doesn't play any role in data path. The check always fails and the connection is reset by healing (by "Reconnect with reselect") in NSC

We considered the following solutions for this issue with nse-remote-vlan:

  • disable the data path healing and/or
  • set a connection endpoint address which can be used for data path monitoring.

1/ Healing should be disabled in NSC by environment parameter.
2/ Healing should be disabled in nse-remote-vlan by setting only 'src' address in IP context (no 'dst' address or same 'dst' as 'src')
3/ Healing should be supported by setting one (or more) remote address in nse-remote-vlan as environment parameter and passing by IP context. Adds data path supervision (maybe a bit expensive).

We prefer to monitor this connection by BGP and/or BFD, so the first option was chosen. The first implementation proposal seemed cleaner compared to 2nd.
Also got a comment that would be preferable that the status on the user (BFD) layer could be signaled to the NSC to trigger some kind of remedy of the lower (NSC) layer, if possible - this might be discussed in detail in community meeting.

@ljkiraly ljkiraly requested a review from edwarnicke May 31, 2022 14:29
@denis-tingaikin
Copy link
Member

@ljkiraly Should we consider about disabling that in something where in sdk-kernel https://github.com/networkservicemesh/sdk-kernel/blob/main/pkg/kernel/tools/heal/liveness_check.go?

@ljkiraly
Copy link
Contributor Author

ljkiraly commented May 31, 2022

@denis-tingaikin @edwarnicke I also tried to eliminate the dst from IP context, but seems to me that this division by zero causes problems: https://github.com/networkservicemesh/sdk-kernel/blob/33c5467b712900bd71fb99c022149383ce04539e/pkg/kernel/tools/heal/liveness_check.go#L50
But I have to double-check.

@denis-tingaikin
Copy link
Member

@ljkiraly I think we can add something like return true immediately if no addresses in the context

@ljkiraly
Copy link
Contributor Author

ljkiraly commented Jun 1, 2022

@denis-tingaikin Based on your suggestion I added two more PR regarding to data path monitoring towards nse-remote-vlan:
networkservicemesh/sdk#1304
networkservicemesh/sdk-kernel#486

These above are specific in relation to nse-remote-vlan.
Please also consider the possibility of disabling the feature in NSC by this current fix.

@LionelJouin LionelJouin mentioned this pull request Jun 1, 2022
12 tasks
Copy link
Member

@denis-tingaikin denis-tingaikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. Added few comments.

internal/config/config.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@ljkiraly ljkiraly force-pushed the fix-en-dis-data-path-healing branch 2 times, most recently from 487e8af to 93377a5 Compare June 1, 2022 14:51
Copy link
Member

@denis-tingaikin denis-tingaikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@edwarnicke Can we merge this?

Copy link
Member

@denis-tingaikin denis-tingaikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a problem. Control plane heal should be enabled in all cases.

main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@ljkiraly
Copy link
Contributor Author

ljkiraly commented Jun 1, 2022

@denis-tingaikin Thanks for the suggestions. I updated the branch.
Could you take a look at the remaining PR related to this: networkservicemesh/sdk#1304 ?

Related issue: cmd-nsc-446

Signed-off-by: Laszlo Kiraly <[email protected]>
@ljkiraly ljkiraly force-pushed the fix-en-dis-data-path-healing branch from 93377a5 to e707205 Compare June 1, 2022 15:56
@edwarnicke edwarnicke merged commit 16acded into networkservicemesh:main Jun 2, 2022
nsmbot pushed a commit to networkservicemesh/deployments-k8s that referenced this pull request Jun 2, 2022
…d-nsc@main

PR link: networkservicemesh/cmd-nsc#447

Commit: 16acded
Author: Laszlo Kiraly
Date: 2022-06-02 13:33:34 +0200
Message:
  - Data path healing should be disabled (#447)
Related issue: cmd-nsc-446

Signed-off-by: NSMBot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants