Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Stateless cni migration #2363

Closed
wants to merge 15 commits into from
Closed

Conversation

behzad-mir
Copy link
Contributor

This PR contains code changes related to Stateless CNI Migration.
1- There has been code change to use cnireconciler (pointing to statefull CNI binary) when the managedEndpoint and Migration flag on CNS is set.
2- There has been code changes on CNI GetEndpoint function to fetch the HNS Endpoint corresponded to an IP address to cover the cases for the missing HNSEndpointID in statefile.

Reason for Change:

Issue Fixed:

Requirements:

Notes:

@behzad-mir behzad-mir requested review from a team as code owners November 7, 2023 22:19
@behzad-mir behzad-mir requested review from vipul-21 and rbtr November 7, 2023 22:19
@behzad-mir behzad-mir added cni Related to CNI. do-not-merge labels Nov 7, 2023
@behzad-mir behzad-mir changed the base branch from master to Dev-Stateless-CNI November 7, 2023 22:22
@behzad-mir behzad-mir requested a review from vakalapa November 7, 2023 22:34
@behzad-mir behzad-mir force-pushed the SatetlessCNI-Migration branch from 38c9751 to b1812e6 Compare November 7, 2023 23:23
Copy link
Contributor

@rbtr rbtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the state of the world at the point that this would run. This still seems to assume some amount of stateless CNI has happened - in that the existing CNI is expected to be renamed to azure-vnet-stateful, at least.

We need to stop thinking about this as part of the stateless CNI change. This is a CNI-managed state to CNS-managed state migration. It is happening entirely independently of any other feature and should not assume anything about the progress of other features. It shouldn't be part of the stateless CNI changeset (except as a prereq) or branch.

}

func NewStatefullCNIPodInfoProvider() (cns.PodInfoByIPProvider, error) {
return newCNIPodInfoProvider(exec.New(), platform.CNIStateFilePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this the statefile path? shouldn't it be the executable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOps, Was a mistake


// getHNSEndpointIdByIP returns an HNS Endpoint IP that matches an specific IPAddress.
func (nm *networkManager) GetEndpointInfoByIPImpl(ipAddresses []net.IPNet, _ string) (string, error) {
return "", errors.New("No HostVethName matches the IPAddress: " + ipAddresses[0].IP.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Errorf with formatter instead of concat and don't sentence case errors

@behzad-mir
Copy link
Contributor Author

behzad-mir commented Nov 9, 2023

I'm confused about the state of the world at the point that this would run. This still seems to assume some amount of stateless CNI has happened - in that the existing CNI is expected to be renamed to azure-vnet-stateful, at least.

We need to stop thinking about this as part of the stateless CNI change. This is a CNI-managed state to CNS-managed state migration. It is happening entirely independently of any other feature and should not assume anything about the progress of other features. It shouldn't be part of the stateless CNI changeset (except as a prereq) or branch.

At the moment that we enter into migration, statless CNI has already been deployed. There are two ways to handles this: 1) We should either duplicate the whole GetEndpointState procedure with store lock initialization from the statefull CNI into the stateless code, or 2) just invoke the statefull CNI binary.
I went with the second option since it will avoid the code duplication and having the whole statefile recovery on stateless CNI will simply spoil the whole point of "stateless" CNI. In my opinion, the migration should call into statefull CNI from design perspective since the statefile is part of that binary.

If you're also advocating for a migration code regardless of having Stateless CNI, I belive we have to add another field to the CNS configmap and based on that condition we can simply call to the azure-vnet.exe binary (which is statefull) to do the migration.

behzad-mir and others added 8 commits November 28, 2023 15:37
#2102)

* Adding getEndpoint and UpdateEndpoint API to CNS with the respective clients in support of stateless CNI.

* Updating the unit tests and address the comments.

* Addressing the comments.

* Addressing the coments regarding CNS support for Stateless CNI

* Adddressing the PR comments
…package. (#2197)

* Apllying stateless CNI mode in network package.

* Addresing the commetns.
@behzad-mir behzad-mir force-pushed the Dev-Stateless-CNI branch 2 times, most recently from 35227c8 to 54aacbd Compare November 28, 2023 23:45
@behzad-mir behzad-mir force-pushed the SatetlessCNI-Migration branch from b1812e6 to c127777 Compare November 29, 2023 22:42
@behzad-mir behzad-mir requested a review from rbtr November 29, 2023 22:43
@behzad-mir behzad-mir force-pushed the SatetlessCNI-Migration branch 2 times, most recently from 458ec6d to 671e982 Compare November 29, 2023 23:38
@behzad-mir behzad-mir force-pushed the SatetlessCNI-Migration branch from 671e982 to 82df49e Compare November 29, 2023 23:58
Copy link
Member

@tamilmani1989 tamilmani1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change include linux support.? If you intend to create separate PR for it, update the title to reflect its for windows.

for _, ipConfiguration := range hcnEndpoint.IpConfigurations {
for _, ipAddress := range ipAddresses {
prefixLength, _ := ipAddress.Mask.Size()
if ipConfiguration.IpAddress == ipAddress.IP.String() && ipConfiguration.PrefixLength == uint8(prefixLength) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is matching prefix length required? isn't that be always /32 ip?


// IsEndpointStateComplete returns true if both HNSEndpointID and HostVethName are missing.
func (epInfo *EndpointInfo) IsEndpointStateIncomplete() bool {
if epInfo.HNSEndpointID == "" && epInfo.IfName == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess epInfo.Ifname is not hostvethname. there would be field in epInfo explictly saying hostveth interface name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a HostIfName on ednpoint. DO you want me to add it to endpointInfo struct too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would always return false for linux ? As HNSendpointId is empty for linux

@@ -1191,10 +1191,19 @@ func InitializeCRDState(ctx context.Context, httpRestService cns.HTTPService, cn
if err != nil {
if errors.Is(err, store.ErrKeyNotFound) {
logger.Printf("[Azure CNS] No endpoint state found, skipping initializing CNS state")
if os.Getenv("StatelessCNIMigration") == "true" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this come from the config instead of the env? and to align with the convention there, maybeEnableManagedStateMigration

@@ -492,3 +493,28 @@ func (ep *endpoint) getInfoImpl(epInfo *EndpointInfo) {
func (nm *networkManager) updateEndpointImpl(nw *network, existingEpInfo *EndpointInfo, targetEpInfo *EndpointInfo) (*endpoint, error) {
return nil, nil
}

// GetEndpointInfoByIPImpl returns an endpointInfo with the corrsponding HNS Endpoint ID that matches an specific IP Address.
func (epInfo *EndpointInfo) GetEndpointInfoByIPImpl(ipAddresses []net.IPNet, networkID string) (*EndpointInfo, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add logs to the functions when it fails/succeeds while fetching Network/Endpoints etc.


// IsEndpointStateComplete returns true if both HNSEndpointID and HostVethName are missing.
func (epInfo *EndpointInfo) IsEndpointStateIncomplete() bool {
if epInfo.HNSEndpointID == "" && epInfo.IfName == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would always return false for linux ? As HNSendpointId is empty for linux

@behzad-mir behzad-mir force-pushed the Dev-Stateless-CNI branch 2 times, most recently from ae68d79 to 9344500 Compare December 8, 2023 21:47
Base automatically changed from Dev-Stateless-CNI to master December 12, 2023 00:15
Copy link

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale due to inactivity. label Dec 27, 2023
Copy link

github-actions bot commented Jan 4, 2024

Pull request closed due to inactivity.

@github-actions github-actions bot closed this Jan 4, 2024
@github-actions github-actions bot deleted the SatetlessCNI-Migration branch January 4, 2024 00:01
@behzad-mir behzad-mir restored the SatetlessCNI-Migration branch January 4, 2024 00:03
@behzad-mir behzad-mir removed the stale Stale due to inactivity. label Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cni Related to CNI. do-not-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants