Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/Check that just bootstrapped SN is available #2542

Merged

Conversation

carpawell
Copy link
Member

No description provided.

@carpawell carpawell self-assigned this Sep 1, 2023
@carpawell carpawell force-pushed the feat/validate-SN-availability-by-IR branch 2 times, most recently from 2bea24e to 7fa4a5a Compare September 1, 2023 16:43
return
}

_, err = c.EndpointInfo(context.Background(), client.PrmEndpointInfo{})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roman-khimov, @cthulhu-rider, @AliceInHunterland, ping me if you think we need to check returned values (IMO, that is useless)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd check them against the data for the map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some checks

@carpawell carpawell marked this pull request as ready for review September 1, 2023 16:44
@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Merging #2542 (ccf8b5c) into master (9224e9f) will decrease coverage by 0.01%.
Report is 2 commits behind head on master.
The diff coverage is 0.00%.

❗ Current head ccf8b5c differs from pull request most recent head 757b90e. Consider uploading reports for the commit 757b90e to get more accurate results

@@            Coverage Diff             @@
##           master    #2542      +/-   ##
==========================================
- Coverage   29.67%   29.66%   -0.01%     
==========================================
  Files         405      405              
  Lines       30756    30758       +2     
==========================================
- Hits         9126     9124       -2     
- Misses      20860    20864       +4     
  Partials      770      770              
Files Changed Coverage Δ
pkg/innerring/innerring.go 0.00% <0.00%> (ø)
pkg/innerring/processors/netmap/process_peers.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

📢 Have feedback on the report? Share it here.

Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have some temporary networking error/inconsistent behavior across IR nodes? Can this be retried once in a while until the notary request expires?

pkg/innerring/processors/netmap/process_peers.go Outdated Show resolved Hide resolved
@carpawell
Copy link
Member Author

What if we have some temporary networking error/inconsistent behavior across IR nodes?

Every IR node? More than 5/7?

Can this be retried once in a while until the notary request expires?

Do you mean some cache of "loser" SN nodes that will be retried some times after some period? Need to think about it. Maybe it would be better to do it on the SN side? It may not find itself as a candidate for the next epoch and retry request (but to whom if the IR (local morph consensus) is down?).

@carpawell carpawell force-pushed the feat/validate-SN-availability-by-IR branch 2 times, most recently from acb6354 to 30fba81 Compare September 4, 2023 22:51
@roman-khimov
Copy link
Member

What if we have some temporary networking error/inconsistent behavior across IR nodes?

Every IR node? More than 5/7?

Can be anything.

Maybe it would be better to do it on the SN side?

That depends on notary request expiration parameters mostly, although nodes can retry even without waiting for request to expire.

var info clientcore.NodeInfo
err = clientcore.NodeInfoFromRawNetmapElement(&info, netmapCore.Node(nodeInfo))
if err != nil {
np.log.Warn("could not convert node info: %w", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
np.log.Warn("could not convert node info: %w", zap.Error(err))
np.log.Warn("could not convert node info", zap.Error(err))

same below

@@ -37,6 +44,37 @@ func (np *Processor) processAddPeer(ev netmapEvent.AddPeer) {
return
}

var info clientcore.NodeInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we extend np.nodeValidator.VerifyAndUpdate procedure below with new check?

Copy link
Member Author

@carpawell carpawell Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have not thought about that. it does not seem to me like a node info structure validator. do we need a separate pkg per every validation, a separate interface for it (and a separate composite interface implementation) for checks that should be done in only one place and be done always?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, moved to a separate validator

return
}

res, err := c.EndpointInfo(context.Background(), client.PrmEndpointInfo{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd add some sane deadline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 15 sec

@@ -116,3 +154,60 @@ func (np *Processor) processUpdatePeer(ev netmapEvent.UpdatePeer) {
np.log.Error("can't invoke netmap.UpdatePeer", zap.Error(err))
}
}

var errUnknownDifference = errors.New("announced and received information differ but not critical")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in current code makes no sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont mind dropping it

func compareNodeInfos(niExp, niGot netmap.NodeInfo) error {
// if a node bootstraps the first time it is OK
// to be OFFLINE but send ONLINE bootstrap requests
niGot.SetOnline()
Copy link
Contributor

@cthulhu-rider cthulhu-rider Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

states will be almost definitely different, maybe SetOnline() for both instances?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

states will be almost definitely different

do not agree, health check pings should be the majority

maybe SetOnline() for both instances?

agree that it should allow maintenance state transfer, fixed

return fmt.Errorf("hash: got %d, expect %d", got, exp)
}

niExp.SortAttributes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, strange, not needed at all, dropped

return fmt.Errorf("attr number: got %d, expect %d", got, exp)
}

niExp.IterateAttributes(func(key, value string) {
Copy link
Contributor

@cthulhu-rider cthulhu-rider Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw do we check the uniqueness of attributes and endpoints somewhere? if so, i'd add a comment here, otherwise, lets cover this too

https://github.com/nspcc-dev/neofs-node/pull/2542/files#r1315818528 perhaps reduce this ambiguity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw do we check the uniqueness of attributes and endpoints somewhere?

not found that check on the IR side. but our SN will not accept such config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a validator for it (extended the existing one) but found out that it is impossible to parse such node info so i can't even add a test for it

@carpawell carpawell force-pushed the feat/validate-SN-availability-by-IR branch from 30fba81 to b2e5c32 Compare September 5, 2023 19:26
@carpawell
Copy link
Member Author

That depends on notary request expiration parameters mostly, although nodes can retry even without waiting for request to expire.

So we may do smth, I am not sure I have a strict opinion on that. @cthulhu-rider, @AliceInHunterland

@carpawell carpawell force-pushed the feat/validate-SN-availability-by-IR branch from b2e5c32 to 85ca87e Compare September 5, 2023 19:46
Improves debug process.

Signed-off-by: Pavel Karpy <[email protected]>
That is the API restriction. Add this check to the `maddress` validator and
rename in to `structure` validator.

Signed-off-by: Pavel Karpy <[email protected]>
@carpawell carpawell force-pushed the feat/validate-SN-availability-by-IR branch from 85ca87e to 757b90e Compare September 7, 2023 20:07

c, err := client.New(prmInit)
if err != nil {
return nil, fmt.Errorf("can't create SDK client: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general: SDK client always seems stange to me, it's actully NeoFS API clent

@roman-khimov roman-khimov merged commit 0a13794 into nspcc-dev:master Sep 8, 2023
@carpawell carpawell deleted the feat/validate-SN-availability-by-IR branch September 8, 2023 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants