Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose memberlist label configs #187

Merged
merged 11 commits into from
Jul 8, 2022
Merged

Conversation

replay
Copy link
Contributor

@replay replay commented Jul 7, 2022

This makes three changes

  1. Switches the vendored version of memberlist to a branch in our fork where we fixed the skip-inbound-label-check (PR)
  2. Exposes the relevant options to allow users of dskit to configure the memberlist label feature
  3. Adds unit tests which test 4 configurations of memberlist clusters:
    1. TestMultipleClients Cluster with no labels and skip-inbound-label-check disabled, expected to succeed in joining
    2. TestMultipleClientsWithMixedLabelsAndExpectFailure Cluster where some members have labels and some don't and skip-inbound-label-check is disabled, this cluster is expected to fail because the members can't join each other
    3. TestMultipleClientsWithMixedLabelsAndSkipLabelCheck Cluster where some members have labels and some don't and skip-inbound-label-check is enabled, this cluster is expected to succeed in joining
    4. TestMultipleClientsWithSameLabelWithoutSkipLabelCheck Cluster where all members have the same label and skip-inbound-label-check is disabled, this cluster is expected to succeed in joining

The above unit tests basically test a migration scenario where we migrate an existing cluster which currently doesn't use labels to use labels by going through the following steps:

  1. Enable skip-inbound-label-check so that members with different labels (some without any label) can join each other
  2. Roll out a label to all processes, since this change gets rolled out slowly across the pods it is important that pods with different labels can join each other into one cluster
  3. Disable skip-inbound-label-check

Furthermore the unit tests also verify that if skip-inbound-label-check is disabled, processes which have different labels cannot join each other, providing isolation between the processes.

@replay replay marked this pull request as draft July 7, 2022 19:43
Signed-off-by: Mauro Stettler <[email protected]>
Signed-off-by: Mauro Stettler <[email protected]>
@replay replay marked this pull request as ready for review July 7, 2022 23:59
Copy link
Collaborator

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments.

In general, I'm not sure why you modify test helpers into returning errors instead of failing on them directly? Personally I prefer for test helpers to catch errors, so their callers don't have to check for them (which increases test verbosity).

if err := casWithErr(context.Background(), t, kv, key, updateFn); err != nil {
t.Fatal(err)
}
func cas(kv *Client, key string, updateFn func(*data) (*data, bool, error)) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change to return an error instead of failing the test on error (as a t.Helper())? I think normally it's preferable to catch test errors in test helpers instead of returning them.

func TestMultipleClients(t *testing.T) {
c := dataCodec{}
members := 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Might as well be a constant:

Suggested change
members := 10
const members = 10

BindAddrs: []string{"localhost"},
BindPort: 0, // randomize ports
func TestMultipleClientsWithMixedLabelsAndSkipLabelCheck(t *testing.T) {
members := 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Might as well be a constant:

Suggested change
members := 3
const members = 3


var clients []*Client
func TestMultipleClientsWithMixedLabelsAndExpectFailure(t *testing.T) {
members := 3
Copy link
Collaborator

@aknuds1 aknuds1 Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Might as well be a constant:

Suggested change
members := 3
const members = 3

}

func TestMultipleClientsWithSameLabelWithoutSkipLabelCheck(t *testing.T) {
members := 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Might as well be a constant:

Suggested change
members := 10
const members = 10

configGen := func(i int) KVConfig {
cfg := defaultKVConfig(i)

cfg.Label = label
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, should we set SkipInboundLabelCheck also here?

Suggested change
cfg.Label = label
cfg.SkipInboundLabelCheck = false
cfg.Label = label

require.NoError(t, err)
}

func testMultipleClientsWithConfigGenerator(t *testing.T, members int, configGen func(memberId int) KVConfig) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this into a t.Helper() that catches errors instead of returning them, it would only make the tests simpler?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of TestMultipleClientsWithMixedLabelsAndExpectFailure.

@aknuds1 aknuds1 requested a review from a team July 8, 2022 07:38
@pracucci pracucci self-requested a review July 8, 2022 07:55
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM (modulo few nits).

kv/memberlist/memberlist_client.go Outdated Show resolved Hide resolved
kv/memberlist/memberlist_client.go Outdated Show resolved Hide resolved
@@ -594,7 +705,11 @@ func TestMultipleClients(t *testing.T) {
if updates < members {
// in general, at least one update from each node. (although that's not necessarily true...
// but typically we get more updates than that anyway)
t.Errorf("expected to see updates, got %d", updates)
return fmt.Errorf("expected to see %d updates, got %d", members, updates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("expected to see %d updates, got %d", members, updates)
return fmt.Errorf("expected to see at least %d updates, got %d", members, updates)

require.NoError(t, err)
}

func testMultipleClientsWithConfigGenerator(t *testing.T, members int, configGen func(memberId int) KVConfig) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of TestMultipleClientsWithMixedLabelsAndExpectFailure.

cfg.GossipInterval = 100 * time.Millisecond
cfg.GossipNodes = 3
cfg.PushPullInterval = 5 * time.Second
err := testMultipleClientsWithConfigGenerator(t, members, configGen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] We can remove members variable at all.

Suggested change
err := testMultipleClientsWithConfigGenerator(t, members, configGen)
err := testMultipleClientsWithConfigGenerator(t, len(labels), configGen)

return cfg
}

err := testMultipleClientsWithConfigGenerator(t, members, configGen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

@@ -399,6 +404,9 @@ func (m *KV) buildMemberlistConfig() (*memberlist.Config, error) {
mlCfg.AdvertiseAddr = m.cfg.AdvertiseAddr
mlCfg.AdvertisePort = m.cfg.AdvertisePort

mlCfg.Label = m.cfg.Label
mlCfg.SkipInboundLabelCheck = m.cfg.SkipInboundLabelCheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below we log:

level.Info(m.logger).Log("msg", "Using memberlist cluster node name", "name", mlCfg.Name)

I would move that log at the end of the function, and change it to something like:

level.Info(m.logger).Log("msg", "Using memberlist cluster label %q and node name %q", mlCfg.Label, mlCfg.Name)

When manually testing this, I found it useful seeing the actual label.

@@ -193,6 +196,8 @@ func (cfg *KVConfig) RegisterFlagsWithPrefix(f *flag.FlagSet, prefix string) {
f.BoolVar(&cfg.EnableCompression, prefix+"memberlist.compression-enabled", mlDefaults.EnableCompression, "Enable message compression. This can be used to reduce bandwidth usage at the cost of slightly more CPU utilization.")
f.StringVar(&cfg.AdvertiseAddr, prefix+"memberlist.advertise-addr", mlDefaults.AdvertiseAddr, "Gossip address to advertise to other members in the cluster. Used for NAT traversal.")
f.IntVar(&cfg.AdvertisePort, prefix+"memberlist.advertise-port", mlDefaults.AdvertisePort, "Gossip port to advertise to other members in the cluster. Used for NAT traversal.")
f.StringVar(&cfg.Label, prefix+"memberlist.label", mlDefaults.Label, "Label is an optional set of bytes to include on the outside of each packet and stream.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "label" CLI flag and YAML config option name is very generic. I'm wondering if we should call it something like "cluster-label" or "cluster-id" given we use it to authenticate the cluster ID. If so, then I would change the "skip" CLI/YAML option accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster-label has a nice sound to it

I think the config option (which inverts the behavior to memberlist unfortunately) which I would prefer would be

enforce-cluster-label

Although we might want to get more opinions on enforce i'm not sure how well that would be understood for non native english speakers and if there is something more clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also enforce-cluster-label would default to true IMO)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like "cluster-label" too. While working on the jsonnet changes to configure it, it sounds nice in the config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforce-cluster-label

WDYT about "cluster-label-verification-enabled" instead of "enforce-cluster-label"? While doing change I've the feeling "enforce-cluster-label" may not be that clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Ed on Slack:

sounds good to me! I like that better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Ed on Slack:

also you could do it then as cluster-label-verification-disabled
and the default could be false and it would then match the same semantics as the existing config in memberlist

@pracucci
Copy link
Contributor

pracucci commented Jul 8, 2022

@replay I pick up the work to do changes to config.

@pracucci
Copy link
Contributor

pracucci commented Jul 8, 2022

Switches the vendored version of memberlist to a branch in our fork where we fixed the skip-inbound-label-check (grafana/memberlist#3)

I merged that PR and updated go.mod in this PR.

I pick up the work to do changes to config.

I pushed few commits to apply my own comments and config renamed as discussed with Ed.

Could you review it again, please?

Copy link
Contributor

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple fixes to the explanation of the variables (based on our recent renaming)

@pracucci
Copy link
Contributor

pracucci commented Jul 8, 2022

Thanks @slim-bean for the edits 🤦 Applied.

@replay
Copy link
Contributor Author

replay commented Jul 8, 2022

In general, I'm not sure why you modify test helpers into returning errors instead of failing on them directly? Personally I prefer for test helpers to catch errors, so their callers don't have to check for them (which increases test verbosity).

I agree with you that it is nicer to just pass testing.T into the helpers and then fail directly in the helpers. But in this case I wanted to add a unit test which brings up a cluster with a configuration that is supposed to fail, so I needed to be able to check whether it returns an error without failing the whole test in the helper. That's the test TestMultipleClientsWithMixedLabelsAndExpectFailure

@replay
Copy link
Contributor Author

replay commented Jul 8, 2022

Thanks for your changes Marco, they make sense to me

@pracucci pracucci merged commit 99f3d00 into main Jul 8, 2022
@pracucci pracucci deleted the expose_memberlist_label_configs branch July 8, 2022 14:10
pracucci added a commit that referenced this pull request Jul 8, 2022
replay pushed a commit that referenced this pull request Jul 8, 2022
@@ -140,6 +140,9 @@ type KVConfig struct {
AdvertiseAddr string `yaml:"advertise_addr"`
AdvertisePort int `yaml:"advertise_port"`

ClusterLabel string `yaml:"cluster_label" category:"experimental"`
ClusterLabelVerificationDisabled bool `yaml:"cluster_label_verification_disabled" category:"experimental"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we call this ClusterLabelVerificationDisabled and make it default to true? Most of our other options are -enabled, not -disabled. Reverse logic is confusing to users.

@pracucci pracucci mentioned this pull request Jul 13, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants