Skip to content
jja725 edited this page Jan 30, 2024 · 17 revisions

Membership Module

Membership module is introduced as of 08/01/2023 targeting to replace worker-registration on master. Membership module provides capability to either

  1. use a static file to provide a pre-set of worker list for a alluxio cluster
  2. use etcd cluster as a distributed system membership coordinator

Code structure

MembershipManager is the module interface for different implementation of membership management. There are currently 3 implementations:

  1. NOOP - NoOpMembershipManager: fallback to the old way of using master for worker registration is still leveraged for regression/testing purpose.
  2. STATIC - StaticMembershipManager: uses a static config file(default file is $ALLUXIO_HOME/conf/workers) to configure a list of workers hostnames to form the alluxio cluster, it doesn't provide membership capability as to track any new member joining / leaving, member liveliness. It's merely used as a simple quickstart deployment way to spin up a DORA alluxio cluster.
  3. ETCD - EtcdMembershipManager: uses a pre-configured standalone etcd cluster to manage worker membership. On first startup, worker will register itself to etcd, and then keeping its liveness to etcd throughout its process lifetime. Through EtcdMembershipManager module, either client or worker could get informations about:

a. What are the currently registered workers?

b. What are the currently alive workers?

Deployment

- NOOP

No need to configure anything, it will not leverage any MembershipManager module at all.

- STATIC

Use a static file, following the format of conf/workers (refer to : https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-a-Cluster.html?q=conf%2Fworkers#basic-setup) , put hostnames of ALL workers on each new line. And configure the alluxio-site.properties with:

alluxio.worker.membership.manager.type=STATIC
alluxio.worker.static.config.file=<absolute_path_to_static_config_workerlist_file>

or just

alluxio.worker.membership.manager.type=STATIC

then conf/workers will be used. e.g. configure an alluxio cluster with 2 workers, conf/workers:

# List of Worker started on each of the machines listed below.
ec2-1-111-11-111.compute-1.amazonaws.com
ec2-2-222-22-222.compute-2.amazonaws.com           

- ETCD

Depending on the deployment environment, Bare Metal or K8s, users could setup etcd cluster and alluxio cluster individually, or through helm install with alluxio's k8s operator for a one-click install for both.

1) Bare Metal

Set up etcd cluster, refer to etcd doc here: https://etcd.io/docs/v3.4/op-guide/clustering/ For versions, we recommend using V3 etcd version as we don't support V2 versions. But we don't have a specific requirement of which V3 version as of now.

e.g. Say we have an etcd 3 node setup:

Name Address Hostname
infra0 10.0.1.10 infra0.example.com
infra1 10.0.1.11 infra1.example.com
infra2 10.0.1.12 infra2.example.com

Configure alluxio-site.properties:

alluxio.worker.membership.manager.type=ETCD
alluxio.etcd.endpoints=http://infra0.example.com:2379,http://infra1.example.com:2379,http://infra2.example.com:2379

[NOTICE] As etcdmembership module relies on etcd's high availability to provide membership service, include ALL the etcd cluster nodes in configuration (or at lease all initial ones if new nodes has been bootstrapped into etcd later) to allow etcdmembership module to redirect connection to etcd leader automatically.

After spin up alluxio workers, use bin/alluxio info nodes to check status of worker registration.

WorkerId	Address	Status
6e715648b6f308cd8c90df531c76a028	127.0.0.1:29999	ONLINE
Authentication enabled Etcd

If your etcd cluster has authentication enabled, you need to create a user with granted role permission with full readwrite to all keys with prefix '/'. Official guidance from Etcd is provided here: https://etcd.io/docs/v3.2/op-guide/authentication/. But here is a simple setup guide to set up a user/role for alluxio Etcd membership module:

# Enable etcd authentication, we need to have user 'root' first.
# Check the user list of your etcd cluster by $etcdctl user list.
# Skip this step if you already have user 'root'.
$ etcdctl user add root (here using 'root' as password for prompt as example)

# Enable authentication with root user
$ etcdctl --user root:root auth enable
Authentication Enabled  

# Create a role, grant permission on prefix '/'
$ etcdctl --user root:root role add alluxioreadwrite
$ etcdctl --user root:root role grant-permission alluxioreadwrite --prefix=true readwrite /

# Create a user for alluxio, enter password on prompt.
$ etcdctl --user root:root user add alluxio

# Grant the user with the role.
$ etcdctl --user root:root user grant-role alluxio alluxioreadwrite

# Check if with the newly created user 'alluxio' we can access prefix '/' keys
$ etcdctl --user alluxio:<password> get --prefix /

Set user/password in the alluxio-site.properties:

alluxio.etcd.username=alluxio
alluxio.etcd.password=<password_for_alluxio>

2) K8s

Use k8s operator, we can spin up a DORA alluxio cluster along with etcd cluster pod(s) with helm. (Prerequisite refer to https://docs.google.com/document/d/1iiDZDNBTJWQ1WAJ-31aKDo9pL1DeTrvrvYUdd-YrTpI/edit#heading=h.1rc792noj716)

To pull etcd dependency for helm repo, do

helm dependency update 

To configure alluxio with a single pod etcd cluster: enable etcd component in k8s-operator/deploy/charts/alluxio/config.yaml

image: <docker_username>/<image-name>
imageTag: <tag>
dataset:
  path: <ufs path>
  credentials: # s3 as example. Leave it empty if not needed.
    aws.accessKeyId:xxxxxxxxxx
    aws.secretKey: xxxxxxxxxxxxxxx
etcd:
  enabled: true

then under k8s-operator/deploy/charts/alluxio/ do:

$helm install <cluster name> -f config.yaml .

then with $kubectl get pods will give:

[root@ip-172-31-24-66 alluxio]# kubectl get pods                                          
NAME                                    READY   STATUS     RESTARTS   AGE
dora0802-alluxio-master-0               0/1     Running    0          3s
dora0802-alluxio-worker-6577bc9-s6njq   0/1     Running    0          3s
dora0802-etcd-0                         0/1     Running    0          3s
  • To spin up 3-node etcd cluster

Simply add replicaCount field to indicate number of etcd instances:

etcd:
  enabled: true
  replicaCount: 3

will now have a 3-pod etcd cluster:

NAME                                        READY   STATUS    RESTARTS   AGE
dora0802-1-alluxio-master-0                 1/1     Running   0          111m
dora0802-1-alluxio-worker-5fc8bd885-jk6pn   1/1     Running   0          111m
dora0802-1-etcd-0                           1/1     Running   0          111m
dora0802-1-etcd-1                           1/1     Running   0          111m
dora0802-1-etcd-2                           1/1     Running   0          111m

If you would like to use etcdctl in k8s env, spin up a etcdclient via:

$kubectl run lucyetcd-client --restart='Never' --image docker.io/bitnami/etcd:3.5.9-debian-11-r24 --env ETCDCTL_ENDPOINTS="dora0802-1-etcd:2379" --namespace default --command -- sleep infinity

For detailed introduction on how the Registration/ServiceDiscovery is done with Etcd, check this doc: https://github.com/Alluxio/alluxio/wiki/Etcd-backed-membership

- SERVICE_REGISTRY

Use ETCD for worker registration, but only record active worker (instead of recording permanent workers). Meaning we only have KV under /ServiceDiscovery.

Configure alluxio-site.properties:

alluxio.worker.membership.manager.type=SERVICE_REGISTRY
alluxio.etcd.endpoints=http://infra0.example.com:2379,http://infra1.example.com:2379,http://infra2.example.com:2379