feat(storage tiers): add design proposal #268

croomes · 2022-07-04T16:21:35Z

We'd like the ability to use multiple classes of backend storage and to allow users to select which type they want.

Submitting this as a proposal for review prior to commencing implementation. Other ideas or suggestions welcome.

It was great to see that most of the work is already done 💯

tiagolobocastro · 2022-07-08T16:26:19Z

doc/storage-tier-design.md

+   REST endpoint so it can be returned in the `NodeGetInfo` response. This would
+   require adding communication from the node to the API, which is not ideal.
+
+Instead, a separate controller is proposed. It will run alongside the diskpool


At the moment we use csi-node topology to control where the diskpools are placed, which as you've pointed out is not quite right (once we remove the locality constraint the csi-node will be able to run on non-"openebs/engine: mayastor" nodes).

What is the actual intent from CSI for the accessible topology?

// Specifies the list of topologies the provisioned volume MUST be // accessible from

Since our volumes are accessed via nvme-tcp, maybe this is not intended to control the placement of data replicas?
And in that case, should we separate the application topology and data topology completely?
If so, we could, for example, pass topology information through the storage class's generic parameters and this way we wouldn't need to sync between diskpools and CSINode?

This is good point to consider, we can have two different topologies one is used for placement of replicas & other one is K8s allowedTopologies(where data can be accessible from).

If we have two different topologies for apps & replicas placement then it will serve both use cases:

In disaggregated storage use case (where set of nodes in a cluster are dedicated for storage and other nodes for application) by following storageclass.parameters.dataPlacementTopology can provision replicas based on topology and AllowedTopology can be used for accessibility of volume (Where app can schedule).

In normal cluster where all the nodes are capable of hosting both storage & apps then one can place replica depends on storageclass.parameters.dataPlacementTopology & application can consume volume from any node.

Good points:

once the locality requirement is removed, the volume will be available on any node

separate topologies for apps & replica placement

So in effect, we shouldn't re-use CSI placement topology - as long as the replica topology is passed on the CreateVolume call, it can still be stored on the Volume's internal topology and placement works as intended.

This should just mean converting the StorageClass's allowedTopologies to KV pairs in the SC parameters. That should be ok since we don't need any complicated logic.

I'll try this out and update the doc.

I tried it out, and I agree this is better:

DiskPool:

apiVersion: openebs.io/v1alpha1 kind: DiskPool metadata: labels: openebs.io/classification: premium name: diskpool-jr4nm spec: disks: - /dev/sdc node: aks-storagepool-25014071-vmss000000

StorageClass:

kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: premium parameters: repl: '1' protocol: 'nvmf' ioTimeout: '60' classification: premium provisioner: io.openebs.csi-mayastor volumeBindingMode: WaitForFirstConsumer

A small code change in csi controller's create_volume to pass the parameter in the volume's labelledTopology:

{ "uuid": "7bb6dbc2-c170-46cd-b627-c0a04d613a24", "size": 1073741824, "labels": null, "num_replicas": 1, "status": { "Created": "Online" }, "target": { "node": "aks-storagepool-25014071-vmss000000", "nexus": "7c14d3ff-459e-4aea-967a-c3b4dd47ea12", "protocol": "nvmf" }, "policy": { "self_heal": true }, "topology": { "node": { "Explicit": { "allowed_nodes": [ "aks-nodepool1-14021096-vmss000000", "aks-nodepool1-14021096-vmss000001", "aks-nodepool1-14021096-vmss000002", "aks-storagepool-25014071-vmss000000", "aks-storagepool-25014071-vmss000001", "aks-storagepool-25014071-vmss000002" ], "preferred_nodes": [ "aks-storagepool-25014071-vmss000000", "aks-storagepool-25014071-vmss000001", "aks-storagepool-25014071-vmss000002", "aks-nodepool1-14021096-vmss000000", "aks-nodepool1-14021096-vmss000001", "aks-nodepool1-14021096-vmss000002" ] } }, "pool": { "Labelled": { "exclusion": {}, "inclusion": { "openebs.io/classification": "premium", "openebs.io/created-by": "operator-diskpool" } } } }, "last_nexus_id": "7c14d3ff-459e-4aea-967a-c3b4dd47ea12", "operation": null }

Is this more what you were thinking?

That's great.
Do you see classification as a reserved key word? I think it'd be good to keep it generic so you can use whatever labels you want?

I was thinking it being a reserved key word, but it doesn't need to be. The issue is knowing which SC parameter to copy from the CreateVolume request to the volume.

How about having a default: openebs.io/classification, but have an optional SC param for openebs.io/classification_key that can override it? I think it's fine to have this configurable on the SC. It could also be a flag on the CSI controller but that seems much less flexible.

And since naming is hard... I don't know whether classification is the right term, but I think it would be better to use the label prefix (openebs.io/) everywhere to avoid confusion. We definitely want to use the prefix for the labels.

This is good point to consider, we can have two different topologies one is used for placement of replicas & other one is K8s allowedTopologies(where data can be accessible from).

If we have two different topologies for apps & replicas placement then it will serve both use cases:

In disaggregated storage use case (where set of nodes in a cluster are dedicated for storage and other nodes for application) by following storageclass.parameters.dataPlacementTopology can provision replicas based on topology and AllowedTopology can be used for accessibility of volume (Where app can schedule).

In normal cluster where all the nodes are capable of hosting both storage & apps then one can place replica depends on storageclass.parameters.dataPlacementTopology & application can consume volume from any node.

Since we have agreed upon this, we would not be using allowed topologies i.e data-accessibility topology for data-placement. The data-placement topology can be labelled topology incorporated from the storage-class parameters. This also gives us benefit of using pools on newly added nodes to cluster as we allowing scaling up of volume after creation, which would otherwise be a problem if we used the explicit topology filled from allowed topology at the time of creation.

#275 -> PR to seperate, the accessibilty from data-placement.

feat(storage tiers): add design proposal

a905032

tiagolobocastro reviewed Jul 8, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage tiers): add design proposal #268

feat(storage tiers): add design proposal #268

croomes commented Jul 4, 2022

tiagolobocastro Jul 8, 2022

mittachaitu Jul 11, 2022 •

edited

Loading

croomes Jul 11, 2022

croomes Jul 13, 2022 •

edited

Loading

tiagolobocastro Jul 14, 2022

croomes Jul 14, 2022

Abhinandan-Purkait Jul 26, 2022

feat(storage tiers): add design proposal #268

Are you sure you want to change the base?

feat(storage tiers): add design proposal #268

Conversation

croomes commented Jul 4, 2022

tiagolobocastro Jul 8, 2022

Choose a reason for hiding this comment

mittachaitu Jul 11, 2022 • edited Loading

Choose a reason for hiding this comment

croomes Jul 11, 2022

Choose a reason for hiding this comment

croomes Jul 13, 2022 • edited Loading

Choose a reason for hiding this comment

tiagolobocastro Jul 14, 2022

Choose a reason for hiding this comment

croomes Jul 14, 2022

Choose a reason for hiding this comment

Abhinandan-Purkait Jul 26, 2022

Choose a reason for hiding this comment

mittachaitu Jul 11, 2022 •

edited

Loading

croomes Jul 13, 2022 •

edited

Loading