Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage tiers): add design proposal #268

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

croomes
Copy link
Contributor

@croomes croomes commented Jul 4, 2022

We'd like the ability to use multiple classes of backend storage and to allow users to select which type they want.

Submitting this as a proposal for review prior to commencing implementation. Other ideas or suggestions welcome.

It was great to see that most of the work is already done 💯

REST endpoint so it can be returned in the `NodeGetInfo` response. This would
require adding communication from the node to the API, which is not ideal.

Instead, a separate controller is proposed. It will run alongside the diskpool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we use csi-node topology to control where the diskpools are placed, which as you've pointed out is not quite right (once we remove the locality constraint the csi-node will be able to run on non-"openebs/engine: mayastor" nodes).

What is the actual intent from CSI for the accessible topology?

  // Specifies the list of topologies the provisioned volume MUST be
  // accessible from

Since our volumes are accessed via nvme-tcp, maybe this is not intended to control the placement of data replicas?
And in that case, should we separate the application topology and data topology completely?
If so, we could, for example, pass topology information through the storage class's generic parameters and this way we wouldn't need to sync between diskpools and CSINode?

Copy link

@mittachaitu mittachaitu Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good point to consider, we can have two different topologies one is used for placement of replicas & other one is K8s allowedTopologies(where data can be accessible from).

  • If we have two different topologies for apps & replicas placement then it will serve both use cases:
    • In disaggregated storage use case (where set of nodes in a cluster are dedicated for storage and other nodes for application) by following storageclass.parameters.dataPlacementTopology can provision replicas based on topology and AllowedTopology can be used for accessibility of volume (Where app can schedule).
    • In normal cluster where all the nodes are capable of hosting both storage & apps then one can place replica depends on storageclass.parameters.dataPlacementTopology & application can consume volume from any node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points:

  • once the locality requirement is removed, the volume will be available on any node
  • separate topologies for apps & replica placement

So in effect, we shouldn't re-use CSI placement topology - as long as the replica topology is passed on the CreateVolume call, it can still be stored on the Volume's internal topology and placement works as intended.

This should just mean converting the StorageClass's allowedTopologies to KV pairs in the SC parameters. That should be ok since we don't need any complicated logic.

I'll try this out and update the doc.

Copy link
Contributor Author

@croomes croomes Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out, and I agree this is better:

DiskPool:

apiVersion: openebs.io/v1alpha1
kind: DiskPool
metadata:
  labels:
    openebs.io/classification: premium
  name: diskpool-jr4nm
spec:
  disks:
  - /dev/sdc
  node: aks-storagepool-25014071-vmss000000

StorageClass:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: premium
parameters:
  repl: '1'
  protocol: 'nvmf'
  ioTimeout: '60'
  classification: premium
provisioner: io.openebs.csi-mayastor
volumeBindingMode: WaitForFirstConsumer

A small code change in csi controller's create_volume to pass the parameter in the volume's labelledTopology:

{
    "uuid": "7bb6dbc2-c170-46cd-b627-c0a04d613a24",
    "size": 1073741824,
    "labels": null,
    "num_replicas": 1,
    "status": {
        "Created": "Online"
    },
    "target": {
        "node": "aks-storagepool-25014071-vmss000000",
        "nexus": "7c14d3ff-459e-4aea-967a-c3b4dd47ea12",
        "protocol": "nvmf"
    },
    "policy": {
        "self_heal": true
    },
    "topology": {
        "node": {
            "Explicit": {
                "allowed_nodes": [
                    "aks-nodepool1-14021096-vmss000000",
                    "aks-nodepool1-14021096-vmss000001",
                    "aks-nodepool1-14021096-vmss000002",
                    "aks-storagepool-25014071-vmss000000",
                    "aks-storagepool-25014071-vmss000001",
                    "aks-storagepool-25014071-vmss000002"
                ],
                "preferred_nodes": [
                    "aks-storagepool-25014071-vmss000000",
                    "aks-storagepool-25014071-vmss000001",
                    "aks-storagepool-25014071-vmss000002",
                    "aks-nodepool1-14021096-vmss000000",
                    "aks-nodepool1-14021096-vmss000001",
                    "aks-nodepool1-14021096-vmss000002"
                ]
            }
        },
        "pool": {
            "Labelled": {
                "exclusion": {},
                "inclusion": {
                    "openebs.io/classification": "premium",
                    "openebs.io/created-by": "operator-diskpool"
                }
            }
        }
    },
    "last_nexus_id": "7c14d3ff-459e-4aea-967a-c3b4dd47ea12",
    "operation": null
}

Is this more what you were thinking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great.
Do you see classification as a reserved key word? I think it'd be good to keep it generic so you can use whatever labels you want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it being a reserved key word, but it doesn't need to be. The issue is knowing which SC parameter to copy from the CreateVolume request to the volume.

How about having a default: openebs.io/classification, but have an optional SC param for openebs.io/classification_key that can override it? I think it's fine to have this configurable on the SC. It could also be a flag on the CSI controller but that seems much less flexible.

And since naming is hard... I don't know whether classification is the right term, but I think it would be better to use the label prefix (openebs.io/) everywhere to avoid confusion. We definitely want to use the prefix for the labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good point to consider, we can have two different topologies one is used for placement of replicas & other one is K8s allowedTopologies(where data can be accessible from).

  • If we have two different topologies for apps & replicas placement then it will serve both use cases:

    • In disaggregated storage use case (where set of nodes in a cluster are dedicated for storage and other nodes for application) by following storageclass.parameters.dataPlacementTopology can provision replicas based on topology and AllowedTopology can be used for accessibility of volume (Where app can schedule).
    • In normal cluster where all the nodes are capable of hosting both storage & apps then one can place replica depends on storageclass.parameters.dataPlacementTopology & application can consume volume from any node.

Since we have agreed upon this, we would not be using allowed topologies i.e data-accessibility topology for data-placement. The data-placement topology can be labelled topology incorporated from the storage-class parameters. This also gives us benefit of using pools on newly added nodes to cluster as we allowing scaling up of volume after creation, which would otherwise be a problem if we used the explicit topology filled from allowed topology at the time of creation.

#275 -> PR to seperate, the accessibilty from data-placement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants