-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage tiers): add design proposal #268
base: develop
Are you sure you want to change the base?
Conversation
REST endpoint so it can be returned in the `NodeGetInfo` response. This would | ||
require adding communication from the node to the API, which is not ideal. | ||
|
||
Instead, a separate controller is proposed. It will run alongside the diskpool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment we use csi-node topology to control where the diskpools are placed, which as you've pointed out is not quite right (once we remove the locality constraint the csi-node will be able to run on non-"openebs/engine: mayastor" nodes).
What is the actual intent from CSI for the accessible topology?
// Specifies the list of topologies the provisioned volume MUST be
// accessible from
Since our volumes are accessed via nvme-tcp, maybe this is not intended to control the placement of data replicas?
And in that case, should we separate the application topology and data topology completely?
If so, we could, for example, pass topology information through the storage class's generic parameters and this way we wouldn't need to sync between diskpools
and CSINode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good point to consider, we can have two different topologies one is used for placement of replicas & other one is K8s allowedTopologies
(where data can be accessible from).
- If we have two different topologies for apps & replicas placement then it will serve both use cases:
- In disaggregated storage use case (where set of nodes in a cluster are dedicated for storage and other nodes for application) by following
storageclass.parameters.dataPlacementTopology
can provision replicas based on topology andAllowedTopology
can be used for accessibility of volume (Where app can schedule). - In normal cluster where all the nodes are capable of hosting both storage & apps then one can place replica depends on
storageclass.parameters.dataPlacementTopology
& application can consume volume from any node.
- In disaggregated storage use case (where set of nodes in a cluster are dedicated for storage and other nodes for application) by following
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points:
- once the locality requirement is removed, the volume will be available on any node
- separate topologies for apps & replica placement
So in effect, we shouldn't re-use CSI placement topology - as long as the replica topology is passed on the CreateVolume
call, it can still be stored on the Volume's internal topology and placement works as intended.
This should just mean converting the StorageClass's allowedTopologies
to KV pairs in the SC parameters. That should be ok since we don't need any complicated logic.
I'll try this out and update the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it out, and I agree this is better:
DiskPool:
apiVersion: openebs.io/v1alpha1
kind: DiskPool
metadata:
labels:
openebs.io/classification: premium
name: diskpool-jr4nm
spec:
disks:
- /dev/sdc
node: aks-storagepool-25014071-vmss000000
StorageClass:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: premium
parameters:
repl: '1'
protocol: 'nvmf'
ioTimeout: '60'
classification: premium
provisioner: io.openebs.csi-mayastor
volumeBindingMode: WaitForFirstConsumer
A small code change in csi controller's create_volume
to pass the parameter in the volume's labelledTopology:
{
"uuid": "7bb6dbc2-c170-46cd-b627-c0a04d613a24",
"size": 1073741824,
"labels": null,
"num_replicas": 1,
"status": {
"Created": "Online"
},
"target": {
"node": "aks-storagepool-25014071-vmss000000",
"nexus": "7c14d3ff-459e-4aea-967a-c3b4dd47ea12",
"protocol": "nvmf"
},
"policy": {
"self_heal": true
},
"topology": {
"node": {
"Explicit": {
"allowed_nodes": [
"aks-nodepool1-14021096-vmss000000",
"aks-nodepool1-14021096-vmss000001",
"aks-nodepool1-14021096-vmss000002",
"aks-storagepool-25014071-vmss000000",
"aks-storagepool-25014071-vmss000001",
"aks-storagepool-25014071-vmss000002"
],
"preferred_nodes": [
"aks-storagepool-25014071-vmss000000",
"aks-storagepool-25014071-vmss000001",
"aks-storagepool-25014071-vmss000002",
"aks-nodepool1-14021096-vmss000000",
"aks-nodepool1-14021096-vmss000001",
"aks-nodepool1-14021096-vmss000002"
]
}
},
"pool": {
"Labelled": {
"exclusion": {},
"inclusion": {
"openebs.io/classification": "premium",
"openebs.io/created-by": "operator-diskpool"
}
}
}
},
"last_nexus_id": "7c14d3ff-459e-4aea-967a-c3b4dd47ea12",
"operation": null
}
Is this more what you were thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great.
Do you see classification as a reserved key word? I think it'd be good to keep it generic so you can use whatever labels you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it being a reserved key word, but it doesn't need to be. The issue is knowing which SC parameter to copy from the CreateVolume
request to the volume.
How about having a default: openebs.io/classification
, but have an optional SC param for openebs.io/classification_key
that can override it? I think it's fine to have this configurable on the SC. It could also be a flag on the CSI controller but that seems much less flexible.
And since naming is hard... I don't know whether classification is the right term, but I think it would be better to use the label prefix (openebs.io/) everywhere to avoid confusion. We definitely want to use the prefix for the labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good point to consider, we can have two different topologies one is used for placement of replicas & other one is K8s
allowedTopologies
(where data can be accessible from).
If we have two different topologies for apps & replicas placement then it will serve both use cases:
- In disaggregated storage use case (where set of nodes in a cluster are dedicated for storage and other nodes for application) by following
storageclass.parameters.dataPlacementTopology
can provision replicas based on topology andAllowedTopology
can be used for accessibility of volume (Where app can schedule).- In normal cluster where all the nodes are capable of hosting both storage & apps then one can place replica depends on
storageclass.parameters.dataPlacementTopology
& application can consume volume from any node.
Since we have agreed upon this, we would not be using allowed topologies
i.e data-accessibility topology for data-placement. The data-placement topology can be labelled topology incorporated from the storage-class parameters
. This also gives us benefit of using pools on newly added nodes to cluster as we allowing scaling up of volume after creation, which would otherwise be a problem if we used the explicit topology filled from allowed topology at the time of creation.
#275 -> PR to seperate, the accessibilty from data-placement.
We'd like the ability to use multiple classes of backend storage and to allow users to select which type they want.
Submitting this as a proposal for review prior to commencing implementation. Other ideas or suggestions welcome.
It was great to see that most of the work is already done 💯