Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move topologyConstrainedPools to Beta status #2823

Closed
Tracked by #2997
sbskas opened this issue Jan 25, 2022 · 13 comments
Closed
Tracked by #2997

Move topologyConstrainedPools to Beta status #2823

sbskas opened this issue Jan 25, 2022 · 13 comments
Assignees
Labels
keepalive This label can be used to disable stale bot activiity in the repo

Comments

@sbskas
Copy link
Contributor

sbskas commented Jan 25, 2022

Describe the feature you'd like to have

At this time, topologyContrainedPools has been in alpha stage for 2 years with very low noise.
The feature is not documented but in the helm chart and its status might scares away lots of potential users.

What is the value to the end user? (why is it a priority?)

We're implementing a multi-zone cluster on on-premise bare metal. This cluster will be home of a few large elasticsearch clusters.
We have 21 nodes in the cluster :
1- 3 masters on vms
2- 6 workers for applications
3- 12 workers for storage

All those nodes are spread on 3 datacenters connected with evpn fabrics.
This feature will allow us to build statefulset and scale them with storage and pods evenly spread on the different datacenters using the topologySpreadConstraints.

We're doing rook deployment for our ceph cluster.
We created 3 cephblockpools, one for each zone by changing the crushRoot to the zone root.
Then we create a storageclass with corresponding topologyContrainedPools with matching zones.
And finally we added the corresponding topologySpreadContrain to our statefulSet.

How will we know we have a good solution? (acceptance criteria)

Right now, the feature is ok but has to be enabled on the nodeplugin daemonset for it to work.
Moving to beta then GA will make this activation optional and will have proper documentation for it.
After discussion on Slack, it seems that this feature is to be completed but I failed to fully understand what's missing.

Additional context

One this feature in beta stage, the rook operator needs to have a way to enable it on its ressources.

@sbskas
Copy link
Contributor Author

sbskas commented Jan 26, 2022

csi-provisionner doesn't seems to need '--feature-gate=Topology=True' anymore.
According to the log :
csi-provisioner W0126 10:05:23.447685 1 feature_gate.go:237] Setting GA feature gate Topology=true. It will be removed in a future release.

@Madhu-1 Madhu-1 added this to the release-3.6 milestone Jan 27, 2022
@humblec
Copy link
Collaborator

humblec commented Feb 2, 2022

Lets try to move this to Beta or supported state , we were tracking this effort under this #2199 and most of it has been addressed, this is one of the last pending items... @sbskas

@humblec
Copy link
Collaborator

humblec commented Feb 8, 2022

@sbskas would you be able to help us to move this feature to BETA in next/upcoming release ? What that basically means is that, more testing , adding tests if its missing and also updating the documentation ? please let us know .. Really appreciate if we can lift the support to BETA in next release. we welcome contributions.

@humblec
Copy link
Collaborator

humblec commented Feb 16, 2022

as discussed in triage call, we could target this for 3.6 if we have a volunteer, otherwise we will move this out of the 3.6 rleease.

@sbskas
Copy link
Contributor Author

sbskas commented Feb 17, 2022

@humblec I volonteer.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Mar 19, 2022
@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@Rakshith-R Rakshith-R added keepalive This label can be used to disable stale bot activiity in the repo and removed wontfix This will not be worked on labels Mar 31, 2022
@Rakshith-R Rakshith-R reopened this Mar 31, 2022
@humblec
Copy link
Collaborator

humblec commented Apr 1, 2022

@sbskas as mentioned in the release issue, we are planning to release 3.6 early next week..

As per your validation of this feature , is it good to qualify as Beta ? or we need more time to declare it as "beta" . please share your thought, accordingly we can reach consensus on support state

@sbskas
Copy link
Contributor Author

sbskas commented Apr 4, 2022

@humblec we tested all the case we met on test platform.
We fixed the obvious problem encountered (#2828 and #2925).
According to me, the feature is ready to go to beta.
I would have like something like #2962 for easier integration with rook however, I'll live with it.

@humblec
Copy link
Collaborator

humblec commented Apr 4, 2022

Thanks @sbskas for all the effort and conclusion on where we stand.

I am fine to call this Beta, however let me get atleast an ack from @ceph/ceph-csi-maintainers

@humblec
Copy link
Collaborator

humblec commented Apr 5, 2022

@Madhu-1 @nixpanic can you share your thought about moving this feature to Beta in this release (3.6) ?

humblec added a commit to humblec/ceph-csi that referenced this issue Apr 5, 2022
In this release we are promoting topology aware provisioning feature
to Beta. It has been in alpha state for last 5 releases and more testing
has been performed on this feature which is recorded at:

ceph#2823

Signed-off-by: Humble Chirammal <[email protected]>
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Apr 5, 2022

@Madhu-1 @nixpanic can you share your thought about moving this feature to Beta in this release (3.6) ?

IMO As we are at end of the release, Let's not hurry we can wait for #2962. based on that we can move to beta? if we are still in alpha we have a good chance to do some breaking changes if really needed.

Even if the feature is alpha we can work on the integration part.

@humblec humblec removed this from the release-3.6 milestone Apr 5, 2022
@humblec humblec mentioned this issue Apr 21, 2022
4 tasks
@nixpanic
Copy link
Member

All of this looks to have made is in release-3.7 quite a while ago. Please open a new issue if something isn't working as planned/expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive This label can be used to disable stale bot activiity in the repo
Projects
None yet
Development

No branches or pull requests

5 participants