-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: do not execute rbd sparsify when volume is in use #3985
Conversation
2705a01
to
b3b402f
Compare
b3b402f
to
6dee995
Compare
Should this also be backported to 3.8? |
👍 added label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@Mergifyio queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at 98fdadf |
This commit makes sure sparsify() is not run when rbd image is in use. Running rbd sparsify with workload doing io and too frequently is not desirable. When a image is in use fstrim is run and sparsify will be run only when image is not mapped. Signed-off-by: Rakshith R <[email protected]>
6dee995
to
c479ca7
Compare
/test ci/centos/k8s-e2e-external-storage/1.25 |
/test ci/centos/k8s-e2e-external-storage/1.26 |
/test ci/centos/k8s-e2e-external-storage/1.27 |
/test ci/centos/mini-e2e-helm/k8s-1.25 |
/test ci/centos/mini-e2e-helm/k8s-1.26 |
/test ci/centos/mini-e2e-helm/k8s-1.27 |
/test ci/centos/mini-e2e/k8s-1.25 |
/test ci/centos/mini-e2e/k8s-1.26 |
/test ci/centos/mini-e2e/k8s-1.27 |
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/upgrade-tests-rbd |
@@ -23,7 +23,18 @@ import ( | |||
// Sparsify checks the size of the objects in the RBD image and calls | |||
// rbd_sparify() to free zero-filled blocks and reduce the storage consumption | |||
// of the image. | |||
// This function will return ErrImageInUse if the image is in use, since | |||
// sparsifying an image on which i/o is in progress is not optimal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to sound like a broken record, but I have to ask again: do you have any data that would support automating rbd sparsify
runs at all? In particular:
- Have you done any research into how likely they are to be encountered?
- Do you have
ceph df
outputs captured before and afterrbd sparsify
is run for typical customer scenarios?
If the answer is no, I don't understand why rbd sparsify
is being conditioned on isInUse()
instead of just being dropped entirely (to be resurrected as an optional step when the relevant CSI addon spec allows taking options from the user). Sparsifying the image the way Ceph CSI does it is likely not optimal in general, not just when there is competing I/O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on one heavily used cluster, I suspect I've seen 33% to 100% extra usage without sparsify. Hadn't had the chance to run it offline and haven't had the guts to run it online. This makes it sound like it was good for me to be hesitent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- I would suggest starting with fstrim
(assuming you aren't using raw block PVs).
Describe what this PR does
This commit makes sure sparsify() is not run when rbd image is in use.
Running rbd sparsify with workload doing io and too frequently is not desirable.
When a image is in use fstrim is run and sparsify will be run only when image is not mapped.
Too many frequent rbd sparsify calls has been seen to cause mounted pod to be stuck.
Logs when PVC is not mounted to a Pod:
Logs when PVC is mounted to a Pod:
cc @idryomov
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>
: retest the<job-name>
after unrelatedfailure (please report the failure too!)