-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transposefs: Only autosave-xfs for much larger filesystems #2565
Conversation
Draft since this could use some more testing |
Downstream: https://issues.redhat.com/browse/OCPBUGS-16724 |
c636219
to
280359d
Compare
Here's a script I wrote to analyze xfs AG counts as the system is grown:
|
280359d
to
223d0f2
Compare
223d0f2
to
65b8bb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. A few comments
overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh
Outdated
Show resolved
Hide resolved
overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh
Outdated
Show resolved
Hide resolved
overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh
Outdated
Show resolved
Hide resolved
I added the whitespace fixes over in #2566 so you can ignore that CI failure here. |
88fc4dc
to
fafe7cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable
fafe7cd
to
c2e4ef8
Compare
Blah, missed a case of output redirection. Fixed now and cleaned up. |
local threshold | ||
threshold=400 | ||
if [ "$agcount" -lt "${threshold}" ]; then | ||
echo "autosave-xfs: ${root_part} agcount=$agcount is lower than threshold=${threshold}" >&2 | ||
echo 0 | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally optional: this return
can be dropped now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The change in #2320
has been very problematic for OpenShift because our default node
configuration is always over the threshold, and that causes
significant latency on instance provisioning.
Experimentally bumping to 400 allocation groups, which is about 700GiB.
This is comfortably about the default OpenShift node root disk sizes,
and returns us to the prior status quo.
While we're here, rework the logging a bit so that we always
log the
agcount
for debugging purposes.