-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vfs: include size of write in DiskSlowInfo #2281
Conversation
35ff3ca
to
9787bb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @bananabrick and @joshimhoff)
vfs/disk_health.go
line 27 at r1 (raw file):
preallocatedSlotCount = 16 // nOffsetBits is the number of bits in the packed 64-bit integer used for // identifying an offset from the file creation time (in nanoseconds).
In milliseconds now, no?
Also, NYC but "offset" is such a confusing word here, should be "elapsed" or "delta"
vfs/disk_health.go
line 31 at r1 (raw file):
// nWriteSizeBits is the number of bits in the packed 64-bit integer used for // identifying the size of the write operation, if the operation is sized. nWriteSizeBits = 20
[nit] Mention that the value is in writeSizePrecision
units. Or just hardcode it to 1KB
vfs/disk_health.go
line 35 at r1 (raw file):
// Track size of writes at kilobyte precision. See comment above lastWritePacked for more. var writeSizePrecision = 1000
Kilobyte is usually 1024 in our code
vfs/disk_health.go
line 97 at r1 (raw file):
File onSlowDisk func(OpType, int, time.Duration)
This needs documentation for the int argument. Or better yet, maybe we should have a SlowDiskOpInfo
struct where we can document them (and where we can add more stuff in the future with fewer code changing)
vfs/disk_health.go
line 168 at r1 (raw file):
} offsetMillis, writeSize, op := unpack(packed) lastWrite := d.createTime.Add(time.Duration(offsetMillis))
Need conversion to nanos here.. Or, have all that happen inside pack
/unpack
vfs/disk_health.go
line 173 at r1 (raw file):
// diskSlowThreshold was exceeded. Call the passed-in // listener. d.onSlowDisk(op, writeSize, now.Sub(lastWrite))
We definitely don't want the callback to care about writeSizePrecision
.. we need to pass bytes here.
vfs/disk_health.go
line 250 at r1 (raw file):
// See writeSizePrecision to get the unit of writeSize. As of 1/26/2023, the unit is KBs. func unpack(packed uint64) (offsetMillis int64, writeSize int, opType OpType) {
This should do the inverse of pack
: return the write size in bytes.
Eeek. That is a bad regression. No tests fail either. I'll adjust the tests. Thanks for noticing this, @RaduBerinde. |
2784f96
to
14d0ac7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my delay on responding. Got too excited about disaggregated storage experimentation. I have addressed your comments. PTAL, @RaduBerinde!
Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on @bananabrick and @RaduBerinde)
vfs/disk_health.go
line 27 at r1 (raw file):
Previously, RaduBerinde wrote…
In milliseconds now, no?
Also, NYC but "offset" is such a confusing word here, should be "elapsed" or "delta"
Updated comment & changed to delta
.
vfs/disk_health.go
line 31 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] Mention that the value is in
writeSizePrecision
units. Or just hardcode it to 1KB
Updated comment.
vfs/disk_health.go
line 35 at r1 (raw file):
Previously, RaduBerinde wrote…
Kilobyte is usually 1024 in our code
Changed to 1024.
This is a basic thing, but I don't understand it really. Can you explain in what sense 1024 bytes are in a kilobyte? Is it just a convention that is often followed, since 1024 is a power of 2, unlike 1000?
vfs/disk_health.go
line 97 at r1 (raw file):
Previously, RaduBerinde wrote…
This needs documentation for the int argument. Or better yet, maybe we should have a
SlowDiskOpInfo
struct where we can document them (and where we can add more stuff in the future with fewer code changing)
Added a DiskSlowInfo
struct to vfs
.
Note that I left the existing DiskSlowInfo
struct in pebble
alone. The caller of vfs.WithDiskHealthChecks
in pebble
converts between the two structs. Alternatively, since the structs are copies of each other, we could include vfs.DiskSlowInfo
within pebble.DiskSlowInfo
. That might couple pebble
& vfs
a bit unnecessarily tightly. Let me know if you have any thoughts.
vfs/disk_health.go
line 168 at r1 (raw file):
Previously, RaduBerinde wrote…
Need conversion to nanos here.. Or, have all that happen inside
pack
/unpack
Fixed by putting inside pack
/ unpack
& using the time.Duration
type to get more safety. Also, added a test that failed without the fix. Previously, there was no coverage asserting lack of false positives, & there was also no coverage of creating a file then waiting some time then writing to it.
vfs/disk_health.go
line 173 at r1 (raw file):
Previously, RaduBerinde wrote…
We definitely don't want the callback to care about
writeSizePrecision
.. we need to pass bytes here.
Done. Good pooint.
vfs/disk_health.go
line 250 at r1 (raw file):
Previously, RaduBerinde wrote…
This should do the inverse of
pack
: return the write size in bytes.
Done. Good point.
Friendly poke, @RaduBerinde. If you feel tho that better to only review this post branch cut since o11y, that works for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry, I missed the previous update.
Reviewable status: 0 of 5 files reviewed, 14 unresolved discussions (waiting on @bananabrick and @joshimhoff)
event.go
line 131 at r2 (raw file):
// than the max that can fit in an int. If not set / if zero, there // is no ceiling. WriteSizeCeiling int
[nit] I think including this is overkill.. We should never have 1GB IOs anywhere. If we do, showing them as 1GB is the least of our problems.
event.go
line 142 at r2 (raw file):
// SafeFormat implements redact.SafeFormatter. func (i DiskSlowInfo) SafeFormat(w redact.SafePrinter, _ rune) { w.Printf("disk slowness detected: %s on file %s (%d bytes (ceiling %d bytes)) has been ongoing for %0.1fs",
I would hide the ceiling when its inconsequential (i.e. WriteSize < WriteSizeCeiling
) and maybe just add a +
when they are equal (e.g. "1024 bytes" and "1073741824+ bytes")
options.go
line 1038 at r2 (raw file):
o.FS, o.private.fsCloser = vfs.WithDiskHealthChecks(o.FS, 5*time.Second, func(info vfs.DiskSlowInfo) { o.EventListener.DiskSlow(DiskSlowInfo{
Once we alias the types, we can just pass info
here.
vfs/disk_health.go
line 27 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Updated comment & changed to
delta
.
Need to mention the units (it's milliseconds now right?)
vfs/disk_health.go
line 31 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Updated comment.
Ceiling
vfs/disk_health.go
line 35 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Changed to 1024.
This is a basic thing, but I don't understand it really. Can you explain in what sense 1024 bytes are in a kilobyte? Is it just a convention that is often followed, since 1024 is a power of 2, unlike 1000?
Yes.. For example a memory page is 4KB = 4096. Pretty much anything that has to do with memory size is in powers of 2. A cache line is 64 bytes, so a 1000-byte block of memory would "straddle" a cacheline; the "round" value is 1024 not 1000. Filesystem and storage device block size is also usually 4KB = 4096. Most things in computers/programming follow this convention (one notable exception is when you buy a storage device, though IMO that's just a cheap marketing ploy to make it sound bigger than it is - their internal pages/sectors are also powers of 2 like 512 or 4096).
The underlying reason is pretty clear - since we deal in bits and not in base 10 numbers, you want the "round" numbers to be those that correspond to the full range of a certain number of bits.
There have been some attempts to move to "kibi / mebi /gibi" but it didn't really catch on (maybe because they sound ridiculous).
vfs/disk_health.go
line 97 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Added a
DiskSlowInfo
struct tovfs
.Note that I left the existing
DiskSlowInfo
struct inpebble
alone. The caller ofvfs.WithDiskHealthChecks
inpebble
converts between the two structs. Alternatively, since the structs are copies of each other, we could includevfs.DiskSlowInfo
withinpebble.DiskSlowInfo
. That might couplepebble
&vfs
a bit unnecessarily tightly. Let me know if you have any thoughts.
We should just alias them (do type DiskSlowInfo = vfs.DiskSlowInfo
in pebble). We do this with many types.
vfs/disk_health.go
line 33 at r2 (raw file):
nWriteSizeBits = 20 // Maximum write size that is representable. See nWriteSizeBits for more. writeSizeCeling = 1<<nWriteSizeBits - 1
It's confusing that this is in writeSizePrecision
units, we should just multiply it by that
vfs/disk_health.go
line 42 at r2 (raw file):
defaultTickInterval = 2 * time.Second // Track size of writes at kilobyte precision. See comment above lastWritePacked for more. writeSizePrecision = int64(1024)
[nit] this belongs right next to nWriteSizeBits
. Also either use the nXxx
convention in all of them or in none of them
vfs/disk_health.go
line 228 at r2 (raw file):
} // Prefetch implements (vfs.File).Preallocate.
[nit] Prefetch
vfs/disk_health.go
line 312 at r2 (raw file):
deltaMillis = 0 } if deltaMillis > 1<<nDeltaBits-1 {
[nit] maybe add a comment here about what duration would trigger this (with current values)
vfs/disk_health.go
line 629 at r2 (raw file):
OpType: opType, WriteSize: writeSizeInBytes, WriteSizeCeiling: writeSizeCeling,
The right hand side is in KB, the left-hand is in bytes
vfs/disk_health_test.go
line 355 at r2 (raw file):
opType: OpTypeWrite, wantDelta: 231 * time.Millisecond, wantWriteSize: 1073740800, // (2^20-1) * writeSizePrecision ~= 1.07 GBs
[nit] this is a bit under 1GB
14d0ac7
to
845db3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sorry!
PTAL.
Reviewable status: 0 of 5 files reviewed, 9 unresolved discussions (waiting on @bananabrick and @RaduBerinde)
event.go
line 131 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] I think including this is overkill.. We should never have 1GB IOs anywhere. If we do, showing them as 1GB is the least of our problems.
I shall remove the ceiling from here.
event.go
line 142 at r2 (raw file):
Previously, RaduBerinde wrote…
I would hide the ceiling when its inconsequential (i.e.
WriteSize < WriteSizeCeiling
) and maybe just add a+
when they are equal (e.g. "1024 bytes" and "1073741824+ bytes")
I've removed the ceiling totally, as per your other comment.
options.go
line 1038 at r2 (raw file):
Previously, RaduBerinde wrote…
Once we alias the types, we can just pass
info
here.
Done.
vfs/disk_health.go
line 27 at r1 (raw file):
Previously, RaduBerinde wrote…
Need to mention the units (it's milliseconds now right?)
Done.
vfs/disk_health.go
line 31 at r1 (raw file):
Previously, RaduBerinde wrote…
Ceiling
Thanks, fixed.
vfs/disk_health.go
line 35 at r1 (raw file):
Previously, RaduBerinde wrote…
Yes.. For example a memory page is 4KB = 4096. Pretty much anything that has to do with memory size is in powers of 2. A cache line is 64 bytes, so a 1000-byte block of memory would "straddle" a cacheline; the "round" value is 1024 not 1000. Filesystem and storage device block size is also usually 4KB = 4096. Most things in computers/programming follow this convention (one notable exception is when you buy a storage device, though IMO that's just a cheap marketing ploy to make it sound bigger than it is - their internal pages/sectors are also powers of 2 like 512 or 4096).
The underlying reason is pretty clear - since we deal in bits and not in base 10 numbers, you want the "round" numbers to be those that correspond to the full range of a certain number of bits.
There have been some attempts to move to "kibi / mebi /gibi" but it didn't really catch on (maybe because they sound ridiculous).
Thanks for explaining!
vfs/disk_health.go
line 97 at r1 (raw file):
Previously, RaduBerinde wrote…
We should just alias them (do
type DiskSlowInfo = vfs.DiskSlowInfo
in pebble). We do this with many types.
Done.
vfs/disk_health.go
line 33 at r2 (raw file):
Previously, RaduBerinde wrote…
It's confusing that this is in
writeSizePrecision
units, we should just multiply it by that
Now that I ripped out passing ceiling to pebble
, there is no need for a constant up here. I have defined it right next to the only place it is used now, like I had it initially. Within the execution context of its usage, I think it makes sense that it would be in writeSizePrecision
units, so I suggest we leave it as is. I actually think doing that extra math would make it harder to grok.
vfs/disk_health.go
line 42 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] this belongs right next to
nWriteSizeBits
. Also either use thenXxx
convention in all of them or in none of them
Moved it up a bit. It is a var so can't put it in same stanza. I have stopped using the nXxx
convention.
vfs/disk_health.go
line 228 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] Prefetch
Fixed.
vfs/disk_health.go
line 312 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] maybe add a comment here about what duration would trigger this (with current values)
Done!
vfs/disk_health.go
line 629 at r2 (raw file):
Previously, RaduBerinde wrote…
The right hand side is in KB, the left-hand is in bytes
Ugh, this is the same sort of bug as the earlier one. Need to be more careful with units. I've removed the ceiling totally, as per your other comment.
vfs/disk_health_test.go
line 355 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] this is a bit under 1GB
Fixed.
845db3b
to
177cf81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo one minor comment about making the precision a constant
Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on @bananabrick and @joshimhoff)
vfs/disk_health.go
line 33 at r2 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Now that I ripped out passing ceiling to
pebble
, there is no need for a constant up here. I have defined it right next to the only place it is used now, like I had it initially. Within the execution context of its usage, I think it makes sense that it would be inwriteSizePrecision
units, so I suggest we leave it as is. I actually think doing that extra math would make it harder to grok.
Great!
vfs/disk_health.go
line 42 at r2 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Moved it up a bit. It is a var so can't put it in same stanza. I have stopped using the
nXxx
convention.
Oh.. why is it a variable? A constant yields better code (division becomes a shift). We should make it a constant and just have multiples of it in the test (or have the test verify the value is rounded to the nearest KB
This commit adds the size of a write to DiskSlowInfo, in cases where a write is sized. A small write stalling out points at file system / disk issues, while a large write taking time to complete may indicate CRDB issues with a certain workload, etc.
177cf81
to
ed34241
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTFR!
Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on @bananabrick and @RaduBerinde)
vfs/disk_health.go
line 42 at r2 (raw file):
Previously, RaduBerinde wrote…
Oh.. why is it a variable? A constant yields better code (division becomes a shift). We should make it a constant and just have multiples of it in the test (or have the test verify the value is rounded to the nearest KB
Changed to constant!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on @bananabrick and @joshimhoff)
Closes cockroachdb/cockroach#67856. Previous work at cockroachdb/cockroach#95436 & #2255.
vfs: include size of write in DiskSlowInfo
This commit adds the size of a write to DiskSlowInfo, in cases where a write is sized. A small write stalling out points at file system / disk issues, while a large write taking time to complete may indicate CRDB issues with a certain workload, etc.