-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hlc: use PTP user space API as HLC clock #46942
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only commenting on the CLI/build aspects of this. I'll let andrei or the other team members comment on the HLC changes.
pkg/base/config.go
Outdated
@@ -213,6 +213,9 @@ type Config struct { | |||
// connections to determine connection health and update the local view | |||
// of remote clocks. | |||
RPCHeartbeatInterval time.Duration | |||
|
|||
// Enables the use of an IEEE 1588 PTP clock device for HLC current time | |||
ClockDevice string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be reset in InitDefaults()
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/cli/flags.go
Outdated
@@ -51,6 +51,7 @@ var serverSQLAddr, serverSQLPort string | |||
var serverSQLAdvertiseAddr, serverSQLAdvertisePort string | |||
var serverHTTPAddr, serverHTTPPort string | |||
var localityAdvertiseHosts localityList | |||
var clockDevice string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be reset to its default value in context.go
. However see my next comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
pkg/cli/flags.go
Outdated
@@ -358,6 +359,7 @@ func init() { | |||
VarFlag(f, &serverCfg.Stores, cliflags.Store) | |||
VarFlag(f, &serverCfg.StorageEngine, cliflags.StorageEngine) | |||
VarFlag(f, &serverCfg.MaxOffset, cliflags.MaxOffset) | |||
StringFlag(f, &clockDevice, cliflags.ClockDevice, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason not to do StringFlag(f, &serverCfg.ClockDevice, ...)
and avoid the separate global var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it as you are suggesting.
@@ -39,7 +39,7 @@ case "${1-}" in | |||
XGOARCH=amd64 | |||
XCMAKE_SYSTEM_NAME=Linux | |||
TARGET_TRIPLE=x86_64-unknown-linux-gnu | |||
LDFLAGS="-static-libgcc -static-libstdc++" | |||
LDFLAGS="-static-libgcc -static-libstdc++ -lrt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your patch work for the other targets supported below? What does the build/builder.sh mkrelease
command says for the various targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work for darwin
, windows
and linux-musl
too.
linux-gnueabi
and linux-msan
don't work for me for the unmodified master branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: are you sure you want to mention IEEE 1588 in the commit title and release notes? I think it's misleading. We're technically using the Linux "PTP hardware clock user space API" (as per here https://www.kernel.org/doc/Documentation/ptp/ptp.txt), but the time source doesn't need to have anything to do with the Precision Time Protocol. I'd just make reference to this API, and otherwise downplay the PTP part.
nit: please make the release note more of a full sentence, and mention the new flag. See this for some guidelines: https://wiki.crdb.io/wiki/spaces/CRDB/pages/186548364/Release+notes
Also, the Release note()
wants to have one of a predefined categories. The "low risk new functionality" part belongs in the Release justification
stanza.
pls see comment about the "global state"
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @darinpp)
build/builder/mkrelease.sh, line 42 at r1 (raw file):
XCMAKE_SYSTEM_NAME=Linux TARGET_TRIPLE=x86_64-unknown-linux-gnu LDFLAGS="-static-libgcc -static-libstdc++ -lrt"
pls add a comment somewhere about why linking this is necessary? Do we really not get all of time.h
without it?
pkg/base/config.go, line 217 at r1 (raw file):
RPCHeartbeatInterval time.Duration // Enables the use of an IEEE 1588 PTP clock device for HLC current time
nit: missing period.
pkg/base/config.go, line 218 at r1 (raw file):
// Enables the use of an IEEE 1588 PTP clock device for HLC current time ClockDevice string
please put Path
in the name, and otherwise comment that this is a path. Also please make reference to the API we're using. Something to include the fd<<3 | 3
convention. In the worst case, a commit in the linux kernel that introduced it. I found some stuff on it in the past with some googling.
pkg/cli/cliflags/flags.go, line 679 at r1 (raw file):
Name: "clock-device", Description: ` Use IEEE 1588 PTP clock device for HLC current time.
I'd get rid of IEEE 1588 and make reference to some description of the API the device needs to implement.
Also please put more words in the description. Say something about this overriding the "regular" clock.
pkg/cli/cliflags/flags.go, line 680 at r1 (raw file):
Description: ` Use IEEE 1588 PTP clock device for HLC current time. <PRE>
Please say that this is only available on Linux (right?).
Also consider adding some validation to error out when the flag is used on other platforms (besides the validation based on the version of UseClockDevice
that gets compiled in - where +build !windows
is not strictly restricted to linux).
pkg/server/status/runtime.go, line 507 at r1 (raw file):
log.Infof(ctx, "runtime stats: %s RSS, %d goroutines, %s/%s/%s GO alloc/idle/total%s, "+ "%s/%s CGO alloc/total, %.1f CGO/sec, %.1f/%.1f %%(u/s)time, %.1f %%gc (%dx), "+ "%s/%s (r/w)net, timeutil.Now calls (all: %d, hlc: %d)",
these lines are already hard to read; you sure you want this new information? I think it's useful only when developing. I'd put it in a separate VEventf(2)
message.
Or, rather, I'd get rid of the counters. I don't think it's worth the code. And I'd rather not worry about the performance impact of the atomic increment either. For such a common increment we might want to look at per-cpu counters (which can be hacked in go with sync.Pool
I think).
pkg/util/hlc/hlc.go, line 122 at r1 (raw file):
// c := hlc.NewClock(hlc.UnixNano, ...). func UnixNano() int64 { return timeutil.HlcNow().UnixNano()
nit: s/Hlc/HLC everywhere
pkg/util/timeutil/now_unix.go, line 28 at r1 (raw file):
"time" errors2 "github.com/pkg/errors"
just one errors
pkg pls. The right one to use is cockroachdb/errors
pkg/util/timeutil/now_unix.go, line 33 at r1 (raw file):
var ( clockDevice *os.File clockDeviceId uintptr
as the linter says, make it clockDeviceID
pkg/util/timeutil/now_unix.go, line 47 at r1 (raw file):
// Use the given device as an HLC clock source. func UseClockDevice(clockDeviceName string) error {
Name? Path...
pkg/util/timeutil/now_unix.go, line 47 at r1 (raw file):
// Use the given device as an HLC clock source. func UseClockDevice(clockDeviceName string) error {
I don't like how this clock device works because it changes global state. And we really don't like globals. They always cause troubles - for example you can't run two servers in a process easily any more.
Everything should be scoped to a server. I think the clock device should be used to initialize an hlc clock - either make hlc.NewClock()
take the device directly, or pass in a closure that uses the respective device as the existing physicalClock
argument.
Also, I'd try to keep the new per-OS code out of now.go
, in favor of keeping it in the hlc package. In fact, it seems to me like now.go can be un-split; there no longer seems to be a need for now_windows.go
according to the comments in it. I think this would be a great time to see if we can clean that up.
pkg/util/timeutil/now_unix.go, line 54 at r1 (raw file):
clockDevice, err = os.Open(clockDeviceName) if err == nil { clockDeviceFd := clockDevice.Fd()
clockDeviceFD
pkg/util/timeutil/now_unix.go, line 55 at r1 (raw file):
if err == nil { clockDeviceFd := clockDevice.Fd() clockId := (^clockDeviceFd << 3) | 3
clockID
pkg/util/timeutil/now_unix.go, line 55 at r1 (raw file):
if err == nil { clockDeviceFd := clockDevice.Fd() clockId := (^clockDeviceFd << 3) | 3
please include a link explaining this magic here too
pkg/util/timeutil/now_unix.go, line 68 at r1 (raw file):
clockDeviceId = clockId } else { return fmt.Errorf("Can't open %s, err %s\n", clockDeviceName, err)
errors.Errorf()
pls. And don't capitalize the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @darinpp)
pkg/base/config.go, line 218 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please put
Path
in the name, and otherwise comment that this is a path. Also please make reference to the API we're using. Something to include thefd<<3 | 3
convention. In the worst case, a commit in the linux kernel that introduced it. I found some stuff on it in the past with some googling.
This is what I had found. Don't know what happened to man page the patch touches, as it doesn't seem to have this content.
https://lore.kernel.org/patchwork/patch/868609/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
build/builder/mkrelease.sh, line 42 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls add a comment somewhere about why linking this is necessary? Do we really not get all of
time.h
without it?
It seems like that with the library versions that we are using it has to be explicit
pkg/base/config.go, line 217 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: missing period.
fixed
pkg/base/config.go, line 218 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
This is what I had found. Don't know what happened to man page the patch touches, as it doesn't seem to have this content.
https://lore.kernel.org/patchwork/patch/868609/
I added the path and also the reference to the API in ClockDevice
pkg/cli/cliflags/flags.go, line 679 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'd get rid of IEEE 1588 and make reference to some description of the API the device needs to implement.
Also please put more words in the description. Say something about this overriding the "regular" clock.
removed IEEE 1588
pkg/cli/cliflags/flags.go, line 680 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Please say that this is only available on Linux (right?).
Also consider adding some validation to error out when the flag is used on other platforms (besides the validation based on the version ofUseClockDevice
that gets compiled in - where+build !windows
is not strictly restricted to linux).
Also mentioned that it is only for Linux. Also fixed the +build
to be linux
vs !linux
. The validation is done already in clock_gettime
. If you are using an incorrect device or you don't have such driver - it gives you back an error.
pkg/server/status/runtime.go, line 507 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
these lines are already hard to read; you sure you want this new information? I think it's useful only when developing. I'd put it in a separate
VEventf(2)
message.Or, rather, I'd get rid of the counters. I don't think it's worth the code. And I'd rather not worry about the performance impact of the atomic increment either. For such a common increment we might want to look at per-cpu counters (which can be hacked in go with
sync.Pool
I think).
I removed them.
pkg/util/hlc/hlc.go, line 122 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: s/Hlc/HLC everywhere
changed everywhere.
pkg/util/timeutil/now_unix.go, line 28 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
just one
errors
pkg pls. The right one to use iscockroachdb/errors
fixed
pkg/util/timeutil/now_unix.go, line 33 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
as the linter says, make it
clockDeviceID
done
pkg/util/timeutil/now_unix.go, line 47 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Name? Path...
added Path
pkg/util/timeutil/now_unix.go, line 47 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't like how this clock device works because it changes global state. And we really don't like globals. They always cause troubles - for example you can't run two servers in a process easily any more.
Everything should be scoped to a server. I think the clock device should be used to initialize an hlc clock - either makehlc.NewClock()
take the device directly, or pass in a closure that uses the respective device as the existingphysicalClock
argument.Also, I'd try to keep the new per-OS code out of
now.go
, in favor of keeping it in the hlc package. In fact, it seems to me like now.go can be un-split; there no longer seems to be a need fornow_windows.go
according to the comments in it. I think this would be a great time to see if we can clean that up.
I changed to a closure.
pkg/util/timeutil/now_unix.go, line 54 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
clockDeviceFD
also capitalized
pkg/util/timeutil/now_unix.go, line 55 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
clockID
capitalized
pkg/util/timeutil/now_unix.go, line 55 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please include a link explaining this magic here too
done
pkg/util/timeutil/now_unix.go, line 68 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
errors.Errorf()
pls. And don't capitalize the message.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 10 files at r1, 7 of 11 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)
build/builder/mkrelease.sh, line 42 at r1 (raw file):
Previously, darinpp wrote…
It seems like that with the library versions that we are using it has to be explicit
To reply to andrei's comment, clock_gettime
was not part of the standard C library prior to glibc 2.17, and that's what our release builds use.
@darinpp I still think that Andrei's request is valid: please add a comment to explain. Your future self will thank you.
pkg/cli/cliflags/flags.go, line 680 at r1 (raw file):
Previously, darinpp wrote…
Also mentioned that it is only for Linux. Also fixed the
+build
to belinux
vs!linux
. The validation is done already inclock_gettime
. If you are using an incorrect device or you don't have such driver - it gives you back an error.
I'm ok with this behavior - Andrei if I am running a non-linux system and I happens to have the device available, I don't want my software to tell me "you're not using linux, go away". Dependency checks should be fine grained.
pkg/util/hlc/hlc_clock_device_linix.go, line 28 at r2 (raw file):
) // ClockDevice contains the handle of the clock device as well as the
I was trying the code out on my computer (which does not have a PTP device) and I got disappointed by the error message:
$ cockroach start --clock-device /dev/ptp0
ERROR: creating clock device: file does not exist: /dev/ptp0
The problem here is that the error messages gives me the impression that CockroachDB is the program creating /dev/ptp0 and the device in my OS.
I think the wording creating clock device
in the error is poor.
This inconvenience then brought me to this file and now as a programmer I am confused: this struct ClockDevice
is not a device - it is merely a handle for some abstraction provided by the operating system.
Consider: unix does not give you files, it gives you file descriptors. It does not give you IP connections, it gives you sockets. It does not give you memory, it gives you pointers.
You need a word here and in the error message that better explains what is going on here.
What about: ClockSource
for the name of the struct:
- the fields inside the struct can remain unchanged:
ClockSource
abstracts a device from the OS into a source inside CockroachDB - the command-line flag
--clock-device
can remain unchanged - the contructour would be
MakeClockSource
- the error message would be
instantiating clock source
what do you think?
pkg/util/hlc/hlc_clock_device_linix.go, line 48 at r2 (raw file):
log.Infof( context.TODO(), "Opened clock device %s with fd %x, mod_fd %x\n",
nit: log messages start with lowercase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nits.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)
build/builder/mkrelease.sh, line 42 at r1 (raw file):
Previously, knz (kena) wrote…
To reply to andrei's comment,
clock_gettime
was not part of the standard C library prior to glibc 2.17, and that's what our release builds use.@darinpp I still think that Andrei's request is valid: please add a comment to explain. Your future self will thank you.
If that's true (and I guess it must be, cause a simple make build
on Linux works without it, right?), consider adding a comment saying that we won't need to link this realtime library when we move to a newer libc.
pkg/cli/cliflags/flags.go, line 680 at r1 (raw file):
Previously, knz (kena) wrote…
I'm ok with this behavior - Andrei if I am running a non-linux system and I happens to have the device available, I don't want my software to tell me "you're not using linux, go away". Dependency checks should be fine grained.
I'm pretty sure the API we're using is Linux specific - it comes from here: http://linuxptp.sourceforge.net/
I haven't found any references to BSD supporting it.
pkg/util/hlc/hlc_clock_device_linix.go, line 2 at r2 (raw file):
// Copyright 2017 The Cockroach Authors. //
s/linix/linux in file name
pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):
// clock id. type ClockDevice struct { clockDevice *os.File
is anybody using this field? We never close the file, do we? If nobody's using it, let's get rid of it.
And then we can just type ClockDevice uintptr
.
pkg/util/hlc/hlc_clock_device_linix.go, line 44 at r2 (raw file):
// For clarification of how the clock id is computed: // https://lore.kernel.org/patchwork/patch/868609/ // https://github.com/torvalds/linux/blob/master/tools/testing/selftests/ptp/testptp.c#L87
links to a branch decay as the branch moves forward. Please put a a commit hash instead of master
. You can easily get the url with the current one in Github by pressing y
when you have the file open.
pkg/util/hlc/hlc_clock_device_linix.go, line 47 at r2 (raw file):
clockID := (^clockDeviceFD << 3) | 3 log.Infof( context.TODO(),
no context.TODO()
in new functions please. Take a ctx
.
pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):
"Opened clock device %s with fd %x, mod_fd %x\n", clockDevicePath, clockDeviceFD,
nit: I know that File.Fd()
returns a uintptr
, but I think it's weird to print it as hex. I believe I've only seen file descriptors printed as decimal - for example, in /proc//fd.
pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):
"Opened clock device %s with fd %x, mod_fd %x\n", clockDevicePath, clockDeviceFD,
nit: consider getting rid of the two last arguments. They seem too in the weeds for an Infof()
, and generally not very useful even at higher verbosity in my opinion.
pkg/util/hlc/hlc_clock_device_linix.go, line 56 at r2 (raw file):
_, err := C.clock_gettime(C.clockid_t(clockID), &ts) if err != nil { return result, errors.Wrap(err, "UseClockDevice: error calling clock_gettime")
say something about "invalid clock device"? That's why we're doing this reading here, isn't it?
pkg/util/hlc/hlc_clock_device_linix.go, line 60 at r2 (raw file):
result.clockDeviceID = clockID } else { return result, errors.Errorf("can't open %s, err %s\n", clockDevicePath, err)
invert the if err == nil
condition, move this early return up, and unindent the main code path
pkg/util/hlc/hlc_clock_device_stub.go, line 24 at r2 (raw file):
// clock id. type ClockDevice struct { clockDevice *os.File
anybody using these fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)
build/builder/mkrelease.sh, line 42 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
If that's true (and I guess it must be, cause a simple
make build
on Linux works without it, right?), consider adding a comment saying that we won't need to link this realtime library when we move to a newer libc.
I added a comment that we need -lrt
this until we upgrade.
pkg/util/hlc/hlc_clock_device_stub.go, line 24 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
anybody using these fields?
Not explicitly. But this is needed to keep the device open and to prevent the GC from closing it.
pkg/util/hlc/hlc_clock_device_linix.go, line 2 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
s/linix/linux in file name
done
pkg/util/hlc/hlc_clock_device_linix.go, line 28 at r2 (raw file):
Previously, knz (kena) wrote…
I was trying the code out on my computer (which does not have a PTP device) and I got disappointed by the error message:
$ cockroach start --clock-device /dev/ptp0 ERROR: creating clock device: file does not exist: /dev/ptp0
The problem here is that the error messages gives me the impression that CockroachDB is the program creating /dev/ptp0 and the device in my OS.
I think the wording
creating clock device
in the error is poor.This inconvenience then brought me to this file and now as a programmer I am confused: this struct
ClockDevice
is not a device - it is merely a handle for some abstraction provided by the operating system.Consider: unix does not give you files, it gives you file descriptors. It does not give you IP connections, it gives you sockets. It does not give you memory, it gives you pointers.
You need a word here and in the error message that better explains what is going on here.
What about:
ClockSource
for the name of the struct:
- the fields inside the struct can remain unchanged:
ClockSource
abstracts a device from the OS into a source inside CockroachDB- the command-line flag
--clock-device
can remain unchanged- the contructour would be
MakeClockSource
- the error message would be
instantiating clock source
what do you think?
changed the name to ClockSource
and updated the error message
pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
is anybody using this field? We never close the file, do we? If nobody's using it, let's get rid of it.
And then we can justtype ClockDevice uintptr
.
The field is needed to keep the file open. Otherwise GC will close the file.
pkg/util/hlc/hlc_clock_device_linix.go, line 44 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
links to a branch decay as the branch moves forward. Please put a a commit hash instead of
master
. You can easily get the url with the current one in Github by pressingy
when you have the file open.
done
pkg/util/hlc/hlc_clock_device_linix.go, line 47 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
no
context.TODO()
in new functions please. Take actx
.
fixed
pkg/util/hlc/hlc_clock_device_linix.go, line 48 at r2 (raw file):
Previously, knz (kena) wrote…
nit: log messages start with lowercase
changed to lowercase
pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: I know that
File.Fd()
returns auintptr
, but I think it's weird to print it as hex. I believe I've only seen file descriptors printed as decimal - for example, in /proc//fd.
fixed it so the FD is printed as decimal.
pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: consider getting rid of the two last arguments. They seem too in the weeds for an
Infof()
, and generally not very useful even at higher verbosity in my opinion.
I disagree here. There was a situation in which the fd variable was shadowed and was defaulting to zero. The clock_gettime
works fine in this case but uses the wrong source. You won't be able to determine that this is what happened unless you have this info.
pkg/util/hlc/hlc_clock_device_linix.go, line 56 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
say something about "invalid clock device"? That's why we're doing this reading here, isn't it?
There could be a number of possible issues here. The device may not be invalid but there still may be a problem with getting the time. We are doing the call here to verify that getting the time works at least once. It is a lot less likely that it won't work after that first call succeeds. It will weed out the problems with missing kernel driver, incorrect device, permissions etc.
pkg/util/hlc/hlc_clock_device_linix.go, line 60 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
invert the
if err == nil
condition, move this early return up, and unindent the main code path
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but please simplify the !linux
ClockDevice
struct.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)
pkg/util/hlc/hlc_clock_device_stub.go, line 24 at r2 (raw file):
Previously, darinpp wrote…
Not explicitly. But this is needed to keep the device open and to prevent the GC from closing it.
if we're never setting this field, what GC? Note that here I'm talking strictly about this !linux
ClockDevice
. I think it can be an empty struct.
pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):
Previously, darinpp wrote…
The field is needed to keep the file open. Otherwise GC will close the file.
Interesting. Please add a comment about it.
pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):
Previously, darinpp wrote…
I disagree here. There was a situation in which the fd variable was shadowed and was defaulting to zero. The
clock_gettime
works fine in this case but uses the wrong source. You won't be able to determine that this is what happened unless you have this info.
well it's not just any sort of programming errors that logs are meant to help you debug. Log are for dynamic conditions.
Anyway, as you wish.
pkg/util/hlc/hlc_clock_device_linix.go, line 56 at r2 (raw file):
Previously, darinpp wrote…
There could be a number of possible issues here. The device may not be invalid but there still may be a problem with getting the time. We are doing the call here to verify that getting the time works at least once. It is a lot less likely that it won't work after that first call succeeds. It will weed out the problems with missing kernel driver, incorrect device, permissions etc.
If the device is "valid", why would the call fail? A missing driver means an "invalid" device in my book. Permission errors are checked by the open call above, I guess. But even if they weren't, I'd still find an error talking about an invalid clock device and its path more useful than something telling me about clock_gettime
.
But as you wish.
pkg/util/hlc/hlc_clock_device_linux.go, line 41 at r3 (raw file):
result.clockDevice, err = os.Open(clockDevicePath) if err != nil { return result, errors.Errorf("can't open %s, err %s\n", clockDevicePath, err)
I think you want errors.Wrapf()
. And no \n
pls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also with the same nits as andrei
Reviewed 1 of 11 files at r2, 4 of 4 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)
pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Interesting. Please add a comment about it.
I concur a comment would be useful here.
pkg/util/hlc/hlc_clock_device_linux.go, line 41 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think you want
errors.Wrapf()
. And no\n
pls.
that's errors.Wrapf(err, "cannot open %s", clockDevicePath)
❌ The GitHub CI (Cockroach) build has failed on e13a3ad0. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. I am experimental - my owner is otan. |
Adds support for Linux for using PTP user space API clock device for HLC. This is needed in case that the host clock is prone to large jumps (as is the case when using vMotion). Release note (cli change): added new start option --clock-device that alows HLC to use PTP user space API clock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @darinpp, and @knz)
pkg/util/hlc/hlc_clock_device_stub.go, line 24 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
if we're never setting this field, what GC? Note that here I'm talking strictly about this
!linux
ClockDevice
. I think it can be an empty struct.
Ah. Got it. Removed both fields in the !linux
case.
pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):
Previously, knz (kena) wrote…
I concur a comment would be useful here.
done
pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
well it's not just any sort of programming errors that logs are meant to help you debug. Log are for dynamic conditions.
Anyway, as you wish.
It isn't worth adding a separate VEventf
just for these two when they are printed only once and the message makes sense to be there w/o them already.
pkg/util/hlc/hlc_clock_device_linix.go, line 56 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
If the device is "valid", why would the call fail? A missing driver means an "invalid" device in my book. Permission errors are checked by the open call above, I guess. But even if they weren't, I'd still find an error talking about an invalid clock device and its path more useful than something telling me about
clock_gettime
.
But as you wish.
My point is that I shouldn't make the determination that the device is invalid purely on the basis of it returning an error. The possible errors are not documented and may change over time. I'd rather log what the code tried doing - calling clock_gettime
and what was the error.
pkg/util/hlc/hlc_clock_device_linux.go, line 41 at r3 (raw file):
Previously, knz (kena) wrote…
that's
errors.Wrapf(err, "cannot open %s", clockDevicePath)
done
bors r+ |
1 similar comment
bors r+ |
Build succeeded |
Adds support for Linux for using PTP user space API clock
device for HLC. This is needed in case that the host clock is
prone to large jumps (as is the case when using vMotion).
Release note (cli change): added new start option --clock-device that
allows HLC to use PTP user space API clock