Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hlc: use PTP user space API as HLC clock #46942

Merged
merged 1 commit into from
Apr 7, 2020

Conversation

darinpp
Copy link
Contributor

@darinpp darinpp commented Apr 2, 2020

Adds support for Linux for using PTP user space API clock
device for HLC. This is needed in case that the host clock is
prone to large jumps (as is the case when using vMotion).

Release note (cli change): added new start option --clock-device that
allows HLC to use PTP user space API clock

@darinpp darinpp requested a review from andreimatei April 2, 2020 21:03
@darinpp darinpp requested a review from a team as a code owner April 2, 2020 21:03
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only commenting on the CLI/build aspects of this. I'll let andrei or the other team members comment on the HLC changes.

@@ -213,6 +213,9 @@ type Config struct {
// connections to determine connection health and update the local view
// of remote clocks.
RPCHeartbeatInterval time.Duration

// Enables the use of an IEEE 1588 PTP clock device for HLC current time
ClockDevice string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reset in InitDefaults() below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/cli/flags.go Outdated
@@ -51,6 +51,7 @@ var serverSQLAddr, serverSQLPort string
var serverSQLAdvertiseAddr, serverSQLAdvertisePort string
var serverHTTPAddr, serverHTTPPort string
var localityAdvertiseHosts localityList
var clockDevice string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reset to its default value in context.go. However see my next comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

pkg/cli/flags.go Outdated
@@ -358,6 +359,7 @@ func init() {
VarFlag(f, &serverCfg.Stores, cliflags.Store)
VarFlag(f, &serverCfg.StorageEngine, cliflags.StorageEngine)
VarFlag(f, &serverCfg.MaxOffset, cliflags.MaxOffset)
StringFlag(f, &clockDevice, cliflags.ClockDevice, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason not to do StringFlag(f, &serverCfg.ClockDevice, ...) and avoid the separate global var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it as you are suggesting.

@@ -39,7 +39,7 @@ case "${1-}" in
XGOARCH=amd64
XCMAKE_SYSTEM_NAME=Linux
TARGET_TRIPLE=x86_64-unknown-linux-gnu
LDFLAGS="-static-libgcc -static-libstdc++"
LDFLAGS="-static-libgcc -static-libstdc++ -lrt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your patch work for the other targets supported below? What does the build/builder.sh mkrelease command says for the various targets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work for darwin, windows and linux-musl too.
linux-gnueabi and linux-msan don't work for me for the unmodified master branch.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are you sure you want to mention IEEE 1588 in the commit title and release notes? I think it's misleading. We're technically using the Linux "PTP hardware clock user space API" (as per here https://www.kernel.org/doc/Documentation/ptp/ptp.txt), but the time source doesn't need to have anything to do with the Precision Time Protocol. I'd just make reference to this API, and otherwise downplay the PTP part.

nit: please make the release note more of a full sentence, and mention the new flag. See this for some guidelines: https://wiki.crdb.io/wiki/spaces/CRDB/pages/186548364/Release+notes
Also, the Release note() wants to have one of a predefined categories. The "low risk new functionality" part belongs in the Release justification stanza.

pls see comment about the "global state"

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @darinpp)


build/builder/mkrelease.sh, line 42 at r1 (raw file):

      XCMAKE_SYSTEM_NAME=Linux
      TARGET_TRIPLE=x86_64-unknown-linux-gnu
      LDFLAGS="-static-libgcc -static-libstdc++ -lrt"

pls add a comment somewhere about why linking this is necessary? Do we really not get all of time.h without it?


pkg/base/config.go, line 217 at r1 (raw file):

	RPCHeartbeatInterval time.Duration

	// Enables the use of an IEEE 1588 PTP clock device for HLC current time

nit: missing period.


pkg/base/config.go, line 218 at r1 (raw file):

	// Enables the use of an IEEE 1588 PTP clock device for HLC current time
	ClockDevice        string

please put Path in the name, and otherwise comment that this is a path. Also please make reference to the API we're using. Something to include the fd<<3 | 3 convention. In the worst case, a commit in the linux kernel that introduced it. I found some stuff on it in the past with some googling.


pkg/cli/cliflags/flags.go, line 679 at r1 (raw file):

		Name: "clock-device",
		Description: `
Use IEEE 1588 PTP clock device for HLC current time. 

I'd get rid of IEEE 1588 and make reference to some description of the API the device needs to implement.
Also please put more words in the description. Say something about this overriding the "regular" clock.


pkg/cli/cliflags/flags.go, line 680 at r1 (raw file):

		Description: `
Use IEEE 1588 PTP clock device for HLC current time. 
<PRE>

Please say that this is only available on Linux (right?).
Also consider adding some validation to error out when the flag is used on other platforms (besides the validation based on the version of UseClockDevice that gets compiled in - where +build !windows is not strictly restricted to linux).


pkg/server/status/runtime.go, line 507 at r1 (raw file):

	log.Infof(ctx, "runtime stats: %s RSS, %d goroutines, %s/%s/%s GO alloc/idle/total%s, "+
		"%s/%s CGO alloc/total, %.1f CGO/sec, %.1f/%.1f %%(u/s)time, %.1f %%gc (%dx), "+
		"%s/%s (r/w)net, timeutil.Now calls (all: %d, hlc: %d)",

these lines are already hard to read; you sure you want this new information? I think it's useful only when developing. I'd put it in a separate VEventf(2) message.

Or, rather, I'd get rid of the counters. I don't think it's worth the code. And I'd rather not worry about the performance impact of the atomic increment either. For such a common increment we might want to look at per-cpu counters (which can be hacked in go with sync.Pool I think).


pkg/util/hlc/hlc.go, line 122 at r1 (raw file):

// c := hlc.NewClock(hlc.UnixNano, ...).
func UnixNano() int64 {
	return timeutil.HlcNow().UnixNano()

nit: s/Hlc/HLC everywhere


pkg/util/timeutil/now_unix.go, line 28 at r1 (raw file):

	"time"

	errors2 "github.com/pkg/errors"

just one errors pkg pls. The right one to use is cockroachdb/errors


pkg/util/timeutil/now_unix.go, line 33 at r1 (raw file):

var (
	clockDevice   *os.File
	clockDeviceId uintptr

as the linter says, make it clockDeviceID


pkg/util/timeutil/now_unix.go, line 47 at r1 (raw file):

// Use the given device as an HLC clock source.
func UseClockDevice(clockDeviceName string) error {

Name? Path...


pkg/util/timeutil/now_unix.go, line 47 at r1 (raw file):

// Use the given device as an HLC clock source.
func UseClockDevice(clockDeviceName string) error {

I don't like how this clock device works because it changes global state. And we really don't like globals. They always cause troubles - for example you can't run two servers in a process easily any more.
Everything should be scoped to a server. I think the clock device should be used to initialize an hlc clock - either make hlc.NewClock() take the device directly, or pass in a closure that uses the respective device as the existing physicalClock argument.

Also, I'd try to keep the new per-OS code out of now.go, in favor of keeping it in the hlc package. In fact, it seems to me like now.go can be un-split; there no longer seems to be a need for now_windows.go according to the comments in it. I think this would be a great time to see if we can clean that up.


pkg/util/timeutil/now_unix.go, line 54 at r1 (raw file):

	clockDevice, err = os.Open(clockDeviceName)
	if err == nil {
		clockDeviceFd := clockDevice.Fd()

clockDeviceFD


pkg/util/timeutil/now_unix.go, line 55 at r1 (raw file):

	if err == nil {
		clockDeviceFd := clockDevice.Fd()
		clockId := (^clockDeviceFd << 3) | 3

clockID


pkg/util/timeutil/now_unix.go, line 55 at r1 (raw file):

	if err == nil {
		clockDeviceFd := clockDevice.Fd()
		clockId := (^clockDeviceFd << 3) | 3

please include a link explaining this magic here too


pkg/util/timeutil/now_unix.go, line 68 at r1 (raw file):

		clockDeviceId = clockId
	} else {
		return fmt.Errorf("Can't open %s, err %s\n", clockDeviceName, err)

errors.Errorf() pls. And don't capitalize the message.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @darinpp)


pkg/base/config.go, line 218 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please put Path in the name, and otherwise comment that this is a path. Also please make reference to the API we're using. Something to include the fd<<3 | 3 convention. In the worst case, a commit in the linux kernel that introduced it. I found some stuff on it in the past with some googling.

This is what I had found. Don't know what happened to man page the patch touches, as it doesn't seem to have this content.
https://lore.kernel.org/patchwork/patch/868609/

Copy link
Contributor Author

@darinpp darinpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


build/builder/mkrelease.sh, line 42 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls add a comment somewhere about why linking this is necessary? Do we really not get all of time.h without it?

It seems like that with the library versions that we are using it has to be explicit


pkg/base/config.go, line 217 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: missing period.

fixed


pkg/base/config.go, line 218 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

This is what I had found. Don't know what happened to man page the patch touches, as it doesn't seem to have this content.
https://lore.kernel.org/patchwork/patch/868609/

I added the path and also the reference to the API in ClockDevice


pkg/cli/cliflags/flags.go, line 679 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'd get rid of IEEE 1588 and make reference to some description of the API the device needs to implement.
Also please put more words in the description. Say something about this overriding the "regular" clock.

removed IEEE 1588


pkg/cli/cliflags/flags.go, line 680 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Please say that this is only available on Linux (right?).
Also consider adding some validation to error out when the flag is used on other platforms (besides the validation based on the version of UseClockDevice that gets compiled in - where +build !windows is not strictly restricted to linux).

Also mentioned that it is only for Linux. Also fixed the +build to be linux vs !linux. The validation is done already in clock_gettime. If you are using an incorrect device or you don't have such driver - it gives you back an error.


pkg/server/status/runtime.go, line 507 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

these lines are already hard to read; you sure you want this new information? I think it's useful only when developing. I'd put it in a separate VEventf(2) message.

Or, rather, I'd get rid of the counters. I don't think it's worth the code. And I'd rather not worry about the performance impact of the atomic increment either. For such a common increment we might want to look at per-cpu counters (which can be hacked in go with sync.Pool I think).

I removed them.


pkg/util/hlc/hlc.go, line 122 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: s/Hlc/HLC everywhere

changed everywhere.


pkg/util/timeutil/now_unix.go, line 28 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

just one errors pkg pls. The right one to use is cockroachdb/errors

fixed


pkg/util/timeutil/now_unix.go, line 33 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

as the linter says, make it clockDeviceID

done


pkg/util/timeutil/now_unix.go, line 47 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Name? Path...

added Path


pkg/util/timeutil/now_unix.go, line 47 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't like how this clock device works because it changes global state. And we really don't like globals. They always cause troubles - for example you can't run two servers in a process easily any more.
Everything should be scoped to a server. I think the clock device should be used to initialize an hlc clock - either make hlc.NewClock() take the device directly, or pass in a closure that uses the respective device as the existing physicalClock argument.

Also, I'd try to keep the new per-OS code out of now.go, in favor of keeping it in the hlc package. In fact, it seems to me like now.go can be un-split; there no longer seems to be a need for now_windows.go according to the comments in it. I think this would be a great time to see if we can clean that up.

I changed to a closure.


pkg/util/timeutil/now_unix.go, line 54 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

clockDeviceFD

also capitalized


pkg/util/timeutil/now_unix.go, line 55 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

clockID

capitalized


pkg/util/timeutil/now_unix.go, line 55 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please include a link explaining this magic here too

done


pkg/util/timeutil/now_unix.go, line 68 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

errors.Errorf() pls. And don't capitalize the message.

done

@darinpp darinpp changed the title hlc: add support for IEEE 1588 PTP clock device hlc: add support for PTP user space API clock device Apr 6, 2020
@darinpp darinpp changed the title hlc: add support for PTP user space API clock device hlc: use PTP user space API as HLC clock Apr 6, 2020
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 10 files at r1, 7 of 11 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)


build/builder/mkrelease.sh, line 42 at r1 (raw file):

Previously, darinpp wrote…

It seems like that with the library versions that we are using it has to be explicit

To reply to andrei's comment, clock_gettime was not part of the standard C library prior to glibc 2.17, and that's what our release builds use.

@darinpp I still think that Andrei's request is valid: please add a comment to explain. Your future self will thank you.


pkg/cli/cliflags/flags.go, line 680 at r1 (raw file):

Previously, darinpp wrote…

Also mentioned that it is only for Linux. Also fixed the +build to be linux vs !linux. The validation is done already in clock_gettime. If you are using an incorrect device or you don't have such driver - it gives you back an error.

I'm ok with this behavior - Andrei if I am running a non-linux system and I happens to have the device available, I don't want my software to tell me "you're not using linux, go away". Dependency checks should be fine grained.


pkg/util/hlc/hlc_clock_device_linix.go, line 28 at r2 (raw file):

)

// ClockDevice contains the handle of the clock device as well as the

I was trying the code out on my computer (which does not have a PTP device) and I got disappointed by the error message:

$ cockroach start --clock-device /dev/ptp0
ERROR: creating clock device: file does not exist: /dev/ptp0

The problem here is that the error messages gives me the impression that CockroachDB is the program creating /dev/ptp0 and the device in my OS.

I think the wording creating clock device in the error is poor.

This inconvenience then brought me to this file and now as a programmer I am confused: this struct ClockDevice is not a device - it is merely a handle for some abstraction provided by the operating system.

Consider: unix does not give you files, it gives you file descriptors. It does not give you IP connections, it gives you sockets. It does not give you memory, it gives you pointers.

You need a word here and in the error message that better explains what is going on here.

What about: ClockSource for the name of the struct:

  • the fields inside the struct can remain unchanged: ClockSource abstracts a device from the OS into a source inside CockroachDB
  • the command-line flag --clock-device can remain unchanged
  • the contructour would be MakeClockSource
  • the error message would be instantiating clock source

what do you think?


pkg/util/hlc/hlc_clock_device_linix.go, line 48 at r2 (raw file):

		log.Infof(
			context.TODO(),
			"Opened clock device %s with fd %x, mod_fd %x\n",

nit: log messages start with lowercase

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)


build/builder/mkrelease.sh, line 42 at r1 (raw file):

Previously, knz (kena) wrote…

To reply to andrei's comment, clock_gettime was not part of the standard C library prior to glibc 2.17, and that's what our release builds use.

@darinpp I still think that Andrei's request is valid: please add a comment to explain. Your future self will thank you.

If that's true (and I guess it must be, cause a simple make build on Linux works without it, right?), consider adding a comment saying that we won't need to link this realtime library when we move to a newer libc.


pkg/cli/cliflags/flags.go, line 680 at r1 (raw file):

Previously, knz (kena) wrote…

I'm ok with this behavior - Andrei if I am running a non-linux system and I happens to have the device available, I don't want my software to tell me "you're not using linux, go away". Dependency checks should be fine grained.

I'm pretty sure the API we're using is Linux specific - it comes from here: http://linuxptp.sourceforge.net/
I haven't found any references to BSD supporting it.


pkg/util/hlc/hlc_clock_device_linix.go, line 2 at r2 (raw file):

// Copyright 2017 The Cockroach Authors.
//

s/linix/linux in file name


pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):

// clock id.
type ClockDevice struct {
	clockDevice   *os.File

is anybody using this field? We never close the file, do we? If nobody's using it, let's get rid of it.
And then we can just type ClockDevice uintptr.


pkg/util/hlc/hlc_clock_device_linix.go, line 44 at r2 (raw file):

		// For clarification of how the clock id is computed:
		// https://lore.kernel.org/patchwork/patch/868609/
		// https://github.com/torvalds/linux/blob/master/tools/testing/selftests/ptp/testptp.c#L87

links to a branch decay as the branch moves forward. Please put a a commit hash instead of master. You can easily get the url with the current one in Github by pressing y when you have the file open.


pkg/util/hlc/hlc_clock_device_linix.go, line 47 at r2 (raw file):

		clockID := (^clockDeviceFD << 3) | 3
		log.Infof(
			context.TODO(),

no context.TODO() in new functions please. Take a ctx.


pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):

			"Opened clock device %s with fd %x, mod_fd %x\n",
			clockDevicePath,
			clockDeviceFD,

nit: I know that File.Fd() returns a uintptr, but I think it's weird to print it as hex. I believe I've only seen file descriptors printed as decimal - for example, in /proc//fd.


pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):

			"Opened clock device %s with fd %x, mod_fd %x\n",
			clockDevicePath,
			clockDeviceFD,

nit: consider getting rid of the two last arguments. They seem too in the weeds for an Infof(), and generally not very useful even at higher verbosity in my opinion.


pkg/util/hlc/hlc_clock_device_linix.go, line 56 at r2 (raw file):

		_, err := C.clock_gettime(C.clockid_t(clockID), &ts)
		if err != nil {
			return result, errors.Wrap(err, "UseClockDevice: error calling clock_gettime")

say something about "invalid clock device"? That's why we're doing this reading here, isn't it?


pkg/util/hlc/hlc_clock_device_linix.go, line 60 at r2 (raw file):

		result.clockDeviceID = clockID
	} else {
		return result, errors.Errorf("can't open %s, err %s\n", clockDevicePath, err)

invert the if err == nil condition, move this early return up, and unindent the main code path


pkg/util/hlc/hlc_clock_device_stub.go, line 24 at r2 (raw file):

// clock id.
type ClockDevice struct {
	clockDevice   *os.File

anybody using these fields?

Copy link
Contributor Author

@darinpp darinpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)


build/builder/mkrelease.sh, line 42 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

If that's true (and I guess it must be, cause a simple make build on Linux works without it, right?), consider adding a comment saying that we won't need to link this realtime library when we move to a newer libc.

I added a comment that we need -lrt this until we upgrade.


pkg/util/hlc/hlc_clock_device_stub.go, line 24 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

anybody using these fields?

Not explicitly. But this is needed to keep the device open and to prevent the GC from closing it.


pkg/util/hlc/hlc_clock_device_linix.go, line 2 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/linix/linux in file name

done


pkg/util/hlc/hlc_clock_device_linix.go, line 28 at r2 (raw file):

Previously, knz (kena) wrote…

I was trying the code out on my computer (which does not have a PTP device) and I got disappointed by the error message:

$ cockroach start --clock-device /dev/ptp0
ERROR: creating clock device: file does not exist: /dev/ptp0

The problem here is that the error messages gives me the impression that CockroachDB is the program creating /dev/ptp0 and the device in my OS.

I think the wording creating clock device in the error is poor.

This inconvenience then brought me to this file and now as a programmer I am confused: this struct ClockDevice is not a device - it is merely a handle for some abstraction provided by the operating system.

Consider: unix does not give you files, it gives you file descriptors. It does not give you IP connections, it gives you sockets. It does not give you memory, it gives you pointers.

You need a word here and in the error message that better explains what is going on here.

What about: ClockSource for the name of the struct:

  • the fields inside the struct can remain unchanged: ClockSource abstracts a device from the OS into a source inside CockroachDB
  • the command-line flag --clock-device can remain unchanged
  • the contructour would be MakeClockSource
  • the error message would be instantiating clock source

what do you think?

changed the name to ClockSource and updated the error message


pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

is anybody using this field? We never close the file, do we? If nobody's using it, let's get rid of it.
And then we can just type ClockDevice uintptr.

The field is needed to keep the file open. Otherwise GC will close the file.


pkg/util/hlc/hlc_clock_device_linix.go, line 44 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

links to a branch decay as the branch moves forward. Please put a a commit hash instead of master. You can easily get the url with the current one in Github by pressing y when you have the file open.

done


pkg/util/hlc/hlc_clock_device_linix.go, line 47 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

no context.TODO() in new functions please. Take a ctx.

fixed


pkg/util/hlc/hlc_clock_device_linix.go, line 48 at r2 (raw file):

Previously, knz (kena) wrote…

nit: log messages start with lowercase

changed to lowercase


pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I know that File.Fd() returns a uintptr, but I think it's weird to print it as hex. I believe I've only seen file descriptors printed as decimal - for example, in /proc//fd.

fixed it so the FD is printed as decimal.


pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: consider getting rid of the two last arguments. They seem too in the weeds for an Infof(), and generally not very useful even at higher verbosity in my opinion.

I disagree here. There was a situation in which the fd variable was shadowed and was defaulting to zero. The clock_gettime works fine in this case but uses the wrong source. You won't be able to determine that this is what happened unless you have this info.


pkg/util/hlc/hlc_clock_device_linix.go, line 56 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

say something about "invalid clock device"? That's why we're doing this reading here, isn't it?

There could be a number of possible issues here. The device may not be invalid but there still may be a problem with getting the time. We are doing the call here to verify that getting the time works at least once. It is a lot less likely that it won't work after that first call succeeds. It will weed out the problems with missing kernel driver, incorrect device, permissions etc.


pkg/util/hlc/hlc_clock_device_linix.go, line 60 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

invert the if err == nil condition, move this early return up, and unindent the main code path

done

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please simplify the !linux ClockDevice struct.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)


pkg/util/hlc/hlc_clock_device_stub.go, line 24 at r2 (raw file):

Previously, darinpp wrote…

Not explicitly. But this is needed to keep the device open and to prevent the GC from closing it.

if we're never setting this field, what GC? Note that here I'm talking strictly about this !linux ClockDevice. I think it can be an empty struct.


pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):

Previously, darinpp wrote…

The field is needed to keep the file open. Otherwise GC will close the file.

Interesting. Please add a comment about it.


pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):

Previously, darinpp wrote…

I disagree here. There was a situation in which the fd variable was shadowed and was defaulting to zero. The clock_gettime works fine in this case but uses the wrong source. You won't be able to determine that this is what happened unless you have this info.

well it's not just any sort of programming errors that logs are meant to help you debug. Log are for dynamic conditions.
Anyway, as you wish.


pkg/util/hlc/hlc_clock_device_linix.go, line 56 at r2 (raw file):

Previously, darinpp wrote…

There could be a number of possible issues here. The device may not be invalid but there still may be a problem with getting the time. We are doing the call here to verify that getting the time works at least once. It is a lot less likely that it won't work after that first call succeeds. It will weed out the problems with missing kernel driver, incorrect device, permissions etc.

If the device is "valid", why would the call fail? A missing driver means an "invalid" device in my book. Permission errors are checked by the open call above, I guess. But even if they weren't, I'd still find an error talking about an invalid clock device and its path more useful than something telling me about clock_gettime.
But as you wish.


pkg/util/hlc/hlc_clock_device_linux.go, line 41 at r3 (raw file):

	result.clockDevice, err = os.Open(clockDevicePath)
	if err != nil {
		return result, errors.Errorf("can't open %s, err %s\n", clockDevicePath, err)

I think you want errors.Wrapf(). And no \n pls.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: also with the same nits as andrei

Reviewed 1 of 11 files at r2, 4 of 4 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @darinpp, and @knz)


pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Interesting. Please add a comment about it.

I concur a comment would be useful here.


pkg/util/hlc/hlc_clock_device_linux.go, line 41 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think you want errors.Wrapf(). And no \n pls.

that's errors.Wrapf(err, "cannot open %s", clockDevicePath)

@blathers-crl
Copy link

blathers-crl bot commented Apr 6, 2020

❌ The GitHub CI (Cockroach) build has failed on e13a3ad0.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. I am experimental - my owner is otan.

Adds support for Linux for using PTP user space API clock
device for HLC. This is needed in case that the host clock is
prone to large jumps (as is the case when using vMotion).

Release note (cli change): added new start option --clock-device that
alows HLC to use PTP user space API clock
Copy link
Contributor Author

@darinpp darinpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @darinpp, and @knz)


pkg/util/hlc/hlc_clock_device_stub.go, line 24 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

if we're never setting this field, what GC? Note that here I'm talking strictly about this !linux ClockDevice. I think it can be an empty struct.

Ah. Got it. Removed both fields in the !linux case.


pkg/util/hlc/hlc_clock_device_linix.go, line 31 at r2 (raw file):

Previously, knz (kena) wrote…

I concur a comment would be useful here.

done


pkg/util/hlc/hlc_clock_device_linix.go, line 50 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

well it's not just any sort of programming errors that logs are meant to help you debug. Log are for dynamic conditions.
Anyway, as you wish.

It isn't worth adding a separate VEventf just for these two when they are printed only once and the message makes sense to be there w/o them already.


pkg/util/hlc/hlc_clock_device_linix.go, line 56 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

If the device is "valid", why would the call fail? A missing driver means an "invalid" device in my book. Permission errors are checked by the open call above, I guess. But even if they weren't, I'd still find an error talking about an invalid clock device and its path more useful than something telling me about clock_gettime.
But as you wish.

My point is that I shouldn't make the determination that the device is invalid purely on the basis of it returning an error. The possible errors are not documented and may change over time. I'd rather log what the code tried doing - calling clock_gettime and what was the error.


pkg/util/hlc/hlc_clock_device_linux.go, line 41 at r3 (raw file):

Previously, knz (kena) wrote…

that's errors.Wrapf(err, "cannot open %s", clockDevicePath)

done

@darinpp
Copy link
Contributor Author

darinpp commented Apr 6, 2020

bors r+

1 similar comment
@darinpp
Copy link
Contributor Author

darinpp commented Apr 7, 2020

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 7, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants