Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reported validation error: start time after end time #84

Closed
jmacd opened this issue Jan 29, 2021 · 2 comments
Closed

Reported validation error: start time after end time #84

jmacd opened this issue Jan 29, 2021 · 2 comments
Assignees

Comments

@jmacd
Copy link
Contributor

jmacd commented Jan 29, 2021

We have an unconfirmed report that data points send from the sidecar have start time after end time. As printed in the sidecar log sending to Lightstep:

level=warn ts=2021-01-28T21:02:58.422Z caller=queue_manager.go:506 component=queue_manager msg="unrecoverable write error" err="rpc error: code = InvalidArgument desc = [metric name]: must be positive for kind=delta|cumulative]"

This should be impossible, we should check this in the sidecar to be absolutely sure.

@jmacd jmacd self-assigned this Jan 29, 2021
@jmacd
Copy link
Contributor Author

jmacd commented Jan 29, 2021

I have evidence that this is caused by a race condition. First, the code responsible for determining resets does not actually check for a negative duration:

// getResetAdjusted takes a sample for a referenced series and returns
// its reset timestamp and adjusted value.
// If the last return argument is false, the sample should be dropped.
func (c *seriesCache) getResetAdjusted(ref walRef, t promTime, v float64) (promTime, float64, bool) {
	c.mtx.Lock()
	e, ok := c.entries[ref]
	c.mtx.Unlock()
	if !ok {
		return 0, 0, false
	}
	hasReset := e.hasReset
	e.hasReset = true
	if !hasReset {
		e.resetTimestamp = t
		e.resetValue = v
		// If we just initialized the reset timestamp, this sample should be skipped.
		// We don't know the window over which the current cumulative value was built up over.
		// The next sample for will be considered from this point onwards.
		return 0, 0, false
	}
	if v < e.resetValue {
		// If the series was reset, set the reset timestamp to be one millisecond
		// before the timestamp of the current sample.
		// We don't know the true reset time but this ensures the range is non-zero
		// while unlikely to conflict with any previous sample.
		e.resetValue = 0
		e.resetTimestamp = t - 1
	}
	return e.resetTimestamp, v - e.resetValue, true
}

(This is similar to the upstream Stackdriver code base.)

The cause of the incident simply requires that samples from a timeseries appear out-of-order to the seriesCache?

@jmacd
Copy link
Contributor Author

jmacd commented Mar 5, 2021

On further investigation, this report is inconsistent. The log message quoted above does not mention time. #145 may fix a related problem, closing this until we see it again.

@jmacd jmacd closed this as completed Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant