Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coordinator] Influxdb write endpoint tag copy fix #2126

Merged
merged 3 commits into from
Feb 13, 2020

Conversation

fingon
Copy link
Contributor

@fingon fingon commented Jan 30, 2020

Reusing Tags objects directly over the iteration did not work. Second
best performing option was to reuse the Tag objects within array, but
replace the Tag with the name with new one.

No existing test but fixed panics in our dev env. Not tested in our prod.

Fixes #2125.

Reusing Tags objects directly over the iteration did not work.  Second
best performing option was to reuse the Tag objects within array, but
replace the Tag with the __name__ with new one.

Fixes issue m3db#2125.
func (ii *ingestIterator) Current() (models.Tags, ts.Datapoints, xtime.Unit, []byte) {
if ii.pointIndex < len(ii.points) && ii.nextFieldIndex > 0 && len(ii.fields) > (ii.nextFieldIndex-1) {
point := ii.points[ii.pointIndex]
field := ii.fields[ii.nextFieldIndex-1]
tags := ii.tags.SetName(field.name)
tags := copyTagsWithNewName(ii.tags, field.name)

return tags, []ts.Datapoint{ts.Datapoint{Timestamp: point.Time(),
Value: field.value}}, xtime.Nanosecond, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I missed this before, it might be worth trying to choose the most granular precision to improve the M3TSZ compression ratio.

Otherwise there could be high variability in the delta-of-delta encoding when not required (i.e. a delta-of-delta encoding 4000000 nanoseconds takes many more bits than a delta-to-delta encoding of 4 milliseconds).

You could basically divide each time unit by the nanoseconds and see which one falls within exactly each unit then choose the highest occuring one that equally divides (i.e. try seconds first, then millisecond, then so on). If they exactly fall into a unit, then emit with that unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wondered about that, because we get pretty lousy M3TSZ compression ratio. That would explain it, I will address that too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it in commit, but it's only ~10% space savings in our test load (1/N scale of real thing). However, better than nothing.

Even with fully per-second data points, this seems to shave off ~10%
in my test env; not as much as I had hoped.
Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robskillington robskillington added the ci Triggers CI (useful for external contributors) label Feb 12, 2020
@schallert schallert removed the ci Triggers CI (useful for external contributors) label Feb 12, 2020
@robskillington robskillington merged commit 4c6ea78 into m3db:master Feb 13, 2020
@fingon fingon deleted the mst-influxdb-fix branch April 8, 2020 04:27
fingon added a commit to aiven/m3 that referenced this pull request May 11, 2020
Consists of three commits

(master)

[coordinator] Influxdb importer endpoint (at /api/v1/influxdb/write) (m3db#2083)

[coordinator] Influxdb write endpoint tag copy fix (m3db#2126)

(our own, for now)

influxdb: Return telegraf-compatible errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic when ingesting
3 participants