Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of generatings IDs from tags in M3Coordinator #1000

Merged
merged 14 commits into from
Oct 2, 2018
2 changes: 1 addition & 1 deletion src/query/functions/temporal/base.go
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ func (c *baseNode) processSingleRequest(request processRequest) error {
for i, m := range seriesMeta {
tags := m.Tags.WithoutName()
resultSeriesMeta[i].Tags = tags
resultSeriesMeta[i].Name = tags.ID()
resultSeriesMeta[i].Name = tags.StringID()
}

builder, err := c.controller.BlockBuilder(seriesIter.Meta(), resultSeriesMeta)
Expand Down
38 changes: 34 additions & 4 deletions src/query/models/tag.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (
"hash/fnv"
"regexp"
"sort"
"strings"
)

const (
Expand Down Expand Up @@ -145,17 +146,46 @@ func (m Matchers) ToTags() (Tags, error) {
return Normalize(tags), nil
}

// ID returns a string representation of the tags
func (t Tags) ID() string {
b := make([]byte, 0, len(t))
// StringID returns a string representation of the tags
func (t Tags) StringID() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is used at a few more places than just write so you would have to change all of them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated to the pr)

Tags keys/values are allowed to have '=' and ','; this encoding isn't invertible - i.e. given a string representation of some tags, you can't guarantee that it comes from a single value.

How is this function used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This determines what the ID of the metric will be in M3DB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should talk about this offline.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to change the method name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I added a BytesID(), so seems good to distinguish the two methods so the caller is forced to think about what they are doing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair. The goish (go-y?) way of doing this is to call WritesBytesID something like MarshalTo()/MarshalInto()

maybe call the two methods ID() and IDMarshalTo()?

var (
idLen = t.IDLen()
strBuilder = strings.Builder{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this 1.10+ only feature ? If yes, maybe we should figure out if we are still planning to support 1.9

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.9 is over a year old at this point, 1.10 is over 6 months old. 1.11 just came out.

Any reason you want to support 1.9?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ok to drop support, since we're only supporting built binaries really. We would like most people to use all this software as docker images or as binaries built from the tagged releases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, i agree that we should drop 1.9 support. Even prom only supports 1.10+ prometheus/prometheus#3856. We should call it out in our README though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yup, @richardartoul want to update the README.md (in a minimal way, we can update formatting/etc of it later) that we only support 1.10 at this time?

)

strBuilder.Grow(idLen)
for _, tag := range t {
strBuilder.WriteString(tag.Name)
strBuilder.WriteByte(eq)
strBuilder.WriteString(tag.Value)
strBuilder.WriteByte(sep)
}

return strBuilder.String()
}

// WriteBytesID writes out the ID representation
// of the tags into the provided buffer.
func (t Tags) WriteBytesID(b []byte) []byte {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why does this one need a buffer, rather than creating one like the stringID function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the caller controls the buffer which enables us to add pooling later without making an API change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I like the caller providing a buffer

for _, tag := range t {
b = append(b, tag.Name...)
b = append(b, eq)
b = append(b, tag.Value...)
b = append(b, sep)
}

return string(b)
return b
}

// IDLen returns the length of the ID that would be
// generated from the tags.
func (t Tags) IDLen() int {
idLen := 2 * len(t) // account for eq and sep
for _, tag := range t {
idLen += len(tag.Name)
idLen += len(tag.Value)
}
return idLen
}

// IDWithExcludes returns a string representation of the tags excluding some tag keys
Expand Down
12 changes: 10 additions & 2 deletions src/query/models/tag_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,17 @@ func createTags(withName bool) Tags {
return tags
}

func TestTagID(t *testing.T) {
func TestTagStringID(t *testing.T) {
tags := createTags(false)
assert.Equal(t, tags.ID(), "t1=v1,t2=v2,")
assert.Equal(t, "t1=v1,t2=v2,", tags.StringID())
}

func TestTagByteID(t *testing.T) {
var (
tags = createTags(false)
b = tags.WriteBytesID([]byte{})
)
assert.Equal(t, []byte("t1=v1,t2=v2,"), b)
}

func TestWithoutName(t *testing.T) {
Expand Down
2 changes: 1 addition & 1 deletion src/query/policy/resolver/static.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ func (r *staticResolver) Resolve(
return nil, err
}
requests[0] = tsdb.FetchRequest{
ID: tags.ID(),
ID: tags.StringID(),
Ranges: ranges,
}

Expand Down
14 changes: 9 additions & 5 deletions src/query/storage/m3/storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -269,11 +269,15 @@ func (s *m3storage) Write(
return errors.ErrNilWriteQuery
}

id := query.Tags.ID()
// TODO: Consider caching id -> identID
identID := ident.StringID(id)
// TODO: Consider caching id -> identID (requires setting NoFinalize()).
var (
// TODO: Pool this once an ident pool is setup.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd probably want to instead read this directly as bytes once we eventually have the .proto changes for them to come directly as bytes from the wire.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we even use t Tags at that point? might as well implement the interface over the grpc generated type

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm that would only be for tags though I guess, you still need a buffer to write out the ID concat'd together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will work because prometheus will send over the tags as []byte but the ID is a concatenation of all the tags so you would still need to copy all the tags into a new buffer (which would be the ID)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this comment to say we'd have to remove NoFinalize() if we started pooling this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah although if we do that its gonna reintroduce the issue of copying which is annoying...Should we make the session API assume ownership of the ID? then it won't have to do a copy internally

buf = make([]byte, 0, query.Tags.IDLen())
idBuf = query.Tags.WriteBytesID(buf)
id = ident.BytesID(idBuf)
)
// Set id to NoFinalize to avoid cloning it in write operations
identID.NoFinalize()
id.NoFinalize()
tagIterator := storage.TagsToIdentTagIterator(query.Tags)

var (
Expand All @@ -287,7 +291,7 @@ func (s *m3storage) Write(
datapoint := datapoint
wg.Add(1)
s.writeWorkerPool.Go(func() {
if err := s.writeSingle(ctx, query, datapoint, identID, tagIter); err != nil {
if err := s.writeSingle(ctx, query, datapoint, id, tagIter); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a special case here for when len(query.Datapoints) == 1 to just return s.writeSingle(...) as discussed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

multiErr.add(err)
}

Expand Down
2 changes: 1 addition & 1 deletion src/query/test/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ type match struct {
type matches []match

func (m matches) Len() int { return len(m) }
func (m matches) Less(i, j int) bool { return m[i].metas.Tags.ID() > m[j].metas.Tags.ID() }
func (m matches) Less(i, j int) bool { return m[i].metas.Tags.StringID() > m[j].metas.Tags.StringID() }
func (m matches) Swap(i, j int) { m[i], m[j] = m[j], m[i] }

// CompareLists compares series meta / index pairs
Expand Down