-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigTable storage backend for Cortex #468
Conversation
Thanks @tomwilkie. Can you please rebase this before we review it? |
Rebased and squashed, thanks @jml |
@@ -122,6 +122,10 @@ update-gazelle: | |||
gazelle -go_prefix github.com/weaveworks/cortex -external vendored \ | |||
-build_file_name BUILD.bazel | |||
|
|||
update-vendor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaron7 here the update-vendor
target we discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful, thanks. Minor niggles.
I've only done a "can I understand the code?" level review, rather than a review of how we are actually using BigTable. If you want the latter, it'd probably be easier to review a design doc than the code.
pkg/chunk/gcp/instrumentation.go
Outdated
|
||
// DynamoDB latency seems to range from a few ms to a few sec and is | ||
// important. So use 8 buckets from 64us to 8s. | ||
Buckets: prometheus.ExponentialBuckets(0.000128, 4, 8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either comment or call to ExponentialBuckets
is wrong. When I evaluate, I get:
[0.000128, 0.000512, 0.002048, 0.008192, 0.032768, 0.131072, 0.524288, 2.097152]
So:
- not 8s
- not from 64us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just copied from DynamoDB. Have fixed comment in both cases.
pkg/chunk/gcp/storage_client.go
Outdated
b.tables[tableName] = rows | ||
} | ||
|
||
// TODO the hashValue should actually be hashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just do this now? Is it more than just picking a hash function & calling it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently have data written in this format, so I need to do a proper migration for it. Will add this to the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its also not clear this will affect load balancing on BigTable, as it dynamically splits ranges (completely different to how DynamoDB load balances data).
pkg/chunk/gcp/storage_client.go
Outdated
}, bigtable.RowFilter(bigtable.FamilyFilter(columnFamily))) | ||
} | ||
|
||
type bigtableReadBatch bigtable.Row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put a comment here explaining the purpose of this, giving some hint as to why Len() always is 1 etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - its just a side effect of smashing these two interfaces together. BigTable returns rows one-by-one.
} | ||
|
||
// NewTableClient returns a new TableClient. | ||
func NewTableClient(ctx context.Context, cfg Config) (chunk.TableClient, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you believed in returning structs, not interfaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a blog about this recently! I thought of you. The reason they gave is that it makes it easier to find the implementation. Meh.
The struct is private anyway, so exposing it would give a lint error.
- Currently stores chunks in BigTable only. - Minimal amount of refactoring, had to: - break out the storage client factory into its own package to avoid import loops. - export a few methods on Chunk (ExternalKey and Decode). - Add stack traces to some of the chunk store errors, and log them on queries. - Instrument the BigTable gRPC client. Also, fix some logic in the previous BigTable storage engine: - Return correct part of the row key as the range key - Correctly stop reading bigtable results when trying to read to the end of a 'row'. - Log the range key we read from bigtable. - Chunks are being written before calculating their checksum, which shouldn't happen. - Don't return empty chunks for BigTable backend.
Thanks for the review @jml! Hold on merging just yet, I'm still tracking down one last bug (I hope). |
Okay fixed the last bug. All good to go now! |
@jml ping? |
Thanks. Sorry for the delay—it's been a very full week. |
No worries; thank you! |
@tomwilkie Whoa, this is awesome. I work on the Bigtable team so please let me know if there's anything we can do to help with this. |
Will do Gary! I must say after working with DynamoDB, BigTable is a breath
of fresh air. So much easier to use and better performance. Will write a
blog post about it soon. And using gRPC makes it so much easier to
instrument. Good work.
…On Fri, 28 Jul 2017 at 14:25, Gary Elliott ***@***.***> wrote:
@tomwilkie <https://github.com/tomwilkie> Whoa, this is awesome. I work
on the Bigtable team so please let me know if there's anything we can do to
help with this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#468 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbGhbWJHGYb8PEhpm9k92mTYs4NljBiks5sSeFQgaJpZM4N-m7_>
.
|
Well that's awesome to hear and I look forward to the blog post! |
@tomwilkie – thank you, that's music to my ears! I'm the PM for Bigtable at Google, and am also eagerly awaiting your blog post. |
@tomwilkie – how does one configure Cortex to use Bigtable as a storage backend? Are there docs or a howto one can follow? |
Hi Misha - I'll write up some instructions next week for you. Right now
it's all in my head.
…On Sat, 29 Jul 2017 at 17:02, Misha Brukman ***@***.***> wrote:
@tomwilkie <https://github.com/tomwilkie> – how does one configure Cortex
to use Bigtable as a storage backend? Are there docs or a howto one can
follow?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#468 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbGhf5hd8I_sqyx2Ml0tthFV-Mkjz5Wks5sS1eAgaJpZM4N-m7_>
.
|
Includes #463