-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunk Store docs for Bigtable #274
Comments
Hi Michal! Great news. I have a local branch which makes the underlying storage interface much more abstract (I was playing with putting this on Cassandra). I'll tidy this up and push this.
Chunks are all 1Kb. This can be changed, but increasing it can make compression less effective, so theres a tradeoff to be had here.
S3 is sloooow for small random reads. There is a memcache layer, and we ensure its sized big enough to have virtually 100% hit rate. Only queries for very old data will hit S3, and we deem it acceptable for these queries acceptable to be slightly slower. S3 is massively parallelisable (and the chunk store does indeed fetch in parallel) so the net effect is queries that hit s3 take ~1s at 99%-ile.
DynamoDB is expensive. If we put the chunks in there, it would make the system cost too much to run (for us to sell). So its a trade off. They would have fit in DynamoDB easily... I'm planning on introducing the idea of "super chunks", where 10 or so chunks worth of data is written to S3 in a single object. I need to experiment with sub-object get first, but this should reduce the cost of running the system further without any perceptible performance impact. (#141)
This was literally merged yesterday! We're not running v4 in prod yet - its in dev, and it seems to have held up for ~9:45hrs...
Yes - v3 and all the priors hot spotted DynamoDB too much. See #254. Initial results are promising with v4. You should use this, I expect you'll have similar hotspotting issues with BigTable.
I'll have to read up on this, and how Cloud BigTable differs from internal BigTable, but if it doesn't differ, you'll need to be careful with the row (range) keys as it'll hotspot the tabletservers. Let me get back to you on this.
I wouldn't cut it at this interface - there is a whole lot of encoded knowledge in the chunk store about how to match selectors (for instance, see #220 for an instance). I would pick the interface below the chunk store, between it and dynamo/s3 - this is the branch I was referring to above. This way you can reuse all the chunk store logic (and monitoring) without too much reimplementation. Let me tidy up and push this branch today and then get back to you - do you want to come over to the weaveworks office for a coffee and chat at some point?
https://drive.google.com/file/d/0BwqTw528sZRINXlRZmNQN2M2RnM/view?usp=sharing |
Hi Michal - here the PR I was referring to: #275 This is the interface I'd start with: https://github.com/weaveworks/cortex/pull/275/files#diff-1f696bedb83075f4980d18328df839a7R8 Its not 100% finished yet, and still need to abstract out the chunk storage, but its a start. I'll probably have it mostly done today. |
Thanks for responding quickly and doing the interfaces. Apologies for the slow response, we're a wee-bit snowed under :) The Cloud Bigtable is very similar to the internal one. Thanks for sharing the knowledge about hot spotting, it will definitely be interesting to see how it plays out. We'd be deploying to K8S, and we're try it on our AWS clusters first. It'd be great to have som YAMLs, if you got them :) Later, if we find that it fits our existing multi-tenant, multi-stage monitoring pipeline, we'll do the Bigtable port for the primary clusters we use on GCE. |
If thats the case you'll definitely want to use the v4 schema. Experiments show its pretty good for load balancing, we're getting good results. I'd hash the hashKey on the client side, stick that in the row key, and use the rangeKey as the columnKey. Cell contents should be empty. Doesn't appear you can do prefix matching on the row key, so you might want to do that client side. Shame the index doesn't make it more efficient.
Here you go, with some instructions: #284 |
A Bigtable back-end was added in #468 |
Hey Tom,
We'll trying out Cortex as our long-term Prometheus storage. However, we probably will consider moving it to GCE's Bigtable.
A couple of questions from the
chunk_store
implementation, before we start the spike:Chunk storage:
What's the size of the ChunkData - we're considering storing Chunks inside Bigtable along with the index. What's the latency impact of reading from blob storage (S3, GCS) of Chunks? Why did you chose to store them in S3? Because of DynamoDB write sizes?
Schemas:
I see that
compositeSchema
switches between different versions. Is v4 the one currently used? Any particular tips on why you chose v4 layout? We're thinking about a tall-but-narrow scheme for bigtable that is a concatenation of your v3 hash key and range key.I expect the only real thing that would need implementing is the
chunk.Store
interface? Is there anything else we're missing?Thanks,
Michal
The text was updated successfully, but these errors were encountered: