-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor, test and document eviction mechanisms #9
Comments
On the topic of your eviction mechanisms, I wanted to ask about a particular eviction policy. I want to look into upgrading a particular LRU cache node in my application to be distributed instead of single node. But currently it relies on an LRU which supports value types that provide a Size() method. And the eviction policy can do MaxSize. Have you considered this policy as a feature? |
Hi @justinfx. I understand your case. Implementing that feature is not hard and it's definitely useful. Olric evicts keys when a DMap object exceeds the given size limit. Is that useful for you? Would you like to explain the details of your use case? Design note: Every node stores some part of a DMap. So every node tracks the size of its own DMap part and evicts the keys when the given size exceeded. There is no central node to decide evict the keys. |
Thanks for the reply! Well the item count of the cache is actually not what I need, since my concern is the total "cost" which in this case is RAM. So I am caching images and I only want to cache up to, say, 4GB of the most recently accessed items, and I want it to evict least recently used to bring the cost under the target. |
It would be completely acceptable to have the cost apply per node as opposed to the entire cache. I cant speak for other uses of a "cost" but specifically for RAM, it would make sense for a node. |
I'll review that package and implement the feature in a convenient way. |
Hi, I have implemented the feature you requested. I'm still iterating over the eviction code and the documentation is not up-to-date. However it works fine most of time. If you want to give a try, I would like to assist you. You may help me improve the implementation and fix possible design flaws at an early stage. Thanks in advance. |
Thanks! I'm going to have some time over this holiday to have a play. Are there any links to the added api and test examples? I did a quick search and found this: How does one specify the size cost when putting an items? The comment on that test link suggests that the configured max size is split across the number of partitions? With that implementation would I be able to accurately limit a node in the cluster to 4G or is it going to depend on how many partitions are distributed? |
Olric uses
Yes, it is. DMaps are evenly distributed among partitions. So if you set If you have a cluster which consists of 6 nodes, every node should have ~45 partitions. And every partition has its own DMap chunk which consumes ~15M in RAM. So every node consumes 675M of RAM. You should know that Olric uses an append-only file with a hash table as storage engine. It allocates ~30M to store your ~15M DMap instance. So the actual memory cost is much more different. You can use
With the current state of the implementation, you can limit I prepared the configuration files to test the algorithm with 2 nodes. The configuration assumes that you have a DMap named Debug mode is enabled in tracing level, so you can see the algorithm in action while Here are the config files: https://gist.github.com/buraksezer/83e1a43f972c1c5bab3aa8b774bcb16c There are two config file in this gist. Run Olric nodes on your localhost like the following:
Then, you can run
This command calls
This scenario assumes that your preferred deployment way is client-server mode. If you want to use embedded-member mode, please let me know. memberlist configuration can be tricky, but the source code of olricd can be used to seed your application. If you have any further questions, please don't hesitate to ask. Thanks for your contribution. |
Thanks for the clarification. |
Good catch! I reviewed the algorithm. Here is my proposal: User's perspective: I may have any number of Olric nodes and I want to ensure that a DMap on the cluster should not consume more than 3G in-use memory for every host. So my distributed map instance consumes n*3G at maximum on the cluster. The cluster may scale up or down. That's not matter. How to implement that feature: Keys on a DMap are evenly distributed among partitions and partitions manage their own resources. So before deciding to evict a key, the algorithm needs to know the currently hosted partition count on the node. If you have 2 nodes with 271 partitions, the nodes have 135 partitions and every partition responsible for its own DMap chunk. So when we want to evict a key, we need to calculate It's easy to implement. Could you please review the proposal for your case? Thank you! |
I wanted to first say thank you for your level of responsiveness and willingness to consider my feedback on these user-facing design decisions, even when I am only a prospective user at this point. I really appreciate your open source contribution and your positive and welcoming attitude on your project. From the user perspective, the proposal does sound like it matches my expectation. I will be a bit redundant in reiterating some of the proposal points, just to reenforce that we are on the same page. I think this proposal is good because from the users point of view, they likely care much less about the combined cluster size and the individual partition count at any point in time, and really the concern is how much of the current node resources can be applied to the caching. Setting a Thanks! Once this feature is available, I will have a go at embedding it into my service. For more context about my service, there is a single api frontend which contains the embedded LRU cache, and then scalable distributed workers which do the cache-filling process as needed. My goal had been to keep the cache embedded, which is why it is still a single frontend. But once I can use olric, it means I can embed it into the api frontend and run more than one behind a load balancer and gain the ability to do rolling restarts/upgrades without losing the cache, since ideally it would replicate again. |
You're welcome. I have been building Olric for almost 2 years and It's a pleasure to see that someone intents to use it in production.
At this point, I should share more information about the replication algorithm. Olric implements a quorum based approach to replicate key/value pairs across the cluster. Let's take look at the following sample. You have When you want to read a key, Olric finds the replication owners and tries to read the given key on those hosts. Then sorts the responses and selects the recent one. This is called Last Write Wins policy. Only one successful read operation is enough to assume the whole operation is successful. In the case of node departure, Olric doesn't try to create new replicas for missing partitions. If a replica node is still available, your keys are still available. If you set Read repair logic runs for every read request, if it's enabled. So possible inconsistencies in partitions are repaired by read repair mechanism. It currently works synchronously. We may improve it in a future milestone. For an embedded LRU cache deployment, it's good to set I recommend set to Here is some benchmark data I remember that I had set I may miss some of the details, please don't hesitate to ask about design and implementation details. Keep in touch! |
I have implemented the new I'm also planning to implement |
Hey @justinfx. I wonder whether you used Olric in your project. Thanks! |
Hi. Sorry for not providing any feedback. It has been a busy start of the year after I returned from holidays. I would really like to be able to try integrating this work within the next 2 weeks. Thanks! |
I've started trying to integrate olric to replace my local LRU cache. I still don't have it working yet, but here is my initial feedback... If I set
There is no
As mentioned in the README, the memberlist config is a bit complicated and I don't fully understand if I need to establish 2 ports per node, or if any of it can be emphemeral (:0). I have to do something like this for my single node test:
And where I am currently stuck is that I don't seem to be getting olric to save my keys. When I do a
I then have some logging to ensure the Last few notes are things I currently can't do with olric that I was able to do in my local LRU. Thanks! |
Hi @justinfx I have just created a gist for you to test Olric with two members in a single process. I hope that it will help to fix the problem. Here is the code: https://gist.github.com/buraksezer/66f6e02bce8cebd9cfba8b5aaf2976a1 You should note that Olric currently needs to know at least one member of the cluster to join. So I added the following line to your config struct for the second node:
You should see the following lines on the console:
An automatic discovery mode may be implemented in 0.3.x milestone by using service discovery components of Kubernetes, AWS, or Consul.
I'm going to prepare a detailed answer for rest of the comment. Thank you very much for the feedback! |
Good idea. I'll investigate the ways of implementing this in a clear way.
Prefix scan is impossible to implement. Because Olric uses the builtin map to index hashed keys on to log files (they are actually byte slices in the process heap). A prefix scan requires O(n) time to run. In future, we may implement a pluggable storage engine for that kind of future. Would you like to explain your cache invalidation logic? Olric doesn't need a prefix scan to evict keys.
Could you please give more details for this? Cache stats are absolutely valuable for me. I can implement that feature before the release, if it's urgent to you.
Stats command is not distributed. The nodes return their own stats. If the A distributed version is possible. We also need to add a new command to return the current list of members in the cluster. |
Thanks for the gist example. It confirms that I need to be explicit with the ports.
I don't need olric to perform the prefix scan. I format my cache keys as combination of properties. When a certain asset has changed in my external system I want to evict all variations of the cached data for that asset. The asset id in this case is the prefix of the key. So I was able to just iterate the keys in the cache, string match, and optionally evict. An example would be if I were caching "foo::green" and "foo::red". Then I find I need to invalidate "foo", so there are 2 keys that need to be evicted.
They actually weren't critical stats. Just reporting something I used to be able to report on. They were just timestamps tracking the age of cached items. So I could see how old the oldest item in the cache was.
Good to know! It wasn't obvious because of how the stats report members and partitions and dmaps. But now I understand that they only reflect the data on the given reporting node. So I would have to iterate the member list with a client connection to see how much of the total distributed cache is being used. However this doesn't help with my blocking problem. I'm not even testing multi nodes yet. So I haven't gotten to configuring two nodes. I simply can't even get it to store the key on the single node so I figured I must have configured something wrong. The stats on the same node don't report a length. |
As far as I understand, you are trying to run a single node but it fails constantly. Could you please share the logs? I tried to run a single node with the config you provided, it runs well: https://gist.github.com/buraksezer/3da18131736336685399e6391c50bd14 The output of
I just noticed a bug. If you create an Olric node with |
Olric can handle too many different DMaps on the same cluster. Does creating different DMaps for the asset ids solve the problem? Then, you can call |
I could likely use multiple dmaps. Is it ok to have 10k+ dmaps in the distributed system? |
I've managed to resolve this. The problem was that I hadn't been properly printing the error that was being generated by the |
It's too big. I didn't try this scenario before but background workers may fail and every DMap preallocates some memory. So you need too much memory to handle 10k+ DMaps in a Olric cluster.
I'm going to work on the possibility of implementing an iterator for DMaps. I'm not sure that it's a good idea. Is it ok to change the cache design to handle the prefix problem? |
Thanks for clarifying about the dmaps. Specifically I am caching image previews for assets. There could be different size transformations and options for the same asset, hence I may have multiple cached variations for one asset. And then I may be caching many thousands of assets. When I find out an asset has changed, all of its variations have to be evicted. I'm totally fine to try out changes to the cache api if you are putting in the time to try and add iterators. Thanks! |
The iterator implementation can only provide an eventually consistent view of a DMap. It's also an O(n) operation like Redis' |
Hey @justinfx, what is the average size of the values? and what is the average RAM of your nodes? As you know, DMaps are distributed among partitions. Chunks on the partitions should be relatively small (between 50MB-100MB) for a smooth operation in the case of node join or departure. Default partition count is 271 in Olric. I think that it is a good choice for clusters of up to 50 nodes and ~25–30 GB of data. 271 * 100MB = 27.1GB. If you need to store more than 27GB, please increase the partition count. Prime numbers are preferred for unique distribution of keys. Here is some Math about this: |
Eventually consistent key iteration is likely fine. I imagine that each of the nodes will equally be receiving external notifications so each of them will end up seeing at least their local keys which meaning everything should still get evicted. My service converts and caches smaller preview representations of source assets/images. So they are definitely less than 1mb per value and usually less than 200k. Because it has been a simple LRU I have only been running a single 8G in memory cache node. But I would like to scale that to 3 nodes and have the ability to roll out upgrades without losing the cache. That is, I want to upgrade one at a time and let the cache rebalance. Currently when I upgrade I just lose the cache and let it start cold again. |
Could you please share the key count in your cache? I assume that you have 100M keys in the cache and need to iterate over that DMap. I need to do some calculations to design a prototype. I consider adding a regex field to the iterator function to reduce network traffic. What do you think? |
I'm willing to build something like that: dm, err := db.NewDMap("mydmap")
if err != nil {
// handle error
}
q, err := dm.Query(olric.Q{"$onKey": {"$regexMatch": "/foo.?"}})
if err != nil {
// handle error
}
q.Range(func(key string, value []byte) bool {
err = dm.Delete(key)
if err != nil {
// handle error
}
}) Currently there is no secondary index to query keys($onKeys) or fields in values($onValues). In this preliminary version, it will do full DMap scan in a distributed manner. Later on we can implement a secondary index to query keys and values. |
It would currently be less than 50k keys with my low available total ram. I would also be fine to set a max keys as an upper limit once I am able to expand further. A regex system and the query options seem really cool. I'm sure it would save alot of network chatter by filtering on the remote side. One suggestion after looking at your prototype query api. It might be a good idea to provide key-only iterators. Because in my case it would be a waste to transmit all those values when I only want to delete the keys. |
Hey @justinfx I have just implemented the initial version of the distributed query subsystem. It works fine. I'm still iterating over the code. Here is a sample for you. Sample output:
A snippet: // Find even numbers and ignore the values
q3, err := dm.Query(
query.M{
"$onKey": query.M{
"$regexMatch": "even:",
"$options": query.M{
"$onValue": query.M{
"$ignore": true,
},
},
},
},
)
if err != nil {
log.Printf("Failed to run query on even numbers %v", err)
}
defer q3.Close()
fmt.Println("EVEN NUMBERS WITHOUT VALUES:")
fmt.Println("============================")
err = q3.Range(func(key string, value interface{}) bool {
fmt.Printf("KEY: %s, VALUE: %v\n", key, value)
return true
}) You can call standard DMap API functions in It's on feature/issue-18 and this is the first commit. Please share your ideas about the query language and its implementation. I hope it's good enough to handle your case. Thanks! |
I think this ticket has achieved the original request, which was the improvements to the evicition mechanisms. I don't want to take this ticket off topic with all of the other great features and fixes you have started. So I will just comment on them in their specific issues. Just to clarify, I have been testing the max size eviction, the Start function callback, and the query key range iterator. All working great so far. Thanks! |
Thank you very much for your feedback. It was very valuable. Please don't hesitate to create an issue on GitHub for bugs and feature requests. |
We need to refactor, test and document the existed eviction mechanisms.
Fix #7 and #8 firstly.
The text was updated successfully, but these errors were encountered: