Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor, test and document eviction mechanisms #9

Closed
buraksezer opened this issue Jul 18, 2019 · 34 comments
Closed

Refactor, test and document eviction mechanisms #9

buraksezer opened this issue Jul 18, 2019 · 34 comments
Assignees
Labels
enhancement New feature or request quality assurance Quality Assurance
Milestone

Comments

@buraksezer
Copy link
Owner

We need to refactor, test and document the existed eviction mechanisms.

Fix #7 and #8 firstly.

@buraksezer buraksezer added enhancement New feature or request quality assurance Quality Assurance labels Jul 18, 2019
@buraksezer buraksezer self-assigned this Jul 18, 2019
@buraksezer buraksezer added this to the FirstBeta milestone Jul 18, 2019
@justinfx
Copy link
Contributor

justinfx commented Aug 8, 2019

On the topic of your eviction mechanisms, I wanted to ask about a particular eviction policy. I want to look into upgrading a particular LRU cache node in my application to be distributed instead of single node. But currently it relies on an LRU which supports value types that provide a Size() method. And the eviction policy can do MaxSize. Have you considered this policy as a feature?

@buraksezer
Copy link
Owner Author

buraksezer commented Aug 9, 2019

Hi @justinfx. I understand your case. Implementing that feature is not hard and it's definitely useful. Olric evicts keys when a DMap object exceeds the given size limit. Is that useful for you? Would you like to explain the details of your use case?

Design note: Every node stores some part of a DMap. So every node tracks the size of its own DMap part and evicts the keys when the given size exceeded. There is no central node to decide evict the keys.

@justinfx
Copy link
Contributor

justinfx commented Aug 9, 2019

Thanks for the reply! Well the item count of the cache is actually not what I need, since my concern is the total "cost" which in this case is RAM. So I am caching images and I only want to cache up to, say, 4GB of the most recently accessed items, and I want it to evict least recently used to bring the cost under the target.
This is what I am using:
https://godoc.org/github.com/vitessio/vitess/go/cache
You can see that the Value that goes into the cache defines a Size() and then the cache can have a max size.

@justinfx
Copy link
Contributor

justinfx commented Aug 9, 2019

Design note: Every node stores some part of a DMap. So every node tracks the size of its own DMap part and evicts the keys when the given size exceeded. There is no central node to decide evict the keys.

It would be completely acceptable to have the cost apply per node as opposed to the entire cache. I cant speak for other uses of a "cost" but specifically for RAM, it would make sense for a node.

@buraksezer
Copy link
Owner Author

Thanks for the reply! Well the item count of the cache is actually not what I need, since my concern is the total "cost" which in this case is RAM. So I am caching images and I only want to cache up to, say, 4GB of the most recently accessed items, and I want it to evict least recently used to bring the cost under the target.
This is what I am using:
https://godoc.org/github.com/vitessio/vitess/go/cache
You can see that the Value that goes into the cache defines a Size() and then the cache can have a max size.

I'll review that package and implement the feature in a convenient way.

@buraksezer
Copy link
Owner Author

buraksezer commented Dec 10, 2019

@justinfx

Hi, I have implemented the feature you requested. I'm still iterating over the eviction code and the documentation is not up-to-date. However it works fine most of time. If you want to give a try, I would like to assist you. You may help me improve the implementation and fix possible design flaws at an early stage.

Thanks in advance.

@justinfx
Copy link
Contributor

Thanks! I'm going to have some time over this holiday to have a play. Are there any links to the added api and test examples? I did a quick search and found this:
https://github.com/buraksezer/olric/blob/feature/issue-9/dmap_eviction_test.go#L175

How does one specify the size cost when putting an items?

The comment on that test link suggests that the configured max size is split across the number of partitions? With that implementation would I be able to accurately limit a node in the cluster to 4G or is it going to depend on how many partitions are distributed?

@buraksezer
Copy link
Owner Author

buraksezer commented Dec 23, 2019

Olric uses MaxInuse property to limit memory usage of a DMap instance. MaxKeys controls maximum key count on a DMaps.

The comment on that test link suggests that the configured max size is split across the number of partitions?

Yes, it is. DMaps are evenly distributed among partitions. So if you set MaxInuse as 4G for a DMap instance while your partition count is 271, used memory(inuse) by a chunk of the DMap should be around 15M. So MaxInuse property is global.

If you have a cluster which consists of 6 nodes, every node should have ~45 partitions. And every partition has its own DMap chunk which consumes ~15M in RAM. So every node consumes 675M of RAM.

You should know that Olric uses an append-only file with a hash table as storage engine. It allocates ~30M to store your ~15M DMap instance. So the actual memory cost is much more different. You can use olric-stats command to see statistics about DMaps on different partitions.

olric-stats -p 69
PartID: 69
  Owner: olric.node:3320
  Previous Owners: not found
  Backups: not found
  DMap count: 1
  DMaps:
    Name: olric-load-test
    Length: 1374
    Allocated: 1048576
    Inuse: 47946
    Garbage: 0

With that implementation would I be able to accurately limit a node in the cluster to 4G or is it going to depend on how many partitions are distributed?

With the current state of the implementation, you can limit Inuse property of partitions and nodes. As you can see in the output of olric-stats, Allocated space is much more different than Inuse. As you know, the memory is controller by Go runtime and there is other parts of the storage engine which consumes memory. So the actual memory is used by the process should be totally different than the sum of Allocated properties of partitions.

I prepared the configuration files to test the algorithm with 2 nodes. The configuration assumes that you have a DMap named olric-load-test on the cluster and its MaxInuse value is 10M. While your partition count is 271, every partition will host ~40K of data.

Debug mode is enabled in tracing level, so you can see the algorithm in action while olric-load is running.

Here are the config files: https://gist.github.com/buraksezer/83e1a43f972c1c5bab3aa8b774bcb16c

There are two config file in this gist. Run Olric nodes on your localhost like the following:

olricd -c olricd1.yaml

# on a different terminal

olricd -c olricd2.yaml

Then, you can run olric-load to test your setup:

olric-load -a localhost:3320,localhost:3321 -k 1000000 -c put -s msgpack

This command calls Put command on the cluster 1M times. olric-stats can be used to analyze internal status of Olric nodes:

olric-stats -a localhost:3320

This scenario assumes that your preferred deployment way is client-server mode. If you want to use embedded-member mode, please let me know. memberlist configuration can be tricky, but the source code of olricd can be used to seed your application.

If you have any further questions, please don't hesitate to ask.

Thanks for your contribution.

@justinfx
Copy link
Contributor

Thanks for the clarification.
My goal is to use it embedded in my distributed caching service and be able to scale up from 1-2 to more depending on load. I'm concerned with the current implementation of the max cache size being a factor of the number of partitions across the available nodes, that it would be difficult to effectively utilise the available Max ram on the node since it would always have to be configured based on the total node count. If I have 2x 4G nodes and I set a max inuse that can make sure to use 3G of each node, then I scale to 3 nodes, do I have to redo the math and restart the embedded caches?

@buraksezer
Copy link
Owner Author

Good catch! I reviewed the algorithm. Here is my proposal:

User's perspective: I may have any number of Olric nodes and I want to ensure that a DMap on the cluster should not consume more than 3G in-use memory for every host. So my distributed map instance consumes n*3G at maximum on the cluster. The cluster may scale up or down. That's not matter.

How to implement that feature: Keys on a DMap are evenly distributed among partitions and partitions manage their own resources. So before deciding to evict a key, the algorithm needs to know the currently hosted partition count on the node. If you have 2 nodes with 271 partitions, the nodes have 135 partitions and every partition responsible for its own DMap chunk. So when we want to evict a key, we need to calculate MaxInuse/currentPartitionCount to determine maximum memory limit for a chunk. In the case of MaxInuse=3G, the chunks on partitions cannot exceeds ~23M.

It's easy to implement. Could you please review the proposal for your case?

Thank you!

@justinfx
Copy link
Contributor

justinfx commented Dec 25, 2019

I wanted to first say thank you for your level of responsiveness and willingness to consider my feedback on these user-facing design decisions, even when I am only a prospective user at this point. I really appreciate your open source contribution and your positive and welcoming attitude on your project.

From the user perspective, the proposal does sound like it matches my expectation. I will be a bit redundant in reiterating some of the proposal points, just to reenforce that we are on the same page. I think this proposal is good because from the users point of view, they likely care much less about the combined cluster size and the individual partition count at any point in time, and really the concern is how much of the current node resources can be applied to the caching. Setting a MaxInuse=3G for one specific node instance configuration is the ideal way to go, since it would allow me to scale the cluster up and down while preserving the desired resource target limits.
At this stage I am still running a particular service on a pair of VMs, each with a fixed amount of ram, so I set my configuration for the single caching front-end to a specific fraction of the available ram. Even when I finally get this service switched to a kubernetes deployment with dynamic scaling, it will be beneficial to be able to associate the MaxInuse value with a single pod instance, so that the container scheduler can consider that metric as part of the scaling factor.

Thanks! Once this feature is available, I will have a go at embedding it into my service. For more context about my service, there is a single api frontend which contains the embedded LRU cache, and then scalable distributed workers which do the cache-filling process as needed. My goal had been to keep the cache embedded, which is why it is still a single frontend. But once I can use olric, it means I can embed it into the api frontend and run more than one behind a load balancer and gain the ability to do rolling restarts/upgrades without losing the cache, since ideally it would replicate again.

@buraksezer
Copy link
Owner Author

You're welcome. I have been building Olric for almost 2 years and It's a pleasure to see that someone intents to use it in production.

it means I can embed it into the api frontend and run more than one behind a load balancer and gain the ability to do rolling restarts/upgrades without losing the cache, since ideally it would replicate again.

At this point, I should share more information about the replication algorithm. Olric implements a quorum based approach to replicate key/value pairs across the cluster. Let's take look at the following sample.

You have ReplicationFactor=2, WriteQuorum=1 and ReadQuorum=1. With this configuration, every partition has 2 replicas on different hosts. When you want to write a key, Olric finds the replication owners. One of them the partition owner which manages the whole process. It tries to write the given key/value pair on its own storage and the backup host. Only one successful write operation is enough to assume the whole operation is successful. In write operation, a timestamp is stored along with the key/value pair. This timestamp is used to reconcile different versions of keys and it's provided by the clients.

When you want to read a key, Olric finds the replication owners and tries to read the given key on those hosts. Then sorts the responses and selects the recent one. This is called Last Write Wins policy. Only one successful read operation is enough to assume the whole operation is successful.

In the case of node departure, Olric doesn't try to create new replicas for missing partitions. If a replica node is still available, your keys are still available. If you set ReadRepair=true, Olric creates new copies to satisfy ReplicationFactor=2 when you call read command on the keys.

Read repair logic runs for every read request, if it's enabled. So possible inconsistencies in partitions are repaired by read repair mechanism. It currently works synchronously. We may improve it in a future milestone.

For an embedded LRU cache deployment, it's good to set ReplicationFactor=1, WriteQuorum=1 and ReadQuorum=1. It means that when you restart a node, some part of the data will miss. But this setup will be very fast.

I recommend set to ReplicationFactor=2, WriteQuorum=1 and ReadQuorum=1, if you want redundancy.

Here is some benchmark data I remember that I had set ReplicationFactor=2, WriteQuorum=1 and ReadQuorum=1 for this test. It tries to set 1M keys by using msgpack as serialization format.

I may miss some of the details, please don't hesitate to ask about design and implementation details.

Keep in touch!

@buraksezer
Copy link
Owner Author

I have implemented the new MaxInuse algorithm. You can test it by pullingfeature/issue-9 branch. As I explained previously, olric-stats or raw Stats command can be used to monitor in-use memory by partitions and DMap instances.

I'm also planning to implement MaxAllocated logic to prevent getting run out of memory panics. Please see this golang/go#14162 and this golang/go#16843

@buraksezer
Copy link
Owner Author

Hey @justinfx. I wonder whether you used Olric in your project. Thanks!

@justinfx
Copy link
Contributor

Hi. Sorry for not providing any feedback. It has been a busy start of the year after I returned from holidays. I would really like to be able to try integrating this work within the next 2 weeks. Thanks!

@justinfx
Copy link
Contributor

I've started trying to integrate olric to replace my local LRU cache. I still don't have it working yet, but here is my initial feedback...

If I set CacheConfig.MaxInuse value it seems that I also have to set all the required default fields like Read/Write Quorum as well, or olric will fail saying:

failed to construct LRU cache: 3 errors occurred:
        * cannot specify ReadQuorum less than or equal to zero
        * cannot specify WriteQuorum less than or equal to zero
        * cannot specify MemberCountQuorum smaller than MinimumMemberCountQuorum

There is no config.NewDefaultConfig(), where I can adjust the settings I want without having to know which other ones also need to be set.

Olric.Start() is a blocking call, so I need to start it in a goroutine. But then I don't really have a nice way to know when it is started. If I try to create my db.NewDMap too quickly I get "cannot be reached cluster quorum to operate". So I have to sleep for a second first. I suppose I could loop on calling Ping() until it succeeds, but that is also not so great. Might be nice if there were an optional Started func() callback field. I could optionally set a callback function to know when it is ready. I think nats.io server offered something similar for when you are embedding their gnatds, starting it, and need to block until its ready to proceed.

As mentioned in the README, the memberlist config is a bit complicated and I don't fully understand if I need to establish 2 ports per node, or if any of it can be emphemeral (:0). I have to do something like this for my single node test:

	m1, _ := olricconfig.NewMemberlistConfig("local")
	m1.BindAddr = "127.0.0.1"
	m1.BindPort = 5555

	ocfg := &olricconfig.Config{
		Cache: &olricconfig.CacheConfig{
			TTLDuration: 3 * 24 * time.Hour,
			MaxInuse:    conf.Int("maxCacheMB") * 1024 * 1024,
		},

		MaxJoinAttempts: 10,
		ReplicationMode: olricconfig.AsyncReplicationMode,

		// TODO: olric doesn't allow for creating a default config.
		// Have to fill in required field if setting some fields.
		Name:              "127.0.0.1:5556",
		ReplicaCount:      1,
		WriteQuorum:       1,
		ReadQuorum:        1,
		MemberCountQuorum: 1,
		MemberlistConfig:  m1,
	}

And where I am currently stuck is that I don't seem to be getting olric to save my keys. When I do a Get for the first time, the cache miss causes this log output:

2020/02/20 14:37:53 [ERROR] Failed to get key from local storage: key not found => dmap_get.go:61

I then have some logging to ensure the Put is being called, but it never seems to increment the length counter in the partition stats, and future Get continues emitting the key is not found. I feel like I might have gotten something wrong in the configuration.

Last few notes are things I currently can't do with olric that I was able to do in my local LRU.
For my cache invalidation logic, I was able to take a key prefix and iterate all the keys in the cache, to evict the ones that match the prefix. Olric doesn't provide a way to get the list of keys. Does this have any value to you as a feature?
And for stats collection I used to be able to get some time stats on the oldest cached item (and I would track the newest manually) so I could report the cache ages. Not sure if this is valuable to you.

Thanks!

@buraksezer
Copy link
Owner Author

Hi @justinfx I have just created a gist for you to test Olric with two members in a single process. I hope that it will help to fix the problem.

Here is the code: https://gist.github.com/buraksezer/66f6e02bce8cebd9cfba8b5aaf2976a1

You should note that Olric currently needs to know at least one member of the cluster to join. So I added the following line to your config struct for the second node:

Peers: []string{"127.0.0.1:5555"},

You should see the following lines on the console:

tardis:future buraksezer$ go run olric.go
Awaiting for background goroutines
2020/02/20 08:17:12 [INFO] Join completed. Synced with 0 initial nodes: 1 => olric.go:345
2020/02/20 08:17:12 [DEBUG] memberlist: Stream connection from=127.0.0.1:55539
2020/02/20 08:17:12 [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:5555
2020/02/20 08:17:12 [INFO] Join completed. Synced with 1 initial nodes: 2 => olric.go:345
2020/02/20 08:17:12 [INFO] Node joined: 127.0.0.1:6667 => routing.go:368
2020/02/20 08:17:12 [INFO] AdvertiseAddr: , AdvertisePort: 7946 => olric.go:409
2020/02/20 08:17:12 [INFO] Cluster coordinator: 127.0.0.1:5556 => olric.go:412
2020/02/20 08:17:12 [INFO] Node joined: 127.0.0.1:5556 => routing.go:368
2020/02/20 08:17:12 [INFO] Routing table has been pushed by 127.0.0.1:5556 => routing.go:489
2020/02/20 08:17:12 [INFO] Routing table has been pushed by 127.0.0.1:5556 => routing.go:489
2020/02/20 08:17:12 [INFO] The cluster coordinator has been bootstrapped => olric.go:322
2020/02/20 08:17:12 [INFO] AdvertiseAddr: , AdvertisePort: 7946 => olric.go:409
2020/02/20 08:17:12 [INFO] Cluster coordinator: 127.0.0.1:5556 => olric.go:412
2020/02/20 08:17:12 [INFO] Routing table has been pushed by 127.0.0.1:5556 => routing.go:489
2020/02/20 08:17:12 [INFO] Routing table has been pushed by 127.0.0.1:5556 => routing.go:489
{num: 0 0} main.customType
{num: 1 1} main.customType
{num: 2 2} main.customType
{num: 3 3} main.customType
{num: 4 4} main.customType
{num: 5 5} main.customType

An automatic discovery mode may be implemented in 0.3.x milestone by using service discovery components of Kubernetes, AWS, or Consul.

NewDefaultConfig is a great idea. I'm going to implement it along with a config validation feature.

I'm going to prepare a detailed answer for rest of the comment.

Thank you very much for the feedback!

@buraksezer
Copy link
Owner Author

Might be nice if there were an optional Started func() callback field.

Good idea. I'll investigate the ways of implementing this in a clear way.

Last few notes are things I currently can't do with olric that I was able to do in my local LRU.
For my cache invalidation logic, I was able to take a key prefix and iterate all the keys in the cache, to evict the ones that match the prefix. Olric doesn't provide a way to get the list of keys. Does this have any value to you as a feature?

Prefix scan is impossible to implement. Because Olric uses the builtin map to index hashed keys on to log files (they are actually byte slices in the process heap). A prefix scan requires O(n) time to run. In future, we may implement a pluggable storage engine for that kind of future.

Would you like to explain your cache invalidation logic? Olric doesn't need a prefix scan to evict keys.

And for stats collection I used to be able to get some time stats on the oldest cached item (and I would track the newest manually) so I could report the cache ages. Not sure if this is valuable to you.

Could you please give more details for this? Cache stats are absolutely valuable for me. I can implement that feature before the release, if it's urgent to you.

I then have some logging to ensure the Put is being called, but it never seems to increment the length counter in the partition stats, and future Get continues emitting the key is not found. I feel like I might have gotten something wrong in the configuration.

Stats command is not distributed. The nodes return their own stats. If the Put call hits the node1 and you call Stats command on node2, you cannot see increment the length counter in the partition stats.

A distributed version is possible. We also need to add a new command to return the current list of members in the cluster.

@justinfx
Copy link
Contributor

Thanks for the gist example. It confirms that I need to be explicit with the ports.

Would you like to explain your cache invalidation logic? Olric doesn't need a prefix scan to evict keys.

I don't need olric to perform the prefix scan. I format my cache keys as combination of properties. When a certain asset has changed in my external system I want to evict all variations of the cached data for that asset. The asset id in this case is the prefix of the key. So I was able to just iterate the keys in the cache, string match, and optionally evict.

An example would be if I were caching "foo::green" and "foo::red". Then I find I need to invalidate "foo", so there are 2 keys that need to be evicted.

Cache stats are absolutely valuable for me. I can implement that feature before the release, if it's urgent to you.

They actually weren't critical stats. Just reporting something I used to be able to report on. They were just timestamps tracking the age of cached items. So I could see how old the oldest item in the cache was.

Stats command is not distributed.

Good to know! It wasn't obvious because of how the stats report members and partitions and dmaps. But now I understand that they only reflect the data on the given reporting node. So I would have to iterate the member list with a client connection to see how much of the total distributed cache is being used.

However this doesn't help with my blocking problem. I'm not even testing multi nodes yet. So I haven't gotten to configuring two nodes. I simply can't even get it to store the key on the single node so I figured I must have configured something wrong. The stats on the same node don't report a length.

@buraksezer
Copy link
Owner Author

buraksezer commented Feb 20, 2020

However this doesn't help with my blocking problem. I'm not even testing multi nodes yet. So I haven't gotten to configuring two nodes. I simply can't even get it to store the key on the single node so I figured I must have configured something wrong. The stats on the same node don't report a length.

As far as I understand, you are trying to run a single node but it fails constantly. Could you please share the logs? I tried to run a single node with the config you provided, it runs well:

https://gist.github.com/buraksezer/3da18131736336685399e6391c50bd14

The output of olric-stats -a 127.0.0.1:5556:

Total length of partitions: 10
Total partition count: 271
Total Allocated: 10485760
Total Inuse: 1018
Total Garbage: 0

I just noticed a bug. If you create an Olric node with nil config, it tries to bind an ephemeral port and propagate 127.0.0.1:0 as it's address. Then the TCP client fails to dial it. I'll fix it as soon as possible.

@buraksezer
Copy link
Owner Author

I don't need olric to perform the prefix scan. I format my cache keys as combination of properties. When a certain asset has changed in my external system I want to evict all variations of the cached data for that asset. The asset id in this case is the prefix of the key. So I was able to just iterate the keys in the cache, string match, and optionally evict.

Olric can handle too many different DMaps on the same cluster. Does creating different DMaps for the asset ids solve the problem? Then, you can call Destroy on the DMap to vanish its content.

https://github.com/buraksezer/olric#destroy

@justinfx
Copy link
Contributor

I could likely use multiple dmaps. Is it ok to have 10k+ dmaps in the distributed system?

@justinfx
Copy link
Contributor

I simply can't even get it to store the key on the single node so I figured I must have configured something wrong

I've managed to resolve this. The problem was that I hadn't been properly printing the error that was being generated by the Put. The error was actually that the gob encoding failed because my cache value did not have any exported fields. This was because my previous lru cache was local in memory and would simply store whatever ìnterface{}` you gave it. I needed to go and refactor my cache types to use exported fields.

@buraksezer
Copy link
Owner Author

I could likely use multiple dmaps. Is it ok to have 10k+ dmaps in the distributed system?

It's too big. I didn't try this scenario before but background workers may fail and every DMap preallocates some memory. So you need too much memory to handle 10k+ DMaps in a Olric cluster.

For my cache invalidation logic, I was able to take a key prefix and iterate all the keys in the cache, to evict the ones that match the prefix. Olric doesn't provide a way to get the list of keys. Does this have any value to you as a feature?

I'm going to work on the possibility of implementing an iterator for DMaps. I'm not sure that it's a good idea.

Is it ok to change the cache design to handle the prefix problem?

@justinfx
Copy link
Contributor

Thanks for clarifying about the dmaps. Specifically I am caching image previews for assets. There could be different size transformations and options for the same asset, hence I may have multiple cached variations for one asset. And then I may be caching many thousands of assets. When I find out an asset has changed, all of its variations have to be evicted.

I'm totally fine to try out changes to the cache api if you are putting in the time to try and add iterators. Thanks!

@buraksezer
Copy link
Owner Author

I'm totally fine to try out changes to the cache api if you are putting in the time to try and add iterators. Thanks!

The iterator implementation can only provide an eventually consistent view of a DMap. It's also an O(n) operation like Redis' KEYS * Is that OK for you?

@buraksezer
Copy link
Owner Author

Hey @justinfx, what is the average size of the values? and what is the average RAM of your nodes?

As you know, DMaps are distributed among partitions. Chunks on the partitions should be relatively small (between 50MB-100MB) for a smooth operation in the case of node join or departure. Default partition count is 271 in Olric. I think that it is a good choice for clusters of up to 50 nodes and ~25–30 GB of data. 271 * 100MB = 27.1GB. If you need to store more than 27GB, please increase the partition count.

Prime numbers are preferred for unique distribution of keys. Here is some Math about this:

@justinfx
Copy link
Contributor

Eventually consistent key iteration is likely fine. I imagine that each of the nodes will equally be receiving external notifications so each of them will end up seeing at least their local keys which meaning everything should still get evicted.

My service converts and caches smaller preview representations of source assets/images. So they are definitely less than 1mb per value and usually less than 200k. Because it has been a simple LRU I have only been running a single 8G in memory cache node. But I would like to scale that to 3 nodes and have the ability to roll out upgrades without losing the cache. That is, I want to upgrade one at a time and let the cache rebalance. Currently when I upgrade I just lose the cache and let it start cold again.

@buraksezer
Copy link
Owner Author

Could you please share the key count in your cache? I assume that you have 100M keys in the cache and need to iterate over that DMap. I need to do some calculations to design a prototype.

I consider adding a regex field to the iterator function to reduce network traffic. What do you think?

@buraksezer
Copy link
Owner Author

I'm willing to build something like that:

dm, err := db.NewDMap("mydmap")
if err != nil {
	// handle error
}

q, err := dm.Query(olric.Q{"$onKey": {"$regexMatch": "/foo.?"}})
if err != nil {
	// handle error
}

q.Range(func(key string, value []byte) bool {
	err = dm.Delete(key)
	if err != nil {
		// handle error
	}
})

Currently there is no secondary index to query keys($onKeys) or fields in values($onValues). In this preliminary version, it will do full DMap scan in a distributed manner. Later on we can implement a secondary index to query keys and values.

@justinfx
Copy link
Contributor

Could you please share the key count in your cache?

It would currently be less than 50k keys with my low available total ram. I would also be fine to set a max keys as an upper limit once I am able to expand further.

A regex system and the query options seem really cool. I'm sure it would save alot of network chatter by filtering on the remote side.

One suggestion after looking at your prototype query api. It might be a good idea to provide key-only iterators. Because in my case it would be a waste to transmit all those values when I only want to delete the keys.

@buraksezer
Copy link
Owner Author

Hey @justinfx I have just implemented the initial version of the distributed query subsystem. It works fine. I'm still iterating over the code. Here is a sample for you.

Sample output:

EVEN NUMBERS:
=============
KEY: even:000000006, VALUE: 6
KEY: even:000000002, VALUE: 2
KEY: even:000000000, VALUE: 0
KEY: even:000000008, VALUE: 8
KEY: even:000000004, VALUE: 4

ODD NUMBERS:
=============
KEY: odd:000000005, VALUE: 5
KEY: odd:000000009, VALUE: 9
KEY: odd:000000003, VALUE: 3
KEY: odd:000000001, VALUE: 1
KEY: odd:000000007, VALUE: 7

EVEN NUMBERS WITHOUT VALUES:
============================
KEY: even:000000006, VALUE: <nil>
KEY: even:000000002, VALUE: <nil>
KEY: even:000000000, VALUE: <nil>
KEY: even:000000008, VALUE: <nil>
KEY: even:000000004, VALUE: <nil>

ITERATOR ON DMAP:
=================
KEY: odd:000000005, VALUE: 5
KEY: odd:000000009, VALUE: 9
KEY: odd:000000003, VALUE: 3
KEY: even:000000006, VALUE: 6
KEY: even:000000002, VALUE: 2
KEY: even:000000000, VALUE: 0
KEY: even:000000008, VALUE: 8
KEY: odd:000000001, VALUE: 1
KEY: even:000000004, VALUE: 4
KEY: odd:000000007, VALUE: 7

A snippet:

	// Find even numbers and ignore the values
	q3, err := dm.Query(
		query.M{
			"$onKey": query.M{
				"$regexMatch": "even:",
				"$options": query.M{
					"$onValue": query.M{
						"$ignore": true,
					},
				},
			},
		},
	)
	if err != nil {
		log.Printf("Failed to run query on even numbers %v", err)
	}
	defer q3.Close()

	fmt.Println("EVEN NUMBERS WITHOUT VALUES:")
	fmt.Println("============================")
	err = q3.Range(func(key string, value interface{}) bool {
		fmt.Printf("KEY: %s, VALUE: %v\n", key, value)
		return true
	})

You can call standard DMap API functions in Range.

It's on feature/issue-18 and this is the first commit. Please share your ideas about the query language and its implementation. I hope it's good enough to handle your case.

Thanks!

@justinfx
Copy link
Contributor

I think this ticket has achieved the original request, which was the improvements to the evicition mechanisms. I don't want to take this ticket off topic with all of the other great features and fixes you have started. So I will just comment on them in their specific issues.

Just to clarify, I have been testing the max size eviction, the Start function callback, and the query key range iterator. All working great so far.

Thanks!

@buraksezer
Copy link
Owner Author

Just to clarify, I have been testing the max size eviction, the Start function callback, and the query key range iterator. All working great so far.

Thank you very much for your feedback. It was very valuable. Please don't hesitate to create an issue on GitHub for bugs and feature requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request quality assurance Quality Assurance
Projects
None yet
Development

No branches or pull requests

2 participants