Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eviction? #28

Open
piegamesde opened this issue Jan 1, 2021 · 5 comments
Open

Eviction? #28

piegamesde opened this issue Jan 1, 2021 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@piegamesde
Copy link

I'm looking at different caching libraries right now for my project and this one looks really cool! However, I cannot find any information on cache eviction. Do I need to implement it manually on top of cacache? Have others already done it? (Is this even possible?)

What I need is some simple (access) time eviction, but I think most use cases require bounded size and some LRU/LFU algorithm.

@piegamesde
Copy link
Author

Normally, I'd suggest having this as an option and then have all methods automatically enforce the eviction invariants. But because the API design suggests that a "cache" is no more than the path to a directory, I instead propose adding some evict method, that takes in a path and an evictor object, which has to be called manually when desired.

The naive implementation would then simply iterate over the content and then take out some of the keys based on the configured metric. But the more apparent problem is that Metadata does not store enough relevant data in order to implement most eviction algorithms.

@zkat
Copy link
Owner

zkat commented Feb 25, 2022

Original cacache actually has a mark-and-sweep garbage collector built in that's extensible enough to build this feature into, but no one ever did it, so I figured it wasn't that important? Eviction is a very application-specific feature, imo.

@zkat zkat added enhancement New feature or request help wanted Extra attention is needed labels Feb 25, 2022
@tarka
Copy link

tarka commented May 27, 2023

Not exactly eviction, but you can implement a TTL using metadata:

pub fn get_cached(cache: &str, key: &str, ttl: u128) -> Result<Option<String>> {
    let md = match cacache::metadata_sync(cache, key)? {
        Some(m) => m,
        None => return Ok(None)
    };

    let now = SystemTime::now()
        .duration_since(UNIX_EPOCH)?
        .as_millis();
    if now - md.time > ttl {
        info!("Cached valued expired: {} - {} = {} > {}", now, md.time, now - md.time, ttl);
        return Ok(None);
    }

    let bval = cacache::read_sync(cache, key)?;
    let uval = String::from_utf8(bval)?;
    debug!("Cache hit: {} -> {}", key, uval);
    Ok(Some(uval))
}

A slightly nicer variant of this would be if the desired TTL could be added to the metadata at put time, but there doesn't seem to be a method of setting arbitrary metadata from the user PoV.

@fiag
Copy link
Contributor

fiag commented Jun 21, 2023

Read all index entries, and filter by TTL, then remove index and content.

use std::time::{Duration, SystemTime, UNIX_EPOCH};

fn main() {
    let cache = "~/.my-cache";
    let ttl = Duration::from_secs(60);
    for md in cacache::list_sync(&cache).flatten() {
        let now = SystemTime::now()
            .duration_since(UNIX_EPOCH)
            .unwrap()
            .as_millis();
        if now - md.time > ttl {
            cacache::remove_hash_sync(&cache, &md.integrity).unwrap();
            cacache::remove_sync(&cache, &md.key).unwrap();
        }
    }
}

@matt-phylum
Copy link

Read all index entries, and filter by TTL, then remove index and content.

This almost works, but there are two problems related to remove_hash_sync:

  1. In a content addressable cache, multiple keys may have the same integrity value, so remove_hash_sync when removing an old key may also remove the data for newer keys which are not being removed.
  2. If a previous operation has been interrupted, there may be cached data for which there is no associated key. list_sync returns only keys and their associated information, so if operations returning unique information are being interrupted, eventually the cache will fill up with orphan data for which remove_hash_sync needs to be called but it is not accessible via the output of list_sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants