Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeat generation of aggregation keys #1479

Closed
Stebalien opened this issue Jul 8, 2021 · 14 comments · Fixed by #1481
Closed

Repeat generation of aggregation keys #1479

Stebalien opened this issue Jul 8, 2021 · 14 comments · Fixed by #1481
Assignees

Comments

@Stebalien
Copy link
Member

Description

Currently, if multiple threads try to verify aggregate proofs with new parameters at the same time, we'll try to compute the keys multiple times (last one wins).

Ideally, we'd compute the keys once.

Acceptance criteria

I can try to verify a thousand aggregate proofs at once without waiting a day or two ((100s * 1000)/3600 = ~28 hours).

Risks + pitfalls

Extra complexity in the caches could introduce bugs. We also probably don't want to rely on nightly, which means either rolling our own "once" or using an external crate.

Where to begin

Use something like https://doc.rust-lang.org/std/lazy/struct.SyncOnceCell.html (or https://crates.io/crates/conquer-once) or even just a mutex to define a per-key locked "cells" for every entry in our caches.

@Stebalien
Copy link
Member Author

pub fn cache_lookup<F, G>(
cache_ref: &Mutex<Cache<G>>,
identifier: String,
generator: F,
) -> Result<Arc<G>>
where
F: FnOnce() -> Result<G>,
G: Send + Sync,
{
info!("trying parameters memory cache for: {}", &identifier);
{
let cache = (*cache_ref).lock().expect("poisoned cache");
if let Some(entry) = cache.get(&identifier) {
info!("found params in memory cache for {}", &identifier);
return Ok(entry.clone());
}
}
info!("no params in memory cache for {}", &identifier);
let new_entry = Arc::new(generator()?);
let res = new_entry.clone();
{
let cache = &mut (*cache_ref).lock().expect("poisoned cache");
cache.insert(identifier, new_entry);
}
Ok(res)
}

@cryptonemo
Copy link
Collaborator

cc: @nikkolasg This refers to the specialize per pow2 proof aggregation count

@cryptonemo
Copy link
Collaborator

Hrm, @Stebalien not sure that's the case, let me look, but that access is locked AFAIK

@cryptonemo
Copy link
Collaborator

Ah, I think I see -- the cache insertion is guarded, but not the generation

@Stebalien
Copy link
Member Author

Yep. And we probably don't want to hold a global lock here (maybe not terrible?).

@cryptonemo
Copy link
Collaborator

They aren't global locks, they are specific to the type (see lines above that method)

@cryptonemo
Copy link
Collaborator

@nikkolasg I still haven't gotten an answer on if there's any way to re-use the calculations across specialize calls (since we always calculate out to the max we support). If so, we should chat about ways to do that before doing anything here since it could more generally speed things up.

@cryptonemo cryptonemo self-assigned this Jul 8, 2021
@cryptonemo
Copy link
Collaborator

Noting also that the reason we've never run into this as an issue is because we have real parameter files that are never generated (so the generate path was previously not exercised ever since mainnet params were available). While we don't generate fake SRS/IPP data either, we do take the time to do a compute intensive specialize call that displays this kind of breakage of cache usage. Unfortunately this was apparently overlooked as the specialize perf was not supposed to be a significant issue.

@Stebalien
Copy link
Member Author

They aren't global locks, they are specific to the type (see lines above that method)

Global as in global for the entire cache (versus a lock for the specific key).

@cryptonemo
Copy link
Collaborator

I ran a similar bench, but of course I warmed it up by running once before spawning threads. Is this an option in the short term if this is a significant issue for the first loading (i.e. verify once before parallelizing)? There are 2 things that should be discussed on our side 1) Is it possible to lock just once no matter how many proofs are being aggregated since the long running part of the computation is always the same, or 2) Do we just leave it as is and come up with some hacky method of locking once per number of proofs being aggregated.

@cryptonemo
Copy link
Collaborator

If the answer to 1 is yes (cc: @nikkolasg), it could potentially be solved completely in bellman (where that data is re-used across calls), if 2, it could potentially be solved either in proofs or in lotus. For 1, it could be too memory intensive or add another hacky cache file on disk, which will be problematic since like params it can be corrupted so it needs verification, etc. For 2, it effectively serializes first time parallel verifies for each pow2 number of proofs, but keeps computations potentially high on slower machines for some time (i.e. the original problem we had with specialize where it could take 100s several times over). I personally prefer 1, with a memory only cache, but I don't know how much that would affect things (i.e. how big that would be).

@Stebalien
Copy link
Member Author

Is it possible to lock just once no matter how many proofs are being aggregated since the long running part of the computation is always the same

This should be fine. Under normal circumstances, lotus will only verify one aggregate at a time.

However, we should compute these ahead-of-time, at compile time, or maybe even when initializing the repo (storing them on disk). We'd expect to compute each one at least once anyways. At the moment, a miner could get caught off guard having to compute multiple of these things on-demand and fail to produce a block in time.

Basically, we save nothing by computing them on-demand.

@dignifiedquire
Copy link
Contributor

dignifiedquire commented Jul 9, 2021

I wonder if we might want to expose a function to the node, allowing to trigger generation for specific sizes/all sizes. This would allow better control from that side.

For the cache itself I would recommend OnceCell which I usually use in cases like this, so that the cache becomes sth like

use lazy_static::lazy_static;
use once_cell::sync::OnceCell;
use std::collections::HashMap;
use std::sync::Arc;

#[derive(Debug, Default)]
pub struct Cache<V> {
    data: HashMap<String, OnceCell<Arc<V>>>,
}

impl<V> Cache<V> {
    pub fn with_defaults(keys: &[&str]) -> Self {
        let mut data = HashMap::new();
        for key in keys {
            data.insert(key.to_string(), OnceCell::new());
        }

        Self { data }
    }

    /// Returns `None` for non existent entries, `Some(v)` for existing ones, where `v` is either
    /// the result of running `f` or already existing one.
    pub fn get_or_init<F>(&self, key: &str, f: F) -> Option<Arc<V>>
    where
        F: FnOnce() -> V,
    {
        self.data
            .get(key)
            .map(|cell| cell.get_or_init(|| Arc::new(f())))
            .cloned()
    }
}

lazy_static! {
    // can only cache "a" and "b", but avoids having a global mutex on the cache.
    static ref MY_CACHE: Cache<u32> = Cache::with_defaults(&["a", "b"]);
}

pub fn cache_lookup<V, F>(cache_ref: &Cache<V>, identifier: &str, generator: F) -> Arc<V>
where
    F: FnOnce() -> V,
    V: Send + Sync,
{
    if let Some(v) = cache_ref.get_or_init(identifier, generator) {
        return v;
    }
    panic!("unknown identifier {}", identifier);
}

@Stebalien
Copy link
Member Author

For the cache itself I would recommend OnceCell which I usually use in cases like this, so that the cache becomes sth like

(ok, I swear I tried to find that exact library and failed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants