You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The slasher stores all the attestations provided to it individually, even if some of them contain duplicate information. The most common instance of this seems to be unaggregated attestations stored alongside their aggregate, which wastes space. I suspected this was an issue, but hadn't measured how bad it was in practice. The use of --subscribe-all-subnets on some of the SigP nodes revealed the extent of the problem: a 70GB database using --subscribe-all-subnets vs a 14GB database without.
Version
Lighthouse v1.0.4
Steps to resolve
I think one change that's straight-forward to implement would be the following:
Deduplicate the attestations in-memory, when they are hashed and stored in the attestation queue prior to being processed as part of a batch. A mapping from (validator_index, attestation_data_root) => indexed_attestation could be used, where on insert, we keep only the max indexed attestations (by # of attesters). Some Arc magic could gracefully handle the sharing and garbage collection.
This will be close to optimal so long as attestations and their aggregate arrive in the same batch. If that assumption turns out to be too strong, some more sophisticated (and likely costly) method to deduplicate them upon writing to disk could be used (perhaps in addition to the in-memory deduplication).
The text was updated successfully, but these errors were encountered:
## Issue Addressed
Closes#2112Closes#1861
## Proposed Changes
Collect attestations by validator index in the slasher, and use the magic of reference counting to automatically discard redundant attestations. This results in us storing only 1-2% of the attestations observed when subscribed to all subnets, which carries over to a 50-100x reduction in data stored 🎉
## Additional Info
There's some nuance to the configuration of the `slot-offset`. It has a profound effect on the effictiveness of de-duplication, see the docs added to the book for an explanation: https://github.com/michaelsproul/lighthouse/blob/5442e695e5256046b91d4b4f45b7d244b0d8ad12/book/src/slasher.md#slot-offset
Description
The slasher stores all the attestations provided to it individually, even if some of them contain duplicate information. The most common instance of this seems to be unaggregated attestations stored alongside their aggregate, which wastes space. I suspected this was an issue, but hadn't measured how bad it was in practice. The use of
--subscribe-all-subnets
on some of the SigP nodes revealed the extent of the problem: a 70GB database using--subscribe-all-subnets
vs a 14GB database without.Version
Lighthouse v1.0.4
Steps to resolve
I think one change that's straight-forward to implement would be the following:
Deduplicate the attestations in-memory, when they are hashed and stored in the attestation queue prior to being processed as part of a batch. A mapping from
(validator_index, attestation_data_root) => indexed_attestation
could be used, where on insert, we keep only the max indexed attestations (by # of attesters). SomeArc
magic could gracefully handle the sharing and garbage collection.This will be close to optimal so long as attestations and their aggregate arrive in the same batch. If that assumption turns out to be too strong, some more sophisticated (and likely costly) method to deduplicate them upon writing to disk could be used (perhaps in addition to the in-memory deduplication).
The text was updated successfully, but these errors were encountered: