- 
                Notifications
    
You must be signed in to change notification settings  - Fork 931
 
Description
Description
The slasher stores all the attestations provided to it individually, even if some of them contain duplicate information. The most common instance of this seems to be unaggregated attestations stored alongside their aggregate, which wastes space. I suspected this was an issue, but hadn't measured how bad it was in practice. The use of --subscribe-all-subnets on some of the SigP nodes revealed the extent of the problem: a 70GB database using --subscribe-all-subnets vs a 14GB database without.
Version
Lighthouse v1.0.4
Steps to resolve
I think one change that's straight-forward to implement would be the following:
Deduplicate the attestations in-memory, when they are hashed and stored in the attestation queue prior to being processed as part of a batch. A mapping from (validator_index, attestation_data_root) => indexed_attestation could be used, where on insert, we keep only the max indexed attestations (by # of attesters). Some Arc magic could gracefully handle the sharing and garbage collection.
This will be close to optimal so long as attestations and their aggregate arrive in the same batch. If that assumption turns out to be too strong, some more sophisticated (and likely costly) method to deduplicate them upon writing to disk could be used (perhaps in addition to the in-memory deduplication).