Skip to content

Bucket-based API for HashTable #613

@orlp

Description

@orlp

I want to propose the following API for HashTable. On OccupiedEntry I would like to add

/// Return in which bucket this entry is located.
///
/// Note: this index is not stable if the hash table is rehashed.
pub fn get_bucket(&self) -> usize;

On HashTable itself I would like to add the following four functions:

/// Returns the bucket index an entry in the hash table is located at, or `None` otherwise.
///
/// Note: this index is not stable if the hash table is rehashed.
pub fn find_bucket(&self, hash: u64, eq: impl FnMut(&T) -> bool) -> Option<usize>;

/// Return a reference to the value located at `bucket_idx`, or `None` otherwise.
pub fn at_bucket(&self, bucket_idx: usize) -> Option<&T>;

/// Return an `OccupiedEntry` for the entry located at `bucket_idx`, or `None` otherwise.
pub fn at_bucket_mut(&mut self, bucket_idx: usize) -> Option<OccupiedEntry<T>>;

/// Returns the total number of buckets in this `HashTable`.
pub fn num_buckets(&self) -> usize;

There are multiple reasons why I'd like this.

  1. It allows incredibly flexible iteration over the hash table. I am aware that this iteration would be suboptimal compared to a SIMD-based iteration which checks the tags of 16 elements in one go, but the flexibility it provides is well worth it. For example: it makes parallel iteration over the hash table trivial, in whatever permutation you'd like.

  2. It allows intrusive data structures to interweave with the hash table, so long as you are careful to avoid rehashing. An example I ran into recently was the writing of a fixed-capacity LRU cache. When the LRU cache wants to evict the least recently used key it has to do a lookup for that key. It could instead simply store the bucket index and use that to directly remove the key, as the hash table never rehashes in this example. (This example did not in fact avoid rehashing, churn leads to rehashing, even with the same capacity.)

  3. It allows additional algorithms to run safely and soundly (e.g. no pointers into the hash table which requires unsafe code and which may or may not get invalidated with the latest flavor of stacked/tree borrows) while referring into the hash table. For example: sorting the keys of a hash table while not cloning the keys or taking them out of the table.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions