-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and Motivation
We already have ConcurrentDictionary which takes great advantages at concurrent environment comparing to Dictionary. However, HashSet doesn't have a Concurrent version.
Sometimes(maybe very often in fact), we hope to use ConcurrentHashSet easily which its elements are never duplicated, with an O(1) operation for contains or remove methods at concurrent environment. It's quite different from ConcurrentBag and provides another choice for suitable usage.
You can also see details at issue #16443 . Almost 5 years passes, nobody gives an API proposal, so I'm here to finish it.
Proposed API
public partial class ConcurrentHashSet<T> : System.Collections.Generic.ICollection<T>, System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyCollection<T>, System.Collections.ICollection,System.Collections.IEnumerable where T : notnull
{
public ConcurrentHashSet() { }
public ConcurrentHashSet(System.Collections.Generic.IEnumerable<T> collection) { }
public ConcurrentHashSet(System.Collections.Generic.IEnumerable<T> collection, System.Collections.Generic.IEqualityComparer<T>? comparer) { }
public ConcurrentHashSet(System.Collections.Generic.IEqualityComparer<T>? comparer) { }
public ConcurrentHashSet(int concurrencyLevel, System.Collections.Generic.IEnumerable<T> collection, System.Collections.Generic.IEqualityComparer<T>? comparer) { }
public ConcurrentHashSet(int concurrencyLevel, int capacity) { }
public ConcurrentHashSet(int concurrencyLevel, int capacity, System.Collections.Generic.IEqualityComparer<T>? comparer) { }
public int Count { get { throw null; } }
public bool IsEmpty { get { throw null; } }
bool System.Collections.Generic.ICollection<T>.IsReadOnly { get { throw null; } }
bool System.Collections.ICollection.IsSynchronized { get { throw null; } }
object System.Collections.ICollection.SyncRoot { get { throw null; } }
public void Clear() { }
public bool Contains(T item) { throw null; }
public System.Collections.Generic.IEnumerator<T> GetEnumerator() { throw null; }
public T GetOrAdd(T item) { throw null; }
void System.Collections.Generic.ICollection<T>.Add(T item) { }
bool System.Collections.Generic.ICollection<T>.Contains(T item) { throw null; }
void System.Collections.Generic.ICollection<T>.CopyTo(T[] array, int index) { }
bool System.Collections.Generic.ICollection<T>.Remove(T item) { throw null; }
void System.Collections.ICollection.CopyTo(System.Array array, int index) { }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { throw null; }
public T[] ToArray() { throw null; }
public bool TryAdd(T item) { throw null; }
public bool TryRemove(T item) { throw null; }
public System.Collections.Generic.HashSet<T> ToHashSet() { } //optional
}Usage Examples
The main goal of ConcurrentHashSet is to improve performance and reduce cost, taking replace of HashSet-With-Lock pattern for concurrent environment. If we have it, the code can be written like below,
public class Sample()
{
// private readonly object _syncRoot = new object();
private readonly ConcurrentHashSet<Guid> _concurrentGuidSet = new ConcurrentHashSet<Guid>();
// private readonly HashSet<Guid> _guidSet = new Hash<Guid>();
public async Task Generate()
{
for(int i = 0; i < 100; i++)
{
Task.Run( () =>
{
// The goal is to improve the performance, and ease the way of programming here.
_concurrentGuidSet.TryAdd(Guid.NewGuid());
//lock(_syncRoot)
//{
// _guidSet.Add(Guid.NewGuid());
//}
})
}
}
}The difference could be very similiar to the one between Dictionary and ConcurrentDictionary.
Alternative Designs
-
Many developers tell that they use
ConcurrentDictionary<T, byte>as a replace when meeting the requirement. They don't want theValueactually which is meaningless and totally a waste of memory. However, someones suggest that the first version of ConcurrentHashSet can be implemented by underlyingConcurrentDictionary<T, byte>, in order to add the APIs as soon as possible. Individually, I prefer to an independent implementation which is necessary to be discussed between excellent engineers of Microsoft & our community. -
Do we need it to inherit
ISet<T>/IReadOnlySet<T>? Though it's the concurrent version of HashSet, I'm not sure whether each API of HashSet needs a copy. In my opinion, not all APIs make sense while running in a concurrent environment. Of course, the discussion is required. -
Do we need this API
public HashSet<T> ToHashSet()to get a synchronous instance for easy usage?
Risks
It might not own concurrent version of the whole APIs of HashSet. Because it's a new collection, no break changes will occur. It just needs a pretty design :).