-
Notifications
You must be signed in to change notification settings - Fork 103
Make size run in O(1)
#170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Make size run in O(1)
#170
Conversation
5c6b2a2
to
7fad271
Compare
b11af0e
to
52cab04
Compare
Aside from my inline comments, I also have a general concern about this. Making |
f652e27
to
21312f1
Compare
067c891
to
de6b771
Compare
The UPDATE: It was just a bug with 'unsafeInsertWithInternal', fixed. |
7d190bd
to
a4f9bc6
Compare
@treeowl regarding benchmarks, if someone has an |
Regarding the already existing benchmarks, because the data used in them is generated with the same seed, the maps never change, so I cloned this repository elsewhere and ran the benchmarks a few times for both this branch and Current implementation is on the left, this PR's code on the right:
|
|
@treeowl I added some benchmarks for the situation you describe. I'll run them with care later (and add a few more cases, i.e. |
Well, imagine someone has a generalized trie with |
719a2e0
to
ae3c3b4
Compare
@treeowl that's a point I cannot argue against, unfortunately speeding up Opting out does sound like an alternative, but aside from creating two different hashmap types, I don't think it can be neatly done. I'll do some benchmarks before speaking more definitely about this. |
c104c5d
to
f344e35
Compare
I added some benchmarks to operations to series of The current code is on the left, this PR's code on the right.
Doesn't look too bad @treeowl |
826a32f
to
f8463ca
Compare
Plan for this PR:
I know the maintainer said here that storing intermediate structure sizes inside the |
@chshersh Hey! I might have time to come back to this in the next few months, but I can't guarantee it: if anyone else wants to take over, feel free to! |
What's the current status here? Can we already say whether the added overhead is prohibitive or not? I'll tentatively mark this as a draft for now. |
@sjakobi ok with this being moved to draft. Tentative status is that there is overhead in some places, and improvements in others, but overall the PR can be improved. |
What is this performance problem? I'm not sure I can help but I'd gladly take a look. And there are other folks here who might be able to help too. |
Unions are going to get rather seriously pricier with this scheme. The alternative approach is to slow down everything a little by bloating the structure with an extra size field per internal node. |
9a40921
to
8c7b2aa
Compare
I finally had some time to resume work on this.
In my fork of this repo a while back I started a branch doing just that, https://github.com/rockbmb/unordered-containers/tree/make-size-const-2.
Here are the benchmarks: https://gist.github.com/rockbmb/c3d7e1d1a252028802db7a8dac63655a Let's consider foldlWithKey' :: (a -> k -> v -> a) -> a -> HashMap k v -> a
foldlWithKey' f = go
where
go !z Empty = z
go z (Leaf _ (L k v)) = f z k v
go z (BitmapIndexed _ ary) = A.foldl' go z ary
go z (Full ary) = A.foldl' go z ary
go z (Collision _ ary) = A.foldl' (\ z' (L k v) -> f z' k v) z ary In this PR it changes to foldlWithKey' :: (a -> k -> v -> a) -> a -> HashMap k v -> a
foldlWithKey' f acc (HashMap _ m) = go acc m
where
go !z Empty = z
go z (Leaf _ (L k v)) = f z k v
go z (BitmapIndexed _ ary) = A.foldl' go z ary
go z (Full ary) = A.foldl' go z ary
go z (Collision _ ary) = A.foldl' (\ z' (L k v) -> f z' k v) z ary The only real change, aside from foldlWithKey' f = go to foldlWithKey' f acc (HashMap _ m) = go acc m and this alone causes an almost 3x slowdown in benchmarks, see
This is also the case for other relatively simple functions like Do you believe this situation can be improved, or is the above alternative of expanding the |
@rockbmb It's great to see some progress on this! :) The apparent slowdown of Also, to get a better overview of the perf differences, could you produce |
Reasonable assumption. It's been a few years since I've worked with these benchmarks, so I went and took a look at the relevant parts: data Env = Env {
...
hmi :: !(HM.HashMap Int Int),
...
} setupEnv = do
...
hmi = HM.fromList elemsI
...
return Env{..} and the benchmark , bench "foldl'" $ whnf (HM.foldl' (+) 0) hmi so I do not believe this is the problem. I've attached a It is not possible to share |
|
Thanks! So the most extreme slowdowns with this branch are
Before we decide to scrap this branch and focus on the other one, it would be very good to understand why these slowdowns are so extreme. I don't have a better idea for doing that than comparing the generated Core. For the other branch, I think the slowdowns in |
I'm really not optimistic about this approach from the |
True, the numbers for either branch don't look great at the moment, especially this one.
I don't mind doing this, but I don't know how proceed - I'm not quite sure what to be looking out for. |
Unfortunately, I don't have any references. Maybe just produce a diff of the Core for |
* because 'HashMap' is now a wrapper and may/may not get unboxed during a program's execution, benchmarks to operations on sets of hashmaps inside different kinds of containers were added;
8c7b2aa
to
a98d0b6
Compare
It's been 5 years since I started this, and 2 since I last worked on it. Over the last 2 years this PR bitrot so for now I have rebased it to master. I'll rebase that branch next. Afterwards I'll resume benchmarking, which will probably involve looking at the Core. EDIT: also, for a reason I cannot understand the job for GHC 8.10.7 fails, and it doesn't seem to be over tests or compilation. It hangs while fetching |
Good luck! |
To fix #138, this PR modifies the
HashMap
datatype and its associated functions so thatsize :: (Eq k, Hashable k) => HashMap k v -> Int
runs in constant time, and not linear.To summarize what was done (not in this order):
HashMap
type was renamedTree
and is no longer visible to the end-user;HashMap = HashMap !Int !(Tree k v)
was created, is now the user-facingHashMap
datatype. Here, the!Int
represents the hashmap's sizeInt
field inHashMap
after calling its internal variantsize
is now defined assize (HashMap sz _) = sz
, which runs inO(1)
The most important parts of the code are here, but I plan on updating the PR with some commits adding better comments throughout. The reason I submit this PR now is that it will take a while for it to be reviewed and merged, and the sooner that process can begin the better.
As an added bonus,
(==)/(/=)
will now shortcircuit whenever the two hashmaps being compared don't have the same size.UPDATE (15/03/2018): another plus is that
intersection{With,WithKey}, union{With,WithKey}
now compare their argument hashmaps' sizes, and iterate over the smaller one.cc @tibbe @treeowl