Fix keyerror for eager api #281

krynju · 2021-09-11T19:26:48Z

So since the UID was a rand() very rarely weird things like keyerrors would happen, especially in case when you're rerunning tests or using @btime

Noticed by running these usecases and by looking at EAGER_ID_MAP and thunk_dict.

I'm 50% sure it fixes the occasional hangs as well, but I'll try to confirm it next week

fixes #267

codecov-commenter · 2021-09-11T21:48:45Z

Codecov Report

Merging #281 (b9a4d41) into master (6a1aa1f) will not change coverage.
The diff coverage is 0.00%.

@@           Coverage Diff           @@
##           master    #281    +/-   ##
=======================================
  Coverage    0.00%   0.00%            
=======================================
  Files          34      35     +1     
  Lines        2754    2913   +159     
=======================================
- Misses       2754    2913   +159

Impacted Files	Coverage Δ
src/thunk.jl	`0.00% <0.00%> (ø)`
src/Dagger.jl	`0.00% <0.00%> (ø)`
src/sch/Sch.jl	`0.00% <0.00%> (ø)`
src/lib/util.jl	`0.00% <0.00%> (ø)`
src/processor.jl	`0.00% <0.00%> (ø)`
src/sch/eager.jl	`0.00% <0.00%> (ø)`
src/lib/logging.jl	`0.00% <0.00%> (ø)`
src/lib/logging-extras.jl	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a1aa1f...b9a4d41. Read the comment docs.

krynju · 2021-09-12T14:34:30Z

I checked for hangs and overall it doesn't seem to hang now, but the scheduler really slows down at some point when ~100k thunks were processed, so it feels like a hang sometimes. That will need another issue for tracking

jpsamaroo · 2021-09-12T19:52:39Z

I don't understand how this can fix the hangs, since the RNG is thread-safe. Is there some insight that I'm missing?

krynju · 2021-09-12T21:22:42Z

I'm not so sure myself and I wouldn't consider the hangs issue 100% gone yet, but I definitely noticed that they happen less often now and usually I'm just getting super slowdowns instead.

Whenever i crl-c'd out of the hangs the stack trace would usually point at a wait on a future. My initial theory is that if sometimes the EAGER_ID_MAP was pointing at the wrong tid the wait would then be called on a future that wouldn't notify the waiting task for some reason?(maybe the task related to that future didn't start yet and there's a deadlock) Not sure though, so I'd have to investigate more

But as I said, my synthetic hang scenarios would usually lock up after 150 iterations or so, with this i can easily go above 1k and usually get a slowdown after 2k iterations instead of a hang

krynju · 2021-09-24T19:55:12Z

Hey, so do we merge this at some point or is there some reason not to?
I've been running it since I opened this PR and it's all good - no keyerrors
My groupby branch has this merged as well and it's been passing since forever.

jpsamaroo · 2021-09-24T21:28:34Z

Ok, I'll merge this now, although I still don't really understand why it fixes it.

jpsamaroo · 2021-09-24T21:28:51Z

Thanks again!

krynju · 2021-09-25T06:20:56Z

It fixes the key error, which still should have been rare because that's a rand over uint64, but it wasn't rare, which is weird to me.
The hangs, now that we've found the actual reason, don't seem to be affected by this at all, so my initial comment about it is invalid at this point i guess.

fix

b9a4d41

This was referenced Sep 12, 2021

[Distributed] Ref counter thread safety JuliaLang/julia#42167

Merged

Eager scheduler error when @spawn inside a thunk #267

Closed

jpsamaroo merged commit 343f74a into JuliaParallel:master Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix keyerror for eager api #281

Fix keyerror for eager api #281

Uh oh!

krynju commented Sep 11, 2021 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 11, 2021 •

edited

Loading

Uh oh!

krynju commented Sep 12, 2021

Uh oh!

jpsamaroo commented Sep 12, 2021

Uh oh!

krynju commented Sep 12, 2021 •

edited

Loading

Uh oh!

krynju commented Sep 24, 2021

Uh oh!

jpsamaroo commented Sep 24, 2021

Uh oh!

jpsamaroo commented Sep 24, 2021

Uh oh!

krynju commented Sep 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix keyerror for eager api #281

Fix keyerror for eager api #281

Uh oh!

Conversation

krynju commented Sep 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

krynju commented Sep 12, 2021

Uh oh!

jpsamaroo commented Sep 12, 2021

Uh oh!

krynju commented Sep 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krynju commented Sep 24, 2021

Uh oh!

jpsamaroo commented Sep 24, 2021

Uh oh!

jpsamaroo commented Sep 24, 2021

Uh oh!

krynju commented Sep 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

krynju commented Sep 11, 2021 •

edited

Loading

codecov-commenter commented Sep 11, 2021 •

edited

Loading

krynju commented Sep 12, 2021 •

edited

Loading