[SPARK-17556] [WIP] executor side broadcast #1

jl982 · 2020-09-18T20:49:49Z

There are 2 commits in this PR, the first commit backports the old prototype from Liang-Chi (apache#15178) to the master branch, while the second commit contains my modifications. Aside from logging and naming changes, what I mainly did was to 1) refactor TorrentBroadcast to a superclass and have TorrentDriverBroadcast and TorrentExecutorBroadcast as subclasses, and 2) change BroadcastHashJoinExec such that broadcasted value does not unintentionally get sent back to the driver.

Outstanding items I can think of right now:

Implement size estimation for executor side broadcast (ebc) (currently it returns Long.MaxValue)
Understand if broadcast size limitation for driver side broadcast (dbc) - less of 512 million rows or 8GB - needs to be enforced also for ebc
Handle what happens to ebc when executors are added or lost, especially if all executors are dead
Verify that canceling works for ebc
Run more performance tests comparing dbc and ebc (eg. ebc seems to be slower when there are more executors)

…master

…tionally fetch broadcasted value to driver, and add more tests

PavithraRamachandran · 2022-02-22T11:44:29Z

hi @jl982 we are currently trying to work on executor broadcast. Will it be possible to discuss with you in more detail? could you share your email id ?

jl982 · 2022-02-23T16:42:04Z

@PavithraRamachandran Great to hear that. I haven’t looked at this code in a while, but I’m happy to help out however I can in this thread

iRakson · 2022-02-25T11:04:06Z

@jl982 When we took this code and incorporated to our fork, we are able to get desired results with one executor. But in case of multiple executors, executor side broadcast is performing poorer than sort merge join itself.
You have also mentioned that with multiple executor there is slight degradation. Did you found any solution for that?

jl982 · 2022-02-27T05:01:18Z

@iRakson Right, I do remember seeing performance degradation with more executors. But unfortunately, I never had time to investigate the cause. I recommend that you look into the logs, and understand how much time was taken by the broadcast/receive/hashmap construction phase, versus performing the actual join. I suspect that in the multi executor case, the current code has some additional overhead in the broadcast phase that needs to be addressed.

jl982 added 2 commits September 17, 2020 16:39

[SPARK-17556] backport PR 15178 for executor side broadcast to spark …

06a6fed

…master

[SPARK-17556] refactor TorrentBroadcast as superclass, do not uninten…

ca2e60a

…tionally fetch broadcasted value to driver, and add more tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17556] [WIP] executor side broadcast #1

[SPARK-17556] [WIP] executor side broadcast #1

Uh oh!

jl982 commented Sep 18, 2020 •

edited

Loading

Uh oh!

PavithraRamachandran commented Feb 22, 2022

Uh oh!

jl982 commented Feb 23, 2022

Uh oh!

iRakson commented Feb 25, 2022

Uh oh!

jl982 commented Feb 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-17556] [WIP] executor side broadcast #1

Are you sure you want to change the base?

[SPARK-17556] [WIP] executor side broadcast #1

Uh oh!

Conversation

jl982 commented Sep 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PavithraRamachandran commented Feb 22, 2022

Uh oh!

jl982 commented Feb 23, 2022

Uh oh!

iRakson commented Feb 25, 2022

Uh oh!

jl982 commented Feb 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jl982 commented Sep 18, 2020 •

edited

Loading