-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19537] Move pendingPartitions to ShuffleMapStage. #16876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The pendingPartitions instance variable should be moved to ShuffleMapStage, because it is only used by ShuffleMapStages. This change is purely refactoring and does not change functionality.
|
Makes good sense to me. |
|
Thanks for the quick look @markhamstra ! |
|
You're welcome -- but do be aware that I'm going to be extremely busy with non-Spark stuff for at least the next week, so for awhile my Spark code reviews are likely to be more cursory than they should be. |
|
It's great to have pendingPartitions in ShuffleMapStage. |
|
Test build #72662 has finished for PR 16876 at commit
|
| stage.pendingPartitions ++= tasks.map(_.partitionId) | ||
| logDebug("New pending partitions: " + stage.pendingPartitions) | ||
| logInfo(s"Submitting ${tasks.size} missing tasks (for partitions " + | ||
| s"${tasks.map(_.partitionId)}) from $stage (${stage.rdd})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think including all the partitions could make this log line a little too long for an info message (what if there are 1k or 10k tasks). I understand not wanting to stick it in a debug message (you often dont' realize you want debug logging on until after something has failed), but I think this will be too much for regular runs.
Would it still be useful if you truncate to the first 25 tasks? tasks.take(25).map(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point. 25 seemed like maybe too many for an info-level message so I split the difference with your original comment and did 15. I could also just leave the original logging if you think that's better.
squito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other than the logging issue, looks good. definitely a useful cleanup.
|
lgtm when tests pass |
|
Test build #72684 has finished for PR 16876 at commit
|
|
@kayousterhout
Is my understanding correct ? It'd great if you can help rectify my wrong points. |
|
@jinxing64 I wrote a long comment on your other PR to attempt to explain (my current understanding of) these issues! |
|
Thanks all for the review; merged this into master |
The pendingPartitions instance variable should be moved to ShuffleMapStage, because it is only used by ShuffleMapStages. This change is purely refactoring and does not change functionality. I fixed this in an attempt to clarify some of the discussion around apache#16620, which I was having trouble reasoning about. I stole the helpful comment Imran wrote for pendingPartitions and used it here. cc squito markhamstra jinxing64 Author: Kay Ousterhout <[email protected]> Closes apache#16876 from kayousterhout/SPARK-19537.
The pendingPartitions instance variable should be moved to ShuffleMapStage, because it is only used by ShuffleMapStages. This change is purely refactoring and does not change functionality. I fixed this in an attempt to clarify some of the discussion around apache#16620, which I was having trouble reasoning about. I stole the helpful comment Imran wrote for pendingPartitions and used it here. cc squito markhamstra jinxing64 Author: Kay Ousterhout <[email protected]> Closes apache#16876 from kayousterhout/SPARK-19537.
The pendingPartitions instance variable should be moved to ShuffleMapStage,
because it is only used by ShuffleMapStages. This change is purely refactoring
and does not change functionality.
I fixed this in an attempt to clarify some of the discussion around #16620, which I was having trouble reasoning about. I stole the helpful comment Imran wrote for pendingPartitions and used it here.
cc @squito @markhamstra @jinxing64