-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8406] [SQL] Adding UUID to output file name to avoid accidental overwriting #6864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
748dbd7
3806190
18b7003
8966bbb
1d7d206
4088226
99a5e7e
088c76c
85c478e
f5c1133
db7a46a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -53,9 +53,10 @@ class AppendingTextOutputFormat(outputFile: Path) extends TextOutputFormat[NullW | |
| numberFormat.setGroupingUsed(false) | ||
|
|
||
| override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { | ||
| val uniqueWriteJobId = context.getConfiguration.get("spark.sql.sources.writeJobUUID") | ||
| val split = context.getTaskAttemptID.getTaskID.getId | ||
| val name = FileOutputFormat.getOutputName(context) | ||
| new Path(outputFile, s"$name-${numberFormat.format(split)}-${UUID.randomUUID()}") | ||
| new Path(outputFile, s"$name-${numberFormat.format(split)}-$uniqueWriteJobId") | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -156,6 +157,7 @@ class CommitFailureTestRelation( | |
| context: TaskAttemptContext): OutputWriter = { | ||
| new SimpleTextOutputWriter(path, context) { | ||
| override def close(): Unit = { | ||
| super.close() | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking about S3, where a file is not actually created before the output stream is closed (the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I decided to leave it there. The writer should be closed anyway. Otherwise it's leaked. |
||
| sys.error("Intentional task commitment failure for testing purpose.") | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should still use
local[*]?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd better use a fixed number here to improve determinism (if we use 32 from the beginning, the ORC bug would be much easier to reproduce).