Skip to content

Conversation

@Victsm
Copy link
Contributor

@Victsm Victsm commented Aug 1, 2017

What changes were proposed in this pull request?

When multiple clients attempt to resolve artifacts via the "--packages" parameter, they could run into race condition when they each attempt to modify the dummy "org.apache.spark-spark-submit-parent-default.xml" file created in the default ivy cache dir.
This patch changes the behavior to encode timestamp in the dummy module descriptor so each client will operate on a different resolution file in the ivy cache dir. In addition, this patch changes the behavior of when and which resolution files are cleaned to prevent accumulation of resolution files in the default ivy cache dir.

How was this patch tested?

Unit test added in SparkSubmitUtilsSuite

@jiangxb1987
Copy link
Contributor

Update the title:

[SPARK-10878][Core] Fix race condition when multiple clients resolves artifacts at the same time

// Include timestamp in module name, so multiple clients resolving maven coordinates at the
// same time do not modify the same resolution file concurrently.
ModuleRevisionId.newInstance("org.apache.spark",
"spark-submit-parent-" + System.currentTimeMillis().toString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "spark-submit-parent-" + System.currentTimeMillis().toString -> s"spark-submit-parent-${System.currentTimeMillis}"

}
}

test("test resolution files cleaned after resolving artifact") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case doesn't test the senario we describe in the PR, that is, having multiple clients trying to resolve artifacts at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually triggering the original race in a test could be pretty hard, so I'm not sure that we necessarily should block this fix on having an end-to-end integration test which could reproduce the original race. I think that the use of the UUID should be sufficient and therefore the only important thing to test is that we're still cleaning up the files properly (as is being done here).

Therefore I would welcome a re-submitted and cleaned-up version of this PR which addresses the other review comments (which I hope to merge soon if it's in good shape).

@jiangxb1987
Copy link
Contributor

cc @JoshRosen @cloud-fan

/** A nice function to use in tests as well. Values are dummy strings. */
def getModuleDescriptor: DefaultModuleDescriptor = DefaultModuleDescriptor.newDefaultInstance(
ModuleRevisionId.newInstance("org.apache.spark", "spark-submit-parent", "1.0"))
// Include timestamp in module name, so multiple clients resolving maven coordinates at the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: coordinates -> coordinate ?

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Aug 7, 2017

Test build #80324 has finished for PR 18801 at commit 1ace5cc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Aug 7, 2017

Test build #80351 has finished for PR 18801 at commit 1ace5cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

Your test case don't reflect the change you made to support multiple clients resolves artifacts at the same time, could you add a new test case or manually test that? Thanks!

@Victsm
Copy link
Contributor Author

Victsm commented Aug 8, 2017

@jiangxb1987 @kiszk
Thanks for the review.
Will update the PR shortly.

@kiszk
Copy link
Member

kiszk commented Sep 8, 2017

@Victsm kindly ping

@kiszk
Copy link
Member

kiszk commented Oct 4, 2017

gentle ping @Victsm

1 similar comment
@kiszk
Copy link
Member

kiszk commented Nov 6, 2017

gentle ping @Victsm

// Include timestamp in module name, so multiple clients resolving maven coordinates at the
// same time do not modify the same resolution file concurrently.
ModuleRevisionId.newInstance("org.apache.spark",
"spark-submit-parent-" + System.currentTimeMillis().toString,
Copy link
Contributor

@vanzin vanzin Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.currentTimeMillis() is not very unique. If you want uniqueness, use a UUID, or better yet, java.nio.file.Files.createTempFile.

EDIT: this is not a file, so go with UUID.

s"resolved-${mdId.getOrganisation}-${mdId.getName}-${mdId.getRevision}.properties")
)
currentResolutionFiles.foreach{ file =>
if (file.exists) file.delete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.exists() is not needed before .delete(). Also, add parentheses.

new File(ivySettings.getDefaultCache,
s"resolved-${mdId.getOrganisation}-${mdId.getName}-${mdId.getRevision}.properties")
)
currentResolutionFiles.foreach{ file =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before {

@squito
Copy link
Contributor

squito commented Dec 29, 2017

+1 on using UUID and having a better test case

@kiszk
Copy link
Member

kiszk commented Jan 3, 2018

+1 for UUID

@kiszk
Copy link
Member

kiszk commented Jan 3, 2018

gentle ping @Victsm

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@gatorsmile
Copy link
Member

Anyone can take over this PR and submit it again?

@kiszk
Copy link
Member

kiszk commented May 4, 2018

I can take over this PR.

@gatorsmile
Copy link
Member

@kiszk Thank you!

asfgit pushed a commit that referenced this pull request May 10, 2018
… artifacts at the same time

## What changes were proposed in this pull request?

When multiple clients attempt to resolve artifacts via the `--packages` parameter, they could run into race condition when they each attempt to modify the dummy `org.apache.spark-spark-submit-parent-default.xml` file created in the default ivy cache dir.
This PR changes the behavior to encode UUID in the dummy module descriptor so each client will operate on a different resolution file in the ivy cache dir. In addition, this patch changes the behavior of when and which resolution files are cleaned to prevent accumulation of resolution files in the default ivy cache dir.

Since this PR is a successor of #18801, close #18801. Many codes were ported from #18801. **Many efforts were put here. I think this PR should credit to Victsm .**

## How was this patch tested?

added UT into `SparkSubmitUtilsSuite`

Author: Kazuaki Ishizaki <[email protected]>

Closes #21251 from kiszk/SPARK-10878.

(cherry picked from commit d3c426a)
Signed-off-by: Marcelo Vanzin <[email protected]>
@vanzin
Copy link
Contributor

vanzin commented May 10, 2018

@Victsm you can close this PR now.

@asfgit asfgit closed this in d3c426a May 10, 2018
robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 24, 2018
… artifacts at the same time

## What changes were proposed in this pull request?

When multiple clients attempt to resolve artifacts via the `--packages` parameter, they could run into race condition when they each attempt to modify the dummy `org.apache.spark-spark-submit-parent-default.xml` file created in the default ivy cache dir.
This PR changes the behavior to encode UUID in the dummy module descriptor so each client will operate on a different resolution file in the ivy cache dir. In addition, this patch changes the behavior of when and which resolution files are cleaned to prevent accumulation of resolution files in the default ivy cache dir.

Since this PR is a successor of apache#18801, close apache#18801. Many codes were ported from apache#18801. **Many efforts were put here. I think this PR should credit to Victsm .**

## How was this patch tested?

added UT into `SparkSubmitUtilsSuite`

Author: Kazuaki Ishizaki <[email protected]>

Closes apache#21251 from kiszk/SPARK-10878.
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
… artifacts at the same time

When multiple clients attempt to resolve artifacts via the `--packages` parameter, they could run into race condition when they each attempt to modify the dummy `org.apache.spark-spark-submit-parent-default.xml` file created in the default ivy cache dir.
This PR changes the behavior to encode UUID in the dummy module descriptor so each client will operate on a different resolution file in the ivy cache dir. In addition, this patch changes the behavior of when and which resolution files are cleaned to prevent accumulation of resolution files in the default ivy cache dir.

Since this PR is a successor of apache#18801, close apache#18801. Many codes were ported from apache#18801. **Many efforts were put here. I think this PR should credit to Victsm .**

added UT into `SparkSubmitUtilsSuite`

Author: Kazuaki Ishizaki <[email protected]>

Closes apache#21251 from kiszk/SPARK-10878.

(cherry picked from commit d3c426a)

RB=1313943
G=superfriends-reviewers
R=fli,mshen,yezhou,edlu
A=fli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants