Skip to content

Conversation

@nchammas
Copy link
Contributor

This issue was uncovered after this discussion.

Don't change the working directory on the user. This breaks relative paths the user may pass in, e.g., for the SSH identity file.

./ec2/spark-ec2 -i ../my.pem

This patch will preserve the user's current working directory and allow calls like the one above to work.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22420 has started for PR 2988 at commit ce071fc.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22420 has finished for PR 2988 at commit ce071fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DeferredObjectAdapter(oi: ObjectInspector) extends DeferredObject

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22420/
Test PASSed.

@nchammas
Copy link
Contributor Author

Phantom new classes again? 😠 I'll look into this tomorrow. Thought it was a resolved issue...

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22464 has started for PR 2988 at commit 77871a2.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22465 has started for PR 2988 at commit bcdf6a5.

  • This patch merges cleanly.

@nchammas nchammas changed the title [EC2] Don't change working dir on user [SPARK-4137] [EC2] Don't change working dir on user Oct 29, 2014
@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22464 has finished for PR 2988 at commit 77871a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22464/
Test PASSed.

@nchammas
Copy link
Contributor Author

cc @shivaram @JoshRosen

@shivaram
Copy link
Contributor

@nchammas does the template replacement code still work correctly ? I am referring to the "deploy.generic" dir that we pass from https://github.com/apache/spark/blob/master/ec2/spark_ec2.py#L589 to https://github.com/apache/spark/blob/master/ec2/spark_ec2.py#L726

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22465 has finished for PR 2988 at commit bcdf6a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22465/
Test PASSed.

@nchammas
Copy link
Contributor Author

@shivaram Oh. Glad I pinged you. :)

In my brief testing I didn't allow launch to go all the way through. This is obviously broken.

Will update this PR with an alternate approach that lets the CWD be ec2/ as it was before.

@nchammas
Copy link
Contributor Author

Actually, we have a few options here for spark-ec2 and the underlying spark_ec2.py script.

  1. We leave the script as-is and just document the fact that relative paths won't work.
  2. We update the script to preserve the user's working directory and qualify any paths relative to that directory.
  3. We update the script so that all relative paths passed in are translated to absolute paths.

Comments:

  • Option 1 is the simplest but is a bit user-unfriendly.
  • Option 3 is tricky since we'd have to do the translation before the Python script is invoked, and this is annoying to do in a cross-platform manner; alternately, we can pass in both the user's CWD and the untranslated paths so that Python can do the translation in a cross-platform manner.
  • Option 2 seems feasible, but we'd have to track down all the places where we handle files.

So far, I see that any code that handles the following files would be affected if we go with either Option 2 or 3:

This turned out to be a bit more involved than I expected...

@shivaram What would you recommend?

@shivaram
Copy link
Contributor

Thanks for taking a closer look ! I don't know much python, but can't we get the directory that the script is in using something like __file__ and prefix that to "deploy.generic" ?

From what I can see there are two kinds of paths we have -- the first are paths supplied by the user (like the SSH identity file) where we can preserve the working directory and not change the path

The second are files relative to spark_ec2.py 's location (like deploy.generic). Making these absolute paths seems like a good idea to me.

@nchammas
Copy link
Contributor Author

So basically you're saying go with option 2, right?

From what I can see, deploy.generic may be the only file we need to fix the path for. Is that right? Maybe it's not such a big deal then.

In any case, I'll give it a shot and report back.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22501 has started for PR 2988 at commit 752f958.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22501 has finished for PR 2988 at commit 752f958.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22501/
Test PASSed.

@nchammas
Copy link
Contributor Author

@shivaram I took your suggestion and tested to make sure spark-ec2 still creates a functioning EC2 cluster.

This is ready for another review.

ec2/spark_ec2.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this style given that other functions in this file have comments on top ? Any thoughts @JoshRosen ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought I'd make this the first change toward having all the function descriptions be in docstrings, but for consistency's sake you're right--it should be a comment on top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen Any comments on this? I'd be more than happy to convert it back to being a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwendell Perhaps you can comment on this. It's a minor issue, and I think we should be able to get this PR merged in for 1.2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to revert to the existing style for now and open a separate patch for the style fix (similar to the PEP-8 patches ?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram Done.

@shivaram
Copy link
Contributor

Functionality LGTM. I left a minor style question for @JoshRosen

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #22977 has started for PR 2988 at commit fbc20c7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #22977 has finished for PR 2988 at commit fbc20c7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22977/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #22981 has started for PR 2988 at commit f3850b5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #22981 has finished for PR 2988 at commit f3850b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22981/
Test PASSed.

@shivaram
Copy link
Contributor

shivaram commented Nov 6, 2014

Thanks @nchammas -- Merged this and backported to 1.2

@asfgit asfgit closed this in db45f5a Nov 6, 2014
asfgit pushed a commit that referenced this pull request Nov 6, 2014
This issue was uncovered after [this discussion](https://issues.apache.org/jira/browse/SPARK-3398?focusedCommentId=14187471&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14187471).

Don't change the working directory on the user. This breaks relative paths the user may pass in, e.g., for the SSH identity file.

```
./ec2/spark-ec2 -i ../my.pem
```

This patch will preserve the user's current working directory and allow calls like the one above to work.

Author: Nicholas Chammas <[email protected]>

Closes #2988 from nchammas/spark-ec2-cwd and squashes the following commits:

f3850b5 [Nicholas Chammas] pep8 fix
fbc20c7 [Nicholas Chammas] revert to old commenting style
752f958 [Nicholas Chammas] specify deploy.generic path absolutely
bcdf6a5 [Nicholas Chammas] fix typo
77871a2 [Nicholas Chammas] add clarifying comment
ce071fc [Nicholas Chammas] don't change working dir

(cherry picked from commit db45f5a)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@nchammas nchammas deleted the spark-ec2-cwd branch November 6, 2014 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants