-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in lint-python #14963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pylintrc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have actually added a lot of exclusion here to make the code pass:
- bad-super-call
- consider-iterating-dictionary
- consider-using-enumerate
- eval-used
- exec-used
- invalid-length-returned
- misplaced-comparison-constant
- raising-bad-type
- redefined-variable-type
- trailing-newlines
- trailing-whitespace
- ungrouped-imports
- unnecessary-pass
- unneeded-not
- wrong-import-order
- wrong-import-position
e0ad40a
to
bd82d8c
Compare
dev/lint-python
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: here I leave the virtualenv of the current environment. tell me if which solution you prefere:
- jump into a custom venv? (this proposal)
- fail if in a virtualenv
- only warn the user if already in a virtualenv
- no warning at all, the user knows what he/she does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we do jump to a custom virtual env for syntax checking we should at least restore the users existing virtual env after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with "after"? For the Sphinx build? I think we can do both in this virtual env, this would allow to build without having to do sudo pip install Sphinx-build.
And we can control precisely the version of sphinx in this requirements.txt
In anyway, this virtualenv will be left automatically at the end of the execution of this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the virtualenv for the sphinx build sounds reasonable if we are already doing it for pep8. I was thinking that since we explicitly leave the current virtual env if the user is in one we should try and restore it after the script has been run - but if that already happens automatically thats great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm thinking it may be better not to leave virtualenv if we are inside one. So we don't have this copy paste of the code for desactivating it.
What do you think?
Test build #64949 has finished for PR 14963 at commit
|
Test build #64950 has finished for PR 14963 at commit
|
dev/lint-python
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait even simpler[ish] - do we need to deactivate the old env or will activating the new env be enough (provided we store the old VIRTUAL_ENV path to restore to after we deactivate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to suggestion. If we are already inside a venv we can expect the user knows what it does and don't desactivate it.
Thanks for working to reenable pylint (super important IMHO). Perhaps @davies could take a look at this approach? |
dev/requirements.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add the sphinx version here
Oh wait I've been a little confused by the PR title - pylint is already reenabled and was only temporary disabled awhile back - maybe less important but would make sense in the context of the other PRs you've put together. Could you maybe give this PR a clearer name like "Use virtualenv for pep8" rather than "Reenable Pylint"? |
Hum I don't see how it was reenabled... Where is it called? And I had many errors to ignore once I have reenabled it on the execution of lint-python. I'll update the title. At least, being inside a venv allow to precisely control the version of pep and pylint, this is kind of critical since errors changes from one version to another |
I mean we already have the pep8 version locked since we download a specific version with curl rather than depending on the system version. |
Yes but for pylint you have many dependencies to update as well (astroid,...). At least with a virtualenv, pip does it for us :) |
I also think virtualenv or similar is probably the right way to go for PySpark developer dependencies in the long term :) |
Great ! By the way, I am also working on virtualenv and wheel support for PySpark job deployment (see #14180) |
Great ! |
994bdfd
to
3f7db49
Compare
Test build #65145 has finished for PR 14963 at commit
|
Test build #65146 has finished for PR 14963 at commit
|
cc @davies ? |
Rebased + updated PR summary Hello, sorry to bother you, but if this patch gets merged, I can work on the pylint errors and submit new PR I had to add in the ignore list of pylint. If I reenable most of them, here is an extract of errors I get:
These are not big errors, but it would be quite easy to fix them and have pylint runing at each pullrequest. Thanks |
3f7db49
to
dff7b23
Compare
Test build #65481 has finished for PR 14963 at commit
|
So even though this passes tests - its showing up in the PR dashboard as not for me. It might be easier to get reviewers to take a look if it was listed correctly, if you merge in the latest master the tests will re-run or we can explicitly ask jenkins to retest this. |
What is the PR dashboard ? I usually rebase this patch one or twice a week, I'll do it tomorrow |
oh actually it seems like the PR dashboard updated its self (it lives at https://spark-prs.appspot.com/ and is sometimes used by reviewers to decide which PRs to look at). |
dff7b23
to
ca2dcb2
Compare
You and I agree, actually:
For development, PySpark will still use the anaconda environment if the user wishes so. Just for the |
I really think using Spark with Anaconda is a must have. Deploying jobs that runs inside a Conda environment is so fast and efficient. I really want to push for this pull request #14180 that also add anaconda support for executors (thanks to jeff's work). Actually before this PR I never tried Anaconda, and now I use it everyday on jupyter. Anaconda on the driver and executor make totally sense! |
Test build #66835 has finished for PR 14963 at commit
|
Test build #66838 has finished for PR 14963 at commit
|
93395fd
to
cd98471
Compare
Ho, I have these pylint errors on my ubuntu! Probably I did not rebased correctly. Fixed with new check ignore:
of course, once merged. i can work on fixing this pylint errors ! |
cd98471
to
132fec8
Compare
Test build #66836 has finished for PR 14963 at commit
|
Test build #66837 has finished for PR 14963 at commit
|
Test build #66848 has finished for PR 14963 at commit
|
Test build #66846 has finished for PR 14963 at commit
|
@stibbons however the problem is the virtualenv that gets created when a user is in a conda enviroment is partially broken. |
Also if it would be useful to have a quick chat off-line and then circle back to the PR with the result of our chat let me know - I'd really like to see us get something like this in so we can have better/more consistent linting. |
I agree, just email me :) |
Use a virtualenv for isolation and easy installation. This basically reverts 85a50a6 Might have been a solution to SPARK-9385. Lot of new test disabled. I propose to fix issues in various pull requests (obviously most 'import' order errors should be fixed by my other pull requests such as apache#14830 for documentation examples, which is part of the effort on code style described in apache#14567). Each subsequent pull request will fix one or more error and reenable the according pylint check. List of new disabled checks: - bad-super-call - consider-iterating-dictionary - consider-using-enumerate - eval-used - exec-used - invalid-length-returned - misplaced-comparison-constant - raising-bad-type - redefined-variable-type - trailing-newlines - trailing-whitespace - ungrouped-imports - unnecessary-pass - unneeded-not - wrong-import-order - wrong-import-position Signed-off-by: Gaetan Semet <[email protected]>
132fec8
to
4e27af2
Compare
Test build #70518 has finished for PR 14963 at commit
|
Any hope this patch might be integrated ? |
That's a reasonable question, I'll try and look at this some next week but if you've got a chance to look at/address the current comments on it that would be great. |
Test build #73451 has finished for PR 14963 at commit
|
So under the current jenkins set up this isn't running the Python style checks in Jenkins so it would be good to make sure these work in jenkins (can you update a python file)? |
Test build #76550 has finished for PR 14963 at commit
|
Hi, are you still working on this? |
Test build #79181 has finished for PR 14963 at commit
|
Use a virtualenv for isolation and easy installation of
pep8
andpylint
by thelint-python
linter script.This involves 2 changes in this script:
requirements.txt
. This will "freeze" the version of the tools used. We don't want the HEAD starting breaking because a new version of pylint has been posted on pypi which add new checks. sphinx version is also "frozen".This basically reverts 85a50a6
Might have been a solution to SPARK-9385 (see #7704)
Lot of new tests introduced by the newer version of pylint have been disabled. I propose to fix issues in various pull requests (obviously most 'import' order errors should be fixed by my other pull requests such as #14830 for documentation examples, which is part of the effort on code style described in #14567). Each subsequent pull request will fix one or more error and reenable the according pylint check.
List of new disabled tests:
How was this patch tested?
Manual execution of
lint-python
on my machine.