-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-30879][DOCS] Refine workflow for building docs #27534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…0731-building-docs
|
|
||
| # Image for building Spark releases. Based on Ubuntu 18.04. | ||
| # | ||
| # Includes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun and @wangyum might better to review.
|
Test build #118187 has finished for PR 27534 at commit
|
|
Should I break up this PR to make it easier to review? (Edit: Done.) e.g.
Asking just in case the silence from other reviewers is because this PR is too big. If y'all just haven't gotten around to taking a look, no worries. I'm being proactive. |
|
I've split off the Makefile and Mkdocs changes and updated my previous comment with links to the new PRs. |
|
Test build #118652 has finished for PR 27534 at commit
|
|
Retest this please. |
|
Is it possible for contributors to re-trigger the GitHub workflows for a certain commit? I'm not seeing how to do that. |
|
FWIW, I don't think Affected Version really matters much for an improvement. You can set it to say it would be a valid improvement as of a version -- maybe it improves a thing that's only present from 2.4.0 onwards or something. But for most cases it probably 'affects' (could be applied to) lots of versions, and that isn't as interesting as in the case of a bug. Not required unless it's really important clarify, doesn't hurt in most cases either. |
|
Just to be clear, any objections to merging? |
|
I don't think any committers have tried working with the proposed build instructions and/or Docker definition. Perhaps a committer could try things out to make sure it works for them? |
|
Jenkins retest this please |
|
Test build #119425 has finished for PR 27534 at commit
|
docs/README.md
Outdated
| You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and | ||
| [Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python) | ||
| installed. Also install the following libraries: | ||
| You need to have Ruby 2 and Python 3 installed. A handy way to install and manage various versions of Ruby and Python is with [`rbenv`] and [`pyenv`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert these doc changes in the documentation? Otherwise, I am good with other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or alternatively, we can have one section to describe use rbenv and pyenv separately .. that works to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved them to their own sub-section. If we think it's distracting / not helpful, I can remove it.
|
Test build #119478 has finished for PR 27534 at commit
|
| pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev | ||
|
|
||
| ENV PATH "$PATH:/root/.pyenv/bin:/root/.pyenv/shims" | ||
| RUN curl -L https://github.com/pyenv/pyenv-installer/raw/dd3f7d0914c5b4a416ca71ffabdf2954f2021596/bin/pyenv-installer | bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I suspect it was not tested due to the limitation described in the PR description:
Generating SQL API Markdown files.
20/03/31 06:41:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/opt/spark-rm/output/spark/sql/gen-sql-api-docs.py", line 21, in <module>
from pyspark.java_gateway import launch_gateway
File "/opt/spark-rm/output/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
File "/opt/spark-rm/output/spark/python/lib/pyspark.zip/pyspark/context.py", line 22, in <module>
ImportError: No module named threading
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.Seems the installed Python is weird. threading is the standard Python library that has existed from Python 2 to Python 3, but seems not existent with the Python installed here.
Let me revert this to make RC preparation easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's strange, because I was able to build enough of the Dockerfile to get past this point, and I did not hit this problem.
What I couldn't test was to run everything via do-release-docker.sh.
|
We can manually test it by changing the The release script doesn't work anymore after this patch because we use a non-standard way to install python/ruby (download a bash script and run it):
|
|
Reverted at 4d4c3e7. |
|
BTW, are we using the release script in cc @rxin since he is a release manager. |
|
Yea we should use the script in branch-3.0 so this doesn't block the RC. But we should still revert this PR as it doesn't work. |
|
Sure~ Of course for reverting this PR. |
|
I think my wording made some miscommunication here:
I simply meant future RCs :-). |
This is strange since I tested this successfully via What I couldn't do was test building the image via I think the reason my test didn't see these issues is because the I guess the moral of the story is that I couldn't test I should have more carefully tried to duplicate the Sorry. |
|
I can try taking another stab at this change for the future, now that I have a better idea of how to test it. However, given that this change was met with some resistance to begin with, I won't try to fix it unless there is explicit committer interest. |
|
This PR does have a good point to fix the dependency versions so that the script is more robust. I'm happy to see a working version of it. |
|
We can pin the version for release related ones; however, I doubt if we should do that for others e.g., CI, documentation. It's being discussed at #27928 |
Split from apache#27534. ### What changes were proposed in this pull request? This PR updates a deprecated Mkdocs option to use the new name. ### Why are the changes needed? This change will prevent the docs from failing to build when we update to a version of Mkdocs that no longer supports the deprecated option. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I built the docs locally and reviewed them in my browser. Closes apache#27626 from nchammas/SPARK-30731-mkdocs-dep-opt. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
Split off from apache#27534. ### What changes were proposed in this pull request? This PR deletes some unused cruft from the Sphinx Makefiles. ### Why are the changes needed? To remove dead code and the associated maintenance burden. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Tested locally by building the Python docs and reviewing them in my browser. Closes apache#27625 from nchammas/SPARK-30731-makefile-cleanup. Lead-authored-by: Nicholas Chammas <[email protected]> Co-authored-by: Nicholas Chammas <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
### What changes were proposed in this pull request? This PR makes the following refinements to the workflow for building docs: * Install Python and Ruby consistently using pyenv and rbenv across both the docs README and the release Dockerfile. * Pin the Python and Ruby versions we use. * Pin all direct Python and Ruby dependency versions. * Eliminate any use of `sudo pip`, which the Python community discourages, or `sudo gem`. ### Why are the changes needed? This PR should increase the consistency and reproducibility of the doc-building process by managing Python and Ruby in a more consistent way, and by eliminating unused or outdated code. Here's a possible example of an issue building the docs that would be addressed by the changes in this PR: apache#27459 (comment) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual tests: * I was able to build the Docker image successfully, minus the final part about `RUN useradd`. * I am unable to run `do-release-docker.sh` because I am not a committer and don't have the required GPG key. * I built the docs locally and viewed them in the browser. I think I need a committer to more fully test out these changes. Closes apache#27534 from nchammas/SPARK-30731-building-docs. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
This PR makes the following refinements to the workflow for building docs:
sudo pip, which the Python community discourages, orsudo gem.Why are the changes needed?
This PR should increase the consistency and reproducibility of the doc-building process by managing Python and Ruby in a more consistent way, and by eliminating unused or outdated code.
Here's a possible example of an issue building the docs that would be addressed by the changes in this PR: #27459 (comment)
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manual tests:
RUN useradd.do-release-docker.shbecause I am not a committer and don't have the required GPG key.I think I need a committer to more fully test out these changes.