Skip to content

Commit 7892f88

Browse files
nchammassrowen
authored andcommitted
[SPARK-30879][DOCS] Refine workflow for building docs
### What changes were proposed in this pull request? This PR makes the following refinements to the workflow for building docs: * Install Python and Ruby consistently using pyenv and rbenv across both the docs README and the release Dockerfile. * Pin the Python and Ruby versions we use. * Pin all direct Python and Ruby dependency versions. * Eliminate any use of `sudo pip`, which the Python community discourages, or `sudo gem`. ### Why are the changes needed? This PR should increase the consistency and reproducibility of the doc-building process by managing Python and Ruby in a more consistent way, and by eliminating unused or outdated code. Here's a possible example of an issue building the docs that would be addressed by the changes in this PR: #27459 (comment) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual tests: * I was able to build the Docker image successfully, minus the final part about `RUN useradd`. * I am unable to run `do-release-docker.sh` because I am not a committer and don't have the required GPG key. * I built the docs locally and viewed them in the browser. I think I need a committer to more fully test out these changes. Closes #27534 from nchammas/SPARK-30731-building-docs. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Sean Owen <[email protected]>
1 parent 4a64901 commit 7892f88

File tree

4 files changed

+72
-37
lines changed

4 files changed

+72
-37
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
.idea_modules/
1919
.project
2020
.pydevproject
21+
.python-version
22+
.ruby-version
2123
.scala_dependencies
2224
.settings
2325
/lib/

dev/create-release/do-release-docker.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ fcreate_secure "$GPG_KEY_FILE"
9696
$GPG --export-secret-key --armor "$GPG_KEY" > "$GPG_KEY_FILE"
9797

9898
run_silent "Building spark-rm image with tag $IMGTAG..." "docker-build.log" \
99-
docker build -t "spark-rm:$IMGTAG" --build-arg UID=$UID "$SELF/spark-rm"
99+
docker build --no-cache -t "spark-rm:$IMGTAG" --build-arg UID=$UID "$SELF/spark-rm"
100100

101101
# Write the release information to a file with environment variables to be used when running the
102102
# image.

dev/create-release/spark-rm/Dockerfile

Lines changed: 32 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020
# Includes:
2121
# * Java 8
2222
# * Ivy
23-
# * Python (2.7.15/3.6.7)
23+
# * Python 3.7
24+
# * Ruby 2.7
2425
# * R-base/R-base-dev (3.6.1)
25-
# * Ruby 2.3 build utilities
2626

2727
FROM ubuntu:18.04
2828

@@ -33,15 +33,11 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true
3333
# These arguments are just for reuse and not really meant to be customized.
3434
ARG APT_INSTALL="apt-get install --no-install-recommends -y"
3535

36-
ARG BASE_PIP_PKGS="setuptools wheel"
37-
ARG PIP_PKGS="pyopenssl numpy sphinx"
36+
ARG PIP_PKGS="sphinx==2.3.1 mkdocs==1.0.4 numpy==1.18.1"
37+
ARG GEM_PKGS="jekyll:4.0.0 jekyll-redirect-from:0.16.0 rouge:3.15.0"
3838

3939
# Install extra needed repos and refresh.
4040
# - CRAN repo
41-
# - Ruby repo (for doc generation)
42-
#
43-
# This is all in a single "RUN" command so that if anything changes, "apt update" is run to fetch
44-
# the most current package versions (instead of potentially using old versions cached by docker).
4541
RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \
4642
echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' >> /etc/apt/sources.list && \
4743
gpg --keyserver keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
@@ -50,36 +46,43 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \
5046
rm -rf /var/lib/apt/lists/* && \
5147
apt-get clean && \
5248
apt-get update && \
53-
$APT_INSTALL software-properties-common && \
54-
apt-add-repository -y ppa:brightbox/ruby-ng && \
55-
apt-get update && \
5649
# Install openjdk 8.
5750
$APT_INSTALL openjdk-8-jdk && \
5851
update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java && \
5952
# Install build / source control tools
6053
$APT_INSTALL curl wget git maven ivy subversion make gcc lsof libffi-dev \
61-
pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev && \
54+
pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev
55+
56+
ENV PATH "$PATH:/root/.pyenv/bin:/root/.pyenv/shims"
57+
RUN curl -L https://github.com/pyenv/pyenv-installer/raw/dd3f7d0914c5b4a416ca71ffabdf2954f2021596/bin/pyenv-installer | bash
58+
RUN $APT_INSTALL libbz2-dev libreadline-dev libsqlite3-dev
59+
RUN pyenv install 3.7.6
60+
RUN pyenv global 3.7.6
61+
RUN python --version
62+
RUN pip install --upgrade pip
63+
RUN pip --version
64+
RUN pip install $PIP_PKGS
65+
66+
ENV PATH "$PATH:/root/.rbenv/bin:/root/.rbenv/shims"
67+
RUN curl -fsSL https://github.com/rbenv/rbenv-installer/raw/108c12307621a0aa06f19799641848dde1987deb/bin/rbenv-installer | bash
68+
RUN rbenv install 2.7.0
69+
RUN rbenv global 2.7.0
70+
RUN ruby --version
71+
RUN $APT_INSTALL g++
72+
RUN gem --version
73+
RUN gem install --no-document $GEM_PKGS
74+
75+
RUN \
6276
curl -sL https://deb.nodesource.com/setup_11.x | bash && \
63-
$APT_INSTALL nodejs && \
64-
# Install needed python packages. Use pip for installing packages (for consistency).
65-
$APT_INSTALL libpython3-dev python3-pip && \
66-
# Change default python version to python3.
67-
update-alternatives --install /usr/bin/python python /usr/bin/python2.7 1 && \
68-
update-alternatives --install /usr/bin/python python /usr/bin/python3.6 2 && \
69-
update-alternatives --set python /usr/bin/python3.6 && \
70-
pip3 install $BASE_PIP_PKGS && \
71-
pip3 install $PIP_PKGS && \
72-
# Install R packages and dependencies used when building.
73-
# R depends on pandoc*, libssl (which are installed above).
77+
$APT_INSTALL nodejs
78+
79+
# Install R packages and dependencies used when building.
80+
# R depends on pandoc*, libssl (which are installed above).
81+
RUN \
7482
$APT_INSTALL r-base r-base-dev && \
7583
$APT_INSTALL texlive-latex-base texlive texlive-fonts-extra texinfo qpdf && \
7684
Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), repos='https://cloud.r-project.org/')" && \
77-
Rscript -e "devtools::install_github('jimhester/lintr')" && \
78-
# Install tools needed to build the documentation.
79-
$APT_INSTALL ruby2.3 ruby2.3-dev mkdocs && \
80-
gem install jekyll --no-rdoc --no-ri -v 3.8.6 && \
81-
gem install jekyll-redirect-from -v 0.15.0 && \
82-
gem install rouge
85+
Rscript -e "devtools::install_github('jimhester/lintr')"
8386

8487
WORKDIR /opt/spark-rm/output
8588

docs/README.md

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,19 +31,49 @@ whichever version of Spark you currently have checked out of revision control.
3131
The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
3232
Python, R and SQL.
3333

34-
You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
35-
[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
36-
installed. Also install the following libraries:
34+
You need to have Ruby 2 (preferably Ruby 2.6+) and Python 3 (preferably Python 3.7+) installed.
35+
36+
You'll also need to install the following libraries:
37+
38+
```sh
39+
gem install jekyll:4.0.0 jekyll-redirect-from:0.16.0 rouge:3.15.0
40+
```
41+
42+
### Using rbenv and pyenv
43+
44+
A handy way to install and manage various versions of Ruby and Python is with [`rbenv`] and [`pyenv`].
45+
46+
[`rbenv`]: https://github.com/rbenv/rbenv
47+
[`pyenv`]: https://github.com/pyenv/pyenv
48+
49+
On macOS you can install them with Homebrew:
3750

3851
```sh
39-
$ sudo gem install jekyll jekyll-redirect-from rouge
52+
brew install rbenv pyenv
4053
```
4154

42-
Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to replace gem with gem2.0.
55+
To activate them, you'll need to run these commands or add them to the end of your `.bash_profile`:
56+
57+
```sh
58+
eval "$(rbenv init -)"
59+
eval "$(pyenv init -)"
60+
```
61+
62+
You can now use them to install specific versions of Ruby and Python and associate them with
63+
the Spark home directory. Whenever you navigate to this directory or any of its subdirectories, these versions of Ruby and Python will be automatically activated.
64+
65+
```sh
66+
rbenv install 2.7.0
67+
pyenv install 3.7.6
68+
69+
cd /path/to/spark/root
70+
rbenv local 2.7.0
71+
pyenv local 3.7.6
72+
```
4373

4474
### R Documentation
4575

46-
If you'd like to generate R documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
76+
If you'd like to generate R documentation, you'll need to install R, [install Pandoc](https://pandoc.org/installing.html),
4777
and install these libraries:
4878

4979
```sh
@@ -58,7 +88,7 @@ Note: Other versions of roxygen2 might work in SparkR documentation generation b
5888
To generate API docs for any language, you'll need to install these libraries:
5989

6090
```sh
61-
$ sudo pip install sphinx mkdocs numpy
91+
pip install sphinx==2.3.1 mkdocs==1.0.4 numpy==1.18.1
6292
```
6393

6494
## Generating the Documentation HTML

0 commit comments

Comments
 (0)