Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions .github/workflows/validate-and-execute-notebooks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
name: Smoke Tests for Notebooks

on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- "notebooks/**/*.ipynb"
- ".github/workflows/validate-and-execute-notebooks.yml"
push:
branches: [ main ]
paths:
- "notebooks/**/*.ipynb"
- ".github/workflows/validate-and-execute-notebooks.yml"

permissions:
contents: read

jobs:
validate_tests:
runs-on: ubuntu-latest
strategy:
matrix:
# Set the notebooks to validate, wildcards are allowed
notebooks_to_validate: ["notebooks/**/*.ipynb"]
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip

- name: Install Testing Tools
run: |
pip install nbformat ipykernel
ipython kernel install --name "python3" --user

- name: Validate Notebooks
run: |
set -ux

shopt -s globstar nullglob

for FILE in ${{ matrix.notebooks_to_validate }}; do
if [ -f "$FILE" ]; then
echo "Validating notebook '$FILE'..."

FILE="$FILE" python - <<'PY'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of having a python script in line within the action. As follow up work could we split this out into it's own file. I really like the nbformat.validate() check here.

import os, sys, nbformat

f = os.environ['FILE']

n = nbformat.read(f, nbformat.NO_CONVERT)

# Validate the notebook
nbformat.validate(n)

has_parameters_cell = False

for cell in n.cells:
if cell.cell_type == 'code':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add s check to ensure that the notebook has atleast one parameter tag. This will make the notebook testable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!


# Fail test if notebook has any code cell with outputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these checks here, I wonder if it would be cleaner to use https://pypi.org/project/nb-clean/ instead of writing our own script to do it.

if cell.outputs:
print(f'Code cell in notebook {f} has outputs')
sys.exit(1)

# Fail test if notebook has any code cell with execution count
if cell.execution_count:
print(f'Code cell in notebook {f} has execution count')
sys.exit(1)

# Check for code cells tagged with 'parameters'
if 'tags' in cell.metadata and 'parameters' in cell.metadata.tags:
has_parameters_cell = True

# Fail test if notebook doesn't have any code cell tagged with 'parameters'
if not has_parameters_cell:
print(f'Notebook {f} does not have any cell tagged with \'parameters\'')
sys.exit(1)
PY

fi
done

execute_tests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run execute only if validate_tests pass? To do so
execute_tests:
needs: validate_tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed.

needs: validate_tests
runs-on: ubuntu-latest
strategy:
matrix:
# Set the notebooks to execute
notebook_to_execute: ["notebooks/use-cases/document-conversion-standard.ipynb"]

# Set the files use in each notebook execution
file_to_use: ["https://raw.githubusercontent.com/py-pdf/sample-files/refs/heads/main/001-trivial/minimal-document.pdf"]
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip

- name: Install Testing Tools
run: |
pip install papermill ipykernel
ipython kernel install --name "python3" --user

- name: Execute Notebooks
run: |
set -ux

NOTEBOOK="${{ matrix.notebook_to_execute }}"
FILE="${{ matrix.file_to_use }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to have something in CI to validate notebook. As a long term solution, I plan to create pytests and call them from workflows. This will lets us scale and maintain as the repo grows.


echo "Executing notebook '$NOTEBOOK' with file '$FILE'..."

papermill $NOTEBOOK $NOTEBOOK.tmp.ipynb -b $(echo -n "files: [\"$FILE\"]" | base64 -w 0)

echo "✓ Notebook $NOTEBOOK executed successfully"