-
Notifications
You must be signed in to change notification settings - Fork 8
Add pytest configuration and new GitHub Action for detecting changed files #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a composite GitHub Action to detect changed files, new workflows to validate and execute notebooks, removes a previously combined workflow, and adds pytest configuration, test utilities, and tests to validate changed Jupyter notebooks in CI. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub
participant ValidateWF as validate-notebooks<br/>workflow
participant ExecuteWF as execute-notebooks<br/>workflow
participant Action as detect-changed-files<br/>action
participant Pytest as pytest
GH->>ValidateWF: Trigger on PR / push (notebooks/**/*.ipynb)
ValidateWF->>Action: Request changed files for `notebooks/**/*.ipynb`
Action->>Action: Inspect event type (PR / push / other)
alt PR
Action->>Action: Determine base/head SHAs → git diff (pattern)
else Push
Action->>Action: Use HEAD~1..HEAD → git diff (pattern)
else Other
Action->>Action: Use find/glob to list files matching pattern
end
Action-->>ValidateWF: Return `all_changed_files`, `files_count`, `has_changes`
ValidateWF->>ValidateWF: Setup Python, install deps
ValidateWF->>Pytest: Run notebook validation tests
Pytest-->>ValidateWF: Return results (artifacts uploaded)
GH->>ExecuteWF: (separate trigger) Execute notebooks when applicable
ExecuteWF->>ExecuteWF: Setup Python, install papermill/ipykernel
ExecuteWF->>ExecuteWF: Run papermill with base64 param
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pytest.ini (1)
10-16: Reconsider the--disable-warningsflag.The
--disable-warningsflag suppresses all pytest warnings, which could hide important deprecation warnings, compatibility issues, or other problems in dependencies. For a new testing setup, it's advisable to see warnings initially.Consider removing this flag or being more selective:
addopts = --verbose --tb=short --strict-markers - --disable-warnings --color=yes -raAlternatively, if specific warnings are noisy, filter them selectively using
-W ignore::WarningCategory.tests/requirements.txt (1)
1-9: Consider tighter version constraints for stability.Using only lower bounds (
>=) without upper bounds can introduce breaking changes when dependencies release new major versions. For more predictable CI behavior, consider using version ranges or pinning exact versions.Example using compatible release constraints:
# Testing dependencies for notebook validation -pytest>=7.4.0 +pytest~=7.4.0 -nbformat>=5.7.0 +nbformat~=5.7.0 -jupyter-client>=8.0.0 +jupyter-client~=8.0.0 # Optional: Enhanced reporting and utilities -pytest-html>=3.2.0 # HTML test reports +pytest-html~=3.2.0 # HTML test reports -pytest-json-report>=1.5.0 # JSON test reports +pytest-json-report~=1.5.0 # JSON test reports -pytest-xdist>=3.3.0 # Parallel test execution +pytest-xdist~=3.3.0 # Parallel test executionThe
~=operator allows patch-level updates but prevents breaking major/minor version changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.github/actions/detect-changed-files/README.md(1 hunks).github/actions/detect-changed-files/action.yml(1 hunks).github/workflows/execute-notebooks.yml(1 hunks).github/workflows/validate-and-execute-notebooks.yml(0 hunks).github/workflows/validate-notebooks.yml(1 hunks)pytest.ini(1 hunks)tests/notebook_validators.py(1 hunks)tests/requirements.txt(1 hunks)tests/test_notebooks_validate.py(1 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/validate-and-execute-notebooks.yml
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_notebooks_validate.py (1)
tests/notebook_validators.py (5)
NotebookValidator(12-72)load_notebook(20-33)validate_no_outputs(35-43)validate_no_execution_counts(45-53)validate_parameters_cell_exists(55-72)
🪛 LanguageTool
.github/actions/detect-changed-files/README.md
[uncategorized] ~41-~41: The official name of this software platform is spelled with a capital “H”.
Context: ...en| GitHub token for API access | ❌ |${{ github.token }}| |base-sha` | Base SHA for...
(GITHUB)
🪛 Ruff (0.14.1)
tests/test_notebooks_validate.py
34-35: try-except-pass detected, consider logging the exception
(S110)
34-34: Do not catch blind exception: Exception
(BLE001)
tests/notebook_validators.py
27-27: Consider moving this statement to an else block
(TRY300)
29-29: Avoid specifying long messages outside the exception class
(TRY003)
31-33: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (12)
.github/actions/detect-changed-files/README.md (1)
1-101: Excellent documentation!The README is comprehensive, well-structured, and provides clear examples for various use cases. The feature descriptions, input/output documentation, and workflow examples will help users effectively utilize this action.
.github/workflows/execute-notebooks.yml (2)
21-27: Matrix structure is prepared for expansion.The matrix strategy with single values for
notebook_to_executeandfile_to_useis valid and appears to be future-proofing for testing multiple notebooks or files. This is a good pattern for scalability.
42-53: Papermill execution looks correct.The base64-encoded parameter passing using the
-bflag is appropriate for parameterized notebook execution. Theset -uxensures the script exits on errors and provides visibility into execution..github/actions/detect-changed-files/action.yml (2)
44-74: Git diff integration looks solid.The action correctly handles both PR and push events with appropriate SHA comparison logic. The
--diff-filter=ACMRTflag appropriately excludes deleted files, and the pathspec pattern handling should work correctly with Git's native glob support.
99-114: Output calculation and reporting looks good.The JSON array handling with
jq, file count calculation, and output setting toGITHUB_OUTPUTare all implemented correctly. The defensive|| truein the file printing loop ensures the action doesn't fail if the loop encounters issues..github/workflows/validate-notebooks.yml (3)
22-30: Checkout and file detection integration is correct.The
fetch-depth: 0is essential for the git-based change detection to work properly, and the action is invoked correctly with the appropriate notebook pattern.
43-50: Pytest integration with changed files looks good.The workflow correctly passes the changed notebook list to pytest via the
CHANGED_NOTEBOOKSenvironment variable. The test module will use this for dynamic test parameterization, enabling selective testing of only changed notebooks in CI.
52-60: Verify that pytest.log is generated.The workflow uploads
pytest.logas an artifact, but pytest doesn't generate this file by default. Ensure pytest is configured to create this log file, or remove it from the artifact paths.Check if pytest is configured to generate
pytest.log. If not, either:
- Add logging configuration to
pytest.ini:addopts = --verbose --tb=short --strict-markers --color=yes -ra --log-file=pytest.log --log-file-level=INFO
- Or remove
pytest.logfrom the artifact upload:path: | .pytest_cache/ - pytest.logtests/test_notebooks_validate.py (2)
12-48: Dynamic test generation pattern is well-implemented.The
pytest_generate_testshook correctly handles both CI mode (withCHANGED_NOTEBOOKS) and local discovery mode. The fallback logic ensures tests run even if the environment variable is malformed or missing.Optionally, consider logging the exception for debugging:
except (json.JSONDecodeError, Exception): - pass + import warnings + warnings.warn(f"Failed to parse CHANGED_NOTEBOOKS: {changed_notebooks_env}")This would help diagnose issues with the JSON parsing, though the fallback behavior is already correct.
51-89: Test functions are clear and comprehensive.The three test functions validate essential notebook properties: format validity, clean execution state, and parameterization support. Each test is focused on a single concern, making failures easy to diagnose.
The
if not notebook_pathchecks at the start of each test are defensive but may be redundant sincepytest.parametrizealready handles empty lists. They don't cause issues and add safety, so keeping them is fine.tests/notebook_validators.py (2)
20-33: Notebook loading logic is sound.Using
nbformat.NO_CONVERTis appropriate for validation to preserve the original notebook format. The exception handling correctly distinguishes between validation errors and other I/O errors.If you prefer to follow Ruff's style suggestions:
def load_notebook(self) -> nbformat.NotebookNode: """Load and validate basic notebook format.""" try: self.notebook = nbformat.read( self.notebook_path, as_version=nbformat.NO_CONVERT ) nbformat.validate(self.notebook) - return self.notebook except ValidationError as e: raise ValueError(f"Invalid notebook format: {e}") from e except Exception as e: raise ValueError( f"Failed to read notebook {self.notebook_path}: {e}" ) from e + else: + return self.notebookHowever, the current code is already clear and correct.
55-72: Parameters cell validation is well-implemented.The method correctly checks code cells for the
parameterstag in cell metadata, which is the standard approach for parameterized notebook execution with tools like Papermill.
| if command -v find >/dev/null 2>&1; then | ||
| # Use find for better pattern matching | ||
| PATTERN="${{ inputs.files }}" | ||
| # Extract directory prefix and file pattern | ||
| if [[ "$PATTERN" == *"**"* ]]; then | ||
| # Pattern has **: extract base directory and filename pattern | ||
| BASE_DIR=$(echo "$PATTERN" | cut -d'/' -f1) | ||
| FILE_PATTERN=$(echo "$PATTERN" | cut -d'/' -f2-) | ||
| CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_PATTERN" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))') | ||
| else | ||
| # Simple pattern without **: use -path directly | ||
| CHANGED_FILES=$(find . -path "./$PATTERN" -type f | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))') | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential issue with find pattern matching for ** globs.
The pattern extraction logic may not work correctly for glob patterns containing **. When the pattern is split, FILE_PATTERN retains directory separators (e.g., **/*.ipynb), but the -name flag only matches against the filename component, not the full path.
For example, with pattern notebooks/**/*.ipynb:
BASE_DIR=notebooksFILE_PATTERN=**/*.ipynbfind notebooks -type f -name "**/*.ipynb"won't match files correctly
Apply this fix to extract just the file extension pattern:
if [[ "$PATTERN" == *"**"* ]]; then
# Pattern has **: extract base directory and filename pattern
BASE_DIR=$(echo "$PATTERN" | cut -d'/' -f1)
- FILE_PATTERN=$(echo "$PATTERN" | cut -d'/' -f2-)
- CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_PATTERN" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
+ FILE_EXT=$(echo "$PATTERN" | grep -oP '\*\.\w+$' || echo "*")
+ CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_EXT" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
elseThis extracts the file extension pattern (e.g., *.ipynb) and uses it with -name.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if command -v find >/dev/null 2>&1; then | |
| # Use find for better pattern matching | |
| PATTERN="${{ inputs.files }}" | |
| # Extract directory prefix and file pattern | |
| if [[ "$PATTERN" == *"**"* ]]; then | |
| # Pattern has **: extract base directory and filename pattern | |
| BASE_DIR=$(echo "$PATTERN" | cut -d'/' -f1) | |
| FILE_PATTERN=$(echo "$PATTERN" | cut -d'/' -f2-) | |
| CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_PATTERN" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))') | |
| else | |
| # Simple pattern without **: use -path directly | |
| CHANGED_FILES=$(find . -path "./$PATTERN" -type f | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))') | |
| fi | |
| if command -v find >/dev/null 2>&1; then | |
| # Use find for better pattern matching | |
| PATTERN="${{ inputs.files }}" | |
| # Extract directory prefix and file pattern | |
| if [[ "$PATTERN" == *"**"* ]]; then | |
| # Pattern has **: extract base directory and filename pattern | |
| BASE_DIR=$(echo "$PATTERN" | cut -d'/' -f1) | |
| FILE_EXT=$(echo "$PATTERN" | grep -oP '\*\.\w+$' || echo "*") | |
| CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_EXT" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))') | |
| else | |
| # Simple pattern without **: use -path directly | |
| CHANGED_FILES=$(find . -path "./$PATTERN" -type f | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))') | |
| fi |
🤖 Prompt for AI Agents
.github/actions/detect-changed-files/action.yml around lines 80-92: the current
logic splits a pattern with "**" into BASE_DIR and FILE_PATTERN but then calls
find -name with FILE_PATTERN still containing directory separators (e.g.,
"**/*.ipynb"), which -name won't match; change the extraction so that when
FILE_PATTERN contains directory components or "**/", derive the filename glob by
taking only the part after the last '/' (e.g., turn "**/*.ipynb" into "*.ipynb")
and pass that to find -name, or alternatively use find -path with a wildcarded
path pattern (e.g., "*/$FILE_PATTERN") to match the full path — implement one of
these fixes so find receives a correct filename glob for -name or a correct
full-path pattern for -path.
| def validate_no_execution_counts(self) -> list[str]: | ||
| """Validate that code cells have no execution counts.""" | ||
| errors = [] | ||
| for i, cell in enumerate(self.notebook.cells): | ||
| if cell.cell_type == "code" and cell.execution_count: | ||
| errors.append( | ||
| f"Cell {i}: Code cell has execution_count {cell.execution_count} (should be null)" | ||
| ) | ||
| return errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential issue with execution count validation.
Line 49 checks if cell.execution_count:, which treats 0 as falsy. However, an execution count of 0 indicates the cell has been executed, which violates the "clean state" requirement. The check should explicitly test for is not None.
Apply this fix:
def validate_no_execution_counts(self) -> list[str]:
"""Validate that code cells have no execution counts."""
errors = []
for i, cell in enumerate(self.notebook.cells):
- if cell.cell_type == "code" and cell.execution_count:
+ if cell.cell_type == "code" and cell.execution_count is not None:
errors.append(
f"Cell {i}: Code cell has execution_count {cell.execution_count} (should be null)"
)
return errorsThis ensures cells with execution_count: 0 are also caught.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def validate_no_execution_counts(self) -> list[str]: | |
| """Validate that code cells have no execution counts.""" | |
| errors = [] | |
| for i, cell in enumerate(self.notebook.cells): | |
| if cell.cell_type == "code" and cell.execution_count: | |
| errors.append( | |
| f"Cell {i}: Code cell has execution_count {cell.execution_count} (should be null)" | |
| ) | |
| return errors | |
| def validate_no_execution_counts(self) -> list[str]: | |
| """Validate that code cells have no execution counts.""" | |
| errors = [] | |
| for i, cell in enumerate(self.notebook.cells): | |
| if cell.cell_type == "code" and cell.execution_count is not None: | |
| errors.append( | |
| f"Cell {i}: Code cell has execution_count {cell.execution_count} (should be null)" | |
| ) | |
| return errors |
🤖 Prompt for AI Agents
In tests/notebook_validators.py around lines 45 to 53, the validation uses `if
cell.execution_count:` which treats 0 as falsy and misses executed cells with
execution_count 0; change the condition to `if cell.execution_count is not None`
so any non-null execution_count (including 0) is flagged, update the conditional
accordingly and keep the error message/append logic the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pytest.ini (1)
9-15: Reconsider the scope of--disable-warnings.Disabling all warnings globally is quite broad and may hide important information about notebook execution or dependencies. Consider:
- Being more selective (e.g.,
--disable-warningsonly for specific warning categories if notebook execution is noisy)- Documenting why warnings are suppressed
- Using
-W ignore::DeprecationWarningor similar patterns for targeted suppressionAlso note:
--strict-markersrequires any custom pytest markers used in the test suite to be registered in the[tool:pytest]section (e.g.,markers = my_marker: description). Verify whether custom markers are needed or if any are already in use.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pytest.ini(1 hunks)tests/requirements.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/requirements.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: execute_tests (notebooks/use-cases/document-conversion-standard.ipynb, https://raw.githubusercont...
🔇 Additional comments (1)
pytest.ini (1)
1-7: Configuration structure and test discovery align with conventions.The pytest configuration correctly specifies test discovery patterns that align with the CI workflow expectations from
validate-notebooks.ymland standard pytest conventions.
Added pytest to validate notebooks, and added a workflow to run these tests. Created an action to detect the changed files and use this action to selectively run the tests in workflows
How Has This Been Tested?
Ran in local report
Merge criteria:
Summary by CodeRabbit
Tests
Chores
Documentation