Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions .github/actions/detect-changed-files/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Detect Changed Files Action

A GitHub Action that detects changed files using native Git commands with support for pull requests and push events.

## Features

- 🔍 **Smart Detection**: Automatically detects changed files based on Git history
- 📋 **Event Support**: Works with pull requests, push events, and manual triggers
- 🎯 **Pattern Matching**: Supports glob patterns for file filtering
- 📊 **Rich Outputs**: Provides file lists, change counts, and boolean flags

## Usage

### Basic Usage

```yaml
- name: Get changed files
id: changed-files
uses: ./.github/actions/detect-changed-files
with:
files: 'notebooks/**/*.ipynb'
```

### Advanced Usage

```yaml
- name: Get changed files
id: changed-files
uses: ./.github/actions/detect-changed-files
with:
files: 'src/**/*.py'
base-sha: ${{ github.event.pull_request.base.sha }}
head-sha: ${{ github.event.pull_request.head.sha }}
```

## Inputs

| Input | Description | Required | Default |
|-------|-------------|----------|---------|
| `files` | File pattern to match (e.g., `notebooks/**/*.ipynb`) | ✅ | - |
| `token` | GitHub token for API access | ❌ | `${{ github.token }}` |
| `base-sha` | Base SHA for comparison | ❌ | Auto-detected |
| `head-sha` | Head SHA for comparison | ❌ | Auto-detected |

## Outputs

| Output | Description |
|--------|-------------|
| `all_changed_files` | JSON array of all changed files matching the pattern |
| `has_changes` | Boolean indicating if any files changed |
| `files_count` | Number of changed files |

## Examples

### Check for notebook changes

```yaml
- name: Check for notebook changes
id: notebooks
uses: ./.github/actions/detect-changed-files
with:
files: 'notebooks/**/*.ipynb'

- name: Run tests if notebooks changed
if: steps.notebooks.outputs.has_changes == 'true'
run: |
echo "Found changed notebooks:"
echo "${{ steps.notebooks.outputs.all_changed_files }}"
```

### Conditional workflow steps

```yaml
- name: Get changed Python files
id: python-files
uses: ./.github/actions/detect-changed-files
with:
files: 'src/**/*.py'

- name: Run linting
if: steps.python-files.outputs.has_changes == 'true'
run: |
echo "Linting ${{ steps.python-files.outputs.files_count }} Python files"
```

## Supported Events

- **Pull Requests**: Compares PR head with base branch
- **Push Events**: Compares current commit with previous commit
- **Manual Triggers**: Finds all files matching the pattern

## Pattern Examples

- `notebooks/**/*.ipynb` - All Jupyter notebooks in notebooks directory
- `src/**/*.py` - All Python files in src directory
- `*.md` - All Markdown files in root directory
- `docs/**/*` - All files in docs directory

## Author

ODH Data Processing Team
118 changes: 118 additions & 0 deletions .github/actions/detect-changed-files/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
name: 'Detect Changed Files'
description: 'Detect changed files using native Git commands with support for PRs and push events'
author: 'ODH Data Processing Team'

inputs:
files:
description: 'File pattern to match (e.g., notebooks/**/*.ipynb)'
required: true
token:
description: 'GitHub token for API access'
required: false
default: ${{ github.token }}
base-sha:
description: 'Base SHA for comparison (auto-detected if not provided)'
required: false
head-sha:
description: 'Head SHA for comparison (auto-detected if not provided)'
required: false

outputs:
all_changed_files:
description: 'JSON array of all changed files matching the pattern'
value: ${{ steps.detect.outputs.all_changed_files }}
has_changes:
description: 'Boolean indicating if any files changed'
value: ${{ steps.detect.outputs.has_changes }}
files_count:
description: 'Number of changed files'
value: ${{ steps.detect.outputs.files_count }}

runs:
using: 'composite'
steps:
- name: Detect changed files
id: detect
shell: bash
run: |
set -euo pipefail

echo "🔍 Detecting changed files for pattern: ${{ inputs.files }}"
echo "📋 Event: ${{ github.event_name }}"

# Determine comparison SHAs based on event type
if [ "${{ github.event_name }}" == "pull_request" ]; then
# For PRs, use provided SHAs or extract from event
if [ -n "${{ inputs.base-sha }}" ] && [ -n "${{ inputs.head-sha }}" ]; then
BASE_SHA="${{ inputs.base-sha }}"
HEAD_SHA="${{ inputs.head-sha }}"
else
BASE_SHA="${{ github.event.pull_request.base.sha }}"
HEAD_SHA="${{ github.event.pull_request.head.sha }}"
fi

echo "📊 Comparing PR: $BASE_SHA..$HEAD_SHA"

# Get changed files for PR
CHANGED_FILES=$(git diff --name-only --diff-filter=ACMRT $BASE_SHA..$HEAD_SHA -- '${{ inputs.files }}' | jq -R -s -c 'split("\n") | map(select(length > 0))')

elif [ "${{ github.event_name }}" == "push" ]; then
# For push events, compare with previous commit
if [ -n "${{ inputs.base-sha }}" ]; then
BASE_SHA="${{ inputs.base-sha }}"
HEAD_SHA="${{ github.sha }}"
else
# Compare with previous commit
BASE_SHA="HEAD~1"
HEAD_SHA="HEAD"
fi

echo "📊 Comparing push: $BASE_SHA..$HEAD_SHA"

# Get changed files for push
CHANGED_FILES=$(git diff --name-only --diff-filter=ACMRT $BASE_SHA $HEAD_SHA -- '${{ inputs.files }}' | jq -R -s -c 'split("\n") | map(select(length > 0))')

else
# For other events (workflow_dispatch, schedule), assume no file-based filtering needed
echo "📊 Event type '${{ github.event_name }}' - returning all matching files"

# Find all files matching pattern
if command -v find >/dev/null 2>&1; then
# Use find for better pattern matching
PATTERN="${{ inputs.files }}"
# Extract directory prefix and file pattern
if [[ "$PATTERN" == *"**"* ]]; then
# Pattern has **: extract base directory and filename pattern
BASE_DIR=$(echo "$PATTERN" | cut -d'/' -f1)
FILE_PATTERN=$(echo "$PATTERN" | cut -d'/' -f2-)
CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_PATTERN" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
else
# Simple pattern without **: use -path directly
CHANGED_FILES=$(find . -path "./$PATTERN" -type f | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
fi
Comment on lines +80 to +92
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential issue with find pattern matching for ** globs.

The pattern extraction logic may not work correctly for glob patterns containing **. When the pattern is split, FILE_PATTERN retains directory separators (e.g., **/*.ipynb), but the -name flag only matches against the filename component, not the full path.

For example, with pattern notebooks/**/*.ipynb:

  • BASE_DIR = notebooks
  • FILE_PATTERN = **/*.ipynb
  • find notebooks -type f -name "**/*.ipynb" won't match files correctly

Apply this fix to extract just the file extension pattern:

            if [[ "$PATTERN" == *"**"* ]]; then
             # Pattern has **: extract base directory and filename pattern
             BASE_DIR=$(echo "$PATTERN" | cut -d'/' -f1)
-             FILE_PATTERN=$(echo "$PATTERN" | cut -d'/' -f2-)
-             CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_PATTERN" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
+             FILE_EXT=$(echo "$PATTERN" | grep -oP '\*\.\w+$' || echo "*")
+             CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_EXT" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
            else

This extracts the file extension pattern (e.g., *.ipynb) and uses it with -name.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if command -v find >/dev/null 2>&1; then
# Use find for better pattern matching
PATTERN="${{ inputs.files }}"
# Extract directory prefix and file pattern
if [[ "$PATTERN" == *"**"* ]]; then
# Pattern has **: extract base directory and filename pattern
BASE_DIR=$(echo "$PATTERN" | cut -d'/' -f1)
FILE_PATTERN=$(echo "$PATTERN" | cut -d'/' -f2-)
CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_PATTERN" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
else
# Simple pattern without **: use -path directly
CHANGED_FILES=$(find . -path "./$PATTERN" -type f | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
fi
if command -v find >/dev/null 2>&1; then
# Use find for better pattern matching
PATTERN="${{ inputs.files }}"
# Extract directory prefix and file pattern
if [[ "$PATTERN" == *"**"* ]]; then
# Pattern has **: extract base directory and filename pattern
BASE_DIR=$(echo "$PATTERN" | cut -d'/' -f1)
FILE_EXT=$(echo "$PATTERN" | grep -oP '\*\.\w+$' || echo "*")
CHANGED_FILES=$(find "$BASE_DIR" -type f -name "$FILE_EXT" 2>/dev/null | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
else
# Simple pattern without **: use -path directly
CHANGED_FILES=$(find . -path "./$PATTERN" -type f | sed 's|^\./||' | jq -R -s -c 'split("\n") | map(select(length > 0))')
fi
🤖 Prompt for AI Agents
.github/actions/detect-changed-files/action.yml around lines 80-92: the current
logic splits a pattern with "**" into BASE_DIR and FILE_PATTERN but then calls
find -name with FILE_PATTERN still containing directory separators (e.g.,
"**/*.ipynb"), which -name won't match; change the extraction so that when
FILE_PATTERN contains directory components or "**/", derive the filename glob by
taking only the part after the last '/' (e.g., turn "**/*.ipynb" into "*.ipynb")
and pass that to find -name, or alternatively use find -path with a wildcarded
path pattern (e.g., "*/$FILE_PATTERN") to match the full path — implement one of
these fixes so find receives a correct filename glob for -name or a correct
full-path pattern for -path.

else
# Fallback: assume all files changed (conservative approach)
CHANGED_FILES='[]'
fi
fi

# Calculate if there are changes and file count
FILES_COUNT=$(echo "$CHANGED_FILES" | jq length)
HAS_CHANGES=$([ "$FILES_COUNT" -gt 0 ] && echo "true" || echo "false")

# Output results
echo "📁 Found $FILES_COUNT changed files"
echo "$CHANGED_FILES" | jq -r '.[]' | while read -r file; do
[ -n "$file" ] && echo " - $file"
done || true

# Set outputs
echo "all_changed_files=$CHANGED_FILES" >> $GITHUB_OUTPUT
echo "has_changes=$HAS_CHANGES" >> $GITHUB_OUTPUT
echo "files_count=$FILES_COUNT" >> $GITHUB_OUTPUT

echo "✅ Detection complete: has_changes=$HAS_CHANGES, files_count=$FILES_COUNT"

branding:
icon: 'search'
color: 'blue'
53 changes: 53 additions & 0 deletions .github/workflows/execute-notebooks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: Execute Notebooks Tests for Notebooks

on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- "notebooks/**/*.ipynb"
- ".github/workflows/execute-notebooks.yml"
push:
branches: [ main ]
paths:
- "notebooks/**/*.ipynb"
- ".github/workflows/execute-notebooks.yml"

permissions:
contents: read

jobs:
execute_tests:
runs-on: ubuntu-latest
strategy:
matrix:
# Set the notebooks to execute
notebook_to_execute: ["notebooks/use-cases/document-conversion-standard.ipynb"]

# Set the files use in each notebook execution
file_to_use: ["https://raw.githubusercontent.com/py-pdf/sample-files/refs/heads/main/001-trivial/minimal-document.pdf"]
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip

- name: Install Testing Tools
run: |
pip install papermill ipykernel
ipython kernel install --name "python3" --user

- name: Execute Notebooks
run: |
set -ux

NOTEBOOK="${{ matrix.notebook_to_execute }}"
FILE="${{ matrix.file_to_use }}"

echo "Executing notebook '$NOTEBOOK' with file '$FILE'..."

papermill $NOTEBOOK $NOTEBOOK.tmp.ipynb -b $(echo -n "files: [\"$FILE\"]" | base64 -w 0)

echo "✓ Notebook $NOTEBOOK executed successfully"
122 changes: 0 additions & 122 deletions .github/workflows/validate-and-execute-notebooks.yml

This file was deleted.

Loading
Loading