R528 kfp support for accelerator devices #16

RobuRishabh · 2025-10-01T17:24:51Z

R528: Add KFP Support for Accelerator Devices

Description

This PR extends the existing Docling Kubeflow Pipelines to support accelerator device configuration. The changes build upon the refactored common components from R1065 and add the ability to specify and configure accelerator devices (GPU, CPU, etc.) for both standard and VLM Docling pipelines.

Technical Details:

Extended AcceleratorOptions configuration in both standard and VLM pipelines
Added device selection parameters (AUTO, CPU, GPU, etc.)
Enhanced documentation and examples for accelerator device usage

How Has This Been Tested?

Tested both standard and VLM pipelines by successfully running local_run.py and on Openshift AI cluster
Environment: Tested on local Docker environment
Log files:

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Related Issues

https://issues.redhat.com/browse/RHAIENG-528

Dependencies

This PR builds upon the changes from R1065 (Refactor Common Components). It should be merged after R1065 is merged to main.

Summary by CodeRabbit

New Features
- Added accelerator device selection (auto, cpu, cuda, mps) for Standard and VLM pipelines.
- Enhanced model download options per pipeline type with optional remote endpoint support.
Refactor
- Separated Standard and VLM pipelines and centralized shared components; renamed converters to match.
Documentation
- Reorganized guides with new paths, added accelerator section, expanded configuration options, and updated compile/run examples.
Chores
- Upgraded Kubeflow Pipelines SDK to 2.14.4.

Signed-off-by: roburishabh <[email protected]>

This reverts commit 3205bc3.

- Merged latest main branch changes - Resolved conflicts in docling_convert_pipeline_compiled.yaml - Updated common components to work with latest main - Ensured compatibility with upstream changes - Added proper documentation and README updates - Maintained refactored common components structure Signed-off-by: roburishabh <[email protected]>

Signed-off-by: roburishabh <[email protected]>

coderabbitai · 2025-10-01T17:24:58Z

Walkthrough

Restructures Kubeflow Pipelines into standard and VLM variants with a new common/ package for shared components and constants. Adds accelerator_device support across pipelines, updates pipeline code and compiled YAML, replaces VLM component module, and revises READMEs and local_run scripts to match new paths, names, and parameters.

Changes

Cohort / File(s)	Summary
Documentation updates `kubeflow-pipelines/README.md`, `kubeflow-pipelines/docling-standard/README.md`, `kubeflow-pipelines/docling-vlm/README.md`	Repoint paths and commands to new standard/vlm files; add accelerator_device option and new config entries; renumber sections; update compile/run examples and YAML names.
Common package (shared) `kubeflow-pipelines/common/__init__.py`, `kubeflow-pipelines/common/components.py`, `kubeflow-pipelines/common/constants.py`	New shared exports and constants; add import_pdfs, create_pdf_splits, download_docling_models components; define model/image constants; expose via all.
Standard pipeline code `kubeflow-pipelines/docling-standard/local_run.py`, `.../standard_components.py`, `.../standard_convert_pipeline.py`	Rename converter to docling_convert_standard; add accelerator_device parameter; wire common components; pass pipeline_type=standard to model download; update pipeline signature and description.
Standard compiled spec `kubeflow-pipelines/docling-standard/standard_convert_pipeline_compiled.yaml`	Rename component/task to -standard; add docling_accelerator_device input; extend model download with pipeline_type and remote_model_endpoint_enabled; bump kfp to 2.14.4; update logs and wiring.
VLM pipeline restructuring `kubeflow-pipelines/docling-vlm/docling_convert_components.py` (removed), `.../vlm_components.py` (added), `.../local_run.py`, `.../vlm_convert_pipeline.py`	Replace old VLM components with docling_convert_vlm; add accelerator_device and remote model handling; use common components; pass pipeline_type=vlm; update local_run and pipeline signature.
VLM compiled spec `kubeflow-pipelines/docling-vlm/vlm_convert_pipeline_compiled.yaml`	Rename component/task to -vlm; add accelerator parameter; extend model download to support standard/vlm/vlm-remote; adjust env/logs; bump kfp to 2.14.4; update description and wiring.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Pipeline as Standard Convert Pipeline
    participant Common as common.components
    participant Models as download_docling_models
    participant Convert as docling_convert_standard
    participant Storage as Artifacts/Outputs

    User->>Pipeline: Run with params (pdf_filenames, accelerator_device="auto", ...)
    Pipeline->>Common: import_pdfs()
    Common-->>Storage: PDFs downloaded
    Pipeline->>Common: create_pdf_splits(num_splits)
    Common-->>Pipeline: Splits
    Pipeline->>Models: download_docling_models(pipeline_type="standard", remote_model_endpoint_enabled=false)
    Models-->>Storage: Models prepared
    loop For each split
        Pipeline->>Convert: convert(split, accelerator_device)
        Note right of Convert: Validate accelerator_device<br/>Configure accelerator options
        Convert-->>Storage: JSON/Markdown outputs
    end
    Pipeline-->>User: Completion + output location

sequenceDiagram
    autonumber
    actor User
    participant Pipeline as VLM Convert Pipeline
    participant Common as common.components
    participant Models as download_docling_models
    participant Convert as docling_convert_vlm
    participant Storage as Artifacts/Outputs

    User->>Pipeline: Run with params (pdf_filenames, accelerator_device, remote_model_enabled)
    Pipeline->>Common: import_pdfs()
    Common-->>Storage: PDFs downloaded
    Pipeline->>Common: create_pdf_splits()
    Common-->>Pipeline: Splits
    Pipeline->>Models: download_docling_models(pipeline_type="vlm", remote_model_endpoint_enabled=false)
    Models-->>Storage: VLM models ready
    alt remote model enabled
        Convert->>Convert: Read secrets, configure API VLM options
    else local model
        Convert->>Convert: Configure local VLM options
    end
    loop For each split
        Pipeline->>Convert: convert(split, accelerator_device)
        Note right of Convert: Validate accelerator_device and image export mode
        Convert-->>Storage: JSON/Markdown outputs
    end
    Pipeline-->>User: Completion + output location

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

I thump the ground: new paths appear,
Two trails—Standard, VLM—now clear.
With turbo paws (accelerators, aye!),
I fetch my PDFs, then hopify.
Models in a burrow, neatly shared—
YAMLs compiled, carrots prepared.
Convert complete—the meadow cheered! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “R528 kfp support for accelerator devices” directly reflects the core change of adding accelerator device support to Kubeflow Pipelines and ties back to the referenced issue, making it clear and specific without extraneous detail.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (10)

kubeflow-pipelines/docling-standard/standard_components.py (4)
5-6: Avoid sys.path hacks; package the shared module.

Relying on sys.path.insert to reach common is brittle inside component containers. Prefer a proper package import (e.g., rename directory to a valid module name and install it, or vendor the shared code into the image PYTHONPATH).

14-14: Too many parameters; group config to reduce API surface.

Static analysis flags R0917/R0915. Consider grouping rarely changed options into structured config (e.g., dataclass/pydantic parsed from JSON), or split OCR/table/image config into helper functions to keep the component interface manageable.

138-155: Build allowed set from device_map to avoid drift; set device via normalized key.

Use the normalized dev above to index device_map when instantiating AcceleratorOptions.
Apply this diff:
-    pipeline_options.accelerator_options = AcceleratorOptions(
-        num_threads=num_threads, device=device_map[accelerator_device.lower()]
-    )
+    pipeline_options.accelerator_options = AcceleratorOptions(
+        num_threads=num_threads,
+        device=device_map[dev],
+    )
120-120: Logging is helpful; include device + threads for traceability.

Augment start/finish logs with accelerator_device and num_threads to aid support.
Apply this diff:
-    print(f"docling-standard-convert: starting with backend='{pdf_backend}', files={len(input_pdfs)}", flush=True)
+    print(
+        f"docling-standard-convert: starting backend='{pdf_backend}', files={len(input_pdfs)}, "
+        f"device='{accelerator_device}', threads={num_threads}",
+        flush=True,
+    )
...
-    print("docling-standard-convert: done", flush=True)
+    print("docling-standard-convert: done", flush=True)
Also applies to: 195-202
kubeflow-pipelines/docling-standard/standard_convert_pipeline.py (3)
4-6: Replace runtime path injection with proper package import.

Same as in components: avoid sys.path manipulation; distribute common as a Python package in the image and import it normally.

71-95: No GPU resources requested; 'cuda' won’t schedule on GPU nodes.

If users choose cuda/gpu, add GPU resource requests/limits and (optionally) node selectors/tolerations; otherwise the pod may run CPU-only on generic nodes.
Example (adjust to your cluster conventions):
from kfp import kubernetes
if docling_accelerator_device.lower() in ("cuda", "gpu"):
    converter.set_gpu_limit(1)  # ensure a GPU is scheduled
    # Optional: steer to GPU nodes and tolerate taints
    kubernetes.add_node_selector(converter, "nvidia.com/gpu.present", "true")
    kubernetes.add_toleration(converter, key="nvidia.com/gpu", operator="Exists", effect="NoSchedule")
If your environment uses a different label/taint or requires explicit requests, mirror CPU pattern (e.g., set_gpu_request(1)) per your KFP/kfp-kubernetes version.

Would you like me to open a follow-up to wire GPU scheduling in both standard and VLM pipelines?

99-101: Compilation entry point: ensure compiled YAML is committed/ignored consistently.

Decide whether the compiled YAML belongs in source control. If checked in, add a CI step to recompile and diff; otherwise, add it to .gitignore.
kubeflow-pipelines/common/constants.py (3)
6-8: Centralize accelerator device choices to keep components/pipelines consistent.

Consider defining a canonical set/aliases here (e.g., ACCELERATOR_DEVICE_ALIASES = {'auto': 'auto','cpu':'cpu','cuda':'cuda','gpu':'cuda','mps':'mps'}) and reusing it in both standard and VLM components.
Example:
+# Accelerator device aliases (canonical target on the right)
+ACCELERATOR_DEVICE_ALIASES = {
+    "auto": "auto",
+    "cpu":  "cpu",
+    "cuda": "cuda",
+    "gpu":  "cuda",
+    "mps":  "mps",
+}
Then import and use in validation/mapping to avoid drift.

2-3: Revise image verification for Red Hat registry

Quay image tag “2.54.0” verified successfully; the UBI9 Python image tag check failed—registry.access.redhat.com’s V2 API requires authentication (or the tag may not exist).

Pin both images to immutable digests (image@sha256:…) and/or update your CI script to authenticate against Red Hat’s registry before querying tags.

8-8: Remove unused MODEL_TYPE_VLM_REMOTE constant
The MODEL_TYPE_VLM_REMOTE constant is only defined in constants.py and re-exported in common/__init__.py but never referenced elsewhere. Remove its declaration (constants.py) and export entries (common/init.py).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d688a0c and 14f35cf.

📒 Files selected for processing (15)

kubeflow-pipelines/README.md (7 hunks)
kubeflow-pipelines/common/__init__.py (1 hunks)
kubeflow-pipelines/common/components.py (1 hunks)
kubeflow-pipelines/common/constants.py (1 hunks)
kubeflow-pipelines/docling-standard/README.md (2 hunks)
kubeflow-pipelines/docling-standard/local_run.py (2 hunks)
kubeflow-pipelines/docling-standard/standard_components.py (5 hunks)
kubeflow-pipelines/docling-standard/standard_convert_pipeline.py (5 hunks)
kubeflow-pipelines/docling-standard/standard_convert_pipeline_compiled.yaml (22 hunks)
kubeflow-pipelines/docling-vlm/README.md (2 hunks)
kubeflow-pipelines/docling-vlm/docling_convert_components.py (0 hunks)
kubeflow-pipelines/docling-vlm/local_run.py (2 hunks)
kubeflow-pipelines/docling-vlm/vlm_components.py (1 hunks)
kubeflow-pipelines/docling-vlm/vlm_convert_pipeline.py (5 hunks)
kubeflow-pipelines/docling-vlm/vlm_convert_pipeline_compiled.yaml (21 hunks)

💤 Files with no reviewable changes (1)

kubeflow-pipelines/docling-vlm/docling_convert_components.py

🧰 Additional context used

🧬 Code graph analysis (5)

kubeflow-pipelines/docling-standard/standard_convert_pipeline.py (3)

kubeflow-pipelines/common/components.py (3)

import_pdfs (9-115)

create_pdf_splits (120-138)

download_docling_models (143-212)

kubeflow-pipelines/docling-standard/standard_components.py (1)

docling_convert_standard (14-202)

kubeflow-pipelines/docling-vlm/vlm_convert_pipeline.py (1)

convert_pipeline (23-83)

kubeflow-pipelines/docling-standard/local_run.py (2)

kubeflow-pipelines/common/components.py (3)

create_pdf_splits (120-138)

download_docling_models (143-212)

import_pdfs (9-115)

kubeflow-pipelines/docling-standard/standard_components.py (1)

docling_convert_standard (14-202)

kubeflow-pipelines/common/__init__.py (1)

kubeflow-pipelines/common/components.py (3)

import_pdfs (9-115)

create_pdf_splits (120-138)

download_docling_models (143-212)

kubeflow-pipelines/docling-vlm/vlm_convert_pipeline.py (2)

kubeflow-pipelines/common/components.py (3)

import_pdfs (9-115)

create_pdf_splits (120-138)

download_docling_models (143-212)

kubeflow-pipelines/docling-vlm/vlm_components.py (1)

docling_convert_vlm (14-174)

kubeflow-pipelines/docling-vlm/local_run.py (2)

kubeflow-pipelines/common/components.py (3)

create_pdf_splits (120-138)

download_docling_models (143-212)

import_pdfs (9-115)

kubeflow-pipelines/docling-vlm/vlm_components.py (1)

docling_convert_vlm (14-174)

🪛 Pylint (3.3.8)

kubeflow-pipelines/docling-vlm/vlm_components.py

[refactor] 14-14: Too many positional arguments (10/5)

(R0917)

[refactor] 107-112: Consider using '{"model_id": remote_model_name, "parameters": dict(max_new_tokens=400), ... }' instead of a call to 'dict'.

(R1735)

[refactor] 109-111: Consider using '{"max_new_tokens": 400}' instead of a call to 'dict'.

(R1735)

[refactor] 14-14: Too many branches (14/12)

(R0912)

[refactor] 14-14: Too many statements (72/50)

(R0915)

kubeflow-pipelines/common/components.py

[refactor] 9-9: Too many branches (21/12)

(R0912)

[refactor] 9-9: Too many statements (69/50)

(R0915)

kubeflow-pipelines/docling-standard/standard_components.py

[refactor] 14-14: Too many positional arguments (18/5)

(R0917)

[refactor] 14-14: Too many statements (66/50)

(R0915)

🪛 Ruff (0.13.2)

kubeflow-pipelines/docling-vlm/vlm_components.py

23-23: Possible hardcoded password assigned to function default: "remote_model_secret_mount_path"

(S107)

43-43: Unused noqa directive (non-enabled: PLC0415, E402)