Skip to content

Conversation

@treff7es
Copy link
Contributor

@treff7es treff7es commented Nov 10, 2025

⏺ ## Fix: Handle None timestamps in Vertex AI ingestion to prevent crashes

Problem

Users were experiencing consistent crashes when ingesting Vertex AI metadata into DataHub with the error:
AttributeError: 'NoneType' object has no attribute 'timestamp'

The ingestion would successfully connect to GCP Vertex AI and identify assets, but would fail when
processing pipeline tasks or experiment runs that had incomplete timestamp data.

Root Cause

The Vertex AI source code was calling .timestamp() on timestamp fields without checking if they were
None first. This occurred in three locations:

  1. _get_pipeline_tasks_metadata (line 253): Calculated task duration using start_time.timestamp()
    and end_time.timestamp() when only end_time was checked for None
  2. _get_run_timestamps (line 501): Used create_time.timestamp() and update_time.timestamp()
    without None checks
  3. _gen_run_execution (line 547): Called datetime_to_ts_millis() on potentially None timestamps

GCP Vertex AI can return None for these timestamps when:

  • Pipeline tasks haven't started yet (no start_time)
  • Tasks are still running (no end_time)
  • Experiment runs don't have execution contexts (no create_time/update_time)

Changes Made

1. Fixed _get_pipeline_tasks_metadata method

# Before: Only checked end_time
if task_detail.end_time:
    task_meta.duration = ... - task_meta.start_time.timestamp()  # ❌ Crashes if start_time is None

# After: Check both timestamps
if task_detail.end_time and task_meta.start_time:
    task_meta.duration = ... - task_meta.start_time.timestamp()  # ✅ Safe

2. Fixed _get_run_timestamps method

# Before: No None checks
duration = update_time.timestamp() * 1000 - create_time.timestamp() * 1000  # ❌ Crashes if either is None

# After: Check both timestamps
if create_time and update_time:
    duration = update_time.timestamp() * 1000 - create_time.timestamp() * 1000  # ✅ Safe
return None, None  # Consistent return when data unavailable

3. Fixed _gen_run_execution method

# Before: No None checks
duration = datetime_to_ts_millis(update_time) - datetime_to_ts_millis(create_time)  # ❌ Crashes
time = datetime_to_ts_millis(create_time)  # ❌ Crashes

# After: Check and provide defaults
if create_time and update_time:
    duration = datetime_to_ts_millis(update_time) - datetime_to_ts_millis(create_time)
time = datetime_to_ts_millis(create_time) if create_time else 0  # ✅ Safe with default

Testing

Added three comprehensive unit tests to prevent regression:

1. test_pipeline_task_with_none_start_time - Verifies pipeline tasks with start_time=None don't crash
ingestion
2. test_pipeline_task_with_none_end_time - Verifies running tasks with end_time=None are handled
gracefully
3. test_experiment_run_with_none_timestamps - Verifies experiment runs with missing timestamps continue
processing

Test Results:
-All 15 tests passing (12 existing + 3 new)
-Linting checks passed (ruff format & check)
-No breaking changes to existing functionality

Impact

This fix allows the Vertex AI connector to gracefully handle incomplete timestamp data, which is common in
 production environments. The ingestion will:
-Continue processing when encountering tasks without start/end times
-Skip duration calculation when timestamps are unavailable (instead of crashing)
-Use sensible defaults (0) for required timestamp fields
-Successfully ingest all other metadata about the assets

Files Changed

- src/datahub/ingestion/source/vertexai/vertexai.py - Fixed 3 methods with timestamp handling issues
- tests/unit/test_vertexai_source.py - Added 3 regression tests

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 10, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Nov 10, 2025
@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 62.50000% with 3 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
.../src/datahub/ingestion/source/vertexai/vertexai.py 62.50% 3 Missing ⚠️

❌ Your patch status has failed because the patch coverage (62.50%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

📢 Thoughts on this report? Let us know!

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Nov 11, 2025
@treff7es treff7es force-pushed the vertex_ai_null_date_fix branch from 064bfc2 to 9d8a671 Compare November 11, 2025 13:17
@treff7es treff7es merged commit c4f63f9 into master Nov 11, 2025
61 of 62 checks passed
@treff7es treff7es deleted the vertex_ai_null_date_fix branch November 11, 2025 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants