Skip to content

Conversation

nathanng17
Copy link
Contributor

@nathanng17 nathanng17 commented Oct 14, 2025

  1. Update the Health Monitoring Agent to be compatible with Nvidia MIG

What's changing and why?

  1. Update the Health Monitoring Agent to be compatible with Nvidia MIG

Before/After UX

Before: NA

After: NA

How was this change tested?

Tested through node repair and fault detection through a newly created HyperPod EKS cluster on G5 and Trn1 instances, and running Health Monitoring Agent on MIG enabled P4 instance.

Are unit tests added?

No

Are integration tests added?

No

Reviewer Guidelines

‼️ Merge Requirements: PRs with failing integration tests cannot be merged without justification.

One of the following must be true:

  • All automated PR checks pass
  • Failed tests include local run results/screenshots proving they work
  • Changes are documentation-only

… with minor improvements and bug fixes.

1. Update the Health Monitoring Agent to be compatible with Nvidia MIG
@nathanng17 nathanng17 requested a review from a team as a code owner October 14, 2025 19:53
@nathanng17 nathanng17 deployed to manual-approval October 14, 2025 19:53 — with GitHub Actions Active
Copy link
Contributor

@haardm haardm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhaoqizqwang zhaoqizqwang merged commit 0ae955c into aws:main Oct 14, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants