Skip to content

Conversation

rsareddy0329
Copy link
Collaborator

PR to merge all the documentation change to main branch for public launch

PR Approval Steps

For Requester

  1. Description
    • Check the PR title and description for clarity. It should describe the changes made and the reason behind them.
    • Ensure that the PR follows the contribution guidelines, if applicable.
  2. Security requirements
  3. Manual review
    1. Click on the Files changed tab to see the code changes. Review the changes thoroughly:
      • Code Quality: Check for coding standards, naming conventions, and readability.
      • Functionality: Ensure that the changes meet the requirements and that all necessary code paths are tested.
      • Security: Check for any security issues or vulnerabilities.
      • Documentation: Confirm that any necessary documentation (code comments, README updates, etc.) has been updated.
  4. Check for Merge Conflicts:
    • Verify if there are any merge conflicts with the base branch. GitHub will usually highlight this. If there are conflicts, you should resolve them.

For Reviewer

  1. Go through For Requester section to double check each item.
  2. Request Changes or Approve the PR:
    1. If the PR is ready to be merged, click Review changes and select Approve.
    2. If changes are required, select Request changes and provide feedback. Be constructive and clear in your feedback.
  3. Merging the PR
    1. Check the Merge Method:
      1. Decide on the appropriate merge method based on your repository's guidelines (e.g., Squash and merge, Rebase and merge, or Merge).
    2. Merge the PR:
      1. Click the Merge pull request button.
      2. Confirm the merge by clicking Confirm merge.

rsareddy0329 and others added 4 commits August 5, 2025 16:00
… main (#190)

* Fix training test (#184)

* Fix SDK training test: Add wait time before refresh

* Fix training tests in canaries

* Update logging information for submitting and deleting training job (#189)

Co-authored-by: pintaoz <[email protected]>

---------

Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: Roja Reddy Sareddy <[email protected]>
* Fix training test (#184)

* Fix SDK training test: Add wait time before refresh

* Fix training tests in canaries

* Update logging information for submitting and deleting training job (#189)

Co-authored-by: pintaoz <[email protected]>

---------

Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
* Documentation Fixes

* Documentation Fixes

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
@rsareddy0329 rsareddy0329 requested a review from a team as a code owner August 6, 2025 00:04
/.mypy_cache

/doc/_apidoc/
doc/_build/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this needs to be /doc/_build/ here?

Copy link
Collaborator Author

@rsareddy0329 rsareddy0329 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly to make sure _build is ignored in git Version control system

source {venv-name}/bin/activate
```
```{note}
Remember to activate your virtual environment (source {venv-name}/bin/activate) each time you want to use the HyperPod CLI and SDK if you chose the virtual environment installation method.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add code quote around source {venv-name}/bin/activate

--image pytorch/pytorch:latest \
```
````
````{tab-item} SDK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SDK code keeping parity with CLI here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a fast-follow item

```
````
````{tab-item} SDK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like SDK code here is still using some optional variables

````
````{tab-item} SDK
```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update SDK code here too

# Custom endpoint
hyp list-pods hyp-custom-endpoint
```
````
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing SDK code here

# Custom endpoint
hyp get-logs hyp-custom-endpoint --pod-name <pod-name>
```
````
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing SDK code here


List all HyperPod PyTorch jobs in a namespace.

#### Syntax
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like Syntax is even bigger then hyp list hyp-pytorch-job, not sure why the rendering is like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, mainly CSS changes required.
would be a fast follow as well.

:::

:::{grid-item-card} HyperPod Developer Guide
:link: https://catalog.workshops.aws/sagemaker-hyperpod-eks/en-US
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link seems to be the same as the workshop. Maybe needs an update?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, checking with Shweta on this.

* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
doc/inference.md Outdated
Comment on lines 134 to 143
When creating an inference endpoint, you'll need to specify:

- **endpoint-name**: Unique identifier for your endpoint
- **instance-type**: The EC2 instance type to use
- **model-id** (JumpStart): ID of the pre-trained JumpStart model
- **image-uri** (Custom): Docker image containing your inference code
- **model-name** (Custom): Name of model to create on SageMaker
- **model-source-type** (Custom): Source type: fsx or s3
- **model-volume-mount-name** (Custom): Name of the model volume mount
- **container-port** (Custom): Port on which the model server listens
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate this into 2

  1. Parameters required for Jumpstart
  2. Parameters required for Custom

Comment on lines 15 to 16
### Supported ML Frameworks
- PyTorch (version ≥ 1.10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Supported ML Frameworks for Training maybe

Comment on lines -30 to -39
def test_set_cluster_context(self, cluster_name):
"""Test setting cluster context."""
result = execute_command([
"hyp", "set-cluster-context",
"--cluster-name", cluster_name
])
assert result.returncode == 0
context_line = result.stdout.strip().splitlines()[-1]
assert any(text in context_line for text in ["Updated context", "Added new context"])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change is from other commits. Can you rebase to main to clean it up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged in the latest changes from main and this change is shown up as diff. Change is from this PR: https://github.com/aws/sagemaker-hyperpod-cli/pull/184/files

* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

* Documentation Fixes

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
@rsareddy0329 rsareddy0329 merged commit 17cfdbd into main Aug 6, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants