fix: Improve worker nodes waiting mechanism in MPI jobs #145

satishpasumarthi · 2022-08-24T06:25:52Z

Issue #, if available:
ORTE has lost communication with a remote daemon
Description of changes:

Made changes to the non-leader nodes to wait on orted process as well as look for the status file sent by the leader node.
Added _MPI_ERRORS_ section to the __init__.py to capture the MPI related errors in the console.
Updated logic to capture psutil.wait_procs return values and added a callback to invoke on termination.
updated documentation wherever needed.
Add EFA specific flags for mpi, sm modelparallalel and dataparallel jobs.

Testing done:

Updated unit tests

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

I have read the CONTRIBUTING doc
I used the commit message format described in CONTRIBUTING
I have used the regional endpoint when creating S3 and/or STS clients (if appropriate)
I have updated any necessary documentation, including READMEs

Tests

I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have checked that my tests are not configured for a specific region or account (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot · 2022-08-24T06:41:04Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 3b64dbd
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

src/sagemaker_training/mpi.py

mseth10 · 2022-08-24T19:45:14Z

@satishpasumarthi can we remove the function definition for _wait_orted_process_to_finish if it is not being used?

satishpasumarthi · 2022-08-24T20:03:31Z

@satishpasumarthi can we remove the function definition for _wait_orted_process_to_finish if it is not being used?

Hi @mseth10 , we will do a clean up later. For now, we would want to retain just in case we need to fall back.

src/sagemaker_training/mpi.py

indhub

It is possible a PT DDP training can let node-0 die while some other node does some tail operation (like eval). It is true that users usually let node-0 do such tail operations. But that is not required. A PT DDP job that lets node-0 die while the rest of the nodes run something is technically not doing anything wrong. But toolkit will kill such a job with this change. Should we be concerned about that?

I think ideally we should do a barrier after user program has terminated. Make sure all processes are done and then exit.

sagemaker-bot · 2022-09-06T20:51:15Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 28f0ea0
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-09-06T20:52:58Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 28f0ea0
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-09-08T18:36:34Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 968708d
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-09-08T19:43:52Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: e266c89
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-09-08T20:04:06Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 11bfe8d
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

src/sagemaker_training/mpi.py

yl-to · 2022-09-08T20:21:55Z

In the long run, I suggest we could use a light-weighted database to store status information.

For now this PR looks good.

sagemaker-bot · 2022-09-08T21:11:00Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: fcb5a33
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-09-09T01:46:13Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 5c4a11d
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-09-09T16:39:40Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: d275545
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-09-09T18:42:27Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 14f2894
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

src/sagemaker_training/mpi.py

src/sagemaker_training/smdataparallel.py

sagemaker-bot · 2022-09-09T21:22:25Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 1c6dc07
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

roywei

please remove the print, else LGTM

src/sagemaker_training/mpi.py

sagemaker-bot · 2022-09-09T22:00:30Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-training-toolkit-pr
Commit ID: 082204d
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Squashed commit of the following: commit 374c29a Author: yl-to <[email protected]> Date: Fri Sep 16 23:34:49 2022 +0000 fix: revert back train.py spacing change commit 84b7df4 Author: yl-to <[email protected]> Date: Fri Sep 16 23:13:00 2022 +0000 fix: format fix commit 007c19e Author: yl-to <[email protected]> Date: Fri Sep 16 23:04:59 2022 +0000 fix: remove debug prints and fix process test commit 2c69572 Author: yl-to <[email protected]> Date: Fri Sep 16 22:12:48 2022 +0000 add module test, format/naming fix commit fdcd652 Author: yl-to <[email protected]> Date: Fri Sep 16 18:16:04 2022 +0000 fix:format change commit 7dc6cf7 Merge: 6ab8764 a1c4755 Author: yl-to <[email protected]> Date: Fri Sep 16 18:09:54 2022 +0000 Merge branch 'debugger_exception' of https://github.com/yl-to/sagemaker-training-toolkit into debugger_exception commit 6ab8764 Author: yl-to <[email protected]> Date: Fri Sep 16 18:08:41 2022 +0000 add: test case and diffrentiate debugger miss or uninstall commit a1c4755 Merge: e559cab a6d0a6b Author: Satish Pasumarthi <[email protected]> Date: Thu Sep 15 17:17:54 2022 -0700 Merge branch 'master' into debugger_exception commit e559cab Author: yl-to <[email protected]> Date: Tue Sep 13 21:13:27 2022 +0000 better way to remove duplication commit a6d0a6b Author: ci <ci> Date: Mon Sep 12 20:18:57 2022 +0000 update development version to v4.2.9.dev0 commit ccfbaae Author: ci <ci> Date: Mon Sep 12 20:04:51 2022 +0000 prepare release v4.2.8 commit 7723b66 Author: Satish Pasumarthi <[email protected]> Date: Mon Sep 12 13:01:20 2022 -0700 Fix: Args for worker nodes in smdataparallel jobs (aws#147) * fix worker args for sm dataparallel jobs commit b6c945b Author: yl-to <[email protected]> Date: Mon Sep 12 18:38:13 2022 +0000 doc update commit 79deb5c Author: yl-to <[email protected]> Date: Mon Sep 12 18:34:02 2022 +0000 fix: wrap ddp exception fetch function commit c4655f4 Author: yl-to <[email protected]> Date: Mon Sep 12 18:01:29 2022 +0000 fix: dupilicate error type commit 2470585 Author: ci <ci> Date: Sat Sep 10 00:18:46 2022 +0000 update development version to v4.2.8.dev0 commit 98e6a7d Author: ci <ci> Date: Sat Sep 10 00:04:39 2022 +0000 prepare release v4.2.7 commit a8d0f7f Author: Satish Pasumarthi <[email protected]> Date: Fri Sep 9 17:01:07 2022 -0700 fix: improve worker node wait logic and update EFA flags (aws#145) commit 6dde2c4 Author: yl-to <[email protected]> Date: Fri Sep 9 09:43:59 2022 +0000 fix: import fix commit 6ab9a53 Author: yl-to <[email protected]> Date: Thu Sep 8 22:57:29 2022 +0000 add: debugger exception to runners commit bb20f65 Author: ci <ci> Date: Thu Aug 18 15:17:41 2022 +0000 update development version to v4.2.7.dev0

* fix: propagate log level to aws services (aws#79) * fix: propagate log level to aws services * drop py27 and add py38 support * update unit test * recover buildspeck * remove py38 build * install latest sagemaker 1.x version * fix: removing py27/py38 * fix arg name Co-authored-by: Chuyang Deng <[email protected]> * prepare release v3.6.3 * update development version to v3.6.4.dev0 * doc: fix typo in ENVIRONMENT_VARIABLES.md (aws#81) Removed typo ')'. Co-authored-by: Ajay Karpur <[email protected]> * prepare release v3.6.3.post0 * update development version to v3.6.4.dev0 * infra: use ECR-hosted image for ubuntu:16.04 (aws#87) * infra: use ECR-hosted image for ubuntu:16.04 * use public ECR repo * disable prompts in Docker build * fix: workaround to print stderr when capturing (aws#86) Co-authored-by: Ajay Karpur <[email protected]> * prepare release v3.6.4 * update development version to v3.6.5.dev0 * feature: add data parallelism support (aws#3) (aws#8) * change: use format in place of f-strings and use comment style type annotations (aws#10) * change: update tox to use sagemaker 2.18.0 for tests * prepare release v3.7.0 * update development version to v3.7.1.dev0 * fix:decode binary stderr string before dumping it out (aws#89) * fix:decode binary stderr string before dumping it out * fix failing test Co-authored-by: Rui Wang Napieralski <[email protected]> * prepare release v3.7.1 * update development version to v3.7.2.dev0 * change: set btl_vader_single_copy_mechanism to none (aws#90) * prepare release v3.7.2 * update development version to v3.7.3.dev0 * change: set btl_vader_single_copy_mechanism to none to avoid Read -1 Warning messages (aws#95) * prepare release v3.7.3 * update development version to v3.7.4.dev0 * Update Dockerfile to accomomdate Rust dependency. (aws#98) * Update Dockerfile to accomomdate Rust dependency. cryptography module has added RUST as its dependency. Upgrading PIP to solve this dependency. * pinning to particular version of pip pinned to pip version 21.0.1 which solves the Rust dependency * prepare release v3.7.4 * update development version to v3.7.5.dev0 * Change: smdataparallel change FI_PROVIDER to efa from sockets (aws#96) * prepare release v3.7.5 * update development version to v3.7.6.dev0 * feature: smdataparallel custom mpi options support (aws#99) * feature: smdataparallel custom mpi options support * Fixed pylint * Fixed black-check * Fixed unit test * prepare release v3.8.0 * update development version to v3.8.1.dev0 * feature: smdataparallel enable EFA RDMA flag (aws#101) * feature: smdataparallel enable EFA RDMA flag * added changes to unit test * updated the flag to use only for ml.p4d.24xlarge instance * prepare release v3.9.0 * update development version to v3.9.1.dev0 * change: [smdataparallel] better messages to establish the SSH connection between workers (aws#103) * change: [smdataparallel] better messages for to establish the SSH connection between workers * python timeout.timeout raises TimeoutError * Added detailed error message * prepare release v3.9.1 * update development version to v3.9.2.dev0 * Reverted -x FI_EFA_USE_DEVICE_RDMA=1 to fix a crash on PyTorch Dataloaders for Distributed training (aws#106) * prepare release v3.9.2 * update development version to v3.9.3.dev0 * Fix logging issues (aws#108) * Fix logging issues Use asyncio to read stdout and stderr streams in realtime Report Exit code on failures Convey user informative message if process gets OOM Killed Filter out stderr to look for error messages and report Prepend tags to the log files to enable easy filtering in CloudWatch Update Amazon Licensing Update SM doc urls Support - Added Py38, Removed py36 and py27 Added unittests for asyncio APIs Install libssl1.1 and openssl packages * prepare release v3.9.3 * update development version to v3.9.4.dev0 * breaking: Add py38, dropped py36 and py2 support. Bump pypi to 4.0.0 (changes from PR aws#108) (aws#109) * prepare release v4.0.0 * update development version to v4.0.1.dev0 * Fix: Enable custom failure logging (aws#118) * prepare release v4.0.1 * update development version to v4.0.2.dev0 * feature: add back FI_EFA_USE_DEVICE_RDMA=1 flag, revert 2936f22 (aws#121) fix: fixed the black lint, upgraded black to version 21.3.0 fix: remove u prefix of strings, as python3 defaults to unicode strings note: EFA is only available on p3dn or p4dn instances note: EFA version 1.15.1 and OFI 1.1.5-aws have the issue fixed note: black format reference on remove u prefix https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#strings * prepare release v4.1.0 * update development version to v4.1.1.dev0 * fix: missing args when shell script is used (aws#122) * prepare release v4.1.1 * update development version to v4.1.2.dev0 * fix: fix flaky issue with incorrect rc being given (aws#124) * fix: fix flaky issue with incorrect rc being given * Add logging around proc.wait. * prepare release v4.1.2 * update development version to v4.1.3.dev0 * Feature: Adding new parameter for TF Multi Worker Mirrored Strategy (aws#130) * feature: Adding new parameter for TF Multi Worker Mirrored Strategy * fix: changing variable name for MWMS * fix: freezing protobuf version and renaming variable for MWMS * fix: linting * prepare release v4.1.3 * update development version to v4.1.4.dev0 * Use framework provided error class and stack trace as error message (aws#123) * log smddp exceptions * update exception class * clean up error msg * address comments * Add Error Categorization for SMMP * add pytorch errors for SMMP && minor fixes * feature: allow framework libraries to supply exceptions to track and report as failure reason. Added support for SMDDP and SMMP custom exceptions. Include custom exception as error class and de-duplicated stack trace as error message. Added tests for wacthing single, list of exceptions and also support existing internal exceptions. Co-authored-by: haohanchen-yagao <[email protected]> Co-authored-by: Joe Evans <[email protected]> * prepare release v4.1.4 * update development version to v4.1.5.dev0 * Fix none exception class issue for mpi (aws#131) * fix: Fix none exception class issue for mpi * Add unit test for SMP exception import * reformat * fix import format * rename get exception function * prepare release v4.1.5 * update development version to v4.1.6.dev0 * update: protobuf version to overlap with TF requirements (aws#134) * update: protobuf version to overlap with TF requirements * fix: upper bound * prepare release v4.1.6 * update development version to v4.1.7.dev0 * feature: Heterogeneous cluster changes (aws#135) * prepare release v4.2.0 * update development version to v4.2.1.dev0 * fix: handle utf-8 decoding exceptions while processing stdout and stderr streams * prepare release v4.2.1 * update development version to v4.2.2.dev0 * fix: specify flake8 config explicitly (aws#138) * change: update distribution_instance_group for pytorch ddp * fix: Removed version hardcoding for sagemaker test dependency (aws#141) * prepare release v4.2.2 * update development version to v4.2.3.dev0 * change: update num_processes_per_host for smdataparallel runner * prepare release v4.2.3 * update development version to v4.2.4.dev0 * Feature: Create a new distribution mechanism for PT-XLA (aws#137) * Create a new distribution mechanism for PT-XLA * Adding new unit tests targetting PT-XLA distributed training * Reformatting according to guidelines * Linting changes * Linting changes * Linting changes * Test Mock syntax fix * Test Mock syntax fix * Fixing syntax error * Fixing syntax error * Revert "Fixing syntax error" This reverts commit 48a10c5. * Fixing syntax error * Fixing syntax error * + new test to target the PT-XLA Distributed runner * + new test to target the PT-XLA Distributed runner * + new test to target the PT-XLA Distributed runner * + new test to target the PT-XLA Distributed runner * + new test to target the PT-XLA Distributed runner * + new test to target the PT-XLA Distributed runner * + new test to target the PT-XLA Distributed runner * Add verbose reporting for tox tests * Fixing syntax errors * Fixing syntax errors * Fixing syntax errors * Fixing syntax errors * Adding more tests targeting PT-XLA DT mechanism * edits for flake8 * edits for black * fixing test errors * fixing test errors * fixing test errors * fixing test errors * fixing test errors * fixing container build for unit testing * fixing container build for unit testing * retry tests * fixing container build for unit testing * fixing container build for unit testing * fixing container execution for unit testing * fixing container execution for unit testing * Refactoring some tests as integration tests * Refactoring some tests as integration tests * Refactoring some tests as integration tests * Refactoring some tests as integration tests * Refactoring some tests as integration tests * Removing stale files * Removing stale test container * Fix: adding EFA specific setup to distributed training runner for PT-XLA (aws#143) * fix: adding EFA specific setup to distributed training runner for PT-XLA * test: testing new env variables for PT-XLA on EFA * prepare release v4.2.4 * update development version to v4.2.5.dev0 * relax exception type (aws#140) no qa * prepare release v4.2.5 * update development version to v4.2.6.dev0 * fix: Enable PT XLA distributed training on homogeneous clusters (aws#144) * fix: adding bypass for PT XLA distributed training on homogeneous cluster * fix: linting * prepare release v4.2.6 * update development version to v4.2.7.dev0 * fix: improve worker node wait logic and update EFA flags (aws#145) * prepare release v4.2.7 * update development version to v4.2.8.dev0 * Fix: Args for worker nodes in smdataparallel jobs (aws#147) * fix worker args for sm dataparallel jobs * prepare release v4.2.8 * update development version to v4.2.9.dev0 * Fix merge conflicts Co-authored-by: Chuyang <[email protected]> Co-authored-by: Chuyang Deng <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Pedro Martins <[email protected]> Co-authored-by: Ajay Karpur <[email protected]> Co-authored-by: sboshin <[email protected]> Co-authored-by: ChaiBapchya <[email protected]> Co-authored-by: Dan <[email protected]> Co-authored-by: icywang86rui <[email protected]> Co-authored-by: Rui Wang Napieralski <[email protected]> Co-authored-by: Eric Johnson <[email protected]> Co-authored-by: Karan Jariwala <[email protected]> Co-authored-by: Rajan Singh <[email protected]> Co-authored-by: Piyush Ghai <[email protected]> Co-authored-by: Daiming Yang <[email protected]> Co-authored-by: matherit <[email protected]> Co-authored-by: Loki <[email protected]> Co-authored-by: Lai Wei <[email protected]> Co-authored-by: haohanchen-yagao <[email protected]> Co-authored-by: Joe Evans <[email protected]> Co-authored-by: haohanchen-yagao <[email protected]> Co-authored-by: Nishanth Hegde <[email protected]> Co-authored-by: Vishwa Karia <[email protected]> Co-authored-by: Nishanth Hegde <[email protected]> Co-authored-by: Jihyeong Lee <[email protected]> Co-authored-by: Loki <[email protected]>

satishpasumarthi changed the title ~~fix: make worker nodes sleep rather than wait on orted process~~ fix: make non-leader nodes sleep rather than wait on orted process for mpi based jobs Aug 24, 2022

wzamazon reviewed Aug 24, 2022

View reviewed changes

src/sagemaker_training/mpi.py Show resolved Hide resolved

yl-to reviewed Aug 24, 2022

View reviewed changes

src/sagemaker_training/mpi.py Outdated Show resolved Hide resolved

YangFei1990 reviewed Aug 24, 2022

View reviewed changes

src/sagemaker_training/mpi.py Outdated Show resolved Hide resolved

Arjunbala reviewed Aug 24, 2022

View reviewed changes

indhub suggested changes Aug 24, 2022

View reviewed changes

indhub previously approved these changes Sep 1, 2022

View reviewed changes

roywei previously approved these changes Sep 1, 2022

View reviewed changes

satishpasumarthi changed the title ~~fix: make non-leader nodes sleep rather than wait on orted process for mpi based jobs~~ fix: Improve worker nodes waiting mechanism during mpirun Sep 1, 2022

satishpasumarthi dismissed stale reviews from roywei and indhub via 9d53166 September 1, 2022 23:45

satishpasumarthi force-pushed the master branch from 9d53166 to 8914af2 Compare September 1, 2022 23:47

satishpasumarthi force-pushed the master branch from e266c89 to 11bfe8d Compare September 8, 2022 19:49

yl-to reviewed Sep 8, 2022

View reviewed changes

src/sagemaker_training/mpi.py Show resolved Hide resolved

yl-to reviewed Sep 8, 2022

View reviewed changes

src/sagemaker_training/mpi.py Outdated Show resolved Hide resolved

satishpasumarthi force-pushed the master branch from 3c6d9c2 to 14f2894 Compare September 9, 2022 18:25

satishpasumarthi changed the title ~~fix: Improve worker nodes waiting mechanism during mpirun~~ fix: Improve worker nodes waiting mechanism in MPI jobs Sep 9, 2022

yl-to previously approved these changes Sep 9, 2022

View reviewed changes

roywei suggested changes Sep 9, 2022

View reviewed changes

src/sagemaker_training/mpi.py Outdated Show resolved Hide resolved

src/sagemaker_training/smdataparallel.py Outdated Show resolved Hide resolved

satishpasumarthi dismissed yl-to’s stale review via 1c6dc07 September 9, 2022 21:05

roywei previously approved these changes Sep 9, 2022

View reviewed changes

src/sagemaker_training/mpi.py Outdated Show resolved Hide resolved

satishpasumarthi dismissed roywei’s stale review via c04b221 September 9, 2022 21:44

fix: improve worker node wait logic and update EFA flags

082204d

satishpasumarthi force-pushed the master branch from c04b221 to 082204d Compare September 9, 2022 21:45

roywei approved these changes Sep 9, 2022

View reviewed changes

992X approved these changes Sep 9, 2022

View reviewed changes

rahul003 approved these changes Sep 9, 2022

View reviewed changes

satishpasumarthi merged commit a8d0f7f into aws:master Sep 10, 2022

fix: Improve worker nodes waiting mechanism in MPI jobs #145

fix: Improve worker nodes waiting mechanism in MPI jobs #145

Uh oh!

Conversation

satishpasumarthi commented Aug 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Checklist

General

Tests

Uh oh!

sagemaker-bot commented Aug 24, 2022

AWS CodeBuild CI Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mseth10 commented Aug 24, 2022

Uh oh!

satishpasumarthi commented Aug 24, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

indhub left a comment

Choose a reason for hiding this comment

Uh oh!

sagemaker-bot commented Sep 6, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Sep 6, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Sep 8, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Sep 8, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Sep 8, 2022

AWS CodeBuild CI Report

Uh oh!

Uh oh!

Uh oh!

yl-to commented Sep 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sagemaker-bot commented Sep 8, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Sep 9, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Sep 9, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Sep 9, 2022

AWS CodeBuild CI Report

Uh oh!

Uh oh!

Uh oh!

sagemaker-bot commented Sep 9, 2022

AWS CodeBuild CI Report

Uh oh!

roywei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sagemaker-bot commented Sep 9, 2022

AWS CodeBuild CI Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

satishpasumarthi commented Aug 24, 2022 •

edited

Loading

yl-to commented Sep 8, 2022 •

edited

Loading