Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
1a4fad0
Improve vLLM workflows (#5480)
jinyan-li1 Nov 17, 2025
639e164
Merge branch 'sbuasai-main' of https://github.com/sirutBuasai/deep-le…
sirutBuasai Nov 18, 2025
2e14cea
formatting
sirutBuasai Nov 18, 2025
cc9bb0b
fix region
sirutBuasai Nov 18, 2025
436bf39
fix artifacts path
sirutBuasai Nov 18, 2025
d122974
isntall dependencies
sirutBuasai Nov 18, 2025
f92e7d9
reverse order
sirutBuasai Nov 18, 2025
94cd5d9
add port
sirutBuasai Nov 18, 2025
1bcc4b5
move scripts into their separate dir
sirutBuasai Nov 18, 2025
4aba883
change port
sirutBuasai Nov 18, 2025
c6eb798
Merge branch 'main' of https://github.com/aws/deep-learning-container…
sirutBuasai Nov 18, 2025
2a59098
style: format pre-commit check
sirutBuasai Nov 18, 2025
ac4c345
chore: run test choices
sirutBuasai Nov 18, 2025
c4b56ec
remove commitizen
sirutBuasai Nov 18, 2025
63f2da7
use main branch
sirutBuasai Nov 18, 2025
25ae1af
fix dir
sirutBuasai Nov 18, 2025
189eaf1
run benchmark sglang
sirutBuasai Nov 18, 2025
52efe21
ac run container id instead
sirutBuasai Nov 18, 2025
7569bb0
add port and host
sirutBuasai Nov 18, 2025
e38c86e
use composite action
sirutBuasai Nov 18, 2025
19555a7
input secrets
sirutBuasai Nov 18, 2025
c89c323
use shell bash
sirutBuasai Nov 18, 2025
cce1039
remove interactive
sirutBuasai Nov 18, 2025
090a2b6
logs tail 200
sirutBuasai Nov 18, 2025
68064d8
add sleep
sirutBuasai Nov 18, 2025
f1b68a3
add cleanup
sirutBuasai Nov 18, 2025
b5de06b
remove -it
sirutBuasai Nov 18, 2025
d04fb8a
fix names
sirutBuasai Nov 18, 2025
71bf8ae
use container cleanup action
sirutBuasai Nov 18, 2025
eb46dbb
remove unused step
sirutBuasai Nov 18, 2025
b27979c
Merge branch 'main' into port-sglang
sirutBuasai Nov 18, 2025
e91c443
try env.containerid
sirutBuasai Nov 18, 2025
f390fd9
use env
sirutBuasai Nov 18, 2025
bfa1930
add -it
sirutBuasai Nov 18, 2025
b31cfe2
full run
sirutBuasai Nov 18, 2025
8809763
use output
sirutBuasai Nov 18, 2025
728942d
use vars expose
sirutBuasai Nov 18, 2025
09f976b
use input
sirutBuasai Nov 18, 2025
7682be3
comment artifacts name
sirutBuasai Nov 18, 2025
4c48634
set output
sirutBuasai Nov 18, 2025
3cd1e4f
echo
sirutBuasai Nov 18, 2025
767ae3b
add outputs id
sirutBuasai Nov 18, 2025
555b8a9
iamge uri output
sirutBuasai Nov 18, 2025
9891937
test using hardcoded string
sirutBuasai Nov 18, 2025
0ff7c73
no echo
sirutBuasai Nov 18, 2025
117a51e
set my output
sirutBuasai Nov 18, 2025
db06d9a
use image uri
sirutBuasai Nov 18, 2025
3e6888d
use secret image uri file
sirutBuasai Nov 18, 2025
ddddeff
fix inputs
sirutBuasai Nov 18, 2025
1df975f
correct docker pull
sirutBuasai Nov 18, 2025
0bd33b6
use non screte var
sirutBuasai Nov 19, 2025
fc34b27
use steps image_uri
sirutBuasai Nov 19, 2025
7320e77
remove unused steps
sirutBuasai Nov 19, 2025
907962e
change uri var name
sirutBuasai Nov 19, 2025
93696e6
change step name
sirutBuasai Nov 19, 2025
261cfe9
run regression test
sirutBuasai Nov 19, 2025
6c5834a
change container_pull to ecr_authenticate
sirutBuasai Nov 19, 2025
25bafb7
remove }}
sirutBuasai Nov 19, 2025
a94c782
Merge branch 'main' into port-sglang
sirutBuasai Nov 19, 2025
0df556c
use 12xlarge fleet
sirutBuasai Nov 19, 2025
04f4482
use base sglang
sirutBuasai Nov 19, 2025
73d3f93
revert dockerfile
sirutBuasai Nov 19, 2025
72847f8
run using g6e
sirutBuasai Nov 19, 2025
af60e22
remove unnecessary wait
sirutBuasai Nov 19, 2025
ba3e0f3
rename tests
sirutBuasai Nov 19, 2025
1c9b4c4
use matrix srt test
sirutBuasai Nov 19, 2025
0a5b4f8
remove srt
sirutBuasai Nov 19, 2025
d6a8dc4
add pytest requirements
sirutBuasai Nov 19, 2025
81b905f
add add conftest
sirutBuasai Nov 19, 2025
fe346c8
test_utils
sirutBuasai Nov 19, 2025
9397f72
remove dup isort
sirutBuasai Nov 19, 2025
50c5f17
run test exampels
sirutBuasai Nov 19, 2025
d3d41a3
fix runner name
sirutBuasai Nov 19, 2025
f0b12ca
force rich terminal
sirutBuasai Nov 19, 2025
82bf96a
remove console width
sirutBuasai Nov 19, 2025
aad71f6
disable rich tracebacks
sirutBuasai Nov 19, 2025
0fc4b84
add columns
sirutBuasai Nov 19, 2025
da7b360
fix intall
sirutBuasai Nov 19, 2025
1055c4e
remove rich logger
sirutBuasai Nov 19, 2025
c9d8883
remove rich logger
sirutBuasai Nov 19, 2025
e1291b2
add sm endpoint
sirutBuasai Nov 19, 2025
649e203
source and install dependencies
sirutBuasai Nov 19, 2025
d54d5a6
reduce sleep time
sirutBuasai Nov 19, 2025
04b875e
use serve cmd instead
sirutBuasai Nov 19, 2025
da84a8e
activate venev
sirutBuasai Nov 19, 2025
2af0a46
add pytest cache
sirutBuasai Nov 19, 2025
ded0b44
enable debug
sirutBuasai Nov 20, 2025
31679f8
input image uri
sirutBuasai Nov 20, 2025
b1edba9
use assert in test instead
sirutBuasai Nov 20, 2025
a828916
remove f
sirutBuasai Nov 20, 2025
f6548dc
show test output
sirutBuasai Nov 20, 2025
dfce3f2
use ack test direction
sirutBuasai Nov 20, 2025
1f2a669
remove predictor
sirutBuasai Nov 20, 2025
38a5771
fix self error
sirutBuasai Nov 20, 2025
70397cc
fix input
sirutBuasai Nov 20, 2025
dee752b
remove model_id yield
sirutBuasai Nov 20, 2025
6d1e16b
fix get endpoint status
sirutBuasai Nov 20, 2025
84c8af9
add predictory class
sirutBuasai Nov 20, 2025
41f930c
change predictor
sirutBuasai Nov 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/actions/pr-permission-gate/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ inputs:
required-level:
description: Minimum permission level required (read|triage|write|maintain|admin)
default: write

runs:
using: "composite"
using: composite
steps:
- name: Check PR sender permission
uses: actions/github-script@v7
Expand Down
222 changes: 222 additions & 0 deletions .github/workflows/pr-sglang.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
name: PR - SGLang

on:
pull_request:
branches:
- main
paths:
- "docker/sglang/**"

permissions:
contents: read

concurrency:
group: pr-sglang-${{ github.event.pull_request.number }}
cancel-in-progress: true

jobs:
check-changes:
runs-on: ubuntu-latest
outputs:
sglang-sagemaker: ${{ steps.changes.outputs.sglang-sagemaker }}
steps:
- uses: actions/checkout@v5
- uses: actions/setup-python@v6
with:
python-version: "3.12"
- uses: pre-commit/[email protected]
with:
extra_args: --all-files
- name: Detect file changes
id: changes
uses: dorny/paths-filter@v3
with:
filters: |
sglang-sagemaker:
- "docker/sglang/Dockerfile"

build-sglang-image:
needs: [check-changes]
if: needs.check-changes.outputs.sglang-sagemaker == 'true'
runs-on:
- codebuild-runner-${{ github.run_id }}-${{ github.run_attempt }}
fleet:x86-build-runner
outputs:
image-uri: ${{ steps.image-uri-build.outputs.IMAGE_URI }}
steps:
- uses: actions/checkout@v5
- run: .github/scripts/runner_setup.sh
- run: .github/scripts/buildkitd.sh

- name: ECR login
uses: ./.github/actions/ecr-authenticate
with:
aws_region: ${{ vars.AWS_REGION }}
aws_account_id: ${{ vars.AWS_ACCOUNT_ID }}

- name: Resolve image URI for build
id: image-uri-build
run: |
IMAGE_URI=${{ vars.AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_REGION }}.amazonaws.com/ci:sglang-0.5.5-gpu-py312-cu129-ubuntu22.04-sagemaker-pr-${{ github.event.pull_request.number }}
echo "Image URI to build: $IMAGE_URI"
echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV
echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_OUTPUT

- name: Build image
run: |
docker buildx build --progress plain \
--build-arg CACHE_REFRESH="$(date +"%Y-%m-%d")" \
--cache-to=type=inline \
--cache-from=type=registry,ref=${IMAGE_URI} \
--tag ${IMAGE_URI} \
--target sglang-sagemaker \
-f docker/sglang/Dockerfile .

- name: Container push
run: |
docker push ${IMAGE_URI}
docker rmi ${IMAGE_URI}

sglang-local-benchmark-test:
needs: [build-sglang-image]
if: needs.build-sglang-image.result == 'success'
runs-on:
- codebuild-runner-${{ github.run_id }}-${{ github.run_attempt }}
fleet:x86-g6xl-runner
steps:
- name: Checkout DLC source
uses: actions/checkout@v5

- name: Container pull
uses: ./.github/actions/ecr-authenticate
with:
aws_region: ${{ vars.AWS_REGION }}
aws_account_id: ${{ vars.AWS_ACCOUNT_ID }}
image_uri: ${{ needs.build-sglang-image.outputs.image-uri }}

- name: Setup for SGLang datasets
run: |
mkdir -p /tmp/sglang/dataset
if [ ! -f /tmp/sglang/dataset/ShareGPT_V3_unfiltered_cleaned_split.json ]; then
echo "Downloading ShareGPT dataset..."
wget -P /tmp/sglang/dataset https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
else
echo "ShareGPT dataset already exists. Skipping download."
fi

- name: Start container
run: |
CONTAINER_ID=$(docker run -d -it --rm --gpus=all \
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
-v /tmp/sglang/dataset:/dataset \
-p 30000:30000 \
-e SM_SGLANG_MODEL_PATH=Qwen/Qwen3-0.6B \
-e SM_SGLANG_REASONING_PARSER=qwen3 \
-e SM_SGLANG_HOST=127.0.0.1 \
-e SM_SGLANG_PORT=30000 \
-e HF_TOKEN=${{ secrets.HUGGING_FACE_HUB_TOKEN }} \
${{ needs.build-sglang-image.outputs.image-uri }})
echo "CONTAINER_ID=$CONTAINER_ID" >> $GITHUB_ENV
echo "Waiting for container startup ..."
sleep 60s
docker logs ${CONTAINER_ID}

- name: Run SGLang tests
run: |
docker exec ${CONTAINER_ID} python3 -m sglang.bench_serving \
--backend sglang \
--host 127.0.0.1 --port 30000 \
--num-prompts 1000 \
--model Qwen/Qwen3-0.6B \
--dataset-name sharegpt \
--dataset-path /dataset/ShareGPT_V3_unfiltered_cleaned_split.json

- name: Cleanup container and images
if: always()
uses: ./.github/actions/container-cleanup
with:
container_id: ${{ env.CONTAINER_ID }}

sglang-lang-test:
needs: [build-sglang-image]
if: needs.build-sglang-image.result == 'success'
runs-on:
- codebuild-runner-${{ github.run_id }}-${{ github.run_attempt }}
fleet:x86-g6exl-runner
steps:
- name: Checkout DLC source
uses: actions/checkout@v5

- name: Container pull
uses: ./.github/actions/ecr-authenticate
with:
aws_region: ${{ vars.AWS_REGION }}
aws_account_id: ${{ vars.AWS_ACCOUNT_ID }}
image_uri: ${{ needs.build-sglang-image.outputs.image-uri }}

- name: Checkout SGLang tests
uses: actions/checkout@v5
with:
repository: sgl-project/sglang
ref: v0.5.5
path: sglang_source

- name: Start container
run: |
CONTAINER_ID=$(docker run -d -it --rm --gpus=all --entrypoint /bin/bash \
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
-v ./sglang_source:/workdir --workdir /workdir \
-e HF_TOKEN=${{ secrets.HUGGING_FACE_HUB_TOKEN }} \
${{ needs.build-sglang-image.outputs.image-uri }})
echo "CONTAINER_ID=$CONTAINER_ID" >> $GITHUB_ENV

- name: Setup for SGLang tests
run: |
docker exec ${CONTAINER_ID} sh -c '
set -eux

bash scripts/ci/ci_install_dependency.sh
'

- name: Run SGLang tests
run: |
docker exec ${CONTAINER_ID} sh -c '
set -eux
nvidia-smi

# Frontend Test
cd /workdir/test/lang
python3 run_suite.py --suite per-commit
'

- name: Cleanup container and images
if: always()
uses: ./.github/actions/container-cleanup
with:
container_id: ${{ env.CONTAINER_ID }}

sglang-sagemaker-test:
needs: [build-sglang-image]
if: needs.build-sglang-image.result == 'success'
runs-on:
- codebuild-runner-${{ github.run_id }}-${{ github.run_attempt }}
fleet:default-runner
steps:
- name: Checkout DLC source
uses: actions/checkout@v5

- name: Install test dependencies
run: |
uv venv
source .venv/bin/activate

uv pip install -r test/requirements.txt
uv pip install -r test/sglang/sagemaker/requirements.txt

- name: Run sagemaker tests
env:
FORCE_COLOR: "1"
run: |
source .venv/bin/activate
cd test/
python3 -m pytest -vsrP --image-uri ${{ needs.build-sglang-image.outputs.image-uri }} sglang/sagemaker
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ __pycache__
.idea
*.pyc
.venv
.ruff_cache
.ruff_cache
.pytest_cache
Loading