feat: support MoE model in SLA Planner Sglang #3185

tedzhouhk · 2025-09-23T20:31:45Z

For DS-liked MoE model, sla planner:

use TEP for prefill
use DEP for decode (will support TEP later)
scaling number of replicas (will scale XEP size later).

This PR:

add a --is-moe-model flag to the pre-deployment sweeping to support TEP/DEP sweep.
no change in sla planner logic (because if we assume a good load balance, TEP prefill is equivalent to TP, while DEP decode is equivalent to TP thanks to in-flight batching)
many misc changes to support multi-node sweep.

Summary by CodeRabbit

New Features
- MoE-aware profiling for prefill/decode with GPU-count sweeps and multinode support.
- New CLI flags: --is-moe-model, --num-gpus-per-node; decode range auto-calculation; added --attention_dp_size.
- Added Kubernetes Job to run MoE profiling.
- Plots now label by GPU count.
Changes
- Removed CLI flags: --isl, --osl.
- Deployment names now include a random suffix; default frontend service name follows it.
- Simplified SGLang service names to “prefill” and “decode”.
Documentation
- Expanded pre-deployment profiling guide with MoE workflows.
Tests
- Added MoE SGLang dry-run coverage.

Signed-off-by: hongkuanz <[email protected]>

…/sglang-dsr1-sweep Signed-off-by: hongkuanz <[email protected]>

Signed-off-by: hongkuanz <[email protected]>

coderabbitai · 2025-09-23T20:36:36Z

Walkthrough

Adds MoE-aware profiling across profiler codepaths, introduces attention_dp_size and GPU-count sweeps, updates config modifiers with DEP/TEP and multinode support, renames SGLang service keys to short names, adds a MoE profiling Job manifest, adjusts decode request range logic, updates docs, Dockerfile, deployment naming behavior, and adds MoE dry-run tests.

Changes

Cohort / File(s)	Summary
Profiler CLI & entrypoints `benchmarks/profiler/profile_endpoint.py`, `benchmarks/profiler/profile_sla.py`	Adds --attention_dp_size (endpoint) and MoE flags (--is-moe-model, --num-gpus-per-node). Refactors profiling to sweep by GPU count; wires attention_dp_size; updates logging/paths and calls to prefill/decode with MoE-aware params.
Profiler utils: config & decode logic `benchmarks/profiler/utils/config.py`, `benchmarks/profiler/utils/profile_decode.py`, `benchmarks/profiler/utils/defaults.py`, `benchmarks/profiler/utils/plot.py`	Extends ConfigModifierProtocol to support MoE (set_config_tep_size/dep_size, is_moe_model) and multinode. Adds get_num_request_range and attention_dp_size plumbed through decode. Replaces fixed decode range with capped range (DECODE_MAX_CONCURRENCY). Plotting now labels by GPU count.
K8s manifests: SGLang services rename `components/backends/sglang/deploy/agg.yaml`, `.../agg_logging.yaml`, `.../agg_router.yaml`, `.../disagg.yaml`, `.../disagg_planner.yaml`	Renames spec.services keys: SGLangDecodeWorker→decode, SGLangPrefillWorker→prefill (where applicable). In agg_router, adds replicas/resources/extraPodSpec and command args.
Profiler MoE Job `benchmarks/profiler/deploy/profile_sla_moe_job.yaml`	Adds Kubernetes Job profile-sla for MoE profiling with env, secrets, PVC mount, args, and serviceAccount.
Planner constants/CLI `components/planner/src/dynamo/planner/defaults.py`, `components/planner/src/dynamo/planner/utils/planner_argparse.py`	Shortens SGLang component names to "prefill"/"decode". Removes --isl and --osl from SLA planner argparse.
Deployment helper `deploy/utils/dynamo_deployment.py`	Appends short UUID to deployment_name; default service name updated accordingly; removes Grove-disable annotation.
Dockerfile `container/Dockerfile.sglang-wideep`	Uses workspace copy instead of git clone; installs deps via container/deps/requirements.txt; adjusts PATH for cargo.
Docs `docs/benchmarks/pre_deployment_profiling.md`	Documents MoE profiling (TEP/DEP), MoE job manifest, workflow steps updates, and notes.
Tests `tests/profiler/test_profile_sla_dryrun.py`	Adds MoE SGLang dry-run fixture and async test for run_profile.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant PS as profile_sla.py
  participant CM as ConfigModifier (SGLang/VLLM/TRTLLM)
  participant PF as Prefill Profiler
  participant DC as Decode Profiler
  participant BK as Backend Engine
  participant ST as PVC (/data)

  U->>PS: run_profile(args)
  PS->>PS: Parse flags (is_moe_model, num_gpus_per_node)
  alt MoE
    PS->>CM: convert_config(..., is_moe_model=True)
    PS->>CM: set_config_tep_size / set_config_dep_size(num_gpus, per-node)
    PS->>DC: profile_decode(..., attention_dp_size=num_gpus)
  else Dense
    PS->>CM: convert_config(..., is_moe_model=False)
    PS->>CM: set_config_tp_size(tp_size)
    PS->>DC: profile_decode(..., attention_dp_size=1)
  end
  PS->>PF: profile_prefill(...)
  par Prefill
    PF->>BK: deploy & run prefill
    BK-->>PF: TTFT/throughput
  and Decode
    DC->>DC: get_num_request_range(attn_dp, max_conc, granularity)
    DC->>BK: deploy & run decode sweep
    BK-->>DC: ITL/throughput
  end
  PS->>ST: Write results/plots
  PS-->>U: Summary JSON/plots

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

feat: support SGLang in pre-deployment sweeping #2360 — Touches profiler stack and CONFIG_MODIFIERS; overlaps with MoE-aware config and profiling pathways.
chore: add instructions to modify SLA to profile_sla doc; update component name #2167 — Updates the profile_sla job and CLI wiring; related to adding a MoE-specific profiling Job.
feat: support sglang in sla planner #2421 — Adjusts SGLang planner/component naming; overlaps with this PR’s rename to prefill/decode.

Poem

A whisk of bytes, a hop through GPUs wide,
I map my trails: DEP and TEP beside.
Decode counts bounded, ranges neat,
Short names on pods make routes petite.
With PVC burrow, logs I stow—
MoE meadows, here I go! 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description describes the high-level intent and main behavioral changes for MoE support but does not follow the repository's required template: it is missing the "Overview", "Details", "Where should the reviewer start?" and "Related Issues" sections and lacks explicit pointers to key files and tests, so it does not meet the repository template requirements.	Please update the PR description to match the repository template by adding an "Overview" one-line summary, a "Details" section listing the key changes and affected files (for example benchmarks/profiler/profile_sla.py, benchmarks/profiler/utils/config.py, components/planner/...), a "Where should the reviewer start?" section pointing to critical files and tests, and a "Related Issues" section with any issue numbers (e.g., closes #xxx); including brief commands or test guidance will also help reviewers.
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.26% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "feat: support MoE model in SLA Planner Sglang" is concise and accurately summarizes the primary change: adding MoE support to the SLA Planner Sglang workflow (e.g., the --is-moe-model flag and MoE-aware profiling/config updates), so it communicates the main intent clearly for reviewers scanning history.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…/sglang-dsr1-sweep Signed-off-by: hongkuanz <[email protected]>

Signed-off-by: hongkuanz <[email protected]>

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

benchmarks/profiler/utils/profile_decode.py (1)

143-170: Regression: _profile_decode_helper signature change breaks profile_decode_aiconfigurator.

Missing attention_dp_size argument causes TypeError at runtime.

Apply:

-def profile_decode_aiconfigurator(
+def profile_decode_aiconfigurator(
     work_dir,
     num_gpus,
     max_kv_tokens,
     max_context_length,
     interpolation_granularity,
     ai_configurator_perf_estimator: AIConfiguratorPerfEstimator,
     **model_config_kwargs,
 ):
@@
-    return _profile_decode_helper(
+    return _profile_decode_helper(
         work_dir,
         num_gpus,
         max_kv_tokens,
         max_context_length,
         interpolation_granularity,
         get_itl_and_thpt_per_gpu,
-    )
+        # For dense models attention_dp_size=1; MoE path doesn't use AI Configurator.
+        model_config_kwargs.get("attention_dp_size", 1) if "attention_dp_size" in model_config_kwargs else 1,
+    )

Optionally, make the parameter explicit:

-    ai_configurator_perf_estimator: AIConfiguratorPerfEstimator,
+    ai_configurator_perf_estimator: AIConfiguratorPerfEstimator,
+    attention_dp_size: int = 1,
@@
-        get_itl_and_thpt_per_gpu,
-        # For dense models attention_dp_size=1; MoE path doesn't use AI Configurator.
-        model_config_kwargs.get("attention_dp_size", 1) if "attention_dp_size" in model_config_kwargs else 1,
+        get_itl_and_thpt_per_gpu,
+        attention_dp_size,

benchmarks/profiler/utils/config.py (3)

552-558: Bug: replicas set on raw dict never makes it into the returned config

You mutate config[...] then return cfg.model_dump(), so replicas stays unchanged. Set it on cfg instead.

-        # set num workers to 1
-        decode_worker_config = config["spec"]["services"][
-            WORKER_COMPONENT_NAMES["sglang"].decode_worker_k8s_name
-        ]
-        decode_worker_config["replicas"] = 1
-
-        return cfg.model_dump()
+        # set num workers to 1
+        decode_worker_config = cfg.spec.services[
+            WORKER_COMPONENT_NAMES["sglang"].decode_worker_k8s_name
+        ]
+        decode_worker_config.replicas = 1
+
+        return cfg.model_dump()

451-454: Fix NameError in exception logging (line variable may be undefined)

On errors before the loop, line is undefined. Log the filename instead.

-        except Exception as e:
-            logger.warning(
-                f"Failed to parse KV cache size from line: {line}. Error: {e}"
-            )
+        except Exception as e:
+            logger.warning(
+                "Failed to parse KV cache size from log file %s. Error: %s",
+                dynamo_log_fn,
+                e,
+            )

274-281: Make removal of flag robust: avoid ValueError if flag missing

Use a presence check before remove.

-            # remove --is-prefill-worker flag
-            args.remove("--is-prefill-worker")
+            # remove --is-prefill-worker flag
+            if "--is-prefill-worker" in args:
+                args.remove("--is-prefill-worker")

🧹 Nitpick comments (15)

docs/benchmarks/pre_deployment_profiling.md (3)

66-68: Clarify TEP-only prefill stance for MoE.

Consider adding one sentence explaining why DEP prefill is suboptimal (e.g., memory-bound tradeoffs), to avoid confusion for users comparing with dense TP.

72-74: Future TEP decode note is helpful; link to tracker/issue if available.

If there’s a tracking issue/PR for TEP decode support, add the link here for discoverability.

124-131: Align naming between docs and CLI: map DEP/TEP to attention_dp_size.

You introduced attention_dp_size in the CLI; add a parenthetical here mapping DEP to attention_dp_size to reduce terminology drift.
deploy/utils/dynamo_deployment.py (1)
142-193: Port-forward robustness: add --address and a readiness check.

To avoid binding on all interfaces and flaky sleeps:

Add "--address 127.0.0.1".

Probe the local URL before returning instead of fixed sleep.

Apply this diff:
         cmd = [
             "kubectl",
             "port-forward",
             f"svc/{self.service_name}",
             f"{local_port}:{self.frontend_port}",
+            "--address",
+            "127.0.0.1",
             "-n",
             self.namespace,
         ]
@@
-        print("Waiting for port forward to establish...")
-        time.sleep(3)
+        print("Waiting for port forward to establish...")
+        start = time.time()
+        while time.time() - start < 15:
+            try:
+                import socket as _s
+                with _s.create_connection(("127.0.0.1", local_port), timeout=0.5):
+                    break
+            except Exception:
+                time.sleep(0.2)
benchmarks/profiler/profile_endpoint.py (1)

82-87: New --attention_dp_size flag: document allowed values and interactions.

Add help text clarifying relation to DEP (e.g., DEP size), valid ranges, and when it’s used (decode/MoE only).

container/Dockerfile.sglang-wideep (1)

61-61: Exclude common local/build/test files from the Docker context (update .dockerignore)

Your .dockerignore blocks many model artifacts and .git but is missing common patterns that reduce image size and cache churn — add entries like:

.venv/

venv/

pycache/

*.pyc

build/

dist/

*.egg-info/

.pytest_cache/

tests/

node_modules/

Location: .dockerignore (repo root) — add the above patterns.
benchmarks/profiler/utils/defaults.py (1)
19-21: Make DECODE_MAX_CONCURRENCY configurable (env override).

Hard-coding 2000 is fine as a cap; allow env override to tune without code changes.

Apply:
-# for MoE models with attn-dp, we might hit this limit
-DECODE_MAX_CONCURRENCY = 2000
+# for MoE models with attn-dp, we might hit this limit
+import os  # at top of file if not present
+DECODE_MAX_CONCURRENCY = int(os.getenv("DECODE_MAX_CONCURRENCY", "2000"))
Add the import near other imports:
import os
benchmarks/profiler/deploy/profile_sla_moe_job.yaml (1)
1-61: Harden pod security context; optionally pin num-gpus-per-node.

Add non-root, drop caps, and disallow privilege escalation. Consider passing --num-gpus-per-node explicitly to document granularity.

Apply:
       - name: profile-sla
         image: ${DOCKER_IMAGE}
         resources:
           requests:
             cpu: "32"
             memory: "50Gi"
+        securityContext:
+          runAsNonRoot: true
+          runAsUser: 1000
+          allowPrivilegeEscalation: false
+          readOnlyRootFilesystem: true
+          capabilities:
+            drop: ["ALL"]
         env:
@@
           - --is-moe-model
+          - --num-gpus-per-node
+          - "8"
If your cluster enforces Pod Security Standards, confirm these settings comply with your admission controller.
benchmarks/profiler/utils/config.py (7)
439-449: Robust KV concurrency parsing (avoid brittle splits and slicing)

Current parsing relies on string splits and [:-1], risking wrong values if format changes. Prefer regex.
-                    if "Maximum concurrency for" in line:
-                        line = line.strip().split("Maximum concurrency for ")[1]
-                        token_count = int(
-                            line.split(" tokens per request: ")[0].replace(",", "")
-                        )
-                        concurrency = float(line.split(" tokens per request: ")[1][:-1])
-
-                        logger.info(
-                            f"Found KV cache info: {token_count} x {concurrency} = {int(token_count * concurrency)}"
-                        )
-                        return int(token_count * concurrency)
+                    if "Maximum concurrency for" in line:
+                        m = re.search(
+                            r"Maximum concurrency for\s+([\d,]+)\s+tokens per request:\s*([\d.]+)",
+                            line,
+                        )
+                        if m:
+                            token_count = int(m.group(1).replace(",", ""))
+                            concurrency = float(m.group(2))
+                            total = int(token_count * concurrency)
+                            logger.info("Found KV cache info: %d x %s = %d", token_count, m.group(2), total)
+                            return total
788-802: Narrow broad exception in KV-cache parsing

Avoid catching bare Exception; use specific exceptions.
-        except Exception as e:
-            logger.warning(f"Failed to parse KV cache size from log file. Error: {e}")
+        except (OSError, ValueError) as e:
+            logger.warning("Failed to parse KV cache size from log file. Error: %s", e)
432-436: Silence unused parameter in VLLM KV-cache method (ruff ARG003)

Parameter is part of the Protocol but unused here. Add a no-op reference.
     def get_kv_cache_size_from_dynamo_log(
         cls, dynamo_log_fn: str, attention_dp_size: int = 1
     ) -> int:
+        _ = attention_dp_size  # unused in vllm backend; kept for protocol compliance
1048-1077: Silence unused parameter in TRT-LLM KV-cache method (ruff ARG003)

Parameter is part of the Protocol but unused here. Add a no-op reference.
     def get_kv_cache_size_from_dynamo_log(
         cls, dynamo_log_fn: str, attention_dp_size: int = 1
     ) -> int:
+        _ = attention_dp_size  # unused in trtllm backend; kept for protocol compliance
         # TRT-LLM log parsing for KV cache size
31-39: Prevent duplicate log handlers on repeated imports

Wrap handler registration to avoid duplicated logs.
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.INFO)
-console_handler = logging.StreamHandler()
-console_handler.setLevel(logging.INFO)
-formatter = logging.Formatter(
-    "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S"
-)
-console_handler.setFormatter(formatter)
-logger.addHandler(console_handler)
+if not logger.handlers:
+    console_handler = logging.StreamHandler()
+    console_handler.setLevel(logging.INFO)
+    formatter = logging.Formatter(
+        "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S"
+    )
+    console_handler.setFormatter(formatter)
+    logger.addHandler(console_handler)
57-61: Type the multinode field to avoid relying on dynamic extra assignment

Relying on extra fields risks assignment issues under Pydantic v2. Recommend adding multinode: Optional[MultinodeConfig] = None to Service and ensuring the type is defined before Service.

Option A (preferred): move MultinodeConfig above Service and add a typed field.

Option B: add a forward reference with from __future__ import annotations and use multinode: Optional["MultinodeConfig"].

Would you like me to generate the exact diff (including moving the class)?

Also applies to: 83-85, 167-184, 602-664, 666-731

102-109: Optional: remove all occurrences when stripping valued args

remove_valued_arguments removes only the first pair. If repeated, consider looping.
 def remove_valued_arguments(args: list[str], key: str) -> list[str]:
-    """Remove a valued argument (e.g., --key value) from the arguments list if exists."""
-    if key in args:
-        idx = args.index(key)
-        if idx + 1 < len(args):
-            del args[idx : idx + 2]
+    """Remove all occurrences of a valued argument (e.g., --key value)."""
+    while True:
+        try:
+            idx = args.index(key)
+            if idx + 1 < len(args):
+                del args[idx : idx + 2]
+            else:
+                del args[idx]
+        except ValueError:
+            break
     return args

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a71802 and 76e77aa.

📒 Files selected for processing (18)

benchmarks/profiler/deploy/profile_sla_moe_job.yaml (1 hunks)
benchmarks/profiler/profile_endpoint.py (2 hunks)
benchmarks/profiler/profile_sla.py (22 hunks)
benchmarks/profiler/utils/config.py (14 hunks)
benchmarks/profiler/utils/defaults.py (1 hunks)
benchmarks/profiler/utils/plot.py (2 hunks)
benchmarks/profiler/utils/profile_decode.py (6 hunks)
components/backends/sglang/deploy/agg.yaml (1 hunks)
components/backends/sglang/deploy/agg_logging.yaml (1 hunks)
components/backends/sglang/deploy/agg_router.yaml (1 hunks)
components/backends/sglang/deploy/disagg.yaml (2 hunks)
components/backends/sglang/deploy/disagg_planner.yaml (2 hunks)
components/planner/src/dynamo/planner/defaults.py (1 hunks)
components/planner/src/dynamo/planner/utils/planner_argparse.py (0 hunks)
container/Dockerfile.sglang-wideep (2 hunks)
deploy/utils/dynamo_deployment.py (2 hunks)
docs/benchmarks/pre_deployment_profiling.md (4 hunks)
tests/profiler/test_profile_sla_dryrun.py (1 hunks)

💤 Files with no reviewable changes (1)

components/planner/src/dynamo/planner/utils/planner_argparse.py

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-30T20:43:49.632Z

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

container/Dockerfile.sglang-wideep

📚 Learning: 2025-08-15T02:01:01.238Z

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2453
File: deploy/dynamo_check.py:739-741
Timestamp: 2025-08-15T02:01:01.238Z
Learning: In the dynamo project, missing cargo should be treated as a fatal error in dynamo_check.py because developers need cargo to build the Rust components. The tool is designed to validate the complete development environment, not just import functionality.

Applied to files:

container/Dockerfile.sglang-wideep

🧬 Code graph analysis (3)

tests/profiler/test_profile_sla_dryrun.py (2)

tests/profiler/test_profile_sla_aiconfigurator.py (1)

Args (28-49)

benchmarks/profiler/profile_sla.py (1)

run_profile (63-708)

benchmarks/profiler/profile_sla.py (4)

benchmarks/profiler/utils/profile_decode.py (2)

get_num_request_range (25-37)

profile_decode_aiconfigurator (143-169)

benchmarks/profiler/utils/config.py (20)

convert_config (188-194)

convert_config (229-319)

convert_config (459-557)

convert_config (807-932)

set_config_tep_size (201-204)

set_config_tep_size (365-368)

set_config_tep_size (603-663)

set_config_tep_size (980-983)

set_config_tp_size (197-198)

set_config_tp_size (322-362)

set_config_tp_size (560-600)

set_config_tp_size (935-977)

set_config_dep_size (207-210)

set_config_dep_size (371-374)

set_config_dep_size (666-730)

set_config_dep_size (986-989)

get_kv_cache_size_from_dynamo_log (221-224)

get_kv_cache_size_from_dynamo_log (433-454)

get_kv_cache_size_from_dynamo_log (789-802)

get_kv_cache_size_from_dynamo_log (1049-1077)

benchmarks/profiler/utils/profile_cache.py (4)

check_prefill_results_exist (26-53)

load_existing_prefill_results (91-108)

check_decode_results_exist (56-88)

load_existing_decode_results (111-138)

benchmarks/profiler/utils/plot.py (1)

plot_prefill_performance (34-71)

benchmarks/profiler/utils/config.py (2)

components/backends/vllm/src/dynamo/vllm/args.py (1)

Config (39-64)

components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (1)

Config (28-93)

🪛 Ruff (0.13.1)

tests/profiler/test_profile_sla_dryrun.py

141-141: Probable insecure usage of temporary file or directory: "/tmp/test_profiling_results"

(S108)

benchmarks/profiler/utils/config.py

434-434: Unused class method argument: attention_dp_size

(ARG003)

633-635: Avoid specifying long messages outside the exception class

(TRY003)

696-698: Avoid specifying long messages outside the exception class

(TRY003)

800-800: Do not catch blind exception: Exception

(BLE001)

1050-1050: Unused class method argument: attention_dp_size

(ARG003)

🪛 markdownlint-cli2 (0.18.1)

docs/benchmarks/pre_deployment_profiling.md

162-162: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Checkov (3.2.334)

benchmarks/profiler/deploy/profile_sla_moe_job.yaml

[medium] 3-61: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 3-61: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - dynamo
GitHub Check: Build and Test - trtllm
GitHub Check: Build and Test - sglang
GitHub Check: Mirror Repository to GitLab

🔇 Additional comments (13)

docs/benchmarks/pre_deployment_profiling.md (1)

16-16: Support matrix update for SGLang MoE looks correct.

Good to mark SGLang MoE as supported to reflect DEP/TEP paths.

components/backends/sglang/deploy/disagg_planner.yaml (2)

132-163: Service key rename to prefill is consistent.

No functional changes beyond the key; args remain valid.

101-132: Service key rename to decode is consistent.
SGLang deployment manifests use short decode/prefill service keys across the repo; docs still mention SGLangDecodeWorker/SGLangPrefillWorker but that's documentation-only — no changes required.

components/backends/sglang/deploy/disagg.yaml (2)

17-43: decode key rename LGTM.

Args align with disaggregation-mode=decode.

44-70: prefill key rename LGTM.

Args align with disaggregation-mode=prefill.

components/backends/sglang/deploy/agg_logging.yaml (1)

20-43: decode key rename LGTM.

Consistent with the broader rename; no behavior change.

benchmarks/profiler/profile_endpoint.py (1)

104-115: Pass-through confirmed — signature matches: benchmarks/profiler/utils/profile_decode.py defines profile_decode(..., attention_dp_size) and benchmarks/profiler/profile_endpoint.py passes args.attention_dp_size.

benchmarks/profiler/utils/plot.py (2)

35-52: LGTM: GPU-count labeling for prefill.

API and annotations updated coherently.

75-87: LGTM: GPU-count labeling for decode.

Legend and loop variables align with new semantics.

tests/profiler/test_profile_sla_dryrun.py (1)

166-169: MoE recipe path verified — no action needed.
Found: recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml

components/backends/sglang/deploy/agg_router.yaml (1)

20-44: Consistent decode rename — confirm router & service selectors use decode

Verify router logic, Kubernetes Service selectors, and any consumer code reference decode (not the previous service name). Check: components/backends/sglang/deploy/README.md (SGLangDecodeWorker / decode-only worker), docs/kubernetes/README.md:119, docs/architecture/distributed_runtime.md:36.

components/backends/sglang/deploy/agg.yaml (1)

17-17: Rename to decode looks good — ensure mappings & docs updated.

No WORKER_COMPONENT_NAMES mapping for decode_worker_k8s_name was found; repo still contains SGLangDecodeWorker references:

docs/kubernetes/README.md:119

docs/architecture/distributed_runtime.md:36

components/backends/sglang/deploy/README.md:12,19,26

Ensure WORKER_COMPONENT_NAMES maps decode_worker_k8s_name -> "decode" and update any consumers/log paths/docs that still expect SGLangDecodeWorker.

benchmarks/profiler/utils/config.py (1)

167-184: Verify Pydantic behavior: does worker_service.multinode = MultinodeConfig(...) persist in model_dump()?

If Pydantic rejects dynamic field assignment, multinode won’t appear in the emitted config. Consider the typed-field change above.

Run a quick check in your environment to confirm assignment persists through cfg.model_dump() for this model shape.

benchmarks/profiler/profile_sla.py

benchmarks/profiler/utils/config.py

benchmarks/profiler/utils/profile_decode.py

components/planner/src/dynamo/planner/defaults.py

container/Dockerfile.sglang-wideep

deploy/utils/dynamo_deployment.py

docs/benchmarks/pre_deployment_profiling.md

tests/profiler/test_profile_sla_dryrun.py

benchmarks/profiler/profile_endpoint.py

Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>

Signed-off-by: hongkuanz <[email protected]>

…o/dynamo into hzhou/sglang-dsr1-sweep

Signed-off-by: hongkuanz <[email protected]>

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Jason Zhou <[email protected]>

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]>

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Jason Zhou <[email protected]>

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]>

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Kyle H <[email protected]>

tedzhouhk added 12 commits September 19, 2025 10:16

rename

68714fd

Signed-off-by: hongkuanz <[email protected]>

stage

1c41d56

Signed-off-by: hongkuanz <[email protected]>

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

4a73121

…/sglang-dsr1-sweep Signed-off-by: hongkuanz <[email protected]>

stage

db737f6

Signed-off-by: hongkuanz <[email protected]>

add test

2a42aea

Signed-off-by: hongkuanz <[email protected]>

add test

5b88f60

Signed-off-by: hongkuanz <[email protected]>

delete

330a611

Signed-off-by: hongkuanz <[email protected]>

bug fixes

684f7c9

Signed-off-by: hongkuanz <[email protected]>

enable grove

a895bc2

Signed-off-by: hongkuanz <[email protected]>

bug

a6f70a5

Signed-off-by: hongkuanz <[email protected]>

remove isl/osl in planner arg

0b64cda

Signed-off-by: hongkuanz <[email protected]>

update doc and plot script

4fb5294

Signed-off-by: hongkuanz <[email protected]>

tedzhouhk requested review from a team as code owners September 23, 2025 20:31

pull-request-size bot added the size/XL label Sep 23, 2025

github-actions bot added the feat label Sep 23, 2025

pc

18c0fae

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 23, 2025 20:32 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 23, 2025 20:33 Inactive

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

bad3f5c

…/sglang-dsr1-sweep Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 23, 2025 20:37 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 23, 2025 20:38 Inactive

pc

76e77aa

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 23, 2025 20:43 Inactive

doclink

88105cb

Signed-off-by: hongkuanz <[email protected]>

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

PeaBrane reviewed Sep 23, 2025

View reviewed changes

benchmarks/profiler/profile_endpoint.py Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to GITLAB September 23, 2025 23:52 Inactive

Update benchmarks/profiler/utils/plot.py

fb897c1

Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 23, 2025 23:53 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 23, 2025 23:58 Inactive

tedzhouhk added 2 commits September 23, 2025 17:01

better code

ff72609

Signed-off-by: hongkuanz <[email protected]>

Merge branch 'hzhou/sglang-dsr1-sweep' of https://github.com/ai-dynam…

5f428dd

…o/dynamo into hzhou/sglang-dsr1-sweep

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 00:01 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 00:03 Inactive

pc

3acb775

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 00:04 Inactive

hhzhang16 approved these changes Sep 24, 2025

View reviewed changes

add todo

fbac34f

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 00:07 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 00:09 Inactive

mypy

aead4d3

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 01:16 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 01:17 Inactive

update test

82595a9

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 02:07 Inactive

pc

17d3349

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 02:08 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 02:14 Inactive

tedzhouhk merged commit 6243bcb into main Sep 24, 2025
15 of 18 checks passed

tedzhouhk deleted the hzhou/sglang-dsr1-sweep branch September 24, 2025 02:42

athreesh pushed a commit that referenced this pull request Sep 24, 2025

feat: support MoE model in SLA Planner Sglang (#3185)

40054be

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]>

coderabbitai bot mentioned this pull request Sep 24, 2025

feat: add subComponentType in DGD API and uptake in planner #3200

Merged

jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025

feat: support MoE model in SLA Planner Sglang (#3185)

c605e8a

Signed-off-by: hongkuanz <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support MoE model in SLA Planner Sglang #3185

feat: support MoE model in SLA Planner Sglang #3185

Uh oh!

tedzhouhk commented Sep 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 23, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: support MoE model in SLA Planner Sglang #3185

feat: support MoE model in SLA Planner Sglang #3185

Uh oh!

Conversation

tedzhouhk commented Sep 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tedzhouhk commented Sep 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 23, 2025 •

edited

Loading