Skip to content

Conversation

@lsy323
Copy link
Collaborator

@lsy323 lsy323 commented Jun 3, 2025

Somehow the test have been hanging. Buildkite log

This makes each TPU CI run take 4 hr, disable it to unblock CI.


tests/v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output_with_reasoning_matrices[Qwen/Qwen3-1.7B-xgrammar-auto-deepseek_r1-None] INFO 06-03 22:27:53 [config.py:822] This model supports multiple tasks: {'embed', 'classify', 'reward', 'generate', 'score'}. Defaulting to 'generate'.
--
  | INFO 06-03 22:27:53 [config.py:1967] Disabled the custom all-reduce kernel because it is not supported on current platform.
  | INFO 06-03 22:27:53 [config.py:2176] Chunked prefill is enabled with max_num_batched_tokens=8192.
  | INFO 06-03 22:27:53 [tpu.py:105] [TPU] Forcing DYNAMO_ONCE compilation level
  | huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
  | To disable this warning, you can either:
  | - Avoid using `tokenizers` before the fork if possible
  | - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true \| false)
  | INFO 06-03 22:27:55 [core.py:455] Waiting for init message from front-end.
  | INFO 06-03 22:27:55 [tpu.py:105] [TPU] Forcing DYNAMO_ONCE compilation level
  | INFO 06-03 22:27:55 [core.py:70] Initializing a V1 LLM engine (v0.9.1.dev143+gfa98d7777) with config: model='Qwen/Qwen3-1.7B', speculative_config=None, tokenizer='Qwen/Qwen3-1.7B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=None, decoding_config=DecodingConfig(backend='xgrammar', disable_fallback=False, disable_any_whitespace=True, disable_additional_properties=False, reasoning_backend='deepseek_r1'), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-1.7B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":2,"debug_dump_path":"","cache_dir":"","backend":"openxla","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null}
  | INFO 06-03 22:27:55 [tpu_worker.py:294] tpu_commons not found, using vLLM's TPUWorker.
  | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
  | INFO 06-03 22:27:55 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
  | WARNING 06-03 22:28:01 [tpu.py:178] Pin memory is not supported on TPU.
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1574] Using exponential token paddings:
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     16
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     32
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     64
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     128
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     256
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     512
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     1024
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     2048
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     4096
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1576]     8192
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1540] Preparing request paddings:
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1547]     8
  | INFO 06-03 22:28:01 [tpu_model_runner.py:1547]     16
  | INFO 06-03 22:28:01 [tpu_model_runner.py:969] Loading model from scratch...
  | INFO 06-03 22:28:01 [tpu.py:51] Cannot use None backend on TPU.
  | INFO 06-03 22:28:01 [tpu.py:54] Using Pallas V1 backend.
  | INFO 06-03 22:28:02 [weight_utils.py:292] Using model weights format ['*.safetensors']


Signed-off-by: Siyuan Liu <[email protected]>
@github-actions
Copy link

github-actions bot commented Jun 3, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @lsy323, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR addresses an issue in the TPU CI pipeline where a specific test, test_structured_output_with_reasoning_matrices within test_struct_output_generate.py, has been hanging. This hang is causing the TPU CI runs to take an excessive amount of time (around 4 hours). To unblock the CI and allow other changes to be tested efficiently, this pull request temporarily skips the problematic test case in the TPU CI script.

Highlights

  • CI Fix: Skips a specific test (test_structured_output_with_reasoning_matrices) in the TPU CI buildkite script because it is currently hanging and significantly increasing CI run times.

Changelog

  • .buildkite/scripts/hardware_ci/run-tpu-v1-test.sh
    • Modified the pytest command for test_struct_output_generate.py (line 153) to exclude the test_structured_output_with_reasoning_matrices test function using the -k 'not ...' flag.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Test hangs, CI waits,
Four hours pass, build fails slow,
Skip the test, move on.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the ci/build label Jun 3, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR provides a clear and effective solution to unblock the TPU CI by skipping the identified hanging test (test_structured_output_with_reasoning_matrices). The change is well-targeted and the PR description clearly explains the rationale. My main suggestion focuses on adding a comment to the script for future maintainability. Thank you for addressing this CI issue promptly!

Summary of Findings

  • In-script Documentation for Test Skip: To improve long-term maintainability, it's beneficial to add a comment in the CI script explaining why a specific test is being skipped. This comment should ideally reference the PR or an issue tracking the problem, providing context for future developers and facilitating future re-evaluation of the skip.

Merge Readiness

This pull request effectively addresses the immediate issue of CI hangs by skipping the problematic test. The change is minimal and targeted.
To enhance maintainability, I've suggested adding a comment in the script to document the reason for this test skip. This is a medium-severity suggestion aimed at ensuring the context for this temporary measure isn't lost over time.

While I am not authorized to approve pull requests, I recommend addressing this suggestion to improve the script's clarity. The PR is otherwise a sensible step to unblock the CI pipeline. It's also important that the underlying issue causing the test to hang is tracked and investigated separately, with the goal of eventually re-enabling this test.

"python3 -m pytest -s -v /workspace/vllm/tests/v1/tpu/test_pallas.py"
run_and_track_test 11 "test_struct_output_generate.py" \
"python3 -m pytest -s -v /workspace/vllm/tests/v1/entrypoints/llm/test_struct_output_generate.py"
"python3 -m pytest -s -v /workspace/vllm/tests/v1/entrypoints/llm/test_struct_output_generate.py -k 'not test_structured_output_with_reasoning_matrices'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This modification to skip the test test_structured_output_with_reasoning_matrices is a good immediate step to unblock the CI pipeline, given the hanging issue described.

For long-term maintainability and context for other developers (or your future self!), would you consider adding a shell comment immediately before this run_and_track_test invocation or on the line before this specific command? This comment could briefly explain why the test is skipped and reference this PR or a tracking issue for the hang.

For example:

# Temporarily skipping test_structured_output_with_reasoning_matrices due to CI hangs.
# See PR #<this_pr_number> or issue #<issue_number_if_any> for details.
# Original command: python3 -m pytest -s -v /workspace/vllm/tests/v1/entrypoints/llm/test_struct_output_generate.py
run_and_track_test 11 "test_struct_output_generate.py" \
    "python3 -m pytest -s -v /workspace/vllm/tests/v1/entrypoints/llm/test_struct_output_generate.py -k 'not test_structured_output_with_reasoning_matrices'"

Adding such a comment would make it easier to track and eventually revisit this skip. What are your thoughts on this?

@mgoin mgoin added tpu Related to Google TPUs ready ONLY add when PR is ready to merge/full CI is needed labels Jun 3, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the current issue

@mergify mergify bot removed the tpu Related to Google TPUs label Jun 3, 2025
Copy link
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@yaochengji
Copy link
Collaborator

Oh, we got


5.200       ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
--
  | 5.200           torch==2.7.0 from https://files.pythonhosted.org/packages/cc/2c/91d1de65573fce563f5284e69d9c56b57289625cffbbb6d533d5d56c36a5/torch-2.7.0-cp310-cp310-manylinux_2_28_x86_64.whl:
  | 5.200               Expected sha256 0b9960183b6e5b71239a3e6c883d8852c304e691c0b2955f7045e8a6d05b9183
  | 5.200                    Got        c74a63dbe482e161469797fb5b97adf97b05e3265b52855964420a343acad996


when building the docker

@mergify mergify bot added v1 tpu Related to Google TPUs labels Jun 4, 2025
@lsy323
Copy link
Collaborator Author

lsy323 commented Jun 4, 2025

close this one, putting this with #19108 together, both are fixing the CI issues are HEAD.

@lsy323
Copy link
Collaborator Author

lsy323 commented Jun 4, 2025

Oh, we got


5.200       ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
--
  | 5.200           torch==2.7.0 from https://files.pythonhosted.org/packages/cc/2c/91d1de65573fce563f5284e69d9c56b57289625cffbbb6d533d5d56c36a5/torch-2.7.0-cp310-cp310-manylinux_2_28_x86_64.whl:
  | 5.200               Expected sha256 0b9960183b6e5b71239a3e6c883d8852c304e691c0b2955f7045e8a6d05b9183
  | 5.200                    Got        c74a63dbe482e161469797fb5b97adf97b05e3265b52855964420a343acad996

when building the docker

Looks like flaky issue, didn't hit this in another PR #19108

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) June 4, 2025 06:07
@vllm-bot vllm-bot merged commit 8e972d9 into vllm-project:main Jun 4, 2025
39 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants