Skip to content

Conversation

ccs96307
Copy link
Contributor

PR title

[#3332][feat] Support more sampling parameters with openai worker

Description

Resolves #3332

This PR enhances the OpenaiWorker in the scaffolding framework by adding support for more sampling parameters.

The convert_task_params method in worker.py has been refactored to use a parameter list (OpenaiWorker) for a cleaner and more maintainable implementation. This replaces a long series of repetitive function calls.

To resolve a naming conflict, the input parameter for requesting log probabilities has been renamed to num_logprobs within GenerationTask. This distinguishes it from the logprobs attribute in the result fields. The worker maps num_logprobs back to the logprobs key required by the OpenAI API.

Please let me know if there's anything I should change, I'd be happy to update the PR accordingly.

Test Coverage

  • tests/unittest/scaffolding/test_worker.py

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@WeiHaocheng
Copy link
Collaborator

/bot run

@WeiHaocheng WeiHaocheng requested a review from dc3671 June 15, 2025 03:46
@tensorrt-cicd
Copy link
Collaborator

PR_Github #8914 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8914 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6499 completed with status: 'FAILURE'

@ccs96307
Copy link
Contributor Author

Hi @WeiHaocheng,

I noticed that I can't access the main Jenkins log (which I believe is expected), but I was able to check the log for "Upload test results" job on GitHub Actions.

It failed with the error gzip: stdin: not in gzip format when trying to download and extract test-results.tar.gz.

It looks like the root cause might be related to one of the test cases in the main pipeline, would you mind taking a look when you get a chance?

Thanks a lot in advance!

@ccs96307
Copy link
Contributor Author

ccs96307 commented Jul 3, 2025

Hi @WeiHaocheng and @dc3671,

Hope you're having a great week.

Just wanted to gently follow up on this PR. I've kept the branch up-to-date with the latest main to ensure there are no merge conflicts.

It seems the main blockers are the pending review and the CI failure. The last CI run failed with a gzip error in the test reporting step.

Please let me know if there's anything I can do to help resolve the CI issue, or if there's any other information needed from my side.

Thanks again for your time and guidance!

@WeiHaocheng
Copy link
Collaborator

/bot run

@WeiHaocheng
Copy link
Collaborator

@ccs96307 Sorry for delay! Let me restart the CI pipeline.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10836 [ run ] triggered by Bot

@ccs96307
Copy link
Contributor Author

ccs96307 commented Jul 3, 2025

Thank you so much @WeiHaocheng for the quick response and for restarting the pipeline!

I’ll keep an eye on the CI results and will follow up immediately if any issues come up.

Really appreciate your time and support~

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10836 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8006 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@WeiHaocheng WeiHaocheng merged commit 7a31952 into NVIDIA:main Jul 4, 2025
3 checks passed
@ccs96307 ccs96307 deleted the add-scaffolding-openai-worker-parameters branch July 4, 2025 10:29
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support more sampling parameters with openai worker.
4 participants