-
Notifications
You must be signed in to change notification settings - Fork 1.8k
start OAIServer with max_beam_width=1
for TorchSampler
#5427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start OAIServer with max_beam_width=1
for TorchSampler
#5427
Conversation
Signed-off-by: Netanel Haber <[email protected]>
max_beam_width=1
for TorchSampler
/bot run |
PR_Github #9672 [ run ] triggered by Bot |
…https://nvbugs/5355091) Signed-off-by: Netanel Haber <[email protected]>
/bot run |
PR_Github #9677 [ run ] triggered by Bot |
PR_Github #9672 [ run ] completed with state |
/bot run --disable-fail-fast |
PR_Github #9677 [ run ] completed with state |
PR_Github #9701 [ run ] triggered by Bot |
PR_Github #9701 [ run ] completed with state |
Signed-off-by: Netanel Haber <[email protected]>
/bot skip --comment "CI failed due to unrelated issues in main, conflicts were only in waives.txt" |
PR_Github #9833 [ skip ] triggered by Bot |
PR_Github #9833 [ skip ] completed with state |
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
TorchSampler recently added an assertion that
max_beam_width==1
: https://github.com/NVIDIA/TensorRT-LLM/pull/4401/files#diff-76ef6eceb82cb64d4ea6d49fca82d42ac6c942a45ad9fc8d3ff6a9b49e3b466bR208test_openai_reasoning.py
never passed a request with beam search enabled, but did actually start pytorch backend servers withmax_beam_width=2
, so the tests fail on that assertion. This pr just passesmax_beam_width=1
to match the new assertion.