-
Notifications
You must be signed in to change notification settings - Fork 88
Closed
Description
I set the GUIDELLM__PREFERRED_ROUTE="chat_completions" but it doesn't seem to affect the route when the request is happening:
(venv) prodooser@prodooser:~/TensorRT-LLab$ guidellm config
Settings:
GUIDELLM__ENV="Environment.PROD"
GUIDELLM__DEFAULT_ASYNC_LOOP_SLEEP="0.0001"
GUIDELLM__LOGGING__DISABLED=
GUIDELLM__LOGGING__CLEAR_LOGGERS="True"
GUIDELLM__LOGGING__CONSOLE_LOG_LEVEL="WARNING"
GUIDELLM__LOGGING__LOG_FILE=
GUIDELLM__LOGGING__LOG_FILE_LEVEL=
GUIDELLM__DEFAULT_SWEEP_NUMBER="10"
GUIDELLM__REQUEST_TIMEOUT="300"
GUIDELLM__REQUEST_HTTP2="True"
GUIDELLM__MAX_CONCURRENCY="512"
GUIDELLM__MAX_WORKER_PROCESSES="10"
GUIDELLM__MAX_ADD_REQUESTS_PER_LOOP="20"
GUIDELLM__DATASET__PREFERRED_DATA_COLUMNS=["prompt","instruction","input","inputs","question","context","text","content","body","data"]
GUIDELLM__DATASET__PREFERRED_DATA_SPLITS=["test","tst","validation","val","train"]
GUIDELLM__PREFERRED_PROMPT_TOKENS_SOURCE="response"
GUIDELLM__PREFERRED_OUTPUT_TOKENS_SOURCE="response"
GUIDELLM__PREFERRED_BACKEND="openai"
GUIDELLM__PREFERRED_ROUTE="chat_completions"
GUIDELLM__OPENAI__API_KEY=
GUIDELLM__OPENAI__BEARER_TOKEN=
GUIDELLM__OPENAI__ORGANIZATION=
GUIDELLM__OPENAI__PROJECT=
GUIDELLM__OPENAI__BASE_URL="http://localhost:8000"
GUIDELLM__OPENAI__MAX_OUTPUT_TOKENS="16384"
GUIDELLM__TABLE_BORDER_CHAR="="
GUIDELLM__TABLE_HEADERS_BORDER_CHAR="-"
GUIDELLM__TABLE_COLUMN_SEPARATOR_CHAR="|"
venv) prodooser@prodooser:~/TensorRT-LLab$ guidellm benchmark --target "http://localhost:8000" --rate-type sweep --max-seconds 30 --data "prompt_tokens=256,output_tokens=128"
Creating backend...
2025-07-01T09:30:19.193651+0000 | text_completions | ERROR - OpenAIHTTPBackend request with headers: {'Content-Type': 'application/json'} and payload: {'prompt': 'Test connection', 'model': 'nvidia--Llama-3.3-70B-Instruct-FP8', 'stream': True, 'stream_options': {'include_usage': True}, 'max_tokens': 1, 'max_completion_tokens': 1, 'stop': None, 'ignore_eos': True} failed: Client error '400 Bad Request' for url 'http://localhost:8000/v1/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
Traceback (most recent call last):
File "/home/prodooser/TensorRT-LLab/venv/bin/guidellm", line 8, in <module>
sys.exit(cli())
^^^^^
...
File "/home/prodooser/TensorRT-LLab/venv/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'http://localhost:8000/v1/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
The codeblock above is showing that it's still trying to use "text_completions" instead of "chat_completions":
2025-07-01T09:30:19.193651+0000 | text_completions | ERROR - OpenAIHTTPBackend ...
and
Client error '400 Bad Request' for url 'http://localhost:8000/v1/completions'
I believe it should be trying to hit v1/chat/completion
Metadata
Metadata
Assignees
Labels
No labels