-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Description
I have a local dev build on commit
lroberts@GPU77B9:~/update-vllm-env/vllm-source/vllm$ git log -n 1
commit 5265631d15d59735152c8b72b38d960110987f10 (HEAD -> main, origin/main, origin/HEAD)
Author: Vladimir <[email protected]>
Date: Fri Jan 26 08:48:17 2024 +0100
use a correct device when creating OptionalCUDAGuard (#2583)and I have some local code that is a thin wrapper around LLM class
If i run this with tensor-parallel == 2 I get the following:
roberts@GPU77B9:~/llm_quantization$ FLASK_APP=quantized_flask_app.py FLASK_ENV=debug python3.10 -m flask run
* Serving Flask app 'quantized_flask_app.py' (lazy loading)
* Environment: debug
* Debug mode: off
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (5.2.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
16384
INFO 2024-01-26 22:03:13,343 abc_etal.py:195 unknown_model_name:unknown_model_version
Hello! logging initialized, starting up...
INFO 2024-01-26 22:03:13,343 abc_etal.py:196 unknown_model_name:unknown_model_version
Git commit of model: unknown_git_commit
INFO 2024-01-26 22:03:13,343 abc_etal.py:197 unknown_model_name:unknown_model_version
Git commit of cuda torch base: unknown_git_commit
INFO 2024-01-26 22:03:14,921 abc_etal.py:200 unknown_model_name:unknown_model_version
Compute device available: cuda
WARNING 01-26 22:03:16 config.py:506] Casting torch.bfloat16 to torch.float16.
WARNING 01-26 22:03:16 config.py:176] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
2024-01-26 22:03:18,650 ERROR services.py:1329 -- Failed to start the dashboard , return code 1
2024-01-26 22:03:18,650 ERROR services.py:1354 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2024-01-26 22:03:18,651 ERROR services.py:1398 --
The last 20 lines of /tmp/ray/session_2024-01-26_22-03-16_731996_3725694/logs/dashboard.log (it contains the error message from the dashboard):
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 16, in <module>
from ray.job_submission import JobStatus, JobSubmissionClient
File "/home/lroberts/.local/lib/python3.10/site-packages/ray/job_submission/__init__.py", line 2, in <module>
from ray.dashboard.modules.job.pydantic_models import DriverInfo, JobDetails, JobType
File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/modules/job/pydantic_models.py", line 4, in <module>
from ray._private.pydantic_compat import BaseModel, Field, PYDANTIC_INSTALLED
File "/home/lroberts/.local/lib/python3.10/site-packages/ray/_private/pydantic_compat.py", line 100, in <module>
monkeypatch_pydantic_2_for_cloudpickle()
File "/home/lroberts/.local/lib/python3.10/site-packages/ray/_private/pydantic_compat.py", line 58, in monkeypatch_pydantic_2_for_cloudpickle
pydantic._internal._model_construction.SchemaSerializer = (
AttributeError: module 'pydantic._internal' has no attribute '_model_construction'
2024-01-26 22:03:18,879 INFO worker.py:1673 -- Started a local Ray instance.
[2024-01-26 22:03:19,820 E 3725694 3725694] core_worker.cc:205: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
however, tensor-parallel == 1 works fine:
lroberts@GPU77B9:~/llm_quantization$ FLASK_APP=quantized_flask_app.py FLASK_ENV=debug python3.10 -m flask run
* Serving Flask app 'quantized_flask_app.py' (lazy loading)
* Environment: debug
* Debug mode: off
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (5.2.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
16384
INFO 2024-01-26 22:04:03,519 abc_etal.py:195 unknown_model_name:unknown_model_version
Hello! logging initialized, starting up...
INFO 2024-01-26 22:04:03,519 abc_etal.py:196 unknown_model_name:unknown_model_version
Git commit of model: unknown_git_commit
INFO 2024-01-26 22:04:03,519 abc_etal.py:197 unknown_model_name:unknown_model_version
Git commit of cuda torch base: unknown_git_commit
INFO 2024-01-26 22:04:05,098 abc_etal.py:200 unknown_model_name:unknown_model_version
Compute device available: cuda
WARNING 01-26 22:04:06 config.py:506] Casting torch.bfloat16 to torch.float16.
WARNING 01-26 22:04:06 config.py:176] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 01-26 22:04:06 llm_engine.py:72] Initializing an LLM engine with config: model='/home/lroberts/NexusRaven-13B-AWQ/', tokenizer='/home/lroberts/NexusRaven-13B-AWQ/presaved_tokenizer', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=awq, enforce_eager=False, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 01-26 22:04:23 llm_engine.py:316] # GPU blocks: 4145, # CPU blocks: 327
INFO 01-26 22:04:27 model_runner.py:625] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-26 22:04:27 model_runner.py:629] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 01-26 22:04:33 model_runner.py:689] Graph capturing finished in 6 secs.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 2024-01-26 22:04:33,205 abc_etal.py:231 unknown_model_name:unknown_model_version
Startup completed!
INFO 2024-01-26 22:04:33,207 _internal.py:224 unknown_model_name:unknown_model_version
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
INFO 2024-01-26 22:04:33,207 _internal.py:224 unknown_model_name:unknown_model_version
Press CTRL+C to quit
[OpenAIMessage(role='system', content='You are a helpful assistant.'), OpenAIMessage(role='user', content='Tell me a few reasons why someone might consider higher education. Do not repeat yourself. Response: ')]
16384
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00, 6.11s/it]
INFO 2024-01-26 22:05:05,684 _internal.py:224 unknown_model_name:unknown_model_version
127.0.0.1 - - [26/Jan/2024 22:05:05] "POST /sequence-generation/chat/json HTTP/1.1" 200 -
```bash
the message is a simple curl request looks like this:
```bash
curl -v --trace-time -X POST -H "Content-Type: application/json" --data '{"max_tokens": 500, "messages": [{"content": "You are a helpful assistant.","role": "system"}, {"content": "Tell me a few reasons why someone might consider higher education. Do not repeat yourself. Response: ","role": "user"}], "model": "gpt-3.5-turbo", "temperature": 0}' http://localhost:5000/sequence-generation/chat/jsonwith response:
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":" There are many reasons why someone might consider higher education. Here are a few:\n\n1. To gain knowledge and skills: Higher education provides students with the opportunity to learn new knowledge and skills that can be applied in their future careers.\n2. To prepare for a career: Many people choose to pursue higher education because it is a way to prepare for a specific career. For example, a student may choose to study business because they want to work in the field.\n3. To gain a competitive edge: Higher education can provide students with a competitive edge in the job market. Many employers require a degree from a reputable institution, and having one can make a candidate more attractive to potential employers.\n4. To develop critical thinking and problem-solving skills: Higher education provides students with the opportunity to develop their critical thinking and problem-solving skills.\n5. To gain a sense of community: Higher education provides students with the opportunity to connect with other students and faculty members, which can help to create a sense of community.\n6. To gain a sense of purpose: Higher education can provide students with a sense of purpose and direction in life.\n7. To gain a sense of accomplishment: Higher education can provide students with a sense of accomplishment and pride in their achievements.\n8. To gain a sense of personal growth: Higher education can provide students with the opportunity to grow and develop as individuals.\n9. To gain a sense of independence: Higher education can provide students with the opportunity to become independent and self-sufficient.\n10. To gain a sense of fulfillment: Higher education can provide students with a sense of fulfillment and satisfaction in their lives.\n\nOverall, higher education can provide students with a wide range of benefits, including the opportunity to gain knowledge and skills, prepare for a career, gain a competitive edge, develop critical thinking and problem-solving skills, gain a sense of community, gain a sense of purpose, gain a sense of accomplishment, gain a sense of personal growth, gain a sense of independence, and gain a sense of fulfillment.","role":"assistant"}}],"created":1706306706,"id":"llama-2-7b-chat-hf","object":"chat.completion","usage":{"completion_tokens":457,"prompt_tokens":49,"total_tokens":506}}the error in logs from ray indicates some serialization
1 2024-01-26 21:35:42,363 INFO utils.py:112 -- Get all modules by type: DashboardHeadModule
2 2024-01-26 21:35:42,407 INFO utils.py:123 -- Module ray.dashboard.modules.actor.actor_head cannot be loaded because we cannot import all dependencies. Install this module using `pip ins tall 'ray[default]'` for the full dashboard functionality. Error: No module named 'opencensus'
3 2024-01-26 21:35:42,429 INFO utils.py:123 -- Module ray.dashboard.modules.event.event_agent cannot be loaded because we cannot import all dependencies. Install this module using `pip in stall 'ray[default]'` for the full dashboard functionality. Error: No module named 'grpc'
4 2024-01-26 21:35:42,430 INFO utils.py:123 -- Module ray.dashboard.modules.event.event_head cannot be loaded because we cannot import all dependencies. Install this module using `pip ins tall 'ray[default]'` for the full dashboard functionality. Error: No module named 'opencensus'
5 2024-01-26 21:35:42,431 INFO utils.py:123 -- Module ray.dashboard.modules.healthz.healthz_agent cannot be loaded because we cannot import all dependencies. Install this module using `pi p install 'ray[default]'` for the full dashboard functionality. Error: No module named 'opencensus'
6 2024-01-26 21:35:42,431 INFO utils.py:123 -- Module ray.dashboard.modules.healthz.healthz_head cannot be loaded because we cannot import all dependencies. Install this module using `pip install 'ray[default]'` for the full dashboard functionality. Error: No module named 'opencensus'
7 2024-01-26 21:35:42,450 ERROR dashboard.py:259 -- The dashboard on node GPU77B9 failed with the following error:
8 Traceback (most recent call last):
9 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/dashboard.py", line 248, in <module>
10 loop.run_until_complete(dashboard.run())
11 File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
12 return future.result()
13 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/dashboard.py", line 75, in run
14 await self.dashboard_head.run()
15 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/head.py", line 325, in run
16 modules = self._load_modules(self._modules_to_load)
17 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/head.py", line 219, in _load_modules
18 head_cls_list = dashboard_utils.get_all_modules(DashboardHeadModule)
19 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/utils.py", line 121, in get_all_modules
20 importlib.import_module(name)
21 File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
22 return _bootstrap._gcd_import(name[level:], package, level)
23 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
24 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
25 File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
26 File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
27 File "<frozen importlib._bootstrap_external>", line 883, in exec_module
28 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
29 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 16, in <module>
30 from ray.job_submission import JobStatus, JobSubmissionClient
31 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/job_submission/__init__.py", line 2, in <module>
32 from ray.dashboard.modules.job.pydantic_models import DriverInfo, JobDetails, JobType
33 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/dashboard/modules/job/pydantic_models.py", line 4, in <module>
34 from ray._private.pydantic_compat import BaseModel, Field, PYDANTIC_INSTALLED
35 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/_private/pydantic_compat.py", line 100, in <module>
36 monkeypatch_pydantic_2_for_cloudpickle()
37 File "/home/lroberts/.local/lib/python3.10/site-packages/ray/_private/pydantic_compat.py", line 58, in monkeypatch_pydantic_2_for_cloudpickle
38 pydantic._internal._model_construction.SchemaSerializer = (
39 AttributeError: module 'pydantic._internal' has no attribute '_model_construction'
40
~
~ relevant details about env:
lroberts@GPU77B9:~/update-vllm-env/vllm-source/vllm$ python -c "import pydantic; print(pydantic.__version__)"
2.5.3
lroberts@GPU77B9:~/update-vllm-env/vllm-source/vllm$ python -c "import ray; print(ray.__version__)"
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (5.2.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
2.8.0
lroberts@GPU77B9:~/update-vllm-env/vllm-source/vllm$ python -c "import torch; print(torch.__version__)"
2.1.2+cu121It seems there a known fix or workaround here -> ray-project/ray#41913 (comment)
but it seems that pydantic version 2 is necessary for openai testing
Line 11 in 3a0e1fc
| pydantic >= 2.0 # Required for OpenAI server. |
is there a suggested workaround or should I manually downgrade pydantic to version lower than 2.0.0?
yippp
Metadata
Metadata
Assignees
Labels
No labels