[Do not merge] Trying out rpyc as a replacement for ray #1318

seanshi-scale · 2023-10-11T03:33:23Z

tl;dr: RPyC is still slower than Ray as implemented in this PR, although there probably are other things that we could try to speed up some of the bottlenecks in this implementation.

Benchmarking on a machine with 4 A10 GPUs, and running with tensor-parallel=4, i.e.

python -m vllm.entrypoints.api_server --tensor-parallel-size 4 --model ~/path/to/model
--worker-use-rpyc

and running a single request i.e.

curl http://localhost:8000/generate -d '{"prompt": "lorem ipsum sit dolor amet", "n": 1, "temperature": 0.01, "max_tokens": 1024, "stream": false}'

I'm getting roughly 49.9 tokens/sec with the RPyC implementation and 54.7 tokens/sec with the Ray implementation as reported by vllm itself.

The main bottleneck at this point seems to be sending the data from the engine process to the worker process, i.e. the obtain() calls in the rpyc worker class's exposed_execute_method call. Think this happens because the objects have to be pickled on the engine process's side via some request from the worker processes, and this happens for each worker process. There's definitely some room to make this faster, e.g. by serializing the objects better so they don't have to be pickled/unpickled, using shared memory between the processes, maybe calling obtain() on each argument directly.

Aside from this PR, I had to do the following inside RPyC's code to avoid a serious slowdown

--- a/rpyc/utils/factory.py
+++ b/rpyc/utils/factory.py
@@ -99,7 +99,7 @@ def connect(host, port, service=VoidService, config={}, ipv6=False, keepalive=False):
    """

     :returns: an RPyC connection
     """
-    s = SocketStream.connect(host, port, ipv6=ipv6, keepalive=keepalive)
+    s = SocketStream.connect(host, port, ipv6=ipv6, keepalive=keepalive, nodelay=True)

Without this patch, I was getting roughly 15 tokens/second with the same setup as before.

Also, there's some messiness in terms of when certain env vars get set/when torch or other libraries get imported that I had to hack around.

…ny devices

… probably

bring in 0.2.0 changes

…re out where the bottleneck is that's causing 0.1 seconds of iteration lag on this small model, don't think threadpoolexecutor helped

…to seanshi-scale/rpyc

add rpyc

Juelianqvq · 2023-11-02T10:26:19Z

asyncio.to_thread method seems only was supported in python > 3.9, is there any workaround?

seanshi-scale · 2023-11-02T17:33:38Z

I did find https://stackoverflow.com/questions/68523752/python-module-asyncio-has-no-attribute-to-thread, don't have the time myself to implement it but it's worth a shot?

Juelianqvq · 2023-11-03T03:41:03Z

I did find https://stackoverflow.com/questions/68523752/python-module-asyncio-has-no-attribute-to-thread, don't have the time myself to implement it but it's worth a shot?

Yeah, I've tried with this answer before, then I found latency on tp=2 llama13b dived from 35 to 7.8 tokens/s.
tp=1 not tested yet.

seanshi-scale · 2023-11-06T21:27:53Z

I haven't found any other workarounds unfortunately, I've only been trying with a later version of python

zhuohan123 · 2024-01-12T21:23:34Z

Close this PR in favor of #2221. Please feel free to reopen the PR if you have anything to add.

Set vllm-hpu-extension revision to 80985d3

seanshi-scale and others added 30 commits September 27, 2023 12:26

todo

b30c21c

start adding in hooks to use rpyc

fd15d74

missed a spot

4ac4db4

.

689b5d5

idk

714c31d

something to initialize env vars for torch distributed

1c5bc06

super untested init code

04601e2

.

c0ac074

async

ff56e75

wip added a lot of stuff

2bd498e

am hitting some runtime error probably bcz of how I'm setting up ports

52f753d

still don't know how to initialize distributed rip

b287bec

stash

459406b

save for bisecting

aa97942

get ray to not break, it's some rpyc_utils import I think

6c54a55

figured out what import breaks the ray serving mode

48ebebb

find free port

e40d4df

some asyncio bs

165b4ab

for some reason we're already done importing torch and we don't see a…

248f0e5

…ny devices

got past cuda no devices

fc4d357

init workers in parallel

c640e7c

starts up but the assert outputs is wrong

371bf7d

it works??? idk if obtain(ans) is slow but watch out for that

8e6d29c

it works but it's a lot slower than ray, gotta figure out parallelism…

37196df

… probably

Merge pull request #2 from vllm-project/main

bc5a20a

bring in 0.2.0 changes

tried switching over to threadpoolexecutor, more timing stuff to figu…

fa23fd6

…re out where the bottleneck is that's causing 0.1 seconds of iteration lag on this small model, don't think threadpoolexecutor helped

rip

8f8e195

help, setting keepalive on rpyc.connect doesn't help it seems?

b1547d2

todos

df301c3

Merge branch 'seanshi-scale/rpyc' of github.com:seanshi-scale/vllm in…

fdbcf6e

…to seanshi-scale/rpyc

seanshi-scale and others added 21 commits October 4, 2023 18:29

use asyncio instead of threadpoolexec for the actual loop oops

00ed936

comment out some prints, we're at about 49.5 tok/sec now

dd396f0

print prepare inputs time also

5240b46

more printing out timing

cdcf0f0

rm a print

8f6b05c

clean up more prints

ecdfb48

clean up x3

ca94c56

add back a print that's actually necessary ugh

df4a843

clean up some more prints

5ed5bad

clean up llm_engine.py

cd67e01

clean up more llm_engine.py

781ad0e

clean up more stuff

c4b469b

cleanup part 8

db596f6

more cleaning up

705fa5d

more cleanup

355983a

oops

bfc97ea

lmao

864b343

remove engine_use_rpyc

a2bfc83

more cleanup

ac86152

clean up unused worker fns

f53e1b9

Merge pull request #1 from seanshi-scale/seanshi-scale/rpyc

c6ee7f3

add rpyc

seanshi-scale mentioned this pull request Oct 11, 2023

Would RPyC be a good alternative of Ray? #908

Closed

zhuohan123 closed this Jan 12, 2024

minmin-intel pushed a commit to minmin-intel/vllm that referenced this pull request Jul 15, 2025

Set vllm-hpu-extension revision to 80985d3 (vllm-project#1318)

376521d

Set vllm-hpu-extension revision to 80985d3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Do not merge] Trying out rpyc as a replacement for ray #1318

[Do not merge] Trying out rpyc as a replacement for ray #1318

Uh oh!

seanshi-scale commented Oct 11, 2023 •

edited

Loading

Uh oh!

Juelianqvq commented Nov 2, 2023 •

edited

Loading

Uh oh!

seanshi-scale commented Nov 2, 2023

Uh oh!

Juelianqvq commented Nov 3, 2023

Uh oh!

seanshi-scale commented Nov 6, 2023

Uh oh!

zhuohan123 commented Jan 12, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

[Do not merge] Trying out rpyc as a replacement for ray #1318

[Do not merge] Trying out rpyc as a replacement for ray #1318

Uh oh!

Conversation

seanshi-scale commented Oct 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Juelianqvq commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seanshi-scale commented Nov 2, 2023

Uh oh!

Juelianqvq commented Nov 3, 2023

Uh oh!

seanshi-scale commented Nov 6, 2023

Uh oh!

zhuohan123 commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seanshi-scale commented Oct 11, 2023 •

edited

Loading

Juelianqvq commented Nov 2, 2023 •

edited

Loading

zhuohan123 commented Jan 12, 2024 •

edited

Loading