Skip to content

Conversation

@hilarious-viking
Copy link
Contributor

Adds gpu layers parameter to llama.cpp wrapper

After a change:
llm = LlamaCpp(model_path=..., n_gpu_layers=3)

Output:

....
llama_model_load_internal: [cublas] offloading 3 layers to GPU
....
llama_model_load_internal: [cublas] total VRAM used: 1148 MB
llama_init_from_file: kv self size  = 3120.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

what do you know about 'HUNTER X HUNTER'

> Entering new AgentExecutor chain...

I should look up information on Hunter x Hunter.
Action: wikipedia
Action Input: Hunter x Hunter
Observation: Page: Hunter × Hunter
Summary: Hunter × Hunter (stylized as HUNTER×HUNTER and pronounced "hunter hunter") is a Japanese manga series...

For review:

"""Number of tokens to process in parallel.
Should be a number between 1 and n_ctx."""

n_gpu_layers: Optional[int] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as i can tell this isn't optional in llama_cpp package, so probably don't want to allow it to be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other params in this wrapper, in llama-cpp-python progect it's optional with default value 0
https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama.py#L86

tested without this param:
llm = LlamaCpp(model_path=...)

Output:
llama_model_load_internal: [cublas] offloading 0 layers to GPU

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by optional i mean typed as Optional (meaning users could pass in None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, correct in that regards, updated, is it better now
n_gpu_layers: Optional[int] = Field(0, alias="n_gpu_layers")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im suggesting we make it n_gpu_layers: int = ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually updated to leave it as None and not pass in if it is None, since that would break backwards compatability ( (since n_gpu_layers support was only recently added to llama-cpp-python)

@dev2049 dev2049 merged commit 7d15669 into langchain-ai:master May 15, 2023
@hilarious-viking hilarious-viking deleted the llama-cpp-gpu-layers-param branch May 15, 2023 23:33
@thekit
Copy link

thekit commented May 18, 2023

I'm on langchain 0.0.173, updated today, and setting n_gpu_layers=3 in my clone of privateGPT.

I am seeing activity on my ATI onboard graphics but in windows 11 performance manager I am seeing a flat line on my 3070. how do I target my discrete graphics card?

I am not experiencing much of a speedup

here is how I am configured:
zylon-ai/private-gpt#275

@hilarious-viking
Copy link
Contributor Author

hilarious-viking commented May 18, 2023

@thekit under the hood langchain.llms.LlamaCpp uses llama-cpp-python
you need to install llama-cpp-python with CUDA support enabled:
installation-with-openblas--cublas--clblast
Seems n_gpu_layers is only available in cublas mode (NVIDIA GPU + CUDA toolkit)

As reference here how I'm doing it (Ubuntu 22), update-llama.sh script in langchain folder:

#!/bin/bash

export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DLLAMA_NATIVE=on";
export FORCE_CMAKE=1;

pip uninstall llama-cpp-python -y
pip --no-cache-dir install llama-cpp-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants