control-vectors : minor code style updates #1

ggerganov · 2024-03-14T14:46:44Z

merged master
avoid tuples in common
fix layer loop condition to <= max_direction_layer

* use multitask for embd endpoint * specify types * remove redundant {"n_predict", 0}

* llama : add pipeline parallelism support for batch processing with multiple CUDA GPUs ggml-ci * server : add -ub, --ubatch-size parameter * fix server embedding test * llama : fix Mamba inference for pipeline parallelism Tested to work correctly with both `main` and `parallel` examples. * llama : limit max batch size to n_batch * add LLAMA_SCHED_MAX_COPIES to configure the number of input copies for pipeline parallelism default increase to 4 (from 2) changing this value may improve performance for some systems, but increases memory usage * fix hip build * fix sycl build (disable cpy_tensor_async) * fix hip build * llama : limit n_batch and n_ubatch to n_ctx during context creation * llama : fix norm backend * batched-bench : sync after decode * swiftui : sync after decode * ggml : allow ggml_get_rows to use multiple threads if they are available * check n_ubatch >= n_tokens with non-casual attention * llama : do not limit n_batch to n_ctx with non-casual attn * server : construct batch with size of llama_n_batch * ggml_backend_cpu_graph_compute : fix return value when alloc fails * llama : better n_batch and n_ubatch comment * fix merge * small fix * reduce default n_batch to 2048 --------- Co-authored-by: Francis Couture-Harpin <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* metal : build metallib + fix embed path ggml-ci * metal : fix embed build + update library load logic ggml-ci * metal : fix embeded library build ggml-ci * ci : fix iOS builds to use embedded library

* Refactor dtype handling to be extensible This code is equivalent as before, but now it is prepared to easily add more NumPy dtypes. * Add support for I8, I16 and I32 These types are allowed in the GGUF specification. * Add support for I8, I16 and I32 to gguf_writer * Add support for I8, I16, I32 to gguf_reader

* attempt to reduce the impact of a worst-case scenario * fragmentation calculation fix * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>

- increase time out for server - do not fail fast

Co-authored-by: Jian Liao <[email protected]>

slaren and others added 20 commits March 12, 2024 17:55

ci : remove tidy-review (#6021)

306d34b

Server: Use multi-task for embeddings endpoint (#6001)

99b71c0

* use multitask for embd endpoint * specify types * remove redundant {"n_predict", 0}

Update get version (#6025)

b3d9786

test-backend-ops : skip CPU backend by default (#6028)

d8fd0cc

grammar : handle missing "root" node (#6004)

4636283

readme : update API changes and hot topics

76a936c

readme : update details about running llama in Termux on Android (#6039)

19885d2

embedding : print cosine similarity (#899)

0fd6c1f

metal : build metallib + fix embed path (#6015)

381da2d

* metal : build metallib + fix embed path ggml-ci * metal : fix embed build + update library load logic ggml-ci * metal : fix embeded library build ggml-ci * ci : fix iOS builds to use embedded library

embedding : print all resulting embeddings (#899)

68265eb

ggml : designate enum vals for integer types (#6050)

3fe8d7a

llama : optimize defrag moves + fix fragmentation calculation (#6037)

2c4fb69

* attempt to reduce the impact of a worst-case scenario * fragmentation calculation fix * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>

llama : fix typo

a44bc96

server: disable debug release type sanitizer, simplify trigger (#6047)

43241ad

- increase time out for server - do not fail fast

readme : improve readme for Llava-1.6 example (#6044)

15a3332

Co-authored-by: Jian Liao <[email protected]>

gguf-py : fix dtype check (#6045)

77178ee

Merge branch 'master' into vgel/repeng

42abb46

control-vectors : minor code style updates

0a9bc30

ggerganov mentioned this pull request Mar 14, 2024

Add support for control vectors ggml-org/llama.cpp#5970

Merged

vgel approved these changes Mar 14, 2024

View reviewed changes

vgel merged commit fc6f042 into NousResearch:vgel/repeng Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

control-vectors : minor code style updates #1

control-vectors : minor code style updates #1

Uh oh!

ggerganov commented Mar 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

control-vectors : minor code style updates #1

control-vectors : minor code style updates #1

Uh oh!

Conversation

ggerganov commented Mar 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants