Add Self-Extend support?

I've been really enjoying using both `llama.cpp-python` and the original `llama.cpp`.  These are amazing developments here, especially for folks without massively powerful GPUs.

There's a really nice feature that was implemented in `llama.cpp` in January to allow self-extend (ala LongLLM's [approach](https://github.com/datamllab/LongLM))).  It works well for the llama's [main.cpp](https://github.com/ggerganov/llama.cpp/pull/4815) as well as [server.cpp.](https://github.com/ggerganov/llama.cpp/issues/4886)  It works really well, and plenty of folks have noted self-extend is especially useful with Mistral/Mixtral, [Gemma,](https://www.reddit.com/r/LocalLLaMA/comments/1b1q88w/selfextend_works_amazingly_well_with_gemma2bit/) and [Phi 2.](https://www.reddit.com/r/LocalLLaMA/comments/194mmki/selfextend_works_for_phi2_now_looks_good/)

It appears someone else might have been asking about this earlier [here](https://github.com/ggerganov/llama.cpp/issues/4992).  Right now, I'm having to move in and out of python when I want to run summarization on a 'just-slightly-too-long' article with self-extend.  Would you consider implementing self-extend as an option in `llama.cpp-python`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Self-Extend support? #1242

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Self-Extend support? #1242

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions