-
Couldn't load subscription status.
- Fork 475
Closed
Description
Description
The LLamaEmbedder at the moment does not expose all of the embedding capabilities of llama.cpp.
Currently:
- It only returns 1 single vector as a
float[] - Some models return a single vector which represents the entire input sequence (embeddings models) and some produce an embedding vector per token (generative models).
- Pooling mode can be set with some models which sets a method for converting lots of embeddings into one embedding. Probably only compatible with some models?
Improvements:
- Indicate which type of model an embedder was created with.
- Indicate how many results there are.
- Use
llama_get_embeddings,llama_get_embeddings_ithandllama_get_embeddings_seqas appropriate to get the correct embeddings.
Random things to consider in no particular order:
- What should be returned?
float[][]or aSpan<float>? - When tokens are added to the batch, should the
logitsflag be set for all tokens, none of the tokens, or just the last token?- Does this change with different models (generative vs embedding vs generative with pooling)?
- Should returned embeddings always be normalized?
Metadata
Metadata
Assignees
Labels
No labels