Skip to content

Improve LLamaEmbedder #889

@martindevans

Description

@martindevans

Description

The LLamaEmbedder at the moment does not expose all of the embedding capabilities of llama.cpp.

Currently:

  • It only returns 1 single vector as a float[]
  • Some models return a single vector which represents the entire input sequence (embeddings models) and some produce an embedding vector per token (generative models).
  • Pooling mode can be set with some models which sets a method for converting lots of embeddings into one embedding. Probably only compatible with some models?

Improvements:

  • Indicate which type of model an embedder was created with.
  • Indicate how many results there are.
  • Use llama_get_embeddings, llama_get_embeddings_ith and llama_get_embeddings_seq as appropriate to get the correct embeddings.

Random things to consider in no particular order:

  • What should be returned? float[][] or a Span<float>?
  • When tokens are added to the batch, should the logits flag be set for all tokens, none of the tokens, or just the last token?
    • Does this change with different models (generative vs embedding vs generative with pooling)?
  • Should returned embeddings always be normalized?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions