Improve `LLamaEmbedder`

### Description

The `LLamaEmbedder` at the moment does not expose all of the embedding capabilities of llama.cpp.

Currently:
 - It only returns 1 single vector as a `float[]`
 - Some models return a single vector which represents the entire input sequence (embeddings models) and some produce an embedding vector per token (generative models).
 - Pooling mode can be set with some models which sets a method for converting lots of embeddings into one embedding. Probably only compatible with some models?

Improvements:
 - Indicate which type of model an embedder was created with.
 - Indicate how many results there are.
 - Use `llama_get_embeddings`, `llama_get_embeddings_ith` and `llama_get_embeddings_seq` as appropriate to get the correct embeddings.

Random things to consider in no particular order:
 - What should be returned? `float[][]` or a `Span<float>`?
 - When tokens are added to the batch, should the `logits` flag be set for **all** tokens, **none** of the tokens, or just the **last** token?
   - Does this change with different models (generative vs embedding vs generative with pooling)?
 - Should returned embeddings always be normalized?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve `LLamaEmbedder` #889

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve LLamaEmbedder #889

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Improve `LLamaEmbedder` #889