-
Notifications
You must be signed in to change notification settings - Fork 470
Closed
Labels
Description
There used to be an issue asking for a way to use embeddings instead of tokens as input to generate the response. Now llama.cpp has supported using embeddings as input, as shown below.
typedef struct llama_batch {
int32_t n_tokens;
llama_token * token;
float * embd; // set this member and keep token as null will take embeddings as input
llama_pos * pos;
int32_t * n_seq_id;
llama_seq_id ** seq_id;
int8_t * logits;
llama_pos all_pos_0; // used if pos == NULL
llama_pos all_pos_1; // used if pos == NULL
llama_seq_id all_seq_id; // used if seq_id == NULL
} llama_batch;
Since there's already a binding of this struct in LLamaSharp, what needs to be done is to add an API for executors to accept embeddings as input.
Lyrcaxis