SemanticKernel ChatCompletion is Stateless

So I'm interested in using the LLamaSharp ChatCompletion interface for Semantic Kernel for an application I'm working on and I've been   looking over the implementation and have some concerns regarding performance and chat state. I've already had this conversation with @martindevans a month or so ago on discord, you can view the [convo here](https://discord.com/channels/1106946823282761851/1106947264938790972/1200162270953623675). 

I haven't had any time to work on it until now. I'm going to have a look at fixing it, but the primary issue is the use of the StatelessExectutor for ChatCompletion. This basically leads to extremely long inference times as the chat history builds up since the model has to process the entire context on each inference call. I remember earlier that the implementation used to just wrap the existing llamasharp ChatSession class but that made it difficult to manage conversation state, i.e. deleting the last message and changing it to generate a different response.

I'm considering writing a dedicated ChatCompletion for semantic-kernel that doesn't use LLamaSharp's internal chat history but instead interfaces directly with the ChatHistory defined by Semantic Kernel. Are there any thoughts from @AsakusaRinne and @xbotter? Tagging you two since Rinne made the initial implementation and xbotter rewrote it and made the newer one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SemanticKernel ChatCompletion is Stateless #614

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SemanticKernel ChatCompletion is Stateless #614

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions