diff --git a/docs/ContributingGuide.md b/docs/ContributingGuide.md
index fde74428a..2ae8685ad 100644
--- a/docs/ContributingGuide.md
+++ b/docs/ContributingGuide.md
@@ -107,3 +107,17 @@ There're mainly two ways to add an example:
 ## Add documents
 
 LLamaSharp uses [mkdocs](https://github.com/mkdocs/mkdocs) to build the documentation, please follow the tutorial of mkdocs to add or modify documents in LLamaSharp.
+
+For API references, LLamaSharp takes use of [xmldoc2md](https://github.com/charlesdevandiere/xmldoc2md) to generate the documentation of markdown format. If you want to update the files in `docs/xmldocs`, please run the following commands.
+
+```
+dotnet tool install -g XMLDoc2Markdown
+cd LLama/bin/Debug/net8 # change the path to your bin path.
+dotnet xmldoc2md LLamaSharp.dll -o ../../../../docs/xmldocs --back-button
+```
+
+Specifically, if the xmldoc2md cannot be found by dotnet cli, please replace the `dotnet xmldoc2md` with the executable file, like below.
+
+```
+C:\Users\liu_y\.dotnet\tools\xmldoc2md.exe LLamaSharp.dll  -o ../../../../docs/xmldocs --back-button
+```
\ No newline at end of file
diff --git a/docs/xmldocs/index.md b/docs/xmldocs/index.md
index 0d85291b6..3c3af0e90 100644
--- a/docs/xmldocs/index.md
+++ b/docs/xmldocs/index.md
@@ -16,6 +16,10 @@
 
 [LLamaQuantizer](./llama.llamaquantizer.md)
 
+[LLamaReranker](./llama.llamareranker.md)
+
+[LLamaTemplate](./llama.llamatemplate.md)
+
 [LLamaTransforms](./llama.llamatransforms.md)
 
 [LLamaWeights](./llama.llamaweights.md)
@@ -32,8 +36,6 @@
 
 ## LLama.Abstractions
 
-[AdapterCollection](./llama.abstractions.adaptercollection.md)
-
 [IContextParams](./llama.abstractions.icontextparams.md)
 
 [IHistoryTransform](./llama.abstractions.ihistorytransform.md)
@@ -46,16 +48,22 @@
 
 [IModelParams](./llama.abstractions.imodelparams.md)
 
+[INativeLibrary](./llama.abstractions.inativelibrary.md)
+
+[INativeLibrarySelectingPolicy](./llama.abstractions.inativelibraryselectingpolicy.md)
+
 [ITextStreamTransform](./llama.abstractions.itextstreamtransform.md)
 
 [ITextTransform](./llama.abstractions.itexttransform.md)
 
-[LoraAdapter](./llama.abstractions.loraadapter.md)
+[LLamaExecutorExtensions](./llama.abstractions.llamaexecutorextensions.md)
 
 [MetadataOverride](./llama.abstractions.metadataoverride.md)
 
 [MetadataOverrideConverter](./llama.abstractions.metadataoverrideconverter.md)
 
+[TensorBufferOverride](./llama.abstractions.tensorbufferoverride.md)
+
 [TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)
 
 [TensorSplitsCollectionConverter](./llama.abstractions.tensorsplitscollectionconverter.md)
@@ -66,14 +74,14 @@
 
 [BatchedExecutor](./llama.batched.batchedexecutor.md)
 
-[CannotForkWhileRequiresInferenceException](./llama.batched.cannotforkwhilerequiresinferenceexception.md)
-
 [CannotModifyWhileRequiresInferenceException](./llama.batched.cannotmodifywhilerequiresinferenceexception.md)
 
 [CannotSampleRequiresInferenceException](./llama.batched.cannotsamplerequiresinferenceexception.md)
 
 [CannotSampleRequiresPromptException](./llama.batched.cannotsamplerequirespromptexception.md)
 
+[CannotSaveWhileRequiresInferenceException](./llama.batched.cannotsavewhilerequiresinferenceexception.md)
+
 [Conversation](./llama.batched.conversation.md)
 
 [ConversationExtensions](./llama.batched.conversationextensions.md)
@@ -96,57 +104,47 @@
 
 ## LLama.Exceptions
 
-[GrammarExpectedName](./llama.exceptions.grammarexpectedname.md)
-
-[GrammarExpectedNext](./llama.exceptions.grammarexpectednext.md)
-
-[GrammarExpectedPrevious](./llama.exceptions.grammarexpectedprevious.md)
-
-[GrammarFormatException](./llama.exceptions.grammarformatexception.md)
-
-[GrammarUnexpectedCharAltElement](./llama.exceptions.grammarunexpectedcharaltelement.md)
-
-[GrammarUnexpectedCharRngElement](./llama.exceptions.grammarunexpectedcharrngelement.md)
-
-[GrammarUnexpectedEndElement](./llama.exceptions.grammarunexpectedendelement.md)
-
-[GrammarUnexpectedEndOfInput](./llama.exceptions.grammarunexpectedendofinput.md)
-
-[GrammarUnexpectedHexCharsCount](./llama.exceptions.grammarunexpectedhexcharscount.md)
-
-[GrammarUnknownEscapeCharacter](./llama.exceptions.grammarunknownescapecharacter.md)
+[GetLogitsInvalidIndexException](./llama.exceptions.getlogitsinvalidindexexception.md)
 
 [LLamaDecodeError](./llama.exceptions.llamadecodeerror.md)
 
 [LoadWeightsFailedException](./llama.exceptions.loadweightsfailedexception.md)
 
+[MissingTemplateException](./llama.exceptions.missingtemplateexception.md)
+
 [RuntimeError](./llama.exceptions.runtimeerror.md)
 
+[TemplateNotFoundException](./llama.exceptions.templatenotfoundexception.md)
+
 ## LLama.Extensions
 
 [IContextParamsExtensions](./llama.extensions.icontextparamsextensions.md)
 
 [IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)
 
-## LLama.Grammars
-
-[Grammar](./llama.grammars.grammar.md)
-
-[GrammarRule](./llama.grammars.grammarrule.md)
+[SpanNormalizationExtensions](./llama.extensions.spannormalizationextensions.md)
 
 ## LLama.Native
 
+[AvxLevel](./llama.native.avxlevel.md)
+
 [DecodeResult](./llama.native.decoderesult.md)
 
+[DefaultNativeLibrarySelectingPolicy](./llama.native.defaultnativelibraryselectingpolicy.md)
+
+[EncodeResult](./llama.native.encoderesult.md)
+
 [GGMLType](./llama.native.ggmltype.md)
 
 [GPUSplitMode](./llama.native.gpusplitmode.md)
 
-[LLamaBatch](./llama.native.llamabatch.md)
+[ICustomSampler](./llama.native.icustomsampler.md)
+
+[LLamaAttentionType](./llama.native.llamaattentiontype.md)
 
-[LLamaBeamsState](./llama.native.llamabeamsstate.md)
+[LLamaBatch](./llama.native.llamabatch.md)
 
-[LLamaBeamView](./llama.native.llamabeamview.md)
+[LLamaBatchEmbeddings](./llama.native.llamabatchembeddings.md)
 
 [LLamaChatMessage](./llama.native.llamachatmessage.md)
 
@@ -154,16 +152,10 @@
 
 [LLamaFtype](./llama.native.llamaftype.md)
 
-[LLamaGrammarElement](./llama.native.llamagrammarelement.md)
-
-[LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)
-
-[LLamaKvCacheView](./llama.native.llamakvcacheview.md)
-
-[LLamaKvCacheViewCell](./llama.native.llamakvcacheviewcell.md)
-
 [LLamaKvCacheViewSafeHandle](./llama.native.llamakvcacheviewsafehandle.md)
 
+[LLamaLogitBias](./llama.native.llamalogitbias.md)
+
 [LLamaLogLevel](./llama.native.llamaloglevel.md)
 
 [LLamaModelKvOverrideType](./llama.native.llamamodelkvoverridetype.md)
@@ -174,60 +166,94 @@
 
 [LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)
 
+[LLamaModelTensorBufferOverride](./llama.native.llamamodeltensorbufferoverride.md)
+
 [LLamaNativeBatch](./llama.native.llamanativebatch.md)
 
+[LLamaPerfContextTimings](./llama.native.llamaperfcontexttimings.md)
+
 [LLamaPoolingType](./llama.native.llamapoolingtype.md)
 
 [LLamaPos](./llama.native.llamapos.md)
 
 [LLamaRopeType](./llama.native.llamaropetype.md)
 
+[LLamaSamplerChainParams](./llama.native.llamasamplerchainparams.md)
+
+[LLamaSamplingTimings](./llama.native.llamasamplingtimings.md)
+
 [LLamaSeqId](./llama.native.llamaseqid.md)
 
 [LLamaToken](./llama.native.llamatoken.md)
 
+[LLamaTokenAttr](./llama.native.llamatokenattr.md)
+
 [LLamaTokenData](./llama.native.llamatokendata.md)
 
 [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)
 
 [LLamaTokenDataArrayNative](./llama.native.llamatokendataarraynative.md)
 
-[LLamaTokenType](./llama.native.llamatokentype.md)
-
 [LLamaVocabType](./llama.native.llamavocabtype.md)
 
 [LLavaImageEmbed](./llama.native.llavaimageembed.md)
 
+[LoraAdapter](./llama.native.loraadapter.md)
+
 [NativeApi](./llama.native.nativeapi.md)
 
 [NativeLibraryConfig](./llama.native.nativelibraryconfig.md)
 
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)
+
+[NativeLibraryFromPath](./llama.native.nativelibraryfrompath.md)
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)
+
+[NativeLibraryName](./llama.native.nativelibraryname.md)
+
+[NativeLibraryWithAvx](./llama.native.nativelibrarywithavx.md)
+
+[NativeLibraryWithCuda](./llama.native.nativelibrarywithcuda.md)
+
+[NativeLibraryWithMacOrFallback](./llama.native.nativelibrarywithmacorfallback.md)
+
+[NativeLibraryWithVulkan](./llama.native.nativelibrarywithvulkan.md)
+
+[NativeLogConfig](./llama.native.nativelogconfig.md)
+
 [RopeScalingType](./llama.native.ropescalingtype.md)
 
 [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
 
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)
-
 [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md)
 
 [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
 
+[SafeLLamaSamplerChainHandle](./llama.native.safellamasamplerchainhandle.md)
+
 [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)
 
 [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)
 
+[SystemInfo](./llama.native.systeminfo.md)
+
+[UnknownNativeLibrary](./llama.native.unknownnativelibrary.md)
+
 ## LLama.Sampling
 
 [BaseSamplingPipeline](./llama.sampling.basesamplingpipeline.md)
 
 [DefaultSamplingPipeline](./llama.sampling.defaultsamplingpipeline.md)
 
+[Grammar](./llama.sampling.grammar.md)
+
 [GreedySamplingPipeline](./llama.sampling.greedysamplingpipeline.md)
 
 [ISamplingPipeline](./llama.sampling.isamplingpipeline.md)
 
 [ISamplingPipelineExtensions](./llama.sampling.isamplingpipelineextensions.md)
 
-[Mirostate2SamplingPipeline](./llama.sampling.mirostate2samplingpipeline.md)
+## LLama.Transformers
 
-[MirostateSamplingPipeline](./llama.sampling.mirostatesamplingpipeline.md)
+[PromptTemplateTransformer](./llama.transformers.prompttemplatetransformer.md)
diff --git a/docs/xmldocs/llama.abstractions.adaptercollection.md b/docs/xmldocs/llama.abstractions.adaptercollection.md
deleted file mode 100644
index 4b49d3a7f..000000000
--- a/docs/xmldocs/llama.abstractions.adaptercollection.md
+++ /dev/null
@@ -1,92 +0,0 @@
-# AdapterCollection
-
-Namespace: LLama.Abstractions
-
-A list of LoraAdapter objects
-
-```csharp
-public sealed class AdapterCollection : System.Collections.Generic.List`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.Generic.IList`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.Generic.ICollection`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.Generic.IEnumerable`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.IEnumerable, System.Collections.IList, System.Collections.ICollection, System.Collections.Generic.IReadOnlyList`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.Collections.Generic.IReadOnlyCollection`1[[LLama.Abstractions.LoraAdapter, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], System.IEquatable`1[[LLama.Abstractions.AdapterCollection, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [List&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1) → [AdapterCollection](./llama.abstractions.adaptercollection.md)<br>
-Implements [IList&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ilist-1), [ICollection&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.icollection-1), [IEnumerable&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1), [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable), [IList](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ilist), [ICollection](https://docs.microsoft.com/en-us/dotnet/api/system.collections.icollection), [IReadOnlyList&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1), [IReadOnlyCollection&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlycollection-1), [IEquatable&lt;AdapterCollection&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Capacity**
-
-```csharp
-public int Capacity { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Count**
-
-```csharp
-public int Count { get; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Item**
-
-```csharp
-public LoraAdapter Item { get; set; }
-```
-
-#### Property Value
-
-[LoraAdapter](./llama.abstractions.loraadapter.md)<br>
-
-## Constructors
-
-### **AdapterCollection()**
-
-```csharp
-public AdapterCollection()
-```
-
-## Methods
-
-### **Equals(AdapterCollection)**
-
-```csharp
-public bool Equals(AdapterCollection other)
-```
-
-#### Parameters
-
-`other` [AdapterCollection](./llama.abstractions.adaptercollection.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
diff --git a/docs/xmldocs/llama.abstractions.icontextparams.md b/docs/xmldocs/llama.abstractions.icontextparams.md
index 1cfc4794b..5252e4804 100644
--- a/docs/xmldocs/llama.abstractions.icontextparams.md
+++ b/docs/xmldocs/llama.abstractions.icontextparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # IContextParams
 
 Namespace: LLama.Abstractions
@@ -8,6 +12,8 @@ The parameters for initializing a LLama context from a model.
 public interface IContextParams
 ```
 
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute)
+
 ## Properties
 
 ### **ContextSize**
@@ -24,7 +30,7 @@ public abstract Nullable<uint> ContextSize { get; }
 
 ### **BatchSize**
 
-batch size for prompt processing (must be &gt;=32 to use BLAS) (n_batch)
+maximum batch size that can be submitted at once (must be &gt;=32 to use BLAS) (n_batch)
 
 ```csharp
 public abstract uint BatchSize { get; }
@@ -34,25 +40,36 @@ public abstract uint BatchSize { get; }
 
 [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
 
-### **Seed**
+### **UBatchSize**
+
+Physical batch size
+
+```csharp
+public abstract uint UBatchSize { get; }
+```
+
+#### Property Value
+
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+### **SeqMax**
 
-Seed for the random number generator (seed)
+max number of sequences (i.e. distinct states for recurrent models)
 
 ```csharp
-public abstract uint Seed { get; }
+public abstract uint SeqMax { get; }
 ```
 
 #### Property Value
 
 [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
 
-### **EmbeddingMode**
+### **Embeddings**
 
-Whether to use embedding mode. (embedding) Note that if this is set to true, 
- The LLamaModel won't produce text response anymore.
+If true, extract embeddings (together with logits).
 
 ```csharp
-public abstract bool EmbeddingMode { get; }
+public abstract bool Embeddings { get; }
 ```
 
 #### Property Value
@@ -100,24 +117,24 @@ public abstract Encoding Encoding { get; }
 Number of threads (null = autodetect) (n_threads)
 
 ```csharp
-public abstract Nullable<uint> Threads { get; }
+public abstract Nullable<int> Threads { get; }
 ```
 
 #### Property Value
 
-[Nullable&lt;UInt32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
+[Nullable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
 
 ### **BatchThreads**
 
 Number of threads to use for batch processing (null = autodetect) (n_threads)
 
 ```csharp
-public abstract Nullable<uint> BatchThreads { get; }
+public abstract Nullable<int> BatchThreads { get; }
 ```
 
 #### Property Value
 
-[Nullable&lt;UInt32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
+[Nullable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
 
 ### **YarnExtrapolationFactor**
 
@@ -227,26 +244,55 @@ public abstract bool NoKqvOffload { get; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
+### **FlashAttention**
+
+Whether to use flash attention
+
+```csharp
+public abstract bool FlashAttention { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **DefragThreshold**
 
 defragment the KV cache if holes/size &gt; defrag_threshold, Set to &lt; 0 to disable (default)
+ defragment the KV cache if holes/size &gt; defrag_threshold, Set to  or &lt; 0 to disable (default)
 
 ```csharp
-public abstract float DefragThreshold { get; }
+public abstract Nullable<float> DefragThreshold { get; }
 ```
 
 #### Property Value
 
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+[Nullable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
 
-### **DoPooling**
+### **PoolingType**
 
-Whether to pool (sum) embedding results by sequence id (ignored if no pooling layer)
+How to pool (sum) embedding results by sequence id (ignored if no pooling layer)
 
 ```csharp
-public abstract bool DoPooling { get; }
+public abstract LLamaPoolingType PoolingType { get; }
 ```
 
 #### Property Value
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[LLamaPoolingType](./llama.native.llamapoolingtype.md)<br>
+
+### **AttentionType**
+
+Attention type to use for embeddings
+
+```csharp
+public abstract LLamaAttentionType AttentionType { get; }
+```
+
+#### Property Value
+
+[LLamaAttentionType](./llama.native.llamaattentiontype.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.ihistorytransform.md b/docs/xmldocs/llama.abstractions.ihistorytransform.md
index b76503ac0..4f9475b9f 100644
--- a/docs/xmldocs/llama.abstractions.ihistorytransform.md
+++ b/docs/xmldocs/llama.abstractions.ihistorytransform.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # IHistoryTransform
 
 Namespace: LLama.Abstractions
@@ -8,6 +12,8 @@ Transform history to plain text and vice versa.
 public interface IHistoryTransform
 ```
 
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), JsonConverterAttribute
+
 ## Methods
 
 ### **HistoryToText(ChatHistory)**
@@ -59,3 +65,7 @@ IHistoryTransform Clone()
 #### Returns
 
 [IHistoryTransform](./llama.abstractions.ihistorytransform.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.iinferenceparams.md b/docs/xmldocs/llama.abstractions.iinferenceparams.md
index 6b1bc27f8..2e299317f 100644
--- a/docs/xmldocs/llama.abstractions.iinferenceparams.md
+++ b/docs/xmldocs/llama.abstractions.iinferenceparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # IInferenceParams
 
 Namespace: LLama.Abstractions
@@ -8,6 +12,8 @@ The parameters used for inference.
 public interface IInferenceParams
 ```
 
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute)
+
 ## Properties
 
 ### **TokensKeep**
@@ -35,18 +41,6 @@ public abstract int MaxTokens { get; set; }
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **LogitBias**
-
-logit bias for specific tokens
-
-```csharp
-public abstract Dictionary<LLamaToken, float> LogitBias { get; set; }
-```
-
-#### Property Value
-
-[Dictionary&lt;LLamaToken, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
-
 ### **AntiPrompts**
 
 Sequences where the model will stop generating further tokens.
@@ -59,198 +53,30 @@ public abstract IReadOnlyList<string> AntiPrompts { get; set; }
 
 [IReadOnlyList&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
 
-### **TopK**
-
-0 or lower to use vocab size
-
-```csharp
-public abstract int TopK { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **TopP**
-
-1.0 = disabled
-
-```csharp
-public abstract float TopP { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **MinP**
-
-0.0 = disabled
-
-```csharp
-public abstract float MinP { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **TfsZ**
-
-1.0 = disabled
-
-```csharp
-public abstract float TfsZ { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **TypicalP**
-
-1.0 = disabled
-
-```csharp
-public abstract float TypicalP { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Temperature**
-
-1.0 = disabled
-
-```csharp
-public abstract float Temperature { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **RepeatPenalty**
-
-1.0 = disabled
-
-```csharp
-public abstract float RepeatPenalty { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **RepeatLastTokensCount**
-
-last n tokens to penalize (0 = disable penalty, -1 = context size) (repeat_last_n)
-
-```csharp
-public abstract int RepeatLastTokensCount { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **FrequencyPenalty**
-
-frequency penalty coefficient
- 0.0 = disabled
-
-```csharp
-public abstract float FrequencyPenalty { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **PresencePenalty**
-
-presence penalty coefficient
- 0.0 = disabled
-
-```csharp
-public abstract float PresencePenalty { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Mirostat**
-
-Mirostat uses tokens instead of words.
- algorithm described in the paper https://arxiv.org/abs/2007.14966.
- 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
-
-```csharp
-public abstract MirostatType Mirostat { get; set; }
-```
-
-#### Property Value
-
-[MirostatType](./llama.common.mirostattype.md)<br>
-
-### **MirostatTau**
-
-target entropy
-
-```csharp
-public abstract float MirostatTau { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **MirostatEta**
+### **SamplingPipeline**
 
-learning rate
+Set a custom sampling pipeline to use.
 
 ```csharp
-public abstract float MirostatEta { get; set; }
+public abstract ISamplingPipeline SamplingPipeline { get; set; }
 ```
 
 #### Property Value
 
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
 
-### **PenalizeNL**
+### **DecodeSpecialTokens**
 
-consider newlines as a repeatable token (penalize_nl)
+If true, special characters will be converted to text. If false they will be invisible.
 
 ```csharp
-public abstract bool PenalizeNL { get; set; }
+public abstract bool DecodeSpecialTokens { get; set; }
 ```
 
 #### Property Value
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **Grammar**
-
-Grammar to constrain possible tokens
+---
 
-```csharp
-public abstract SafeLLamaGrammarHandle Grammar { get; set; }
-```
-
-#### Property Value
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
-### **SamplingPipeline**
-
-Set a custom sampling pipeline to use. If this is set All other sampling parameters are ignored!
-
-```csharp
-public abstract ISamplingPipeline SamplingPipeline { get; set; }
-```
-
-#### Property Value
-
-[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.illamaexecutor.md b/docs/xmldocs/llama.abstractions.illamaexecutor.md
index 72eab1bc7..d00a478de 100644
--- a/docs/xmldocs/llama.abstractions.illamaexecutor.md
+++ b/docs/xmldocs/llama.abstractions.illamaexecutor.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ILLamaExecutor
 
 Namespace: LLama.Abstractions
@@ -8,6 +12,8 @@ A high level interface for LLama models.
 public interface ILLamaExecutor
 ```
 
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute)
+
 ## Properties
 
 ### **Context**
@@ -36,7 +42,7 @@ public abstract bool IsMultiModal { get; }
 
 ### **ClipModel**
 
-Muti-Modal Projections / Clip Model weights
+Multi-Modal Projections / Clip Model weights
 
 ```csharp
 public abstract LLavaWeights ClipModel { get; }
@@ -46,17 +52,17 @@ public abstract LLavaWeights ClipModel { get; }
 
 [LLavaWeights](./llama.llavaweights.md)<br>
 
-### **ImagePaths**
+### **Images**
 
-List of images: Image filename and path (jpeg images).
+List of images: List of images in byte array format.
 
 ```csharp
-public abstract List<string> ImagePaths { get; set; }
+public abstract List<Byte[]> Images { get; }
 ```
 
 #### Property Value
 
-[List&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+[List&lt;Byte[]&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
 
 ## Methods
 
@@ -82,3 +88,7 @@ A cancellation token.
 #### Returns
 
 [IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.illamaparams.md b/docs/xmldocs/llama.abstractions.illamaparams.md
index e3d598db1..e4cf78fe8 100644
--- a/docs/xmldocs/llama.abstractions.illamaparams.md
+++ b/docs/xmldocs/llama.abstractions.illamaparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ILLamaParams
 
 Namespace: LLama.Abstractions
@@ -13,3 +17,7 @@ Implements [IModelParams](./llama.abstractions.imodelparams.md), [IContextParams
 **Remarks:**
 
 Mostly exists for backwards compatibility reasons, when these two were not split.
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.imodelparams.md b/docs/xmldocs/llama.abstractions.imodelparams.md
index f319a49ee..87cd8ff32 100644
--- a/docs/xmldocs/llama.abstractions.imodelparams.md
+++ b/docs/xmldocs/llama.abstractions.imodelparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # IModelParams
 
 Namespace: LLama.Abstractions
@@ -8,12 +12,17 @@ The parameters for initializing a LLama model.
 public interface IModelParams
 ```
 
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute)
+
 ## Properties
 
 ### **MainGpu**
 
 main_gpu interpretation depends on split_mode:
- NoneThe GPU that is used for the entire mode.RowThe GPU that is used for small tensors and intermediate results.LayerIgnored.
+
+- **None** - The GPU that is used for the entire mode.
+- **Row** - The GPU that is used for small tensors and intermediate results.
+- **Layer** - Ignored.
 
 ```csharp
 public abstract int MainGpu { get; set; }
@@ -28,12 +37,25 @@ public abstract int MainGpu { get; set; }
 How to split the model across multiple GPUs
 
 ```csharp
-public abstract GPUSplitMode SplitMode { get; }
+public abstract Nullable<GPUSplitMode> SplitMode { get; }
+```
+
+#### Property Value
+
+[Nullable&lt;GPUSplitMode&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
+
+### **TensorBufferOverrides**
+
+Buffer type overrides for specific tensor patterns, allowing you to specify hardware devices to use for individual tensors or sets of tensors.
+ Equivalent to --override-tensor or -ot on the llama.cpp command line or tensor_buft_overrides internally.
+
+```csharp
+public abstract List<TensorBufferOverride> TensorBufferOverrides { get; }
 ```
 
 #### Property Value
 
-[GPUSplitMode](./llama.native.gpusplitmode.md)<br>
+[List&lt;TensorBufferOverride&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
 
 ### **GpuLayerCount**
 
@@ -107,29 +129,17 @@ public abstract bool VocabOnly { get; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **LoraAdapters**
+### **CheckTensors**
 
-List of LoRA adapters to apply
+Validate model tensor data before loading
 
 ```csharp
-public abstract AdapterCollection LoraAdapters { get; }
+public abstract bool CheckTensors { get; }
 ```
 
 #### Property Value
 
-[AdapterCollection](./llama.abstractions.adaptercollection.md)<br>
-
-### **LoraBase**
-
-base model path for the lora adapter (lora_base)
-
-```csharp
-public abstract string LoraBase { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
 ### **MetadataOverrides**
 
@@ -142,3 +152,7 @@ public abstract List<MetadataOverride> MetadataOverrides { get; }
 #### Property Value
 
 [List&lt;MetadataOverride&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.inativelibrary.md b/docs/xmldocs/llama.abstractions.inativelibrary.md
new file mode 100644
index 000000000..934f494d1
--- /dev/null
+++ b/docs/xmldocs/llama.abstractions.inativelibrary.md
@@ -0,0 +1,57 @@
+[`< Back`](./)
+
+---
+
+# INativeLibrary
+
+Namespace: LLama.Abstractions
+
+Descriptor of a native library.
+
+```csharp
+public interface INativeLibrary
+```
+
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute)
+
+## Properties
+
+### **Metadata**
+
+Metadata of this library.
+
+```csharp
+public abstract NativeLibraryMetadata Metadata { get; }
+```
+
+#### Property Value
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+## Methods
+
+### **Prepare(SystemInfo, LLamaLogCallback)**
+
+Prepare the native library file and returns the local path of it.
+ If it's a relative path, LLamaSharp will search the path in the search directies you set.
+
+```csharp
+IEnumerable<string> Prepare(SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+The system information of the current machine.
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+The log callback.
+
+#### Returns
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+The relative paths of the library. You could return multiple paths to try them one by one. If no file is available, please return an empty array.
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.inativelibraryselectingpolicy.md b/docs/xmldocs/llama.abstractions.inativelibraryselectingpolicy.md
new file mode 100644
index 000000000..18649e1d2
--- /dev/null
+++ b/docs/xmldocs/llama.abstractions.inativelibraryselectingpolicy.md
@@ -0,0 +1,44 @@
+[`< Back`](./)
+
+---
+
+# INativeLibrarySelectingPolicy
+
+Namespace: LLama.Abstractions
+
+Decides the selected native library that should be loaded according to the configurations.
+
+```csharp
+public interface INativeLibrarySelectingPolicy
+```
+
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute)
+
+## Methods
+
+### **Apply(Description, SystemInfo, LLamaLogCallback)**
+
+Select the native library.
+
+```csharp
+IEnumerable<INativeLibrary> Apply(Description description, SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`description` [Description](./llama.native.nativelibraryconfig.description.md)<br>
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+The system information of the current machine.
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+The log callback.
+
+#### Returns
+
+[IEnumerable&lt;INativeLibrary&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+The information of the selected native library files, in order by priority from the beginning to the end.
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.itextstreamtransform.md b/docs/xmldocs/llama.abstractions.itextstreamtransform.md
index 69e163aa7..223646712 100644
--- a/docs/xmldocs/llama.abstractions.itextstreamtransform.md
+++ b/docs/xmldocs/llama.abstractions.itextstreamtransform.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ITextStreamTransform
 
 Namespace: LLama.Abstractions
@@ -8,6 +12,8 @@ Takes a stream of tokens and transforms them.
 public interface ITextStreamTransform
 ```
 
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), JsonConverterAttribute
+
 ## Methods
 
 ### **TransformAsync(IAsyncEnumerable&lt;String&gt;)**
@@ -37,3 +43,7 @@ ITextStreamTransform Clone()
 #### Returns
 
 [ITextStreamTransform](./llama.abstractions.itextstreamtransform.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.itexttransform.md b/docs/xmldocs/llama.abstractions.itexttransform.md
index f38c028d5..4f9c2d907 100644
--- a/docs/xmldocs/llama.abstractions.itexttransform.md
+++ b/docs/xmldocs/llama.abstractions.itexttransform.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ITextTransform
 
 Namespace: LLama.Abstractions
@@ -14,6 +18,8 @@ An interface for text transformations.
 public interface ITextTransform
 ```
 
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), JsonConverterAttribute
+
 ## Methods
 
 ### **Transform(String)**
@@ -43,3 +49,7 @@ ITextTransform Clone()
 #### Returns
 
 [ITextTransform](./llama.abstractions.itexttransform.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.llamaexecutorextensions.md b/docs/xmldocs/llama.abstractions.llamaexecutorextensions.md
new file mode 100644
index 000000000..686b1f746
--- /dev/null
+++ b/docs/xmldocs/llama.abstractions.llamaexecutorextensions.md
@@ -0,0 +1,51 @@
+[`< Back`](./)
+
+---
+
+# LLamaExecutorExtensions
+
+Namespace: LLama.Abstractions
+
+Extension methods to the [LLamaExecutorExtensions](./llama.abstractions.llamaexecutorextensions.md) interface.
+
+```csharp
+public static class LLamaExecutorExtensions
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaExecutorExtensions](./llama.abstractions.llamaexecutorextensions.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute), [ExtensionAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.extensionattribute)
+
+## Methods
+
+### **AsChatClient(ILLamaExecutor, IHistoryTransform, ITextStreamTransform)**
+
+Gets an  instance for the specified [ILLamaExecutor](./llama.abstractions.illamaexecutor.md).
+
+```csharp
+public static IChatClient AsChatClient(ILLamaExecutor executor, IHistoryTransform historyTransform, ITextStreamTransform outputTransform)
+```
+
+#### Parameters
+
+`executor` [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)<br>
+The executor.
+
+`historyTransform` [IHistoryTransform](./llama.abstractions.ihistorytransform.md)<br>
+The [IHistoryTransform](./llama.abstractions.ihistorytransform.md) to use to transform an input list messages into a prompt.
+
+`outputTransform` [ITextStreamTransform](./llama.abstractions.itextstreamtransform.md)<br>
+The [ITextStreamTransform](./llama.abstractions.itextstreamtransform.md) to use to transform the output into text.
+
+#### Returns
+
+IChatClient<br>
+An  instance for the provided [ILLamaExecutor](./llama.abstractions.illamaexecutor.md).
+
+#### Exceptions
+
+[ArgumentNullException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentnullexception)<br>
+`executor` is null.
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.loraadapter.md b/docs/xmldocs/llama.abstractions.loraadapter.md
deleted file mode 100644
index 09d487438..000000000
--- a/docs/xmldocs/llama.abstractions.loraadapter.md
+++ /dev/null
@@ -1,118 +0,0 @@
-# LoraAdapter
-
-Namespace: LLama.Abstractions
-
-A LoRA adapter to apply to a model
-
-```csharp
-public struct LoraAdapter
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LoraAdapter](./llama.abstractions.loraadapter.md)<br>
-Implements [IEquatable&lt;LoraAdapter&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Path**
-
-Path to the LoRA file
-
-```csharp
-public string Path { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Scale**
-
-Strength of this LoRA
-
-```csharp
-public float Scale { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-## Constructors
-
-### **LoraAdapter(String, Single)**
-
-A LoRA adapter to apply to a model
-
-```csharp
-LoraAdapter(string Path, float Scale)
-```
-
-#### Parameters
-
-`Path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-Path to the LoRA file
-
-`Scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-Strength of this LoRA
-
-## Methods
-
-### **ToString()**
-
-```csharp
-string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **GetHashCode()**
-
-```csharp
-int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(LoraAdapter)**
-
-```csharp
-bool Equals(LoraAdapter other)
-```
-
-#### Parameters
-
-`other` [LoraAdapter](./llama.abstractions.loraadapter.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Deconstruct(String&, Single&)**
-
-```csharp
-void Deconstruct(String& Path, Single& Scale)
-```
-
-#### Parameters
-
-`Path` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Scale` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
diff --git a/docs/xmldocs/llama.abstractions.metadataoverride.md b/docs/xmldocs/llama.abstractions.metadataoverride.md
index 293e61b33..d3edf67df 100644
--- a/docs/xmldocs/llama.abstractions.metadataoverride.md
+++ b/docs/xmldocs/llama.abstractions.metadataoverride.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # MetadataOverride
 
 Namespace: LLama.Abstractions
@@ -9,7 +13,8 @@ public sealed class MetadataOverride : System.IEquatable`1[[LLama.Abstractions.M
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [MetadataOverride](./llama.abstractions.metadataoverride.md)<br>
-Implements [IEquatable&lt;MetadataOverride&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+Implements [IEquatable&lt;MetadataOverride&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute), JsonConverterAttribute
 
 ## Properties
 
@@ -69,27 +74,21 @@ public MetadataOverride(string key, bool value)
 
 `value` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-## Methods
+### **MetadataOverride(String, String)**
 
-### **WriteValue(LLamaModelMetadataOverride&)**
+Create a new override for a string key
 
 ```csharp
-internal void WriteValue(LLamaModelMetadataOverride& dest)
+public MetadataOverride(string key, string value)
 ```
 
 #### Parameters
 
-`dest` [LLamaModelMetadataOverride&](./llama.native.llamamodelmetadataoverride&.md)<br>
-
-### **WriteValue(Utf8JsonWriter)**
-
-```csharp
-internal void WriteValue(Utf8JsonWriter writer)
-```
+`key` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-#### Parameters
+`value` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`writer` Utf8JsonWriter<br>
+## Methods
 
 ### **ToString()**
 
@@ -148,3 +147,7 @@ public MetadataOverride <Clone>$()
 #### Returns
 
 [MetadataOverride](./llama.abstractions.metadataoverride.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.metadataoverrideconverter.md b/docs/xmldocs/llama.abstractions.metadataoverrideconverter.md
index 18afc9d32..9c78883b7 100644
--- a/docs/xmldocs/llama.abstractions.metadataoverrideconverter.md
+++ b/docs/xmldocs/llama.abstractions.metadataoverrideconverter.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # MetadataOverrideConverter
 
 Namespace: LLama.Abstractions
@@ -8,7 +12,8 @@ A JSON converter for [MetadataOverride](./llama.abstractions.metadataoverride.md
 public class MetadataOverrideConverter : System.Text.Json.Serialization.JsonConverter`1[[LLama.Abstractions.MetadataOverride, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → JsonConverter → JsonConverter&lt;MetadataOverride&gt; → [MetadataOverrideConverter](./llama.abstractions.metadataoverrideconverter.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → JsonConverter → JsonConverter&lt;MetadataOverride&gt; → [MetadataOverrideConverter](./llama.abstractions.metadataoverrideconverter.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -22,6 +27,16 @@ public bool HandleNull { get; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
+### **Type**
+
+```csharp
+public Type Type { get; }
+```
+
+#### Property Value
+
+[Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
+
 ## Constructors
 
 ### **MetadataOverrideConverter()**
@@ -63,3 +78,7 @@ public void Write(Utf8JsonWriter writer, MetadataOverride value, JsonSerializerO
 `value` [MetadataOverride](./llama.abstractions.metadataoverride.md)<br>
 
 `options` JsonSerializerOptions<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.tensorbufferoverride.md b/docs/xmldocs/llama.abstractions.tensorbufferoverride.md
new file mode 100644
index 000000000..0d8ed0671
--- /dev/null
+++ b/docs/xmldocs/llama.abstractions.tensorbufferoverride.md
@@ -0,0 +1,64 @@
+[`< Back`](./)
+
+---
+
+# TensorBufferOverride
+
+Namespace: LLama.Abstractions
+
+Represents a mapping between a tensor name pattern and a specific buffer type
+
+```csharp
+public class TensorBufferOverride
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [TensorBufferOverride](./llama.abstractions.tensorbufferoverride.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **Pattern**
+
+Pattern to match tensor names. This is a regular expression. You can check the tensor names via the model.Metadata.
+
+```csharp
+public string Pattern { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **BufferType**
+
+Buffer type to use for matching tensors. Examples: CPU, GPU0, GPU1
+
+```csharp
+public string BufferType { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Constructors
+
+### **TensorBufferOverride(String, String)**
+
+Creates a new tensor buffer override
+
+```csharp
+public TensorBufferOverride(string pattern, string bufferType)
+```
+
+#### Parameters
+
+`pattern` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Pattern to match tensor names
+
+`bufferType` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Buffer type to use for matching tensors
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.tensorsplitscollection.md b/docs/xmldocs/llama.abstractions.tensorsplitscollection.md
index d5723745a..dff87538f 100644
--- a/docs/xmldocs/llama.abstractions.tensorsplitscollection.md
+++ b/docs/xmldocs/llama.abstractions.tensorsplitscollection.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # TensorSplitsCollection
 
 Namespace: LLama.Abstractions
@@ -5,11 +9,12 @@ Namespace: LLama.Abstractions
 A fixed size array to set the tensor splits across multiple GPUs
 
 ```csharp
-public sealed class TensorSplitsCollection : System.Collections.Generic.IEnumerable`1[[System.Single, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Collections.IEnumerable
+public sealed class TensorSplitsCollection : System.Collections.Generic.IEnumerable`1[[System.Single, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Collections.IEnumerable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)<br>
-Implements [IEnumerable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1), [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable)
+Implements [IEnumerable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1), [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute), [DefaultMemberAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.defaultmemberattribute), JsonConverterAttribute
 
 ## Properties
 
@@ -71,16 +76,6 @@ Set all values to zero
 public void Clear()
 ```
 
-### **Pin()**
-
-```csharp
-internal MemoryHandle Pin()
-```
-
-#### Returns
-
-[MemoryHandle](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memoryhandle)<br>
-
 ### **GetEnumerator()**
 
 ```csharp
@@ -90,3 +85,7 @@ public IEnumerator<float> GetEnumerator()
 #### Returns
 
 [IEnumerator&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerator-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.abstractions.tensorsplitscollectionconverter.md b/docs/xmldocs/llama.abstractions.tensorsplitscollectionconverter.md
index 3b16aade7..7d3a0dc01 100644
--- a/docs/xmldocs/llama.abstractions.tensorsplitscollectionconverter.md
+++ b/docs/xmldocs/llama.abstractions.tensorsplitscollectionconverter.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # TensorSplitsCollectionConverter
 
 Namespace: LLama.Abstractions
@@ -8,7 +12,8 @@ A JSON converter for [TensorSplitsCollection](./llama.abstractions.tensorsplitsc
 public class TensorSplitsCollectionConverter : System.Text.Json.Serialization.JsonConverter`1[[LLama.Abstractions.TensorSplitsCollection, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → JsonConverter → JsonConverter&lt;TensorSplitsCollection&gt; → [TensorSplitsCollectionConverter](./llama.abstractions.tensorsplitscollectionconverter.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → JsonConverter → JsonConverter&lt;TensorSplitsCollection&gt; → [TensorSplitsCollectionConverter](./llama.abstractions.tensorsplitscollectionconverter.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -22,6 +27,16 @@ public bool HandleNull { get; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
+### **Type**
+
+```csharp
+public Type Type { get; }
+```
+
+#### Property Value
+
+[Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
+
 ## Constructors
 
 ### **TensorSplitsCollectionConverter()**
@@ -63,3 +78,7 @@ public void Write(Utf8JsonWriter writer, TensorSplitsCollection value, JsonSeria
 `value` [TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)<br>
 
 `options` JsonSerializerOptions<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.antipromptprocessor.md b/docs/xmldocs/llama.antipromptprocessor.md
index 73edc2779..cab384a83 100644
--- a/docs/xmldocs/llama.antipromptprocessor.md
+++ b/docs/xmldocs/llama.antipromptprocessor.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # AntipromptProcessor
 
 Namespace: LLama
@@ -8,7 +12,8 @@ AntipromptProcessor keeps track of past tokens looking for any set Anti-Prompts
 public sealed class AntipromptProcessor
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [AntipromptProcessor](./llama.antipromptprocessor.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [AntipromptProcessor](./llama.antipromptprocessor.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Constructors
 
@@ -67,3 +72,7 @@ public bool Add(string text)
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 true if the text buffer ends with any antiprompt
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.alreadypromptedconversationexception.md b/docs/xmldocs/llama.batched.alreadypromptedconversationexception.md
index 227fb0590..7dc933556 100644
--- a/docs/xmldocs/llama.batched.alreadypromptedconversationexception.md
+++ b/docs/xmldocs/llama.batched.alreadypromptedconversationexception.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # AlreadyPromptedConversationException
 
 Namespace: LLama.Batched
@@ -94,3 +98,21 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.batchedexecutor.md b/docs/xmldocs/llama.batched.batchedexecutor.md
index 0c3353ac1..2a90b6691 100644
--- a/docs/xmldocs/llama.batched.batchedexecutor.md
+++ b/docs/xmldocs/llama.batched.batchedexecutor.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # BatchedExecutor
 
 Namespace: LLama.Batched
@@ -9,7 +13,8 @@ public sealed class BatchedExecutor : System.IDisposable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [BatchedExecutor](./llama.batched.batchedexecutor.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -49,6 +54,18 @@ public int BatchedTokenCount { get; }
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
+### **BatchQueueCount**
+
+Number of batches in the queue, waiting for [BatchedExecutor.Infer(CancellationToken)](./llama.batched.batchedexecutor.md#infercancellationtoken) to be called
+
+```csharp
+public int BatchQueueCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **IsDisposed**
 
 Check if this executor has been disposed.
@@ -81,40 +98,59 @@ Parameters to create a new context
 
 ## Methods
 
-### **Prompt(String)**
+### **Create()**
 
-#### Caution
+Start a new [Conversation](./llama.batched.conversation.md)
 
-Use BatchedExecutor.Create instead
+```csharp
+public Conversation Create()
+```
 
----
+#### Returns
 
-Start a new [Conversation](./llama.batched.conversation.md) with the given prompt
+[Conversation](./llama.batched.conversation.md)<br>
+
+### **Load(String)**
+
+Load a conversation that was previously saved to a file. Once loaded the conversation will
+ need to be prompted.
 
 ```csharp
-public Conversation Prompt(string prompt)
+public Conversation Load(string filepath)
 ```
 
 #### Parameters
 
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`filepath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
 #### Returns
 
 [Conversation](./llama.batched.conversation.md)<br>
 
-### **Create()**
+#### Exceptions
 
-Start a new [Conversation](./llama.batched.conversation.md)
+[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
+
+### **Load(State)**
+
+Load a conversation that was previously saved into memory. Once loaded the conversation will need to be prompted.
 
 ```csharp
-public Conversation Create()
+public Conversation Load(State state)
 ```
 
+#### Parameters
+
+`state` [State](./llama.batched.conversation.state.md)<br>
+
 #### Returns
 
 [Conversation](./llama.batched.conversation.md)<br>
 
+#### Exceptions
+
+[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
+
 ### **Infer(CancellationToken)**
 
 Run inference for all conversations in the batch which have pending tokens.
@@ -140,12 +176,6 @@ public Task<DecodeResult> Infer(CancellationToken cancellation)
 public void Dispose()
 ```
 
-### **GetNextSequenceId()**
-
-```csharp
-internal LLamaSeqId GetNextSequenceId()
-```
-
-#### Returns
+---
 
-[LLamaSeqId](./llama.native.llamaseqid.md)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.cannotmodifywhilerequiresinferenceexception.md b/docs/xmldocs/llama.batched.cannotmodifywhilerequiresinferenceexception.md
index 09e20f8c9..e630a8791 100644
--- a/docs/xmldocs/llama.batched.cannotmodifywhilerequiresinferenceexception.md
+++ b/docs/xmldocs/llama.batched.cannotmodifywhilerequiresinferenceexception.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # CannotModifyWhileRequiresInferenceException
 
 Namespace: LLama.Batched
@@ -92,3 +96,21 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.cannotsamplerequiresinferenceexception.md b/docs/xmldocs/llama.batched.cannotsamplerequiresinferenceexception.md
index 4bda45d36..4fe0a87bc 100644
--- a/docs/xmldocs/llama.batched.cannotsamplerequiresinferenceexception.md
+++ b/docs/xmldocs/llama.batched.cannotsamplerequiresinferenceexception.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # CannotSampleRequiresInferenceException
 
 Namespace: LLama.Batched
@@ -94,3 +98,21 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.cannotsamplerequirespromptexception.md b/docs/xmldocs/llama.batched.cannotsamplerequirespromptexception.md
index d3a72c7b1..5f1474ad3 100644
--- a/docs/xmldocs/llama.batched.cannotsamplerequirespromptexception.md
+++ b/docs/xmldocs/llama.batched.cannotsamplerequirespromptexception.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # CannotSampleRequiresPromptException
 
 Namespace: LLama.Batched
@@ -94,3 +98,21 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.cannotforkwhilerequiresinferenceexception.md b/docs/xmldocs/llama.batched.cannotsavewhilerequiresinferenceexception.md
similarity index 70%
rename from docs/xmldocs/llama.batched.cannotforkwhilerequiresinferenceexception.md
rename to docs/xmldocs/llama.batched.cannotsavewhilerequiresinferenceexception.md
index 752f24107..71ecdb379 100644
--- a/docs/xmldocs/llama.batched.cannotforkwhilerequiresinferenceexception.md
+++ b/docs/xmldocs/llama.batched.cannotsavewhilerequiresinferenceexception.md
@@ -1,14 +1,20 @@
-# CannotForkWhileRequiresInferenceException
+[`< Back`](./)
+
+---
+
+# CannotSaveWhileRequiresInferenceException
 
 Namespace: LLama.Batched
 
-This exception is thrown when [Conversation.Fork()](./llama.batched.conversation.md#fork) is called when [Conversation.RequiresInference](./llama.batched.conversation.md#requiresinference) = true
+This exception is thrown when "Save()" is called on a [Conversation](./llama.batched.conversation.md) which has
+ already been prompted and before "Infer()" has been called.
+ [BatchedExecutor](./llama.batched.batchedexecutor.md).
 
 ```csharp
-public class CannotForkWhileRequiresInferenceException : ExperimentalBatchedExecutorException, System.Runtime.Serialization.ISerializable
+public class CannotSaveWhileRequiresInferenceException : ExperimentalBatchedExecutorException, System.Runtime.Serialization.ISerializable
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md) → [CannotForkWhileRequiresInferenceException](./llama.batched.cannotforkwhilerequiresinferenceexception.md)<br>
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [ExperimentalBatchedExecutorException](./llama.batched.experimentalbatchedexecutorexception.md) → [CannotSaveWhileRequiresInferenceException](./llama.batched.cannotsavewhilerequiresinferenceexception.md)<br>
 Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
 
 ## Properties
@@ -92,3 +98,21 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.conversation.md b/docs/xmldocs/llama.batched.conversation.md
index 115a95e42..597cba0f2 100644
--- a/docs/xmldocs/llama.batched.conversation.md
+++ b/docs/xmldocs/llama.batched.conversation.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # Conversation
 
 Namespace: LLama.Batched
@@ -9,7 +13,8 @@ public sealed class Conversation : System.IDisposable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Conversation](./llama.batched.conversation.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -127,17 +132,31 @@ public Conversation Fork()
 
 The copy shares internal state, so consumes very little extra memory.
 
-### **Sample()**
+### **GetSampleIndex(Int32)**
 
-Get the logits from this conversation, ready for sampling
+Get the index in the context which each token can be sampled from, the return value of this function get be used to retrieve logits
+ ([SafeLLamaContextHandle.GetLogitsIth(Int32)](./llama.native.safellamacontexthandle.md#getlogitsithint32)) or to sample a token ([SafeLLamaSamplerChainHandle.Sample(SafeLLamaContextHandle, Int32)](./llama.native.safellamasamplerchainhandle.md#samplesafellamacontexthandle-int32).
 
 ```csharp
-public Span<float> Sample()
+public int GetSampleIndex(int offset)
 ```
 
+#### Parameters
+
+`offset` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+How far from the end of the previous prompt should logits be sampled. Any value other than 0 requires
+ allLogits to have been set during prompting.<br>
+ For example if 5 tokens were supplied in the last prompt call:
+
+- 
+- 
+- 
+- 
+-
+
 #### Returns
 
-[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
 #### Exceptions
 
@@ -149,48 +168,69 @@ Thrown if this conversation was not prompted before the previous call to infer
 [CannotSampleRequiresInferenceException](./llama.batched.cannotsamplerequiresinferenceexception.md)<br>
 Thrown if Infer() must be called on the executor
 
-### **Prompt(String)**
+### **Sample(Int32)**
 
-Add tokens to this conversation
+Get the logits from this conversation, ready for sampling
 
 ```csharp
-public void Prompt(string input)
+public Span<float> Sample(int offset)
 ```
 
 #### Parameters
 
-`input` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`offset` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+How far from the end of the previous prompt should logits be sampled. Any value other than 0 requires allLogits to have been set during prompting
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+#### Exceptions
+
+[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
+
+[CannotSampleRequiresPromptException](./llama.batched.cannotsamplerequirespromptexception.md)<br>
+Thrown if this conversation was not prompted before the previous call to infer
+
+[CannotSampleRequiresInferenceException](./llama.batched.cannotsamplerequiresinferenceexception.md)<br>
+Thrown if Infer() must be called on the executor
 
-### **Prompt(List&lt;LLamaToken&gt;)**
+### **Prompt(List&lt;LLamaToken&gt;, Boolean)**
 
 Add tokens to this conversation
 
 ```csharp
-public void Prompt(List<LLamaToken> tokens)
+public void Prompt(List<LLamaToken> tokens, bool allLogits)
 ```
 
 #### Parameters
 
 `tokens` [List&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
 
+`allLogits` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+If true, generate logits for all tokens. If false, only generate logits for the last token.
+
 #### Exceptions
 
 [ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
 
 [AlreadyPromptedConversationException](./llama.batched.alreadypromptedconversationexception.md)<br>
 
-### **Prompt(ReadOnlySpan&lt;LLamaToken&gt;)**
+### **Prompt(ReadOnlySpan&lt;LLamaToken&gt;, Boolean)**
 
 Add tokens to this conversation
 
 ```csharp
-public void Prompt(ReadOnlySpan<LLamaToken> tokens)
+public void Prompt(ReadOnlySpan<LLamaToken> tokens, bool allLogits)
 ```
 
 #### Parameters
 
 `tokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
 
+`allLogits` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+If true, generate logits for all tokens. If false, only generate logits for the last token.
+
 #### Exceptions
 
 [ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
@@ -215,6 +255,31 @@ public void Prompt(LLamaToken token)
 
 [AlreadyPromptedConversationException](./llama.batched.alreadypromptedconversationexception.md)<br>
 
+### **Prompt(SafeLlavaImageEmbedHandle)**
+
+Prompt this conversation with an image embedding
+
+```csharp
+public void Prompt(SafeLlavaImageEmbedHandle embedding)
+```
+
+#### Parameters
+
+`embedding` [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+
+### **Prompt(ReadOnlySpan&lt;Single&gt;)**
+
+Prompt this conversation with embeddings
+
+```csharp
+public void Prompt(ReadOnlySpan<float> embeddings)
+```
+
+#### Parameters
+
+`embeddings` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+The raw values of the embeddings. This span must divide equally by the embedding size of this model.
+
 ### **Modify(ModifyKvCache)**
 
 Directly modify the KV cache of this conversation
@@ -231,3 +296,35 @@ public void Modify(ModifyKvCache modifier)
 
 [CannotModifyWhileRequiresInferenceException](./llama.batched.cannotmodifywhilerequiresinferenceexception.md)<br>
 Thrown if this method is called while [Conversation.RequiresInference](./llama.batched.conversation.md#requiresinference) == true
+
+### **Save(String)**
+
+Save the complete state of this conversation to a file. if the file already exists it will be overwritten.
+
+```csharp
+public void Save(string filepath)
+```
+
+#### Parameters
+
+`filepath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Exceptions
+
+[CannotSaveWhileRequiresInferenceException](./llama.batched.cannotsavewhilerequiresinferenceexception.md)<br>
+
+### **Save()**
+
+Save the complete state of this conversation in system memory.
+
+```csharp
+public State Save()
+```
+
+#### Returns
+
+[State](./llama.batched.conversation.state.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.conversationextensions.md b/docs/xmldocs/llama.batched.conversationextensions.md
index 30cdfa2bb..c59b6e73d 100644
--- a/docs/xmldocs/llama.batched.conversationextensions.md
+++ b/docs/xmldocs/llama.batched.conversationextensions.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ConversationExtensions
 
 Namespace: LLama.Batched
@@ -8,10 +12,55 @@ Extension method for [Conversation](./llama.batched.conversation.md)
 public static class ConversationExtensions
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ConversationExtensions](./llama.batched.conversationextensions.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ConversationExtensions](./llama.batched.conversationextensions.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute), [ExtensionAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.extensionattribute)
 
 ## Methods
 
+### **Sample(Conversation, SafeLLamaSamplerChainHandle, Int32)**
+
+Sample a token from this conversation using the given sampler chain
+
+```csharp
+public static LLamaToken Sample(Conversation conversation, SafeLLamaSamplerChainHandle sampler, int offset)
+```
+
+#### Parameters
+
+`conversation` [Conversation](./llama.batched.conversation.md)<br>
+[Conversation](./llama.batched.conversation.md) to sample from
+
+`sampler` [SafeLLamaSamplerChainHandle](./llama.native.safellamasamplerchainhandle.md)<br>
+
+`offset` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Offset from the end of the conversation to the logits to sample, see [Conversation.GetSampleIndex(Int32)](./llama.batched.conversation.md#getsampleindexint32) for more details
+
+#### Returns
+
+[LLamaToken](./llama.native.llamatoken.md)<br>
+
+### **Sample(Conversation, ISamplingPipeline, Int32)**
+
+Sample a token from this conversation using the given sampling pipeline
+
+```csharp
+public static LLamaToken Sample(Conversation conversation, ISamplingPipeline sampler, int offset)
+```
+
+#### Parameters
+
+`conversation` [Conversation](./llama.batched.conversation.md)<br>
+[Conversation](./llama.batched.conversation.md) to sample from
+
+`sampler` [ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
+
+`offset` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Offset from the end of the conversation to the logits to sample, see [Conversation.GetSampleIndex(Int32)](./llama.batched.conversation.md#getsampleindexint32) for more details
+
+#### Returns
+
+[LLamaToken](./llama.native.llamatoken.md)<br>
+
 ### **Rewind(Conversation, Int32)**
 
 Rewind a [Conversation](./llama.batched.conversation.md) back to an earlier state by removing tokens from the end
@@ -53,3 +102,7 @@ How much to shift tokens over by
 
 `keep` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 The number of tokens at the start which should not be shifted
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.batched.experimentalbatchedexecutorexception.md b/docs/xmldocs/llama.batched.experimentalbatchedexecutorexception.md
index 35270bb09..54d2eed24 100644
--- a/docs/xmldocs/llama.batched.experimentalbatchedexecutorexception.md
+++ b/docs/xmldocs/llama.batched.experimentalbatchedexecutorexception.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ExperimentalBatchedExecutorException
 
 Namespace: LLama.Batched
@@ -92,3 +96,21 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.chatsession-1.md b/docs/xmldocs/llama.chatsession-1.md
deleted file mode 100644
index 1f3cf67e9..000000000
--- a/docs/xmldocs/llama.chatsession-1.md
+++ /dev/null
@@ -1,85 +0,0 @@
-# ChatSession&lt;T&gt;
-
-Namespace: LLama
-
-```csharp
-public class ChatSession<T>
-```
-
-#### Type Parameters
-
-`T`<br>
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatSession&lt;T&gt;](./llama.chatsession-1.md)
-
-## Constructors
-
-### **ChatSession(T)**
-
-```csharp
-public ChatSession(T model)
-```
-
-#### Parameters
-
-`model` T<br>
-
-## Methods
-
-### **Chat(String, String)**
-
-```csharp
-public IEnumerable<string> Chat(string text, string prompt)
-```
-
-#### Parameters
-
-`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-### **WithPrompt(String)**
-
-```csharp
-public ChatSession<T> WithPrompt(string prompt)
-```
-
-#### Parameters
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[ChatSession&lt;T&gt;](./llama.chatsession-1.md)<br>
-
-### **WithPromptFile(String)**
-
-```csharp
-public ChatSession<T> WithPromptFile(string promptFilename)
-```
-
-#### Parameters
-
-`promptFilename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[ChatSession&lt;T&gt;](./llama.chatsession-1.md)<br>
-
-### **WithAntiprompt(String[])**
-
-```csharp
-public ChatSession<T> WithAntiprompt(String[] antiprompt)
-```
-
-#### Parameters
-
-`antiprompt` [String[]](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[ChatSession&lt;T&gt;](./llama.chatsession-1.md)<br>
diff --git a/docs/xmldocs/llama.chatsession.md b/docs/xmldocs/llama.chatsession.md
index 99c535ccf..ba0c626fe 100644
--- a/docs/xmldocs/llama.chatsession.md
+++ b/docs/xmldocs/llama.chatsession.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ChatSession
 
 Namespace: LLama
@@ -8,7 +12,8 @@ The main chat session class.
 public class ChatSession
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatSession](./llama.chatsession.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatSession](./llama.chatsession.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Fields
 
@@ -149,12 +154,12 @@ public ChatSession(ILLamaExecutor executor, ChatHistory history)
 
 ## Methods
 
-### **InitializeSessionFromHistoryAsync(ILLamaExecutor, ChatHistory)**
+### **InitializeSessionFromHistoryAsync(ILLamaExecutor, ChatHistory, IHistoryTransform)**
 
 Create a new chat session and preprocess history.
 
 ```csharp
-public static Task<ChatSession> InitializeSessionFromHistoryAsync(ILLamaExecutor executor, ChatHistory history)
+public static Task<ChatSession> InitializeSessionFromHistoryAsync(ILLamaExecutor executor, ChatHistory history, IHistoryTransform transform)
 ```
 
 #### Parameters
@@ -165,9 +170,13 @@ The executor for this session
 `history` [ChatHistory](./llama.common.chathistory.md)<br>
 History for this session
 
+`transform` [IHistoryTransform](./llama.abstractions.ihistorytransform.md)<br>
+History Transform for this session
+
 #### Returns
 
 [Task&lt;ChatSession&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+A new chat session.
 
 ### **WithHistoryTransform(IHistoryTransform)**
 
@@ -556,3 +565,7 @@ public IAsyncEnumerable<string> RegenerateAssistantMessageAsync(InferenceParams
 #### Exceptions
 
 [InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.common.authorrole.md b/docs/xmldocs/llama.common.authorrole.md
index da1881f43..b2325e581 100644
--- a/docs/xmldocs/llama.common.authorrole.md
+++ b/docs/xmldocs/llama.common.authorrole.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # AuthorRole
 
 Namespace: LLama.Common
@@ -9,7 +13,7 @@ public enum AuthorRole
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [AuthorRole](./llama.common.authorrole.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 ## Fields
 
@@ -19,3 +23,7 @@ Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icom
 | System | 0 | Message comes from a "system" prompt, not written by a user or language model |
 | User | 1 | Message comes from the user |
 | Assistant | 2 | Messages was generated by the language model |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.common.chathistory.md b/docs/xmldocs/llama.common.chathistory.md
index 78b5ecd2b..8535637a7 100644
--- a/docs/xmldocs/llama.common.chathistory.md
+++ b/docs/xmldocs/llama.common.chathistory.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ChatHistory
 
 Namespace: LLama.Common
@@ -8,7 +12,8 @@ The chat history class
 public class ChatHistory
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatHistory](./llama.common.chathistory.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatHistory](./llama.common.chathistory.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -91,3 +96,7 @@ public static ChatHistory FromJson(string json)
 #### Returns
 
 [ChatHistory](./llama.common.chathistory.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.common.fixedsizequeue-1.md b/docs/xmldocs/llama.common.fixedsizequeue-1.md
index 1bb79f271..87db982fb 100644
--- a/docs/xmldocs/llama.common.fixedsizequeue-1.md
+++ b/docs/xmldocs/llama.common.fixedsizequeue-1.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # FixedSizeQueue&lt;T&gt;
 
 Namespace: LLama.Common
@@ -14,7 +18,8 @@ public class FixedSizeQueue<T> : , , , System.Collections.IEnumerable
 `T`<br>
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [FixedSizeQueue&lt;T&gt;](./llama.common.fixedsizequeue-1.md)<br>
-Implements IReadOnlyList&lt;T&gt;, IReadOnlyCollection&lt;T&gt;, IEnumerable&lt;T&gt;, [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable)
+Implements IReadOnlyList&lt;T&gt;, IReadOnlyCollection&lt;T&gt;, IEnumerable&lt;T&gt;, [IEnumerable](https://docs.microsoft.com/en-us/dotnet/api/system.collections.ienumerable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute), [DefaultMemberAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.defaultmemberattribute)
 
 ## Properties
 
@@ -104,3 +109,7 @@ public IEnumerator<T> GetEnumerator()
 #### Returns
 
 IEnumerator&lt;T&gt;<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.common.inferenceparams.md b/docs/xmldocs/llama.common.inferenceparams.md
index 2b6e3f122..b07de7ac9 100644
--- a/docs/xmldocs/llama.common.inferenceparams.md
+++ b/docs/xmldocs/llama.common.inferenceparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # InferenceParams
 
 Namespace: LLama.Common
@@ -9,46 +13,45 @@ public class InferenceParams : LLama.Abstractions.IInferenceParams, System.IEqua
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [InferenceParams](./llama.common.inferenceparams.md)<br>
-Implements [IInferenceParams](./llama.abstractions.iinferenceparams.md), [IEquatable&lt;InferenceParams&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+Implements [IInferenceParams](./llama.abstractions.iinferenceparams.md), [IEquatable&lt;InferenceParams&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
-### **TokensKeep**
-
-number of tokens to keep from initial prompt
+### **EqualityContract**
 
 ```csharp
-public int TokensKeep { get; set; }
+protected Type EqualityContract { get; }
 ```
 
 #### Property Value
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
 
-### **MaxTokens**
+### **TokensKeep**
 
-how many new tokens to predict (n_predict), set to -1 to inifinitely generate response
- until it complete.
+number of tokens to keep from initial prompt when applying context shifting
 
 ```csharp
-public int MaxTokens { get; set; }
+public int TokensKeep { get; set; }
 ```
 
 #### Property Value
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **LogitBias**
+### **MaxTokens**
 
-logit bias for specific tokens
+how many new tokens to predict (n_predict), set to -1 to inifinitely generate response
+ until it complete.
 
 ```csharp
-public Dictionary<LLamaToken, float> LogitBias { get; set; }
+public int MaxTokens { get; set; }
 ```
 
 #### Property Value
 
-[Dictionary&lt;LLamaToken, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
 ### **AntiPrompts**
 
@@ -62,167 +65,37 @@ public IReadOnlyList<string> AntiPrompts { get; set; }
 
 [IReadOnlyList&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
 
-### **TopK**
-
-```csharp
-public int TopK { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **TopP**
-
-```csharp
-public float TopP { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **MinP**
-
-```csharp
-public float MinP { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **TfsZ**
-
-```csharp
-public float TfsZ { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **TypicalP**
-
-```csharp
-public float TypicalP { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Temperature**
-
-```csharp
-public float Temperature { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **RepeatPenalty**
-
-```csharp
-public float RepeatPenalty { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **RepeatLastTokensCount**
-
-```csharp
-public int RepeatLastTokensCount { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **FrequencyPenalty**
-
-```csharp
-public float FrequencyPenalty { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **PresencePenalty**
-
-```csharp
-public float PresencePenalty { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Mirostat**
-
-```csharp
-public MirostatType Mirostat { get; set; }
-```
-
-#### Property Value
-
-[MirostatType](./llama.common.mirostattype.md)<br>
-
-### **MirostatTau**
-
-```csharp
-public float MirostatTau { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **MirostatEta**
+### **SamplingPipeline**
 
 ```csharp
-public float MirostatEta { get; set; }
+public ISamplingPipeline SamplingPipeline { get; set; }
 ```
 
 #### Property Value
 
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
 
-### **PenalizeNL**
+### **DecodeSpecialTokens**
 
 ```csharp
-public bool PenalizeNL { get; set; }
+public bool DecodeSpecialTokens { get; set; }
 ```
 
 #### Property Value
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **Grammar**
-
-```csharp
-public SafeLLamaGrammarHandle Grammar { get; set; }
-```
-
-#### Property Value
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+## Constructors
 
-### **SamplingPipeline**
+### **InferenceParams(InferenceParams)**
 
 ```csharp
-public ISamplingPipeline SamplingPipeline { get; set; }
+protected InferenceParams(InferenceParams original)
 ```
 
-#### Property Value
+#### Parameters
 
-[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
-
-## Constructors
+`original` [InferenceParams](./llama.common.inferenceparams.md)<br>
 
 ### **InferenceParams()**
 
@@ -303,3 +176,7 @@ public InferenceParams <Clone>$()
 #### Returns
 
 [InferenceParams](./llama.common.inferenceparams.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.common.mirostattype.md b/docs/xmldocs/llama.common.mirostattype.md
index 6d54c1814..36e98fa07 100644
--- a/docs/xmldocs/llama.common.mirostattype.md
+++ b/docs/xmldocs/llama.common.mirostattype.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # MirostatType
 
 Namespace: LLama.Common
@@ -10,7 +14,7 @@ public enum MirostatType
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [MirostatType](./llama.common.mirostattype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 ## Fields
 
@@ -19,3 +23,7 @@ Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icom
 | Disable | 0 | Disable Mirostat sampling |
 | Mirostat | 1 | Original mirostat algorithm |
 | Mirostat2 | 2 | Mirostat 2.0 algorithm |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.common.modelparams.md b/docs/xmldocs/llama.common.modelparams.md
index a9af0a858..737bc5552 100644
--- a/docs/xmldocs/llama.common.modelparams.md
+++ b/docs/xmldocs/llama.common.modelparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ModelParams
 
 Namespace: LLama.Common
@@ -9,10 +13,21 @@ public class ModelParams : LLama.Abstractions.ILLamaParams, LLama.Abstractions.I
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ModelParams](./llama.common.modelparams.md)<br>
-Implements [ILLamaParams](./llama.abstractions.illamaparams.md), [IModelParams](./llama.abstractions.imodelparams.md), [IContextParams](./llama.abstractions.icontextparams.md), [IEquatable&lt;ModelParams&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+Implements [ILLamaParams](./llama.abstractions.illamaparams.md), [IModelParams](./llama.abstractions.imodelparams.md), [IContextParams](./llama.abstractions.icontextparams.md), [IEquatable&lt;ModelParams&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
+### **EqualityContract**
+
+```csharp
+protected Type EqualityContract { get; }
+```
+
+#### Property Value
+
+[Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
+
 ### **ContextSize**
 
 ```csharp
@@ -36,12 +51,22 @@ public int MainGpu { get; set; }
 ### **SplitMode**
 
 ```csharp
-public GPUSplitMode SplitMode { get; set; }
+public Nullable<GPUSplitMode> SplitMode { get; set; }
 ```
 
 #### Property Value
 
-[GPUSplitMode](./llama.native.gpusplitmode.md)<br>
+[Nullable&lt;GPUSplitMode&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
+
+### **TensorBufferOverrides**
+
+```csharp
+public List<TensorBufferOverride> TensorBufferOverrides { get; set; }
+```
+
+#### Property Value
+
+[List&lt;TensorBufferOverride&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
 
 ### **GpuLayerCount**
 
@@ -53,10 +78,10 @@ public int GpuLayerCount { get; set; }
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **Seed**
+### **SeqMax**
 
 ```csharp
-public uint Seed { get; set; }
+public uint SeqMax { get; set; }
 ```
 
 #### Property Value
@@ -93,75 +118,75 @@ public string ModelPath { get; set; }
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-### **LoraAdapters**
+### **Threads**
 
 ```csharp
-public AdapterCollection LoraAdapters { get; set; }
+public Nullable<int> Threads { get; set; }
 ```
 
 #### Property Value
 
-[AdapterCollection](./llama.abstractions.adaptercollection.md)<br>
+[Nullable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
 
-### **LoraBase**
+### **BatchThreads**
 
 ```csharp
-public string LoraBase { get; set; }
+public Nullable<int> BatchThreads { get; set; }
 ```
 
 #### Property Value
 
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+[Nullable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
 
-### **Threads**
+### **BatchSize**
 
 ```csharp
-public Nullable<uint> Threads { get; set; }
+public uint BatchSize { get; set; }
 ```
 
 #### Property Value
 
-[Nullable&lt;UInt32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
 
-### **BatchThreads**
+### **UBatchSize**
 
 ```csharp
-public Nullable<uint> BatchThreads { get; set; }
+public uint UBatchSize { get; set; }
 ```
 
 #### Property Value
 
-[Nullable&lt;UInt32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
 
-### **BatchSize**
+### **Embeddings**
 
 ```csharp
-public uint BatchSize { get; set; }
+public bool Embeddings { get; set; }
 ```
 
 #### Property Value
 
-[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **EmbeddingMode**
+### **TensorSplits**
 
 ```csharp
-public bool EmbeddingMode { get; set; }
+public TensorSplitsCollection TensorSplits { get; set; }
 ```
 
 #### Property Value
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)<br>
 
-### **TensorSplits**
+### **CheckTensors**
 
 ```csharp
-public TensorSplitsCollection TensorSplits { get; set; }
+public bool CheckTensors { get; }
 ```
 
 #### Property Value
 
-[TensorSplitsCollection](./llama.abstractions.tensorsplitscollection.md)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
 ### **MetadataOverrides**
 
@@ -283,25 +308,45 @@ public bool NoKqvOffload { get; set; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
+### **FlashAttention**
+
+```csharp
+public bool FlashAttention { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **DefragThreshold**
 
 ```csharp
-public float DefragThreshold { get; set; }
+public Nullable<float> DefragThreshold { get; set; }
 ```
 
 #### Property Value
 
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+[Nullable&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
 
-### **DoPooling**
+### **PoolingType**
 
 ```csharp
-public bool DoPooling { get; set; }
+public LLamaPoolingType PoolingType { get; set; }
 ```
 
 #### Property Value
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[LLamaPoolingType](./llama.native.llamapoolingtype.md)<br>
+
+### **AttentionType**
+
+```csharp
+public LLamaAttentionType AttentionType { get; set; }
+```
+
+#### Property Value
+
+[LLamaAttentionType](./llama.native.llamaattentiontype.md)<br>
 
 ### **VocabOnly**
 
@@ -338,6 +383,16 @@ public ModelParams(string modelPath)
 `modelPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 The model path.
 
+### **ModelParams(ModelParams)**
+
+```csharp
+protected ModelParams(ModelParams original)
+```
+
+#### Parameters
+
+`original` [ModelParams](./llama.common.modelparams.md)<br>
+
 ## Methods
 
 ### **ToString()**
@@ -411,3 +466,7 @@ public ModelParams <Clone>$()
 #### Returns
 
 [ModelParams](./llama.common.modelparams.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.exceptions.grammarformatexception.md b/docs/xmldocs/llama.exceptions.getlogitsinvalidindexexception.md
similarity index 58%
rename from docs/xmldocs/llama.exceptions.grammarformatexception.md
rename to docs/xmldocs/llama.exceptions.getlogitsinvalidindexexception.md
index 74a9d80c8..6ec2c9cd7 100644
--- a/docs/xmldocs/llama.exceptions.grammarformatexception.md
+++ b/docs/xmldocs/llama.exceptions.getlogitsinvalidindexexception.md
@@ -1,18 +1,34 @@
-# GrammarFormatException
+[`< Back`](./)
+
+---
+
+# GetLogitsInvalidIndexException
 
 Namespace: LLama.Exceptions
 
-Base class for all grammar exceptions
+`llama_get_logits_ith` returned null, indicating that the index was invalid
 
 ```csharp
-public abstract class GrammarFormatException : System.Exception, System.Runtime.Serialization.ISerializable
+public class GetLogitsInvalidIndexException : RuntimeError, System.Runtime.Serialization.ISerializable
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md)<br>
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [RuntimeError](./llama.exceptions.runtimeerror.md) → [GetLogitsInvalidIndexException](./llama.exceptions.getlogitsinvalidindexexception.md)<br>
 Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
 
 ## Properties
 
+### **Index**
+
+The incorrect index passed to the `llama_get_logits_ith` call
+
+```csharp
+public int Index { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **TargetSite**
 
 ```csharp
@@ -92,3 +108,33 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Constructors
+
+### **GetLogitsInvalidIndexException(Int32)**
+
+```csharp
+public GetLogitsInvalidIndexException(int index)
+```
+
+#### Parameters
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.exceptions.grammarexpectedname.md b/docs/xmldocs/llama.exceptions.grammarexpectedname.md
deleted file mode 100644
index 8ad5fd212..000000000
--- a/docs/xmldocs/llama.exceptions.grammarexpectedname.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# GrammarExpectedName
-
-Namespace: LLama.Exceptions
-
-Failed to parse a "name" element when one was expected
-
-```csharp
-public class GrammarExpectedName : GrammarFormatException, System.Runtime.Serialization.ISerializable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedName](./llama.exceptions.grammarexpectedname.md)<br>
-Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
-
-## Properties
-
-### **TargetSite**
-
-```csharp
-public MethodBase TargetSite { get; }
-```
-
-#### Property Value
-
-[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
-
-### **Message**
-
-```csharp
-public string Message { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Data**
-
-```csharp
-public IDictionary Data { get; }
-```
-
-#### Property Value
-
-[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
-
-### **InnerException**
-
-```csharp
-public Exception InnerException { get; }
-```
-
-#### Property Value
-
-[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
-
-### **HelpLink**
-
-```csharp
-public string HelpLink { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Source**
-
-```csharp
-public string Source { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **HResult**
-
-```csharp
-public int HResult { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **StackTrace**
-
-```csharp
-public string StackTrace { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarexpectednext.md b/docs/xmldocs/llama.exceptions.grammarexpectednext.md
deleted file mode 100644
index bdf2df13d..000000000
--- a/docs/xmldocs/llama.exceptions.grammarexpectednext.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# GrammarExpectedNext
-
-Namespace: LLama.Exceptions
-
-A specified string was expected when parsing
-
-```csharp
-public class GrammarExpectedNext : GrammarFormatException, System.Runtime.Serialization.ISerializable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedNext](./llama.exceptions.grammarexpectednext.md)<br>
-Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
-
-## Properties
-
-### **TargetSite**
-
-```csharp
-public MethodBase TargetSite { get; }
-```
-
-#### Property Value
-
-[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
-
-### **Message**
-
-```csharp
-public string Message { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Data**
-
-```csharp
-public IDictionary Data { get; }
-```
-
-#### Property Value
-
-[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
-
-### **InnerException**
-
-```csharp
-public Exception InnerException { get; }
-```
-
-#### Property Value
-
-[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
-
-### **HelpLink**
-
-```csharp
-public string HelpLink { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Source**
-
-```csharp
-public string Source { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **HResult**
-
-```csharp
-public int HResult { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **StackTrace**
-
-```csharp
-public string StackTrace { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md b/docs/xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md
deleted file mode 100644
index ddaf1a51f..000000000
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# GrammarUnexpectedCharAltElement
-
-Namespace: LLama.Exceptions
-
-A CHAR_ALT was created without a preceding CHAR element
-
-```csharp
-public class GrammarUnexpectedCharAltElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedCharAltElement](./llama.exceptions.grammarunexpectedcharaltelement.md)<br>
-Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
-
-## Properties
-
-### **TargetSite**
-
-```csharp
-public MethodBase TargetSite { get; }
-```
-
-#### Property Value
-
-[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
-
-### **Message**
-
-```csharp
-public string Message { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Data**
-
-```csharp
-public IDictionary Data { get; }
-```
-
-#### Property Value
-
-[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
-
-### **InnerException**
-
-```csharp
-public Exception InnerException { get; }
-```
-
-#### Property Value
-
-[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
-
-### **HelpLink**
-
-```csharp
-public string HelpLink { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Source**
-
-```csharp
-public string Source { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **HResult**
-
-```csharp
-public int HResult { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **StackTrace**
-
-```csharp
-public string StackTrace { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md b/docs/xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md
deleted file mode 100644
index 882ba31e7..000000000
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# GrammarUnexpectedCharRngElement
-
-Namespace: LLama.Exceptions
-
-A CHAR_RNG was created without a preceding CHAR element
-
-```csharp
-public class GrammarUnexpectedCharRngElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedCharRngElement](./llama.exceptions.grammarunexpectedcharrngelement.md)<br>
-Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
-
-## Properties
-
-### **TargetSite**
-
-```csharp
-public MethodBase TargetSite { get; }
-```
-
-#### Property Value
-
-[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
-
-### **Message**
-
-```csharp
-public string Message { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Data**
-
-```csharp
-public IDictionary Data { get; }
-```
-
-#### Property Value
-
-[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
-
-### **InnerException**
-
-```csharp
-public Exception InnerException { get; }
-```
-
-#### Property Value
-
-[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
-
-### **HelpLink**
-
-```csharp
-public string HelpLink { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Source**
-
-```csharp
-public string Source { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **HResult**
-
-```csharp
-public int HResult { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **StackTrace**
-
-```csharp
-public string StackTrace { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedendelement.md b/docs/xmldocs/llama.exceptions.grammarunexpectedendelement.md
deleted file mode 100644
index af98be6cf..000000000
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedendelement.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# GrammarUnexpectedEndElement
-
-Namespace: LLama.Exceptions
-
-An END was encountered before the last element
-
-```csharp
-public class GrammarUnexpectedEndElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedEndElement](./llama.exceptions.grammarunexpectedendelement.md)<br>
-Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
-
-## Properties
-
-### **TargetSite**
-
-```csharp
-public MethodBase TargetSite { get; }
-```
-
-#### Property Value
-
-[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
-
-### **Message**
-
-```csharp
-public string Message { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Data**
-
-```csharp
-public IDictionary Data { get; }
-```
-
-#### Property Value
-
-[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
-
-### **InnerException**
-
-```csharp
-public Exception InnerException { get; }
-```
-
-#### Property Value
-
-[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
-
-### **HelpLink**
-
-```csharp
-public string HelpLink { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Source**
-
-```csharp
-public string Source { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **HResult**
-
-```csharp
-public int HResult { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **StackTrace**
-
-```csharp
-public string StackTrace { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedendofinput.md b/docs/xmldocs/llama.exceptions.grammarunexpectedendofinput.md
deleted file mode 100644
index 1d1f11331..000000000
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedendofinput.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# GrammarUnexpectedEndOfInput
-
-Namespace: LLama.Exceptions
-
-End-of-file was encountered while parsing
-
-```csharp
-public class GrammarUnexpectedEndOfInput : GrammarFormatException, System.Runtime.Serialization.ISerializable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedEndOfInput](./llama.exceptions.grammarunexpectedendofinput.md)<br>
-Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
-
-## Properties
-
-### **TargetSite**
-
-```csharp
-public MethodBase TargetSite { get; }
-```
-
-#### Property Value
-
-[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
-
-### **Message**
-
-```csharp
-public string Message { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Data**
-
-```csharp
-public IDictionary Data { get; }
-```
-
-#### Property Value
-
-[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
-
-### **InnerException**
-
-```csharp
-public Exception InnerException { get; }
-```
-
-#### Property Value
-
-[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
-
-### **HelpLink**
-
-```csharp
-public string HelpLink { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Source**
-
-```csharp
-public string Source { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **HResult**
-
-```csharp
-public int HResult { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **StackTrace**
-
-```csharp
-public string StackTrace { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md b/docs/xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md
deleted file mode 100644
index f699939f0..000000000
--- a/docs/xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# GrammarUnexpectedHexCharsCount
-
-Namespace: LLama.Exceptions
-
-An incorrect number of characters were encountered while parsing a hex literal
-
-```csharp
-public class GrammarUnexpectedHexCharsCount : GrammarFormatException, System.Runtime.Serialization.ISerializable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedHexCharsCount](./llama.exceptions.grammarunexpectedhexcharscount.md)<br>
-Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
-
-## Properties
-
-### **TargetSite**
-
-```csharp
-public MethodBase TargetSite { get; }
-```
-
-#### Property Value
-
-[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
-
-### **Message**
-
-```csharp
-public string Message { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Data**
-
-```csharp
-public IDictionary Data { get; }
-```
-
-#### Property Value
-
-[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
-
-### **InnerException**
-
-```csharp
-public Exception InnerException { get; }
-```
-
-#### Property Value
-
-[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
-
-### **HelpLink**
-
-```csharp
-public string HelpLink { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Source**
-
-```csharp
-public string Source { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **HResult**
-
-```csharp
-public int HResult { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **StackTrace**
-
-```csharp
-public string StackTrace { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.llamadecodeerror.md b/docs/xmldocs/llama.exceptions.llamadecodeerror.md
index 12601c23c..bfda8b8d2 100644
--- a/docs/xmldocs/llama.exceptions.llamadecodeerror.md
+++ b/docs/xmldocs/llama.exceptions.llamadecodeerror.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaDecodeError
 
 Namespace: LLama.Exceptions
@@ -116,3 +120,21 @@ public LLamaDecodeError(DecodeResult returnCode)
 #### Parameters
 
 `returnCode` [DecodeResult](./llama.native.decoderesult.md)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.exceptions.loadweightsfailedexception.md b/docs/xmldocs/llama.exceptions.loadweightsfailedexception.md
index e3ea6a5c9..a0caa41c1 100644
--- a/docs/xmldocs/llama.exceptions.loadweightsfailedexception.md
+++ b/docs/xmldocs/llama.exceptions.loadweightsfailedexception.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LoadWeightsFailedException
 
 Namespace: LLama.Exceptions
@@ -9,7 +13,8 @@ public class LoadWeightsFailedException : RuntimeError, System.Runtime.Serializa
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [RuntimeError](./llama.exceptions.runtimeerror.md) → [LoadWeightsFailedException](./llama.exceptions.loadweightsfailedexception.md)<br>
-Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -116,3 +121,21 @@ public LoadWeightsFailedException(string modelPath)
 #### Parameters
 
 `modelPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.exceptions.grammarexpectedprevious.md b/docs/xmldocs/llama.exceptions.missingtemplateexception.md
similarity index 63%
rename from docs/xmldocs/llama.exceptions.grammarexpectedprevious.md
rename to docs/xmldocs/llama.exceptions.missingtemplateexception.md
index 890e6bdc5..53037142c 100644
--- a/docs/xmldocs/llama.exceptions.grammarexpectedprevious.md
+++ b/docs/xmldocs/llama.exceptions.missingtemplateexception.md
@@ -1,14 +1,18 @@
-# GrammarExpectedPrevious
+[`< Back`](./)
+
+---
+
+# MissingTemplateException
 
 Namespace: LLama.Exceptions
 
-A specified character was expected to preceded another when parsing
+`llama_decode` return a non-zero status code
 
 ```csharp
-public class GrammarExpectedPrevious : GrammarFormatException, System.Runtime.Serialization.ISerializable
+public class MissingTemplateException : RuntimeError, System.Runtime.Serialization.ISerializable
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedPrevious](./llama.exceptions.grammarexpectedprevious.md)<br>
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [RuntimeError](./llama.exceptions.runtimeerror.md) → [MissingTemplateException](./llama.exceptions.missingtemplateexception.md)<br>
 Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
 
 ## Properties
@@ -92,3 +96,39 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Constructors
+
+### **MissingTemplateException()**
+
+```csharp
+public MissingTemplateException()
+```
+
+### **MissingTemplateException(String)**
+
+```csharp
+public MissingTemplateException(string message)
+```
+
+#### Parameters
+
+`message` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.exceptions.runtimeerror.md b/docs/xmldocs/llama.exceptions.runtimeerror.md
index 3b2f4446a..cc36835b7 100644
--- a/docs/xmldocs/llama.exceptions.runtimeerror.md
+++ b/docs/xmldocs/llama.exceptions.runtimeerror.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # RuntimeError
 
 Namespace: LLama.Exceptions
@@ -106,3 +110,21 @@ public RuntimeError(string message)
 #### Parameters
 
 `message` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.exceptions.grammarunknownescapecharacter.md b/docs/xmldocs/llama.exceptions.templatenotfoundexception.md
similarity index 65%
rename from docs/xmldocs/llama.exceptions.grammarunknownescapecharacter.md
rename to docs/xmldocs/llama.exceptions.templatenotfoundexception.md
index 009a5bf8f..1f689471b 100644
--- a/docs/xmldocs/llama.exceptions.grammarunknownescapecharacter.md
+++ b/docs/xmldocs/llama.exceptions.templatenotfoundexception.md
@@ -1,14 +1,18 @@
-# GrammarUnknownEscapeCharacter
+[`< Back`](./)
+
+---
+
+# TemplateNotFoundException
 
 Namespace: LLama.Exceptions
 
-An unexpected character was encountered after an escape sequence
+`llama_decode` return a non-zero status code
 
 ```csharp
-public class GrammarUnknownEscapeCharacter : GrammarFormatException, System.Runtime.Serialization.ISerializable
+public class TemplateNotFoundException : RuntimeError, System.Runtime.Serialization.ISerializable
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnknownEscapeCharacter](./llama.exceptions.grammarunknownescapecharacter.md)<br>
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [RuntimeError](./llama.exceptions.runtimeerror.md) → [TemplateNotFoundException](./llama.exceptions.templatenotfoundexception.md)<br>
 Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
 
 ## Properties
@@ -92,3 +96,33 @@ public string StackTrace { get; }
 #### Property Value
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Constructors
+
+### **TemplateNotFoundException(String)**
+
+```csharp
+public TemplateNotFoundException(string name)
+```
+
+#### Parameters
+
+`name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Events
+
+### **SerializeObjectState**
+
+#### Caution
+
+BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.
+
+---
+
+```csharp
+protected event EventHandler<SafeSerializationEventArgs> SerializeObjectState;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.extensions.icontextparamsextensions.md b/docs/xmldocs/llama.extensions.icontextparamsextensions.md
index 143a918fc..f9f9439d1 100644
--- a/docs/xmldocs/llama.extensions.icontextparamsextensions.md
+++ b/docs/xmldocs/llama.extensions.icontextparamsextensions.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # IContextParamsExtensions
 
 Namespace: LLama.Extensions
@@ -8,7 +12,8 @@ Extension methods to the IContextParams interface
 public static class IContextParamsExtensions
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [IContextParamsExtensions](./llama.extensions.icontextparamsextensions.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [IContextParamsExtensions](./llama.extensions.icontextparamsextensions.md)<br>
+Attributes [ExtensionAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.extensionattribute)
 
 ## Methods
 
@@ -31,3 +36,7 @@ public static void ToLlamaContextParams(IContextParams params, LLamaContextParam
 [FileNotFoundException](https://docs.microsoft.com/en-us/dotnet/api/system.io.filenotfoundexception)<br>
 
 [ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.extensions.imodelparamsextensions.md b/docs/xmldocs/llama.extensions.imodelparamsextensions.md
index 923f0f029..cf236d452 100644
--- a/docs/xmldocs/llama.extensions.imodelparamsextensions.md
+++ b/docs/xmldocs/llama.extensions.imodelparamsextensions.md
@@ -1,14 +1,19 @@
+[`< Back`](./)
+
+---
+
 # IModelParamsExtensions
 
 Namespace: LLama.Extensions
 
-Extention methods to the IModelParams interface
+Extension methods to the IModelParams interface
 
 ```csharp
 public static class IModelParamsExtensions
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute), [ExtensionAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.extensionattribute)
 
 ## Methods
 
@@ -35,3 +40,7 @@ public static IDisposable ToLlamaModelParams(IModelParams params, LLamaModelPara
 [FileNotFoundException](https://docs.microsoft.com/en-us/dotnet/api/system.io.filenotfoundexception)<br>
 
 [ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.extensions.spannormalizationextensions.md b/docs/xmldocs/llama.extensions.spannormalizationextensions.md
new file mode 100644
index 000000000..c7f7bf081
--- /dev/null
+++ b/docs/xmldocs/llama.extensions.spannormalizationextensions.md
@@ -0,0 +1,203 @@
+[`< Back`](./)
+
+---
+
+# SpanNormalizationExtensions
+
+Namespace: LLama.Extensions
+
+Extensions to span which apply in-place normalization
+
+```csharp
+public static class SpanNormalizationExtensions
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [SpanNormalizationExtensions](./llama.extensions.spannormalizationextensions.md)<br>
+Attributes [ExtensionAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.extensionattribute)
+
+## Methods
+
+### **MaxAbsoluteNormalization(Single[])**
+
+In-place multiple every element by 32760 and divide every element in the span by the max absolute value in the span
+
+```csharp
+public static Single[] MaxAbsoluteNormalization(Single[] vector)
+```
+
+#### Parameters
+
+`vector` [Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The same array
+
+### **MaxAbsoluteNormalization(Span&lt;Single&gt;)**
+
+In-place multiple every element by 32760 and divide every element in the span by the max absolute value in the span
+
+```csharp
+public static Span<float> MaxAbsoluteNormalization(Span<float> vector)
+```
+
+#### Parameters
+
+`vector` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+The same span
+
+### **TaxicabNormalization(Single[])**
+
+In-place divide every element in the array by the sum of absolute values in the array
+
+```csharp
+public static Single[] TaxicabNormalization(Single[] vector)
+```
+
+#### Parameters
+
+`vector` [Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The same array
+
+**Remarks:**
+
+Also known as "Manhattan normalization".
+
+### **TaxicabNormalization(Span&lt;Single&gt;)**
+
+In-place divide every element in the span by the sum of absolute values in the span
+
+```csharp
+public static Span<float> TaxicabNormalization(Span<float> vector)
+```
+
+#### Parameters
+
+`vector` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+The same span
+
+**Remarks:**
+
+Also known as "Manhattan normalization".
+
+### **EuclideanNormalization(Single[])**
+
+In-place divide every element by the euclidean length of the vector
+
+```csharp
+public static Single[] EuclideanNormalization(Single[] vector)
+```
+
+#### Parameters
+
+`vector` [Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The same array
+
+**Remarks:**
+
+Also known as "L2 normalization".
+
+### **EuclideanNormalization(Span&lt;Single&gt;)**
+
+In-place divide every element by the euclidean length of the vector
+
+```csharp
+public static Span<float> EuclideanNormalization(Span<float> vector)
+```
+
+#### Parameters
+
+`vector` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+The same span
+
+**Remarks:**
+
+Also known as "L2 normalization".
+
+### **EuclideanNormalization(ReadOnlySpan&lt;Single&gt;)**
+
+Creates a new array containing an L2 normalization of the input vector.
+
+```csharp
+public static Single[] EuclideanNormalization(ReadOnlySpan<float> vector)
+```
+
+#### Parameters
+
+`vector` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The same span
+
+### **PNormalization(Single[], Int32)**
+
+In-place apply p-normalization. https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm
+
+- 
+- 
+-
+
+```csharp
+public static Single[] PNormalization(Single[] vector, int p)
+```
+
+#### Parameters
+
+`vector` [Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`p` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The same array
+
+### **PNormalization(Span&lt;Single&gt;, Int32)**
+
+In-place apply p-normalization. https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm
+
+- 
+- 
+-
+
+```csharp
+public static Span<float> PNormalization(Span<float> vector, int p)
+```
+
+#### Parameters
+
+`vector` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+`p` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+The same span
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.grammars.grammar.md b/docs/xmldocs/llama.grammars.grammar.md
deleted file mode 100644
index da52c1bc5..000000000
--- a/docs/xmldocs/llama.grammars.grammar.md
+++ /dev/null
@@ -1,110 +0,0 @@
-# Grammar
-
-Namespace: LLama.Grammars
-
-A grammar is a set of [GrammarRule](./llama.grammars.grammarrule.md)s for deciding which characters are valid next. Can be used to constrain
- output to certain formats - e.g. force the model to output JSON
-
-```csharp
-public sealed class Grammar
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Grammar](./llama.grammars.grammar.md)
-
-## Properties
-
-### **StartRuleIndex**
-
-Index of the initial rule to start from
-
-```csharp
-public ulong StartRuleIndex { get; }
-```
-
-#### Property Value
-
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-### **Rules**
-
-The rules which make up this grammar
-
-```csharp
-public IReadOnlyList<GrammarRule> Rules { get; }
-```
-
-#### Property Value
-
-[IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
-
-## Constructors
-
-### **Grammar(IReadOnlyList&lt;GrammarRule&gt;, UInt64)**
-
-Create a new grammar from a set of rules
-
-```csharp
-public Grammar(IReadOnlyList<GrammarRule> rules, ulong startRuleIndex)
-```
-
-#### Parameters
-
-`rules` [IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
-The rules which make up this grammar
-
-`startRuleIndex` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-Index of the initial rule to start from
-
-#### Exceptions
-
-[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
-
-## Methods
-
-### **CreateInstance()**
-
-Create a `SafeLLamaGrammarHandle` instance to use for parsing
-
-```csharp
-public SafeLLamaGrammarHandle CreateInstance()
-```
-
-#### Returns
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
-### **Parse(String, String)**
-
-Parse a string of GGML BNF into a Grammar
-
-```csharp
-public static Grammar Parse(string gbnf, string startRule)
-```
-
-#### Parameters
-
-`gbnf` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-The string to parse
-
-`startRule` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-Name of the start rule of this grammar
-
-#### Returns
-
-[Grammar](./llama.grammars.grammar.md)<br>
-A Grammar which can be converted into a SafeLLamaGrammarHandle for sampling
-
-#### Exceptions
-
-[GrammarFormatException](./llama.exceptions.grammarformatexception.md)<br>
-Thrown if input is malformed
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.grammars.grammarrule.md b/docs/xmldocs/llama.grammars.grammarrule.md
deleted file mode 100644
index 9a6461eb9..000000000
--- a/docs/xmldocs/llama.grammars.grammarrule.md
+++ /dev/null
@@ -1,118 +0,0 @@
-# GrammarRule
-
-Namespace: LLama.Grammars
-
-A single rule in a [Grammar](./llama.grammars.grammar.md)
-
-```csharp
-public sealed class GrammarRule : System.IEquatable`1[[LLama.Grammars.GrammarRule, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [GrammarRule](./llama.grammars.grammarrule.md)<br>
-Implements [IEquatable&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Name**
-
-Name of this rule
-
-```csharp
-public string Name { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Elements**
-
-The elements of this grammar rule
-
-```csharp
-public IReadOnlyList<LLamaGrammarElement> Elements { get; }
-```
-
-#### Property Value
-
-[IReadOnlyList&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
-
-## Constructors
-
-### **GrammarRule(String, IReadOnlyList&lt;LLamaGrammarElement&gt;)**
-
-Create a new GrammarRule containing the given elements
-
-```csharp
-public GrammarRule(string name, IReadOnlyList<LLamaGrammarElement> elements)
-```
-
-#### Parameters
-
-`name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`elements` [IReadOnlyList&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
-
-#### Exceptions
-
-[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(GrammarRule)**
-
-```csharp
-public bool Equals(GrammarRule other)
-```
-
-#### Parameters
-
-`other` [GrammarRule](./llama.grammars.grammarrule.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public GrammarRule <Clone>$()
-```
-
-#### Returns
-
-[GrammarRule](./llama.grammars.grammarrule.md)<br>
diff --git a/docs/xmldocs/llama.ichatmodel.md b/docs/xmldocs/llama.ichatmodel.md
deleted file mode 100644
index 9f51ba117..000000000
--- a/docs/xmldocs/llama.ichatmodel.md
+++ /dev/null
@@ -1,57 +0,0 @@
-# IChatModel
-
-Namespace: LLama
-
-```csharp
-public interface IChatModel
-```
-
-## Properties
-
-### **Name**
-
-```csharp
-public abstract string Name { get; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Methods
-
-### **Chat(String, String)**
-
-```csharp
-IEnumerable<string> Chat(string text, string prompt)
-```
-
-#### Parameters
-
-`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-### **InitChatPrompt(String)**
-
-```csharp
-void InitChatPrompt(string prompt)
-```
-
-#### Parameters
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **InitChatAntiprompt(String[])**
-
-```csharp
-void InitChatAntiprompt(String[] antiprompt)
-```
-
-#### Parameters
-
-`antiprompt` [String[]](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.instructexecutor.md b/docs/xmldocs/llama.instructexecutor.md
new file mode 100644
index 000000000..77a3d5ac5
--- /dev/null
+++ b/docs/xmldocs/llama.instructexecutor.md
@@ -0,0 +1,283 @@
+[`< Back`](./)
+
+---
+
+# InstructExecutor
+
+Namespace: LLama
+
+The LLama executor for instruct mode.
+
+```csharp
+public class InstructExecutor : StatefulExecutorBase, LLama.Abstractions.ILLamaExecutor
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [StatefulExecutorBase](./llama.statefulexecutorbase.md) → [InstructExecutor](./llama.instructexecutor.md)<br>
+Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Fields
+
+### **_logger**
+
+The logger used by this executor.
+
+```csharp
+protected ILogger _logger;
+```
+
+### **_pastTokensCount**
+
+The tokens that were already processed by the model.
+
+```csharp
+protected int _pastTokensCount;
+```
+
+### **_consumedTokensCount**
+
+The tokens that were consumed by the model during the current inference.
+
+```csharp
+protected int _consumedTokensCount;
+```
+
+### **_n_session_consumed**
+
+
+
+```csharp
+protected int _n_session_consumed;
+```
+
+### **_n_matching_session_tokens**
+
+
+
+```csharp
+protected int _n_matching_session_tokens;
+```
+
+### **_pathSession**
+
+The path of the session file.
+
+```csharp
+protected string _pathSession;
+```
+
+### **_embeds**
+
+A container of the tokens to be processed and after processed.
+
+```csharp
+protected List<LLamaToken> _embeds;
+```
+
+### **_embed_inps**
+
+A container for the tokens of input.
+
+```csharp
+protected List<LLamaToken> _embed_inps;
+```
+
+### **_session_tokens**
+
+
+
+```csharp
+protected List<LLamaToken> _session_tokens;
+```
+
+### **_last_n_tokens**
+
+The last tokens generated by the model.
+
+```csharp
+protected FixedSizeQueue<LLamaToken> _last_n_tokens;
+```
+
+## Properties
+
+### **Context**
+
+The context used by the executor.
+
+```csharp
+public LLamaContext Context { get; }
+```
+
+#### Property Value
+
+[LLamaContext](./llama.llamacontext.md)<br>
+
+### **IsMultiModal**
+
+```csharp
+public bool IsMultiModal { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **ClipModel**
+
+```csharp
+public LLavaWeights ClipModel { get; }
+```
+
+#### Property Value
+
+[LLavaWeights](./llama.llavaweights.md)<br>
+
+### **Images**
+
+```csharp
+public List<Byte[]> Images { get; }
+```
+
+#### Property Value
+
+[List&lt;Byte[]&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+
+## Constructors
+
+### **InstructExecutor(LLamaContext, String, String, ILogger)**
+
+
+
+```csharp
+public InstructExecutor(LLamaContext context, string instructionPrefix, string instructionSuffix, ILogger logger)
+```
+
+#### Parameters
+
+`context` [LLamaContext](./llama.llamacontext.md)<br>
+
+`instructionPrefix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`instructionSuffix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`logger` ILogger<br>
+
+## Methods
+
+### **GetStateData()**
+
+```csharp
+public ExecutorBaseState GetStateData()
+```
+
+#### Returns
+
+[ExecutorBaseState](./llama.statefulexecutorbase.executorbasestate.md)<br>
+
+### **LoadState(ExecutorBaseState)**
+
+```csharp
+public Task LoadState(ExecutorBaseState data)
+```
+
+#### Parameters
+
+`data` [ExecutorBaseState](./llama.statefulexecutorbase.executorbasestate.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **SaveState(String)**
+
+```csharp
+public Task SaveState(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **LoadState(String)**
+
+```csharp
+public Task LoadState(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **GetLoopCondition(InferStateArgs)**
+
+```csharp
+protected Task<bool> GetLoopCondition(InferStateArgs args)
+```
+
+#### Parameters
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task&lt;Boolean&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **PreprocessInputs(String, InferStateArgs)**
+
+```csharp
+protected Task PreprocessInputs(string text, InferStateArgs args)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **PostProcess(IInferenceParams, InferStateArgs)**
+
+```csharp
+protected Task<ValueTuple<bool, IReadOnlyList<string>>> PostProcess(IInferenceParams inferenceParams, InferStateArgs args)
+```
+
+#### Parameters
+
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task&lt;ValueTuple&lt;Boolean, IReadOnlyList&lt;String&gt;&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **InferInternal(IInferenceParams, InferStateArgs)**
+
+```csharp
+protected Task InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
+```
+
+#### Parameters
+
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.interactiveexecutor.md b/docs/xmldocs/llama.interactiveexecutor.md
new file mode 100644
index 000000000..8230e03c2
--- /dev/null
+++ b/docs/xmldocs/llama.interactiveexecutor.md
@@ -0,0 +1,299 @@
+[`< Back`](./)
+
+---
+
+# InteractiveExecutor
+
+Namespace: LLama
+
+The LLama executor for interactive mode.
+
+```csharp
+public class InteractiveExecutor : StatefulExecutorBase, LLama.Abstractions.ILLamaExecutor
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [StatefulExecutorBase](./llama.statefulexecutorbase.md) → [InteractiveExecutor](./llama.interactiveexecutor.md)<br>
+Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Fields
+
+### **_logger**
+
+The logger used by this executor.
+
+```csharp
+protected ILogger _logger;
+```
+
+### **_pastTokensCount**
+
+The tokens that were already processed by the model.
+
+```csharp
+protected int _pastTokensCount;
+```
+
+### **_consumedTokensCount**
+
+The tokens that were consumed by the model during the current inference.
+
+```csharp
+protected int _consumedTokensCount;
+```
+
+### **_n_session_consumed**
+
+
+
+```csharp
+protected int _n_session_consumed;
+```
+
+### **_n_matching_session_tokens**
+
+
+
+```csharp
+protected int _n_matching_session_tokens;
+```
+
+### **_pathSession**
+
+The path of the session file.
+
+```csharp
+protected string _pathSession;
+```
+
+### **_embeds**
+
+A container of the tokens to be processed and after processed.
+
+```csharp
+protected List<LLamaToken> _embeds;
+```
+
+### **_embed_inps**
+
+A container for the tokens of input.
+
+```csharp
+protected List<LLamaToken> _embed_inps;
+```
+
+### **_session_tokens**
+
+
+
+```csharp
+protected List<LLamaToken> _session_tokens;
+```
+
+### **_last_n_tokens**
+
+The last tokens generated by the model.
+
+```csharp
+protected FixedSizeQueue<LLamaToken> _last_n_tokens;
+```
+
+## Properties
+
+### **Context**
+
+The context used by the executor.
+
+```csharp
+public LLamaContext Context { get; }
+```
+
+#### Property Value
+
+[LLamaContext](./llama.llamacontext.md)<br>
+
+### **IsMultiModal**
+
+```csharp
+public bool IsMultiModal { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **ClipModel**
+
+```csharp
+public LLavaWeights ClipModel { get; }
+```
+
+#### Property Value
+
+[LLavaWeights](./llama.llavaweights.md)<br>
+
+### **Images**
+
+```csharp
+public List<Byte[]> Images { get; }
+```
+
+#### Property Value
+
+[List&lt;Byte[]&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+
+## Constructors
+
+### **InteractiveExecutor(LLamaContext, ILogger)**
+
+
+
+```csharp
+public InteractiveExecutor(LLamaContext context, ILogger logger)
+```
+
+#### Parameters
+
+`context` [LLamaContext](./llama.llamacontext.md)<br>
+
+`logger` ILogger<br>
+
+### **InteractiveExecutor(LLamaContext, LLavaWeights, ILogger)**
+
+
+
+```csharp
+public InteractiveExecutor(LLamaContext context, LLavaWeights clipModel, ILogger logger)
+```
+
+#### Parameters
+
+`context` [LLamaContext](./llama.llamacontext.md)<br>
+
+`clipModel` [LLavaWeights](./llama.llavaweights.md)<br>
+
+`logger` ILogger<br>
+
+## Methods
+
+### **GetStateData()**
+
+```csharp
+public ExecutorBaseState GetStateData()
+```
+
+#### Returns
+
+[ExecutorBaseState](./llama.statefulexecutorbase.executorbasestate.md)<br>
+
+### **LoadState(ExecutorBaseState)**
+
+```csharp
+public Task LoadState(ExecutorBaseState data)
+```
+
+#### Parameters
+
+`data` [ExecutorBaseState](./llama.statefulexecutorbase.executorbasestate.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **SaveState(String)**
+
+```csharp
+public Task SaveState(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **LoadState(String)**
+
+```csharp
+public Task LoadState(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **GetLoopCondition(InferStateArgs)**
+
+Define whether to continue the loop to generate responses.
+
+```csharp
+protected Task<bool> GetLoopCondition(InferStateArgs args)
+```
+
+#### Parameters
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task&lt;Boolean&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **PreprocessInputs(String, InferStateArgs)**
+
+```csharp
+protected Task PreprocessInputs(string text, InferStateArgs args)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **PostProcess(IInferenceParams, InferStateArgs)**
+
+Return whether to break the generation.
+
+```csharp
+protected Task<ValueTuple<bool, IReadOnlyList<string>>> PostProcess(IInferenceParams inferenceParams, InferStateArgs args)
+```
+
+#### Parameters
+
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task&lt;ValueTuple&lt;Boolean, IReadOnlyList&lt;String&gt;&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **InferInternal(IInferenceParams, InferStateArgs)**
+
+```csharp
+protected Task InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
+```
+
+#### Parameters
+
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.llamacache.md b/docs/xmldocs/llama.llamacache.md
deleted file mode 100644
index c789224ae..000000000
--- a/docs/xmldocs/llama.llamacache.md
+++ /dev/null
@@ -1,59 +0,0 @@
-# LLamaCache
-
-Namespace: LLama
-
-```csharp
-public class LLamaCache
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaCache](./llama.llamacache.md)
-
-## Properties
-
-### **CacheSize**
-
-```csharp
-public int CacheSize { get; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Item**
-
-```csharp
-public LLamaState Item { get; set; }
-```
-
-#### Property Value
-
-[LLamaState](./llama.llamastate.md)<br>
-
-## Constructors
-
-### **LLamaCache(Int32)**
-
-```csharp
-public LLamaCache(int capacity)
-```
-
-#### Parameters
-
-`capacity` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-## Methods
-
-### **Contains(Int32[])**
-
-```csharp
-public bool Contains(Int32[] key)
-```
-
-#### Parameters
-
-`key` [Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
diff --git a/docs/xmldocs/llama.llamacontext.md b/docs/xmldocs/llama.llamacontext.md
new file mode 100644
index 000000000..8f1f2cd62
--- /dev/null
+++ b/docs/xmldocs/llama.llamacontext.md
@@ -0,0 +1,448 @@
+[`< Back`](./)
+
+---
+
+# LLamaContext
+
+Namespace: LLama
+
+A llama_context, which holds all the context required to interact with a model
+
+```csharp
+public sealed class LLamaContext : System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaContext](./llama.llamacontext.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **ContextSize**
+
+Total number of tokens in the context
+
+```csharp
+public uint ContextSize { get; }
+```
+
+#### Property Value
+
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Params**
+
+The context params set for this context
+
+```csharp
+public IContextParams Params { get; }
+```
+
+#### Property Value
+
+[IContextParams](./llama.abstractions.icontextparams.md)<br>
+
+### **NativeHandle**
+
+The native handle, which is used to be passed to the native APIs
+
+```csharp
+public SafeLLamaContextHandle NativeHandle { get; }
+```
+
+#### Property Value
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+**Remarks:**
+
+Be careful how you use this!
+
+### **Encoding**
+
+The encoding set for this model to deal with text input.
+
+```csharp
+public Encoding Encoding { get; }
+```
+
+#### Property Value
+
+[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+### **GenerationThreads**
+
+Get or set the number of threads to use for generation
+
+```csharp
+public int GenerationThreads { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **BatchThreads**
+
+Get or set the number of threads to use for batch processing
+
+```csharp
+public int BatchThreads { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **BatchSize**
+
+Get the maximum batch size for this context
+
+```csharp
+public uint BatchSize { get; }
+```
+
+#### Property Value
+
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+### **Vocab**
+
+Get the special tokens for the model associated with this context
+
+```csharp
+public Vocabulary Vocab { get; }
+```
+
+#### Property Value
+
+[Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
+
+## Constructors
+
+### **LLamaContext(LLamaWeights, IContextParams, ILogger)**
+
+Create a new LLamaContext for the given LLamaWeights
+
+```csharp
+public LLamaContext(LLamaWeights model, IContextParams params, ILogger logger)
+```
+
+#### Parameters
+
+`model` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IContextParams](./llama.abstractions.icontextparams.md)<br>
+
+`logger` ILogger<br>
+
+#### Exceptions
+
+[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
+
+## Methods
+
+### **Tokenize(String, Boolean, Boolean)**
+
+Tokenize a string.
+
+```csharp
+public LLamaToken[] Tokenize(string text, bool addBos, bool special)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`addBos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether to add a bos to the text.
+
+`special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.
+
+#### Returns
+
+[LLamaToken[]](./llama.native.llamatoken.md)<br>
+
+### **DeTokenize(IReadOnlyList&lt;LLamaToken&gt;)**
+
+#### Caution
+
+Use a `StreamingTokenDecoder` instead
+
+---
+
+Detokenize the tokens to text.
+
+```csharp
+public string DeTokenize(IReadOnlyList<LLamaToken> tokens)
+```
+
+#### Parameters
+
+`tokens` [IReadOnlyList&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **SaveState(String)**
+
+Save the state to specified path.
+
+```csharp
+public void SaveState(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **SaveState(String, LLamaSeqId)**
+
+Save the state of a particular sequence to specified path.
+
+```csharp
+public void SaveState(string filename, LLamaSeqId sequence)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
+### **GetState()**
+
+Get the state data as an opaque handle, which can be loaded later using [LLamaContext.LoadState(String)](./llama.llamacontext.md#loadstatestring)
+
+```csharp
+public State GetState()
+```
+
+#### Returns
+
+[State](./llama.llamacontext.state.md)<br>
+
+**Remarks:**
+
+Use [LLamaContext.SaveState(String)](./llama.llamacontext.md#savestatestring) if you intend to save this state to disk.
+
+### **GetState(LLamaSeqId)**
+
+Get the state data as an opaque handle, which can be loaded later using [LLamaContext.LoadState(String)](./llama.llamacontext.md#loadstatestring)
+
+```csharp
+public SequenceState GetState(LLamaSeqId sequence)
+```
+
+#### Parameters
+
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
+#### Returns
+
+[SequenceState](./llama.llamacontext.sequencestate.md)<br>
+
+**Remarks:**
+
+Use [LLamaContext.SaveState(String, LLamaSeqId)](./llama.llamacontext.md#savestatestring-llamaseqid) if you intend to save this state to disk.
+
+### **LoadState(String)**
+
+Load the state from specified path.
+
+```csharp
+public void LoadState(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **LoadState(String, LLamaSeqId)**
+
+Load the state from specified path into a particular sequence
+
+```csharp
+public void LoadState(string filename, LLamaSeqId sequence)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
+### **LoadState(State)**
+
+Load the state from memory.
+
+```csharp
+public void LoadState(State state)
+```
+
+#### Parameters
+
+`state` [State](./llama.llamacontext.state.md)<br>
+
+### **LoadState(SequenceState, LLamaSeqId)**
+
+Load the state from memory into a particular sequence
+
+```csharp
+public void LoadState(SequenceState state, LLamaSeqId sequence)
+```
+
+#### Parameters
+
+`state` [SequenceState](./llama.llamacontext.sequencestate.md)<br>
+
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
+### **Encode(LLamaBatch)**
+
+
+
+```csharp
+public EncodeResult Encode(LLamaBatch batch)
+```
+
+#### Parameters
+
+`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
+
+#### Returns
+
+[EncodeResult](./llama.native.encoderesult.md)<br>
+
+### **EncodeAsync(LLamaBatch, CancellationToken)**
+
+
+
+```csharp
+public Task<EncodeResult> EncodeAsync(LLamaBatch batch, CancellationToken cancellationToken)
+```
+
+#### Parameters
+
+`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
+
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+
+#### Returns
+
+[Task&lt;EncodeResult&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **Decode(LLamaBatch)**
+
+
+
+```csharp
+public DecodeResult Decode(LLamaBatch batch)
+```
+
+#### Parameters
+
+`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
+
+#### Returns
+
+[DecodeResult](./llama.native.decoderesult.md)<br>
+
+### **DecodeAsync(LLamaBatch, CancellationToken)**
+
+
+
+```csharp
+public Task<DecodeResult> DecodeAsync(LLamaBatch batch, CancellationToken cancellationToken)
+```
+
+#### Parameters
+
+`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
+
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+
+#### Returns
+
+[Task&lt;DecodeResult&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **Decode(LLamaBatchEmbeddings)**
+
+
+
+```csharp
+public DecodeResult Decode(LLamaBatchEmbeddings batch)
+```
+
+#### Parameters
+
+`batch` [LLamaBatchEmbeddings](./llama.native.llamabatchembeddings.md)<br>
+
+#### Returns
+
+[DecodeResult](./llama.native.decoderesult.md)<br>
+
+### **DecodeAsync(LLamaBatchEmbeddings, CancellationToken)**
+
+
+
+```csharp
+public Task<DecodeResult> DecodeAsync(LLamaBatchEmbeddings batch, CancellationToken cancellationToken)
+```
+
+#### Parameters
+
+`batch` [LLamaBatchEmbeddings](./llama.native.llamabatchembeddings.md)<br>
+
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+
+#### Returns
+
+[Task&lt;DecodeResult&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **DecodeAsync(List&lt;LLamaToken&gt;, LLamaSeqId, LLamaBatch, Int32)**
+
+
+
+```csharp
+public Task<ValueTuple<DecodeResult, int, int>> DecodeAsync(List<LLamaToken> tokens, LLamaSeqId id, LLamaBatch batch, int n_past)
+```
+
+#### Parameters
+
+`tokens` [List&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+
+`id` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
+`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
+
+`n_past` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Task&lt;ValueTuple&lt;DecodeResult, Int32, Int32&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+A tuple, containing the decode result, the number of tokens that have not been decoded yet and the total number of tokens that have been decoded.
+
+### **Dispose()**
+
+```csharp
+public void Dispose()
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.llamaembedder.md b/docs/xmldocs/llama.llamaembedder.md
index 333c856e5..8bf2c258a 100644
--- a/docs/xmldocs/llama.llamaembedder.md
+++ b/docs/xmldocs/llama.llamaembedder.md
@@ -1,43 +1,102 @@
+[`< Back`](./)
+
+---
+
 # LLamaEmbedder
 
 Namespace: LLama
 
-The embedder for LLama, which supports getting embeddings from text.
+Generate high dimensional embedding vectors from text
+
+```csharp
+public sealed class LLamaEmbedder : System.IDisposable, Microsoft.Extensions.AI.IEmbeddingGenerator`2[[System.String, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Extensions.AI.Embedding`1[[System.Single, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], Microsoft.Extensions.AI.Abstractions, Version=9.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], Microsoft.Extensions.AI.IEmbeddingGenerator
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaEmbedder](./llama.llamaembedder.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable), IEmbeddingGenerator&lt;String, Embedding&lt;Single&gt;&gt;, IEmbeddingGenerator<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
 
 ```csharp
-public class LLamaEmbedder
+public int EmbeddingSize { get; private set; }
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaEmbedder](./llama.llamaembedder.md)
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Context**
+
+LLama Context
+
+```csharp
+public LLamaContext Context { get; private set; }
+```
+
+#### Property Value
+
+[LLamaContext](./llama.llamacontext.md)<br>
 
 ## Constructors
 
-### **LLamaEmbedder(LLamaParams)**
+### **LLamaEmbedder(LLamaWeights, IContextParams, ILogger)**
+
+Create a new embedder, using the given LLamaWeights
 
 ```csharp
-public LLamaEmbedder(LLamaParams params)
+public LLamaEmbedder(LLamaWeights weights, IContextParams params, ILogger logger)
 ```
 
 #### Parameters
 
-`params` [LLamaParams](./llama.llamaparams.md)<br>
+`weights` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IContextParams](./llama.abstractions.icontextparams.md)<br>
+
+`logger` ILogger<br>
 
 ## Methods
 
-### **GetEmbeddings(String, Int32, Boolean)**
+### **Dispose()**
 
 ```csharp
-public Single[] GetEmbeddings(string text, int n_thread, bool add_bos)
+public void Dispose()
 ```
 
-#### Parameters
+### **GetEmbeddings(String, CancellationToken)**
+
+Get high dimensional embedding vectors for the given text. Depending on the pooling type used when constructing
+ this [LLamaEmbedder](./llama.llamaembedder.md) this may return an embedding vector per token, or one single embedding vector for the entire string.
+
+```csharp
+public Task<IReadOnlyList<Single[]>> GetEmbeddings(string input, CancellationToken cancellationToken)
+```
 
-`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+#### Parameters
 
-`n_thread` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`input` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
 #### Returns
 
-[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+[Task&lt;IReadOnlyList&lt;Single[]&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+[NotSupportedException](https://docs.microsoft.com/en-us/dotnet/api/system.notsupportedexception)<br>
+
+**Remarks:**
+
+Embedding vectors are not normalized, consider using one of the extensions in [SpanNormalizationExtensions](./llama.extensions.spannormalizationextensions.md).
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.llamamodel.md b/docs/xmldocs/llama.llamamodel.md
deleted file mode 100644
index 4c927a248..000000000
--- a/docs/xmldocs/llama.llamamodel.md
+++ /dev/null
@@ -1,226 +0,0 @@
-# LLamaModel
-
-Namespace: LLama
-
-```csharp
-public class LLamaModel : IChatModel
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaModel](./llama.llamamodel.md)<br>
-Implements [IChatModel](./llama.ichatmodel.md)
-
-## Properties
-
-### **Name**
-
-```csharp
-public string Name { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **NativeHandle**
-
-```csharp
-public SafeLLamaContextHandle NativeHandle { get; }
-```
-
-#### Property Value
-
-[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-## Constructors
-
-### **LLamaModel(String, String, Boolean, Boolean, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Dictionary&lt;Int32, Single&gt;, Int32, Single, Single, Single, Single, Single, Int32, Single, Single, Int32, Single, Single, String, String, String, String, List&lt;String&gt;, String, String, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean)**
-
-```csharp
-public LLamaModel(string model_path, string model_name, bool echo_input, bool verbose, int seed, int n_threads, int n_predict, int n_parts, int n_ctx, int n_batch, int n_keep, Dictionary<int, float> logit_bias, int top_k, float top_p, float tfs_z, float typical_p, float temp, float repeat_penalty, int repeat_last_n, float frequency_penalty, float presence_penalty, int mirostat, float mirostat_tau, float mirostat_eta, string prompt, string path_session, string input_prefix, string input_suffix, List<string> antiprompt, string lora_adapter, string lora_base, bool memory_f16, bool random_prompt, bool use_color, bool interactive, bool embedding, bool interactive_first, bool instruct, bool penalize_nl, bool perplexity, bool use_mmap, bool use_mlock, bool mem_test, bool verbose_prompt)
-```
-
-#### Parameters
-
-`model_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`model_name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`echo_input` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`verbose` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`seed` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_predict` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_parts` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_ctx` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_batch` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_keep` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`logit_bias` [Dictionary&lt;Int32, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
-
-`top_k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`top_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`tfs_z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`typical_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_last_n` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`frequency_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`presence_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`mirostat` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`mirostat_tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`mirostat_eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`input_prefix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`input_suffix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`antiprompt` [List&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
-
-`lora_adapter` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`lora_base` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`memory_f16` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`random_prompt` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`use_color` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`interactive` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`embedding` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`interactive_first` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`instruct` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`penalize_nl` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`perplexity` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`use_mmap` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`use_mlock` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`mem_test` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`verbose_prompt` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **LLamaModel(LLamaParams, String, Boolean, Boolean)**
-
-```csharp
-public LLamaModel(LLamaParams params, string name, bool echo_input, bool verbose)
-```
-
-#### Parameters
-
-`params` [LLamaParams](./llama.llamaparams.md)<br>
-
-`name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`echo_input` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`verbose` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-## Methods
-
-### **WithPrompt(String)**
-
-```csharp
-public LLamaModel WithPrompt(string prompt)
-```
-
-#### Parameters
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[LLamaModel](./llama.llamamodel.md)<br>
-
-### **WithPromptFile(String)**
-
-```csharp
-public LLamaModel WithPromptFile(string promptFileName)
-```
-
-#### Parameters
-
-`promptFileName` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[LLamaModel](./llama.llamamodel.md)<br>
-
-### **InitChatPrompt(String)**
-
-```csharp
-public void InitChatPrompt(string prompt)
-```
-
-#### Parameters
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **InitChatAntiprompt(String[])**
-
-```csharp
-public void InitChatAntiprompt(String[] antiprompt)
-```
-
-#### Parameters
-
-`antiprompt` [String[]](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Chat(String, String)**
-
-```csharp
-public IEnumerable<string> Chat(string text, string prompt)
-```
-
-#### Parameters
-
-`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-### **Call(String)**
-
-```csharp
-public IEnumerable<string> Call(string text)
-```
-
-#### Parameters
-
-`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
diff --git a/docs/xmldocs/llama.llamamodelv1.md b/docs/xmldocs/llama.llamamodelv1.md
deleted file mode 100644
index a4d02d001..000000000
--- a/docs/xmldocs/llama.llamamodelv1.md
+++ /dev/null
@@ -1,369 +0,0 @@
-# LLamaModelV1
-
-Namespace: LLama
-
-#### Caution
-
-This type is obsolete.
-
----
-
-```csharp
-public class LLamaModelV1
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaModelV1](./llama.llamamodelv1.md)
-
-## Constructors
-
-### **LLamaModelV1(String, Int32, Int32, Int32, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Int32, Int32, Int32, String, String, Boolean)**
-
-```csharp
-public LLamaModelV1(string model_path, int n_ctx, int n_parts, int seed, bool f16_kv, bool logits_all, bool vocab_only, bool use_mmap, bool use_mlock, bool embedding, int n_threads, int n_batch, int last_n_tokens_size, string lora_base, string lora_path, bool verbose)
-```
-
-#### Parameters
-
-`model_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`n_ctx` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_parts` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`seed` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`f16_kv` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`logits_all` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`vocab_only` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`use_mmap` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`use_mlock` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`embedding` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_batch` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`last_n_tokens_size` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`lora_base` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`lora_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`verbose` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **LLamaModelV1(LLamaModelV1)**
-
-```csharp
-public LLamaModelV1(LLamaModelV1 other)
-```
-
-#### Parameters
-
-`other` [LLamaModelV1](./llama.llamamodelv1.md)<br>
-
-## Methods
-
-### **Tokenize(String)**
-
-```csharp
-public List<int> Tokenize(string text)
-```
-
-#### Parameters
-
-`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[List&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
-
-### **DeTokenize(IEnumerable&lt;Int32&gt;)**
-
-```csharp
-public string DeTokenize(IEnumerable<int> tokens)
-```
-
-#### Parameters
-
-`tokens` [IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **DeTokenize(Int32)**
-
-```csharp
-public string DeTokenize(int token)
-```
-
-#### Parameters
-
-`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **SetCache(LLamaCache)**
-
-```csharp
-public void SetCache(LLamaCache cache)
-```
-
-#### Parameters
-
-`cache` [LLamaCache](./llama.llamacache.md)<br>
-
-### **Reset()**
-
-```csharp
-public void Reset()
-```
-
-### **Eval(List&lt;Int32&gt;)**
-
-```csharp
-public void Eval(List<int> tokens)
-```
-
-#### Parameters
-
-`tokens` [List&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
-
-### **Sample(Int32, Single, Single, Single, Single, Single)**
-
-```csharp
-public int Sample(int top_k, float top_p, float temp, float repeat_penalty, float frequency_penalty, float presence_penalty)
-```
-
-#### Parameters
-
-`top_k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`top_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`frequency_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`presence_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Generate(IEnumerable&lt;Int32&gt;, Int32, Single, Single, Single, Single, Single, Boolean)**
-
-```csharp
-public IEnumerable<int> Generate(IEnumerable<int> tokens, int top_k, float top_p, float temp, float repeat_penalty, float frequency_penalty, float presence_penalty, bool reset)
-```
-
-#### Parameters
-
-`tokens` [IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-`top_k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`top_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`frequency_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`presence_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`reset` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-#### Returns
-
-[IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-### **CreateEmbedding(String)**
-
-```csharp
-public Embedding CreateEmbedding(string input)
-```
-
-#### Parameters
-
-`input` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[Embedding](./llama.types.embedding.md)<br>
-
-### **Embed(String)**
-
-```csharp
-public Single[] Embed(string input)
-```
-
-#### Parameters
-
-`input` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **CreateCompletion(String, String, Int32, Single, Single, Int32, Boolean, String[], Single, Single, Single, Int32)**
-
-```csharp
-public IEnumerable<CompletionChunk> CreateCompletion(string prompt, string suffix, int max_tokens, float temperature, float top_p, int logprobs, bool echo, String[] stop, float frequency_penalty, float presence_penalty, float repeat_penalty, int top_k)
-```
-
-#### Parameters
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`suffix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`max_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`temperature` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`top_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`logprobs` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`echo` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`stop` [String[]](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`frequency_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`presence_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`top_k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-#### Returns
-
-[IEnumerable&lt;CompletionChunk&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-### **Call(String, String, Int32, Single, Single, Int32, Boolean, String[], Single, Single, Single, Int32)**
-
-```csharp
-public IEnumerable<CompletionChunk> Call(string prompt, string suffix, int max_tokens, float temperature, float top_p, int logprobs, bool echo, String[] stop, float frequency_penalty, float presence_penalty, float repeat_penalty, int top_k)
-```
-
-#### Parameters
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`suffix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`max_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`temperature` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`top_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`logprobs` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`echo` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`stop` [String[]](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`frequency_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`presence_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`top_k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-#### Returns
-
-[IEnumerable&lt;CompletionChunk&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-### **CreateChatCompletion(IEnumerable&lt;ChatCompletionMessage&gt;, Single, Single, Int32, String[], Int32, Single, Single, Single)**
-
-```csharp
-public IEnumerable<ChatCompletionChunk> CreateChatCompletion(IEnumerable<ChatCompletionMessage> messages, float temperature, float top_p, int top_k, String[] stop, int max_tokens, float presence_penalty, float frequency_penalty, float repeat_penalty)
-```
-
-#### Parameters
-
-`messages` [IEnumerable&lt;ChatCompletionMessage&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-`temperature` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`top_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`top_k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`stop` [String[]](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`max_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`presence_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`frequency_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-#### Returns
-
-[IEnumerable&lt;ChatCompletionChunk&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-### **SaveState()**
-
-```csharp
-public LLamaState SaveState()
-```
-
-#### Returns
-
-[LLamaState](./llama.llamastate.md)<br>
-
-### **LoadState(LLamaState)**
-
-```csharp
-public void LoadState(LLamaState state)
-```
-
-#### Parameters
-
-`state` [LLamaState](./llama.llamastate.md)<br>
-
-### **LongestTokenPrefix(IEnumerable&lt;Int32&gt;, IEnumerable&lt;Int32&gt;)**
-
-```csharp
-internal static int LongestTokenPrefix(IEnumerable<int> a, IEnumerable<int> b)
-```
-
-#### Parameters
-
-`a` [IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-`b` [IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **&lt;CreateChatCompletion&gt;g__GetRole|31_0(ChatCompletionMessage)**
-
-```csharp
-internal static string <CreateChatCompletion>g__GetRole|31_0(ChatCompletionMessage message)
-```
-
-#### Parameters
-
-`message` [ChatCompletionMessage](./llama.types.chatcompletionmessage.md)<br>
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.llamaparams.md b/docs/xmldocs/llama.llamaparams.md
deleted file mode 100644
index cb74af2ad..000000000
--- a/docs/xmldocs/llama.llamaparams.md
+++ /dev/null
@@ -1,349 +0,0 @@
-# LLamaParams
-
-Namespace: LLama
-
-```csharp
-public struct LLamaParams
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaParams](./llama.llamaparams.md)
-
-## Fields
-
-### **seed**
-
-```csharp
-public int seed;
-```
-
-### **n_threads**
-
-```csharp
-public int n_threads;
-```
-
-### **n_predict**
-
-```csharp
-public int n_predict;
-```
-
-### **n_parts**
-
-```csharp
-public int n_parts;
-```
-
-### **n_ctx**
-
-```csharp
-public int n_ctx;
-```
-
-### **n_batch**
-
-```csharp
-public int n_batch;
-```
-
-### **n_keep**
-
-```csharp
-public int n_keep;
-```
-
-### **logit_bias**
-
-```csharp
-public Dictionary<int, float> logit_bias;
-```
-
-### **top_k**
-
-```csharp
-public int top_k;
-```
-
-### **top_p**
-
-```csharp
-public float top_p;
-```
-
-### **tfs_z**
-
-```csharp
-public float tfs_z;
-```
-
-### **typical_p**
-
-```csharp
-public float typical_p;
-```
-
-### **temp**
-
-```csharp
-public float temp;
-```
-
-### **repeat_penalty**
-
-```csharp
-public float repeat_penalty;
-```
-
-### **repeat_last_n**
-
-```csharp
-public int repeat_last_n;
-```
-
-### **frequency_penalty**
-
-```csharp
-public float frequency_penalty;
-```
-
-### **presence_penalty**
-
-```csharp
-public float presence_penalty;
-```
-
-### **mirostat**
-
-```csharp
-public int mirostat;
-```
-
-### **mirostat_tau**
-
-```csharp
-public float mirostat_tau;
-```
-
-### **mirostat_eta**
-
-```csharp
-public float mirostat_eta;
-```
-
-### **model**
-
-```csharp
-public string model;
-```
-
-### **prompt**
-
-```csharp
-public string prompt;
-```
-
-### **path_session**
-
-```csharp
-public string path_session;
-```
-
-### **input_prefix**
-
-```csharp
-public string input_prefix;
-```
-
-### **input_suffix**
-
-```csharp
-public string input_suffix;
-```
-
-### **antiprompt**
-
-```csharp
-public List<string> antiprompt;
-```
-
-### **lora_adapter**
-
-```csharp
-public string lora_adapter;
-```
-
-### **lora_base**
-
-```csharp
-public string lora_base;
-```
-
-### **memory_f16**
-
-```csharp
-public bool memory_f16;
-```
-
-### **random_prompt**
-
-```csharp
-public bool random_prompt;
-```
-
-### **use_color**
-
-```csharp
-public bool use_color;
-```
-
-### **interactive**
-
-```csharp
-public bool interactive;
-```
-
-### **embedding**
-
-```csharp
-public bool embedding;
-```
-
-### **interactive_first**
-
-```csharp
-public bool interactive_first;
-```
-
-### **instruct**
-
-```csharp
-public bool instruct;
-```
-
-### **penalize_nl**
-
-```csharp
-public bool penalize_nl;
-```
-
-### **perplexity**
-
-```csharp
-public bool perplexity;
-```
-
-### **use_mmap**
-
-```csharp
-public bool use_mmap;
-```
-
-### **use_mlock**
-
-```csharp
-public bool use_mlock;
-```
-
-### **mem_test**
-
-```csharp
-public bool mem_test;
-```
-
-### **verbose_prompt**
-
-```csharp
-public bool verbose_prompt;
-```
-
-## Constructors
-
-### **LLamaParams(Int32, Int32, Int32, Int32, Int32, Int32, Int32, Dictionary&lt;Int32, Single&gt;, Int32, Single, Single, Single, Single, Single, Int32, Single, Single, Int32, Single, Single, String, String, String, String, String, List&lt;String&gt;, String, String, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean, Boolean)**
-
-```csharp
-LLamaParams(int seed, int n_threads, int n_predict, int n_parts, int n_ctx, int n_batch, int n_keep, Dictionary<int, float> logit_bias, int top_k, float top_p, float tfs_z, float typical_p, float temp, float repeat_penalty, int repeat_last_n, float frequency_penalty, float presence_penalty, int mirostat, float mirostat_tau, float mirostat_eta, string model, string prompt, string path_session, string input_prefix, string input_suffix, List<string> antiprompt, string lora_adapter, string lora_base, bool memory_f16, bool random_prompt, bool use_color, bool interactive, bool embedding, bool interactive_first, bool instruct, bool penalize_nl, bool perplexity, bool use_mmap, bool use_mlock, bool mem_test, bool verbose_prompt)
-```
-
-#### Parameters
-
-`seed` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_predict` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_parts` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_ctx` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_batch` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_keep` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`logit_bias` [Dictionary&lt;Int32, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
-
-`top_k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`top_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`tfs_z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`typical_p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`repeat_last_n` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`frequency_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`presence_penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`mirostat` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`mirostat_tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`mirostat_eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`model` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`input_prefix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`input_suffix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`antiprompt` [List&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
-
-`lora_adapter` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`lora_base` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`memory_f16` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`random_prompt` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`use_color` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`interactive` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`embedding` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`interactive_first` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`instruct` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`penalize_nl` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`perplexity` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`use_mmap` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`use_mlock` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`mem_test` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`verbose_prompt` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
diff --git a/docs/xmldocs/llama.llamaquantizer.md b/docs/xmldocs/llama.llamaquantizer.md
index 977185d72..4a8756391 100644
--- a/docs/xmldocs/llama.llamaquantizer.md
+++ b/docs/xmldocs/llama.llamaquantizer.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaQuantizer
 
 Namespace: LLama
@@ -8,7 +12,8 @@ The quantizer to quantize the model.
 public static class LLamaQuantizer
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaQuantizer](./llama.llamaquantizer.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaQuantizer](./llama.llamaquantizer.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Methods
 
@@ -81,3 +86,7 @@ Whether the quantization is successful.
 #### Exceptions
 
 [ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.llamareranker.md b/docs/xmldocs/llama.llamareranker.md
new file mode 100644
index 000000000..00e77dd36
--- /dev/null
+++ b/docs/xmldocs/llama.llamareranker.md
@@ -0,0 +1,131 @@
+[`< Back`](./)
+
+---
+
+# LLamaReranker
+
+Namespace: LLama
+
+Get rank scores between prompt and documents
+
+```csharp
+public sealed class LLamaReranker : System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaReranker](./llama.llamareranker.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Context**
+
+LLama Context
+
+```csharp
+public LLamaContext Context { get; }
+```
+
+#### Property Value
+
+[LLamaContext](./llama.llamacontext.md)<br>
+
+## Constructors
+
+### **LLamaReranker(LLamaWeights, IContextParams, ILogger)**
+
+Create a new reranker, using the given LLamaWeights
+
+```csharp
+public LLamaReranker(LLamaWeights weights, IContextParams params, ILogger logger)
+```
+
+#### Parameters
+
+`weights` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IContextParams](./llama.abstractions.icontextparams.md)<br>
+
+`logger` ILogger<br>
+
+## Methods
+
+### **Dispose()**
+
+```csharp
+public void Dispose()
+```
+
+### **GetRelevanceScores(String, IReadOnlyList&lt;String&gt;, Boolean, CancellationToken)**
+
+Retrieve relevance scores for input and documents by reranking, execute once.
+
+```csharp
+public Task<IReadOnlyList<float>> GetRelevanceScores(string input, IReadOnlyList<string> documents, bool normalize, CancellationToken cancellationToken)
+```
+
+#### Parameters
+
+`input` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`documents` [IReadOnlyList&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+
+`normalize` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether to normalize the score to the range (0, 1)
+
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+
+#### Returns
+
+[Task&lt;IReadOnlyList&lt;Single&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+[NotSupportedException](https://docs.microsoft.com/en-us/dotnet/api/system.notsupportedexception)<br>
+
+### **GetRelevanceScoreWithTokenCount(String, String, Boolean, CancellationToken)**
+
+Retrieve relevance score for input and document by reranking
+
+```csharp
+public Task<ValueTuple<float, int>> GetRelevanceScoreWithTokenCount(string input, string document, bool normalize, CancellationToken cancellationToken)
+```
+
+#### Parameters
+
+`input` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`document` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`normalize` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether to normalize the score to the range (0, 1)
+
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+
+#### Returns
+
+[Task&lt;ValueTuple&lt;Single, Int32&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+[NotSupportedException](https://docs.microsoft.com/en-us/dotnet/api/system.notsupportedexception)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.llamastate.md b/docs/xmldocs/llama.llamastate.md
deleted file mode 100644
index 4db410f03..000000000
--- a/docs/xmldocs/llama.llamastate.md
+++ /dev/null
@@ -1,160 +0,0 @@
-# LLamaState
-
-Namespace: LLama
-
-```csharp
-public class LLamaState : System.IEquatable`1[[LLama.LLamaState, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaState](./llama.llamastate.md)<br>
-Implements [IEquatable&lt;LLamaState&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **EvalTokens**
-
-```csharp
-public Queue<int> EvalTokens { get; set; }
-```
-
-#### Property Value
-
-[Queue&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.queue-1)<br>
-
-### **EvalLogits**
-
-```csharp
-public Queue<Single[]> EvalLogits { get; set; }
-```
-
-#### Property Value
-
-[Queue&lt;Single[]&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.queue-1)<br>
-
-### **State**
-
-```csharp
-public Byte[] State { get; set; }
-```
-
-#### Property Value
-
-[Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
-
-### **Size**
-
-```csharp
-public int Size { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-## Constructors
-
-### **LLamaState(Queue&lt;Int32&gt;, Queue&lt;Single[]&gt;, Byte[], Int32)**
-
-```csharp
-public LLamaState(Queue<int> EvalTokens, Queue<Single[]> EvalLogits, Byte[] State, int Size)
-```
-
-#### Parameters
-
-`EvalTokens` [Queue&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.queue-1)<br>
-
-`EvalLogits` [Queue&lt;Single[]&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.queue-1)<br>
-
-`State` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
-
-`Size` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(LLamaState)**
-
-```csharp
-public bool Equals(LLamaState other)
-```
-
-#### Parameters
-
-`other` [LLamaState](./llama.llamastate.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public LLamaState <Clone>$()
-```
-
-#### Returns
-
-[LLamaState](./llama.llamastate.md)<br>
-
-### **Deconstruct(Queue`1&, Queue`1&, Byte[]&, Int32&)**
-
-```csharp
-public void Deconstruct(Queue`1& EvalTokens, Queue`1& EvalLogits, Byte[]& State, Int32& Size)
-```
-
-#### Parameters
-
-`EvalTokens` [Queue`1&](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.queue-1&)<br>
-
-`EvalLogits` [Queue`1&](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.queue-1&)<br>
-
-`State` [Byte[]&](https://docs.microsoft.com/en-us/dotnet/api/system.byte&)<br>
-
-`Size` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
diff --git a/docs/xmldocs/llama.llamatemplate.md b/docs/xmldocs/llama.llamatemplate.md
new file mode 100644
index 000000000..e35898bc2
--- /dev/null
+++ b/docs/xmldocs/llama.llamatemplate.md
@@ -0,0 +1,195 @@
+[`< Back`](./)
+
+---
+
+# LLamaTemplate
+
+Namespace: LLama
+
+Converts a sequence of messages into text according to a model template
+
+```csharp
+public sealed class LLamaTemplate
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaTemplate](./llama.llamatemplate.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute), [DefaultMemberAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.defaultmemberattribute)
+
+## Fields
+
+### **Encoding**
+
+The encoding algorithm to use
+
+```csharp
+public static Encoding Encoding;
+```
+
+## Properties
+
+### **Count**
+
+Number of messages added to this template
+
+```csharp
+public int Count { get; private set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Item**
+
+```csharp
+public TextMessage Item { get; }
+```
+
+#### Property Value
+
+[TextMessage](./llama.llamatemplate.textmessage.md)<br>
+
+### **AddAssistant**
+
+Whether to end the prompt with the token(s) that indicate the start of an assistant message.
+
+```csharp
+public bool AddAssistant { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Constructors
+
+### **LLamaTemplate(SafeLlamaModelHandle, String, Boolean)**
+
+Construct a new template, using the default model template
+
+```csharp
+public LLamaTemplate(SafeLlamaModelHandle model, string name, bool strict)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+The native handle of the loaded model.
+
+`name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The name of the template, in case there are many or differently named. Set to 'null' for the default behaviour of finding an appropriate match.
+
+`strict` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Setting this to true will cause the call to throw if no valid templates are found.
+
+### **LLamaTemplate(LLamaWeights, Boolean)**
+
+Construct a new template, using the default model template
+
+```csharp
+public LLamaTemplate(LLamaWeights weights, bool strict)
+```
+
+#### Parameters
+
+`weights` [LLamaWeights](./llama.llamaweights.md)<br>
+The handle of the loaded model's weights.
+
+`strict` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Setting this to true will cause the call to throw if no valid templates are found.
+
+### **LLamaTemplate(String)**
+
+Construct a new template, using a custom template.
+
+```csharp
+public LLamaTemplate(string customTemplate)
+```
+
+#### Parameters
+
+`customTemplate` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+**Remarks:**
+
+Only support a pre-defined list of templates. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
+
+## Methods
+
+### **Add(String, String)**
+
+Add a new message to the end of this template
+
+```csharp
+public LLamaTemplate Add(string role, string content)
+```
+
+#### Parameters
+
+`role` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[LLamaTemplate](./llama.llamatemplate.md)<br>
+This template, for chaining calls.
+
+### **Add(TextMessage)**
+
+Add a new message to the end of this template
+
+```csharp
+public LLamaTemplate Add(TextMessage message)
+```
+
+#### Parameters
+
+`message` [TextMessage](./llama.llamatemplate.textmessage.md)<br>
+
+#### Returns
+
+[LLamaTemplate](./llama.llamatemplate.md)<br>
+This template, for chaining calls.
+
+### **RemoveAt(Int32)**
+
+Remove a message at the given index
+
+```csharp
+public LLamaTemplate RemoveAt(int index)
+```
+
+#### Parameters
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[LLamaTemplate](./llama.llamatemplate.md)<br>
+This template, for chaining calls.
+
+### **Clear()**
+
+Remove all messages from the template and resets internal state to accept/generate new messages
+
+```csharp
+public void Clear()
+```
+
+### **Apply()**
+
+Apply the template to the messages and return a span containing the results
+
+```csharp
+public ReadOnlySpan<byte> Apply()
+```
+
+#### Returns
+
+[ReadOnlySpan&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+A span over the buffer that holds the applied template
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.llamatransforms.md b/docs/xmldocs/llama.llamatransforms.md
index 5b23a419c..b69b80e1a 100644
--- a/docs/xmldocs/llama.llamatransforms.md
+++ b/docs/xmldocs/llama.llamatransforms.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaTransforms
 
 Namespace: LLama
@@ -17,3 +21,7 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
 ```csharp
 public LLamaTransforms()
 ```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.llamaweights.md b/docs/xmldocs/llama.llamaweights.md
new file mode 100644
index 000000000..7ef9fba93
--- /dev/null
+++ b/docs/xmldocs/llama.llamaweights.md
@@ -0,0 +1,207 @@
+[`< Back`](./)
+
+---
+
+# LLamaWeights
+
+Namespace: LLama
+
+A set of model weights, loaded into memory.
+
+```csharp
+public sealed class LLamaWeights : System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaWeights](./llama.llamaweights.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **NativeHandle**
+
+The native handle, which is used in the native APIs
+
+```csharp
+public SafeLlamaModelHandle NativeHandle { get; }
+```
+
+#### Property Value
+
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+**Remarks:**
+
+Be careful how you use this!
+
+### **ContextSize**
+
+Total number of tokens in the context
+
+```csharp
+public int ContextSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **SizeInBytes**
+
+Get the size of this model in bytes
+
+```csharp
+public ulong SizeInBytes { get; }
+```
+
+#### Property Value
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **ParameterCount**
+
+Get the number of parameters in this model
+
+```csharp
+public ulong ParameterCount { get; }
+```
+
+#### Property Value
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Vocab**
+
+Get the special tokens of this model
+
+```csharp
+public Vocabulary Vocab { get; }
+```
+
+#### Property Value
+
+[Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
+
+### **Metadata**
+
+All metadata keys in this model
+
+```csharp
+public IReadOnlyDictionary<string, string> Metadata { get; set; }
+```
+
+#### Property Value
+
+[IReadOnlyDictionary&lt;String, String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlydictionary-2)<br>
+
+## Methods
+
+### **LoadFromFile(IModelParams)**
+
+Load weights into memory
+
+```csharp
+public static LLamaWeights LoadFromFile(IModelParams params)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+#### Returns
+
+[LLamaWeights](./llama.llamaweights.md)<br>
+
+### **LoadFromFileAsync(IModelParams, CancellationToken, IProgress&lt;Single&gt;)**
+
+Load weights into memory
+
+```csharp
+public static Task<LLamaWeights> LoadFromFileAsync(IModelParams params, CancellationToken token, IProgress<float> progressReporter)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+Parameters to use to load the model
+
+`token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+A cancellation token that can interrupt model loading
+
+`progressReporter` [IProgress&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iprogress-1)<br>
+Receives progress updates as the model loads (0 to 1)
+
+#### Returns
+
+[Task&lt;LLamaWeights&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+#### Exceptions
+
+[LoadWeightsFailedException](./llama.exceptions.loadweightsfailedexception.md)<br>
+Thrown if weights failed to load for any reason. e.g. Invalid file format or loading cancelled.
+
+[OperationCanceledException](https://docs.microsoft.com/en-us/dotnet/api/system.operationcanceledexception)<br>
+Thrown if the cancellation token is cancelled.
+
+### **Dispose()**
+
+```csharp
+public void Dispose()
+```
+
+### **CreateContext(IContextParams, ILogger)**
+
+Create a llama_context using this model
+
+```csharp
+public LLamaContext CreateContext(IContextParams params, ILogger logger)
+```
+
+#### Parameters
+
+`params` [IContextParams](./llama.abstractions.icontextparams.md)<br>
+
+`logger` ILogger<br>
+
+#### Returns
+
+[LLamaContext](./llama.llamacontext.md)<br>
+
+### **Tokenize(String, Boolean, Boolean, Encoding)**
+
+Convert a string of text into tokens
+
+```csharp
+public LLamaToken[] Tokenize(string text, bool add_bos, bool special, Encoding encoding)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[LLamaToken[]](./llama.native.llamatoken.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.llavaweights.md b/docs/xmldocs/llama.llavaweights.md
index c44fafeab..59a44ae29 100644
--- a/docs/xmldocs/llama.llavaweights.md
+++ b/docs/xmldocs/llama.llavaweights.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLavaWeights
 
 Namespace: LLama
@@ -9,7 +13,8 @@ public sealed class LLavaWeights : System.IDisposable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLavaWeights](./llama.llavaweights.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -48,6 +53,25 @@ path to the "mmproj" model file
 
 [LLavaWeights](./llama.llavaweights.md)<br>
 
+### **LoadFromFileAsync(String, CancellationToken)**
+
+Load weights into memory
+
+```csharp
+public static Task<LLavaWeights> LoadFromFileAsync(string mmProject, CancellationToken token)
+```
+
+#### Parameters
+
+`mmProject` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+path to the "mmproj" model file
+
+`token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+
+#### Returns
+
+[Task&lt;LLavaWeights&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
 ### **CreateImageEmbeddings(LLamaContext, Byte[])**
 
 Create the Image Embeddings from the bytes of an image.
@@ -62,11 +86,36 @@ public SafeLlavaImageEmbedHandle CreateImageEmbeddings(LLamaContext ctxLlama, By
 
 `image` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
 Image bytes. Supported formats:
- JPGPNGBMPTGA
+
+- 
+- 
+- 
+-
+
+#### Returns
+
+[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+
+### **CreateImageEmbeddings(Byte[], Int32)**
+
+Create the Image Embeddings.
+
+```csharp
+public SafeLlavaImageEmbedHandle CreateImageEmbeddings(Byte[] image, int threads)
+```
+
+#### Parameters
+
+`image` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
+Image in binary format (it supports jpeg format only)
+
+`threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Number of threads to use
 
 #### Returns
 
 [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+return the SafeHandle of these embeddings
 
 ### **CreateImageEmbeddings(LLamaContext, String)**
 
@@ -82,7 +131,39 @@ public SafeLlavaImageEmbedHandle CreateImageEmbeddings(LLamaContext ctxLlama, st
 
 `image` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 Path to the image file. Supported formats:
- JPGPNGBMPTGA
+
+- 
+- 
+- 
+-
+
+#### Returns
+
+[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+
+#### Exceptions
+
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+
+### **CreateImageEmbeddings(String, Int32)**
+
+Create the Image Embeddings from the bytes of an image.
+
+```csharp
+public SafeLlavaImageEmbedHandle CreateImageEmbeddings(string image, int threads)
+```
+
+#### Parameters
+
+`image` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Path to the image file. Supported formats:
+
+- 
+- 
+- 
+-
+
+`threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
 #### Returns
 
@@ -117,3 +198,7 @@ public bool EvalImageEmbed(LLamaContext ctxLlama, SafeLlavaImageEmbedHandle imag
 ```csharp
 public void Dispose()
 ```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.avxlevel.md b/docs/xmldocs/llama.native.avxlevel.md
new file mode 100644
index 000000000..038681ab2
--- /dev/null
+++ b/docs/xmldocs/llama.native.avxlevel.md
@@ -0,0 +1,29 @@
+[`< Back`](./)
+
+---
+
+# AvxLevel
+
+Namespace: LLama.Native
+
+Avx support configuration
+
+```csharp
+public enum AvxLevel
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [AvxLevel](./llama.native.avxlevel.md)<br>
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+
+## Fields
+
+| Name | Value | Description |
+| --- | --: | --- |
+| None | 0 | No AVX |
+| Avx | 1 | Advanced Vector Extensions (supported by most processors after 2011) |
+| Avx2 | 2 | AVX2 (supported by most processors after 2013) |
+| Avx512 | 3 | AVX512 (supported by some processors after 2016, not widely supported) |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.decoderesult.md b/docs/xmldocs/llama.native.decoderesult.md
index 86ff26b84..fe2f0fa4c 100644
--- a/docs/xmldocs/llama.native.decoderesult.md
+++ b/docs/xmldocs/llama.native.decoderesult.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # DecodeResult
 
 Namespace: LLama.Native
@@ -9,7 +13,7 @@ public enum DecodeResult
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [DecodeResult](./llama.native.decoderesult.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 ## Fields
 
@@ -18,3 +22,10 @@ Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icom
 | Error | -1 | An unspecified error |
 | Ok | 0 | Ok. |
 | NoKvSlot | 1 | Could not find a KV slot for the batch (try reducing the size of the batch or increase the context) |
+| ComputeAborted | 2 | Compute was aborted (e.g. due to callback request or timeout) |
+| AllocationFailed | -2 | Failed to allocate memory or reserve output space |
+| DecodeFailed | -3 | General failure during decode (e.g. internal error, slot failure) |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.defaultnativelibraryselectingpolicy.md b/docs/xmldocs/llama.native.defaultnativelibraryselectingpolicy.md
new file mode 100644
index 000000000..e290485de
--- /dev/null
+++ b/docs/xmldocs/llama.native.defaultnativelibraryselectingpolicy.md
@@ -0,0 +1,46 @@
+[`< Back`](./)
+
+---
+
+# DefaultNativeLibrarySelectingPolicy
+
+Namespace: LLama.Native
+
+```csharp
+public class DefaultNativeLibrarySelectingPolicy : LLama.Abstractions.INativeLibrarySelectingPolicy
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [DefaultNativeLibrarySelectingPolicy](./llama.native.defaultnativelibraryselectingpolicy.md)<br>
+Implements [INativeLibrarySelectingPolicy](./llama.abstractions.inativelibraryselectingpolicy.md)
+
+## Constructors
+
+### **DefaultNativeLibrarySelectingPolicy()**
+
+```csharp
+public DefaultNativeLibrarySelectingPolicy()
+```
+
+## Methods
+
+### **Apply(Description, SystemInfo, LLamaLogCallback)**
+
+```csharp
+public IEnumerable<INativeLibrary> Apply(Description description, SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`description` [Description](./llama.native.nativelibraryconfig.description.md)<br>
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[IEnumerable&lt;INativeLibrary&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.encoderesult.md b/docs/xmldocs/llama.native.encoderesult.md
new file mode 100644
index 000000000..9106a5dcf
--- /dev/null
+++ b/docs/xmldocs/llama.native.encoderesult.md
@@ -0,0 +1,27 @@
+[`< Back`](./)
+
+---
+
+# EncodeResult
+
+Namespace: LLama.Native
+
+Return codes from llama_encode
+
+```csharp
+public enum EncodeResult
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [EncodeResult](./llama.native.encoderesult.md)<br>
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+
+## Fields
+
+| Name | Value | Description |
+| --- | --: | --- |
+| Error | -1 | An unspecified error |
+| Ok | 0 | Ok. |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.ggmltype.md b/docs/xmldocs/llama.native.ggmltype.md
index 2fa955d08..cd8361bc0 100644
--- a/docs/xmldocs/llama.native.ggmltype.md
+++ b/docs/xmldocs/llama.native.ggmltype.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # GGMLType
 
 Namespace: LLama.Native
@@ -9,7 +13,7 @@ public enum GGMLType
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [GGMLType](./llama.native.ggmltype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 ## Fields
 
@@ -33,3 +37,7 @@ Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icom
 | GGML_TYPE_I16 | 17 | Integer, 16 bit |
 | GGML_TYPE_I32 | 18 | Integer, 32 bit |
 | GGML_TYPE_COUNT | 19 | The value of this entry is the count of the number of possible quant types. |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.gpusplitmode.md b/docs/xmldocs/llama.native.gpusplitmode.md
index 756637bef..c7f384cb9 100644
--- a/docs/xmldocs/llama.native.gpusplitmode.md
+++ b/docs/xmldocs/llama.native.gpusplitmode.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # GPUSplitMode
 
 Namespace: LLama.Native
@@ -9,7 +13,7 @@ public enum GPUSplitMode
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [GPUSplitMode](./llama.native.gpusplitmode.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 **Remarks:**
 
@@ -21,4 +25,8 @@ llama_split_mode
 | --- | --: | --- |
 | None | 0 | Single GPU |
 | Layer | 1 | Split layers and KV across GPUs |
-| Row | 2 | split rows across GPUs |
+| Row | 2 | split layers and KV across GPUs, use tensor parallelism if supported |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.icustomsampler.md b/docs/xmldocs/llama.native.icustomsampler.md
new file mode 100644
index 000000000..717c3f598
--- /dev/null
+++ b/docs/xmldocs/llama.native.icustomsampler.md
@@ -0,0 +1,87 @@
+[`< Back`](./)
+
+---
+
+# ICustomSampler
+
+Namespace: LLama.Native
+
+A custom sampler stage for modifying logits or selecting a token
+
+```csharp
+public interface ICustomSampler : System.IDisposable
+```
+
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute)
+
+## Properties
+
+### **Name**
+
+The human readable name of this stage
+
+```csharp
+public abstract string Name { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Methods
+
+### **Apply(LLamaTokenDataArrayNative&)**
+
+Apply this stage to a set of logits.
+ This can modify logits or select a token (or both).
+ If logits are modified the Sorted flag must be set to false.
+
+```csharp
+void Apply(LLamaTokenDataArrayNative& tokenData)
+```
+
+#### Parameters
+
+`tokenData` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+
+**Remarks:**
+
+If the logits are no longer sorted after the custom sampler has run it is critically important to
+ set Sorted=false. If unsure, always set it to false, this is a safe default.
+
+### **Accept(LLamaToken)**
+
+Update the internal state of the sampler when a token is chosen
+
+```csharp
+void Accept(LLamaToken token)
+```
+
+#### Parameters
+
+`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+
+### **Reset()**
+
+Reset the internal state of this sampler
+
+```csharp
+void Reset()
+```
+
+### **Clone()**
+
+Create a clone of this sampler
+
+```csharp
+ICustomSampler Clone()
+```
+
+#### Returns
+
+[ICustomSampler](./llama.native.icustomsampler.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamaattentiontype.md b/docs/xmldocs/llama.native.llamaattentiontype.md
new file mode 100644
index 000000000..9d26960d4
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamaattentiontype.md
@@ -0,0 +1,32 @@
+[`< Back`](./)
+
+---
+
+# LLamaAttentionType
+
+Namespace: LLama.Native
+
+
+
+```csharp
+public enum LLamaAttentionType
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaAttentionType](./llama.native.llamaattentiontype.md)<br>
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+
+**Remarks:**
+
+llama_attention_type
+
+## Fields
+
+| Name | Value | Description |
+| --- | --: | --- |
+| Unspecified | -1 | Unspecified attention type. The library will attempt to find the best fit |
+| Causal | 0 | The causal mask will be applied, causing tokens to only see previous tokens in the same sequence, and not future ones |
+| NonCausal | 1 | The causal mask will not be applied, and tokens of the same sequence will be able to see each other |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamabatch.md b/docs/xmldocs/llama.native.llamabatch.md
index 72a97796f..e37e681fd 100644
--- a/docs/xmldocs/llama.native.llamabatch.md
+++ b/docs/xmldocs/llama.native.llamabatch.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaBatch
 
 Namespace: LLama.Native
@@ -8,7 +12,8 @@ A batch allows submitting multiple tokens to multiple sequences simultaneously
 public class LLamaBatch
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaBatch](./llama.native.llamabatch.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaBatch](./llama.native.llamabatch.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -48,20 +53,6 @@ public LLamaBatch()
 
 ## Methods
 
-### **ToNativeBatch(LLamaNativeBatch&)**
-
-```csharp
-internal GroupDisposable ToNativeBatch(LLamaNativeBatch& batch)
-```
-
-#### Parameters
-
-`batch` [LLamaNativeBatch&](./llama.native.llamanativebatch&.md)<br>
-
-#### Returns
-
-[GroupDisposable](./llama.native.groupdisposable.md)<br>
-
 ### **Add(LLamaToken, LLamaPos, ReadOnlySpan&lt;LLamaSeqId&gt;, Boolean)**
 
 Add a single token to the batch at the same position in several sequences
@@ -187,18 +178,6 @@ Set TokenCount to zero for this batch
 public void Clear()
 ```
 
-### **GetLogitPositions(Span&lt;ValueTuple&lt;LLamaSeqId, Int32&gt;&gt;)**
-
-Get the positions where logits can be sampled from
-
-```csharp
-internal Span<ValueTuple<LLamaSeqId, int>> GetLogitPositions(Span<ValueTuple<LLamaSeqId, int>> dest)
-```
-
-#### Parameters
-
-`dest` [Span&lt;ValueTuple&lt;LLamaSeqId, Int32&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-
-#### Returns
+---
 
-[Span&lt;ValueTuple&lt;LLamaSeqId, Int32&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamabatchembeddings.md b/docs/xmldocs/llama.native.llamabatchembeddings.md
new file mode 100644
index 000000000..bd6dc673e
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamabatchembeddings.md
@@ -0,0 +1,213 @@
+[`< Back`](./)
+
+---
+
+# LLamaBatchEmbeddings
+
+Namespace: LLama.Native
+
+An embeddings batch allows submitting embeddings to multiple sequences simultaneously
+
+```csharp
+public class LLamaBatchEmbeddings
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaBatchEmbeddings](./llama.native.llamabatchembeddings.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **EmbeddingDimensions**
+
+Size of an individual embedding
+
+```csharp
+public int EmbeddingDimensions { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **EmbeddingsCount**
+
+The number of items in this batch
+
+```csharp
+public int EmbeddingsCount { get; private set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **SequenceCapacity**
+
+Maximum number of sequences an item can be assigned to (automatically grows if exceeded)
+
+```csharp
+public int SequenceCapacity { get; private set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+## Constructors
+
+### **LLamaBatchEmbeddings(Int32)**
+
+Create a new batch for submitting inputs to llama.cpp
+
+```csharp
+public LLamaBatchEmbeddings(int embeddingDimensions)
+```
+
+#### Parameters
+
+`embeddingDimensions` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+## Methods
+
+### **Add(ReadOnlySpan&lt;Single&gt;, LLamaPos, ReadOnlySpan&lt;LLamaSeqId&gt;, Boolean)**
+
+Add a single embedding to the batch at the same position in several sequences
+
+```csharp
+public int Add(ReadOnlySpan<float> embedding, LLamaPos pos, ReadOnlySpan<LLamaSeqId> sequences, bool logits)
+```
+
+#### Parameters
+
+`embedding` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+The embedding to add
+
+`pos` [LLamaPos](./llama.native.llamapos.md)<br>
+The position to add it att
+
+`sequences` [ReadOnlySpan&lt;LLamaSeqId&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+The set of sequences to add this token to
+
+`logits` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The index that the token was added at. Use this for GetLogitsIth
+
+**Remarks:**
+
+https://github.com/ggerganov/llama.cpp/blob/ad939626577cd25b462e8026cc543efb71528472/common/common.cpp#L829C2-L829C2
+
+### **Add(ReadOnlySpan&lt;Single&gt;, LLamaPos, LLamaSeqId, Boolean)**
+
+Add a single embedding to the batch for a single sequence
+
+```csharp
+public int Add(ReadOnlySpan<float> embedding, LLamaPos pos, LLamaSeqId sequence, bool logits)
+```
+
+#### Parameters
+
+`embedding` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+
+`pos` [LLamaPos](./llama.native.llamapos.md)<br>
+
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
+`logits` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The index that the token was added at. Use this for GetLogitsIth
+
+### **Add&lt;TParam&gt;(TParam, WriteEmbeddingsDelegate&lt;TParam&gt;, LLamaPos, ReadOnlySpan&lt;LLamaSeqId&gt;, Boolean)**
+
+Add a single embedding to the batch at the same position in several sequences
+
+```csharp
+public int Add<TParam>(TParam parameter, WriteEmbeddingsDelegate<TParam> write, LLamaPos pos, ReadOnlySpan<LLamaSeqId> sequences, bool logits)
+```
+
+#### Type Parameters
+
+`TParam`<br>
+Type of userdata passed to write delegate
+
+#### Parameters
+
+`parameter` TParam<br>
+Userdata passed to write delegate
+
+`write` WriteEmbeddingsDelegate&lt;TParam&gt;<br>
+Delegate called once to write data into a span
+
+`pos` [LLamaPos](./llama.native.llamapos.md)<br>
+Position to write this embedding to
+
+`sequences` [ReadOnlySpan&lt;LLamaSeqId&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+All sequences to assign this embedding to
+
+`logits` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether logits should be generated for this embedding
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The index that the token was added at. Use this for GetLogitsIth
+
+**Remarks:**
+
+https://github.com/ggerganov/llama.cpp/blob/ad939626577cd25b462e8026cc543efb71528472/common/common.cpp#L829C2-L829C2
+
+### **Add&lt;TParam&gt;(TParam, WriteEmbeddingsDelegate&lt;TParam&gt;, LLamaPos, LLamaSeqId, Boolean)**
+
+Add a single embedding to the batch at a position for one sequence
+
+```csharp
+public int Add<TParam>(TParam parameter, WriteEmbeddingsDelegate<TParam> write, LLamaPos pos, LLamaSeqId sequence, bool logits)
+```
+
+#### Type Parameters
+
+`TParam`<br>
+Type of userdata passed to write delegate
+
+#### Parameters
+
+`parameter` TParam<br>
+Userdata passed to write delegate
+
+`write` WriteEmbeddingsDelegate&lt;TParam&gt;<br>
+Delegate called once to write data into a span
+
+`pos` [LLamaPos](./llama.native.llamapos.md)<br>
+Position to write this embedding to
+
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+Sequence to assign this embedding to
+
+`logits` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether logits should be generated for this embedding
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The index that the token was added at. Use this for GetLogitsIth
+
+**Remarks:**
+
+https://github.com/ggerganov/llama.cpp/blob/ad939626577cd25b462e8026cc543efb71528472/common/common.cpp#L829C2-L829C2
+
+### **Clear()**
+
+Set EmbeddingsCount to zero for this batch
+
+```csharp
+public void Clear()
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamabeamsstate.md b/docs/xmldocs/llama.native.llamabeamsstate.md
deleted file mode 100644
index 5bdf73d4a..000000000
--- a/docs/xmldocs/llama.native.llamabeamsstate.md
+++ /dev/null
@@ -1,45 +0,0 @@
-# LLamaBeamsState
-
-Namespace: LLama.Native
-
-Passed to beam_search_callback function.
- Whenever 0 &lt; common_prefix_length, this number of tokens should be copied from any of the beams
- (e.g. beams[0]) as they will be removed (shifted) from all beams in all subsequent callbacks.
-
-```csharp
-public struct LLamaBeamsState
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaBeamsState](./llama.native.llamabeamsstate.md)
-
-## Fields
-
-### **CommonPrefixLength**
-
-Current max length of prefix tokens shared by all beams.
-
-```csharp
-public ulong CommonPrefixLength;
-```
-
-### **LastCall**
-
-True iff this is the last callback invocation.
-
-```csharp
-public bool LastCall;
-```
-
-## Properties
-
-### **Beams**
-
-The current state of each beam
-
-```csharp
-public Span<LLamaBeamView> Beams { get; }
-```
-
-#### Property Value
-
-[Span&lt;LLamaBeamView&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
diff --git a/docs/xmldocs/llama.native.llamabeamview.md b/docs/xmldocs/llama.native.llamabeamview.md
deleted file mode 100644
index f23eb95c4..000000000
--- a/docs/xmldocs/llama.native.llamabeamview.md
+++ /dev/null
@@ -1,43 +0,0 @@
-# LLamaBeamView
-
-Namespace: LLama.Native
-
-Information about a single beam in a beam search
-
-```csharp
-public struct LLamaBeamView
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaBeamView](./llama.native.llamabeamview.md)
-
-## Fields
-
-### **CumulativeProbability**
-
-Cumulative beam probability (renormalized relative to all beams)
-
-```csharp
-public float CumulativeProbability;
-```
-
-### **EndOfBeam**
-
-Callback should set this to true when a beam is at end-of-beam.
-
-```csharp
-public bool EndOfBeam;
-```
-
-## Properties
-
-### **Tokens**
-
-Tokens in this beam
-
-```csharp
-public Span<LLamaToken> Tokens { get; }
-```
-
-#### Property Value
-
-[Span&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
diff --git a/docs/xmldocs/llama.native.llamachatmessage.md b/docs/xmldocs/llama.native.llamachatmessage.md
index 7b0c90358..f89cf7354 100644
--- a/docs/xmldocs/llama.native.llamachatmessage.md
+++ b/docs/xmldocs/llama.native.llamachatmessage.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaChatMessage
 
 Namespace: LLama.Native
@@ -18,12 +22,20 @@ llama_chat_message
 
 ### **role**
 
+Pointer to the null terminated bytes that make up the role string
+
 ```csharp
 public Byte* role;
 ```
 
 ### **content**
 
+Pointer to the null terminated bytes that make up the content string
+
 ```csharp
 public Byte* content;
 ```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamacontextparams.md b/docs/xmldocs/llama.native.llamacontextparams.md
index 2bb397c64..3be7f1587 100644
--- a/docs/xmldocs/llama.native.llamacontextparams.md
+++ b/docs/xmldocs/llama.native.llamacontextparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaContextParams
 
 Namespace: LLama.Native
@@ -10,15 +14,12 @@ public struct LLamaContextParams
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaContextParams](./llama.native.llamacontextparams.md)
 
-## Fields
-
-### **seed**
+**Remarks:**
 
-RNG seed, -1 for random
+changing the default values of parameters marked as [EXPERIMENTAL] may cause crashes or incorrect results in certain configurations
+ https://github.com/ggerganov/llama.cpp/pull/7544
 
-```csharp
-public uint seed;
-```
+## Fields
 
 ### **n_ctx**
 
@@ -30,18 +31,34 @@ public uint n_ctx;
 
 ### **n_batch**
 
-prompt processing batch size
+logical maximum batch size that can be submitted to llama_decode
 
 ```csharp
 public uint n_batch;
 ```
 
+### **n_ubatch**
+
+physical maximum batch size
+
+```csharp
+public uint n_ubatch;
+```
+
+### **n_seq_max**
+
+max number of sequences (i.e. distinct states for recurrent models)
+
+```csharp
+public uint n_seq_max;
+```
+
 ### **n_threads**
 
 number of threads to use for generation
 
 ```csharp
-public uint n_threads;
+public int n_threads;
 ```
 
 ### **n_threads_batch**
@@ -49,7 +66,7 @@ public uint n_threads;
 number of threads to use for batch processing
 
 ```csharp
-public uint n_threads_batch;
+public int n_threads_batch;
 ```
 
 ### **rope_scaling_type**
@@ -60,6 +77,22 @@ RoPE scaling type, from `enum llama_rope_scaling_type`
 public RopeScalingType rope_scaling_type;
 ```
 
+### **llama_pooling_type**
+
+whether to pool (sum) embedding results by sequence id
+
+```csharp
+public LLamaPoolingType llama_pooling_type;
+```
+
+### **attention_type**
+
+Attention type to use for embeddings
+
+```csharp
+public LLamaAttentionType attention_type;
+```
+
 ### **rope_freq_base**
 
 RoPE base frequency, 0 = from model
@@ -142,7 +175,7 @@ public IntPtr cb_eval_user_data;
 
 ### **type_k**
 
-data type for K cache
+data type for K cache. EXPERIMENTAL
 
 ```csharp
 public GGMLType type_k;
@@ -150,20 +183,36 @@ public GGMLType type_k;
 
 ### **type_v**
 
-data type for V cache
+data type for V cache. EXPERIMENTAL
 
 ```csharp
 public GGMLType type_v;
 ```
 
+### **abort_callback**
+
+ggml_abort_callback
+
+```csharp
+public IntPtr abort_callback;
+```
+
+### **abort_callback_user_data**
+
+User data passed into abort_callback
+
+```csharp
+public IntPtr abort_callback_user_data;
+```
+
 ## Properties
 
-### **embedding**
+### **embeddings**
 
-embedding mode only
+if true, extract embeddings (together with logits)
 
 ```csharp
-public bool embedding { get; set; }
+public bool embeddings { get; set; }
 ```
 
 #### Property Value
@@ -182,14 +231,44 @@ public bool offload_kqv { get; set; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **do_pooling**
+### **flash_attention**
+
+whether to use flash attention. EXPERIMENTAL
+
+```csharp
+public bool flash_attention { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **no_perf**
 
-Whether to pool (sum) embedding results by sequence id (ignored if no pooling layer)
+whether to measure performance timings
 
 ```csharp
-public bool do_pooling { get; set; }
+public bool no_perf { get; set; }
 ```
 
 #### Property Value
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Default()**
+
+Get the default LLamaContextParams
+
+```csharp
+LLamaContextParams Default()
+```
+
+#### Returns
+
+[LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamaftype.md b/docs/xmldocs/llama.native.llamaftype.md
index 6f982cf12..158c2d9fb 100644
--- a/docs/xmldocs/llama.native.llamaftype.md
+++ b/docs/xmldocs/llama.native.llamaftype.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaFtype
 
 Namespace: LLama.Native
@@ -9,39 +13,50 @@ public enum LLamaFtype
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaFtype](./llama.native.llamaftype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+
+**Remarks:**
+
+C# representation of llama_ftype
 
 ## Fields
 
 | Name | Value | Description |
 | --- | --: | --- |
-| LLAMA_FTYPE_ALL_F32 | 0 | All f32 |
-| LLAMA_FTYPE_MOSTLY_F16 | 1 | Mostly f16 |
-| LLAMA_FTYPE_MOSTLY_Q8_0 | 7 | Mostly 8 bit |
-| LLAMA_FTYPE_MOSTLY_Q4_0 | 2 | Mostly 4 bit |
-| LLAMA_FTYPE_MOSTLY_Q4_1 | 3 | Mostly 4 bit |
-| LLAMA_FTYPE_MOSTLY_Q4_1_SOME_F16 | 4 | Mostly 4 bit, tok_embeddings.weight and output.weight are f16 |
-| LLAMA_FTYPE_MOSTLY_Q5_0 | 8 | Mostly 5 bit |
-| LLAMA_FTYPE_MOSTLY_Q5_1 | 9 | Mostly 5 bit |
-| LLAMA_FTYPE_MOSTLY_Q2_K | 10 | K-Quant 2 bit |
-| LLAMA_FTYPE_MOSTLY_Q3_K_S | 11 | K-Quant 3 bit (Small) |
-| LLAMA_FTYPE_MOSTLY_Q3_K_M | 12 | K-Quant 3 bit (Medium) |
-| LLAMA_FTYPE_MOSTLY_Q3_K_L | 13 | K-Quant 3 bit (Large) |
-| LLAMA_FTYPE_MOSTLY_Q4_K_S | 14 | K-Quant 4 bit (Small) |
-| LLAMA_FTYPE_MOSTLY_Q4_K_M | 15 | K-Quant 4 bit (Medium) |
-| LLAMA_FTYPE_MOSTLY_Q5_K_S | 16 | K-Quant 5 bit (Small) |
-| LLAMA_FTYPE_MOSTLY_Q5_K_M | 17 | K-Quant 5 bit (Medium) |
-| LLAMA_FTYPE_MOSTLY_Q6_K | 18 | K-Quant 6 bit |
-| LLAMA_FTYPE_MOSTLY_IQ2_XXS | 19 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ2_XS | 20 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_Q2_K_S | 21 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ3_K_XS | 22 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ3_XXS | 23 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ1_S | 24 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ4_NL | 25 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ3_S | 26 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ3_M | 27 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ2_S | 28 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ2_M | 29 | except 1d tensors |
-| LLAMA_FTYPE_MOSTLY_IQ4_XS | 30 | except 1d tensors |
-| LLAMA_FTYPE_GUESSED | 1024 | File type was not specified |
+| ALL_F32 | 0 | All f32 |
+| MOSTLY_F16 | 1 | Mostly f16 |
+| MOSTLY_Q8_0 | 7 | Mostly 8 bit |
+| MOSTLY_Q4_0 | 2 | Mostly 4 bit |
+| MOSTLY_Q4_1 | 3 | Mostly 4 bit |
+| MOSTLY_Q5_0 | 8 | Mostly 5 bit |
+| MOSTLY_Q5_1 | 9 | Mostly 5 bit |
+| MOSTLY_Q2_K | 10 | K-Quant 2 bit |
+| MOSTLY_Q3_K_S | 11 | K-Quant 3 bit (Small) |
+| MOSTLY_Q3_K_M | 12 | K-Quant 3 bit (Medium) |
+| MOSTLY_Q3_K_L | 13 | K-Quant 3 bit (Large) |
+| MOSTLY_Q4_K_S | 14 | K-Quant 4 bit (Small) |
+| MOSTLY_Q4_K_M | 15 | K-Quant 4 bit (Medium) |
+| MOSTLY_Q5_K_S | 16 | K-Quant 5 bit (Small) |
+| MOSTLY_Q5_K_M | 17 | K-Quant 5 bit (Medium) |
+| MOSTLY_Q6_K | 18 | K-Quant 6 bit |
+| MOSTLY_IQ2_XXS | 19 | except 1d tensors |
+| MOSTLY_IQ2_XS | 20 | except 1d tensors |
+| MOSTLY_Q2_K_S | 21 | except 1d tensors |
+| MOSTLY_IQ3_K_XS | 22 | except 1d tensors |
+| MOSTLY_IQ3_XXS | 23 | except 1d tensors |
+| MOSTLY_IQ1_S | 24 | except 1d tensors |
+| MOSTLY_IQ4_NL | 25 | except 1d tensors |
+| MOSTLY_IQ3_S | 26 | except 1d tensors |
+| MOSTLY_IQ3_M | 27 | except 1d tensors |
+| MOSTLY_IQ2_S | 28 | except 1d tensors |
+| MOSTLY_IQ2_M | 29 | except 1d tensors |
+| MOSTLY_IQ4_XS | 30 | except 1d tensors |
+| MOSTLY_IQ1_M | 31 | except 1d tensors |
+| MOSTLY_BF16 | 32 | except 1d tensors |
+| LLAMA_FTYPE_MOSTLY_TQ1_0 | 36 | except 1d tensors |
+| LLAMA_FTYPE_MOSTLY_TQ2_0 | 37 | except 1d tensors |
+| GUESSED | 1024 | File type was not specified |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamagrammarelement.md b/docs/xmldocs/llama.native.llamagrammarelement.md
deleted file mode 100644
index 60bb882a6..000000000
--- a/docs/xmldocs/llama.native.llamagrammarelement.md
+++ /dev/null
@@ -1,106 +0,0 @@
-# LLamaGrammarElement
-
-Namespace: LLama.Native
-
-An element of a grammar
-
-```csharp
-public struct LLamaGrammarElement
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaGrammarElement](./llama.native.llamagrammarelement.md)<br>
-Implements [IEquatable&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Fields
-
-### **Type**
-
-The type of this element
-
-```csharp
-public LLamaGrammarElementType Type;
-```
-
-### **Value**
-
-Unicode code point or rule ID
-
-```csharp
-public uint Value;
-```
-
-## Constructors
-
-### **LLamaGrammarElement(LLamaGrammarElementType, UInt32)**
-
-Construct a new LLamaGrammarElement
-
-```csharp
-LLamaGrammarElement(LLamaGrammarElementType type, uint value)
-```
-
-#### Parameters
-
-`type` [LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)<br>
-
-`value` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-
-## Methods
-
-### **IsCharElement()**
-
-```csharp
-bool IsCharElement()
-```
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **ToString()**
-
-```csharp
-string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **GetHashCode()**
-
-```csharp
-int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(LLamaGrammarElement)**
-
-```csharp
-bool Equals(LLamaGrammarElement other)
-```
-
-#### Parameters
-
-`other` [LLamaGrammarElement](./llama.native.llamagrammarelement.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
diff --git a/docs/xmldocs/llama.native.llamagrammarelementtype.md b/docs/xmldocs/llama.native.llamagrammarelementtype.md
deleted file mode 100644
index bf69e5a7c..000000000
--- a/docs/xmldocs/llama.native.llamagrammarelementtype.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# LLamaGrammarElementType
-
-Namespace: LLama.Native
-
-grammar element type
-
-```csharp
-public enum LLamaGrammarElementType
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
-
-## Fields
-
-| Name | Value | Description |
-| --- | --: | --- |
-| END | 0 | end of rule definition |
-| ALT | 1 | start of alternate definition for rule |
-| RULE_REF | 2 | non-terminal element: reference to rule |
-| CHAR | 3 | terminal element: character (code point) |
-| CHAR_NOT | 4 | inverse char(s) ([^a], [^a-b] [^abc]) |
-| CHAR_RNG_UPPER | 5 | modifies a preceding CHAR or CHAR_ALT to be an inclusive range ([a-z]) |
-| CHAR_ALT | 6 | modifies a preceding CHAR or CHAR_RNG_UPPER to add an alternate char to match ([ab], [a-zA]) |
diff --git a/docs/xmldocs/llama.native.llamakvcacheview.md b/docs/xmldocs/llama.native.llamakvcacheview.md
deleted file mode 100644
index d77f72164..000000000
--- a/docs/xmldocs/llama.native.llamakvcacheview.md
+++ /dev/null
@@ -1,11 +0,0 @@
-# LLamaKvCacheView
-
-Namespace: LLama.Native
-
-An updateable view of the KV cache (llama_kv_cache_view)
-
-```csharp
-public struct LLamaKvCacheView
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaKvCacheView](./llama.native.llamakvcacheview.md)
diff --git a/docs/xmldocs/llama.native.llamakvcacheviewcell.md b/docs/xmldocs/llama.native.llamakvcacheviewcell.md
deleted file mode 100644
index 599de961f..000000000
--- a/docs/xmldocs/llama.native.llamakvcacheviewcell.md
+++ /dev/null
@@ -1,22 +0,0 @@
-# LLamaKvCacheViewCell
-
-Namespace: LLama.Native
-
-Information associated with an individual cell in the KV cache view (llama_kv_cache_view_cell)
-
-```csharp
-public struct LLamaKvCacheViewCell
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaKvCacheViewCell](./llama.native.llamakvcacheviewcell.md)
-
-## Fields
-
-### **pos**
-
-The position for this cell. Takes KV cache shifts into account.
- May be negative if the cell is not populated.
-
-```csharp
-public LLamaPos pos;
-```
diff --git a/docs/xmldocs/llama.native.llamakvcacheviewsafehandle.md b/docs/xmldocs/llama.native.llamakvcacheviewsafehandle.md
index 55ce11888..cd724922e 100644
--- a/docs/xmldocs/llama.native.llamakvcacheviewsafehandle.md
+++ b/docs/xmldocs/llama.native.llamakvcacheviewsafehandle.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaKvCacheViewSafeHandle
 
 Namespace: LLama.Native
@@ -5,49 +9,119 @@ Namespace: LLama.Native
 A safe handle for a LLamaKvCacheView
 
 ```csharp
-public class LLamaKvCacheViewSafeHandle : SafeLLamaHandleBase, System.IDisposable
+public sealed class LLamaKvCacheViewSafeHandle : SafeLLamaHandleBase, System.IDisposable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [LLamaKvCacheViewSafeHandle](./llama.native.llamakvcacheviewsafehandle.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Fields
+
+### **handle**
+
+```csharp
+protected IntPtr handle;
+```
 
 ## Properties
 
-### **IsInvalid**
+### **CellCount**
+
+Number of KV cache cells. This will be the same as the context size.
 
 ```csharp
-public bool IsInvalid { get; }
+public int CellCount { get; }
 ```
 
 #### Property Value
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **IsClosed**
+### **TokenCount**
+
+Get the total number of tokens in the KV cache.
+ 
+ For example, if there are two populated
+ cells, the first with 1 sequence id in it and the second with 2 sequence
+ ids then you'll have 3 tokens.
 
 ```csharp
-public bool IsClosed { get; }
+public int TokenCount { get; }
 ```
 
 #### Property Value
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **MaxSequenceCount**
+
+Maximum number of sequences visible for a cell. There may be more sequences than this
+ in reality, this is simply the maximum number this view can see.
+
+```csharp
+public int MaxSequenceCount { get; }
+```
 
-## Constructors
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **LLamaKvCacheViewSafeHandle(SafeLLamaContextHandle, LLamaKvCacheView)**
+### **UsedCellCount**
 
-Initialize a LLamaKvCacheViewSafeHandle which will call `llama_kv_cache_view_free` when disposed
+Number of populated cache cells
 
 ```csharp
-public LLamaKvCacheViewSafeHandle(SafeLLamaContextHandle ctx, LLamaKvCacheView view)
+public int UsedCellCount { get; }
 ```
 
-#### Parameters
+#### Property Value
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **MaxContiguous**
+
+Maximum contiguous empty slots in the cache.
+
+```csharp
+public int MaxContiguous { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **MaxContiguousIdx**
+
+Index to the start of the MaxContiguous slot range. Can be negative when cache is full.
+
+```csharp
+public int MaxContiguousIdx { get; }
+```
+
+#### Property Value
 
-`view` [LLamaKvCacheView](./llama.native.llamakvcacheview.md)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **IsInvalid**
+
+```csharp
+public bool IsInvalid { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsClosed**
+
+```csharp
+public bool IsClosed { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
 ## Methods
 
@@ -82,20 +156,60 @@ protected bool ReleaseHandle()
 
 ### **Update()**
 
-Update this view
+Read the current KV cache state into this view.
 
 ```csharp
 public void Update()
 ```
 
-### **GetView()**
+### **GetCell(Int32)**
 
-Get the raw KV cache view
+Get the cell at the given index
 
 ```csharp
-public LLamaKvCacheView& GetView()
+public LLamaPos GetCell(int index)
 ```
 
+#### Parameters
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The index of the cell [0, CellCount)
+
+#### Returns
+
+[LLamaPos](./llama.native.llamapos.md)<br>
+Data about the cell at the given index
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+Thrown if index is out of range (0 &lt;= index &lt; CellCount)
+
+### **GetCellSequences(Int32)**
+
+Get all of the sequences assigned to the cell at the given index. This will contain [LLamaKvCacheViewSafeHandle.MaxSequenceCount](./llama.native.llamakvcacheviewsafehandle.md#maxsequencecount) entries
+ sequences even if the cell actually has more than that many sequences, allocate a new view with a larger maxSequences parameter
+ if necessary. Invalid sequences will be negative values.
+
+```csharp
+public Span<LLamaSeqId> GetCellSequences(int index)
+```
+
+#### Parameters
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The index of the cell [0, CellCount)
+
 #### Returns
 
-[LLamaKvCacheView&](./llama.native.llamakvcacheview&.md)<br>
+[Span&lt;LLamaSeqId&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+A span containing the sequences assigned to this cell
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+Thrown if index is out of range (0 &lt;= index &lt; CellCount)
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamalogitbias.md b/docs/xmldocs/llama.native.llamalogitbias.md
new file mode 100644
index 000000000..103ae37f2
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamalogitbias.md
@@ -0,0 +1,88 @@
+[`< Back`](./)
+
+---
+
+# LLamaLogitBias
+
+Namespace: LLama.Native
+
+A bias to apply directly to a logit
+
+```csharp
+public struct LLamaLogitBias
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaLogitBias](./llama.native.llamalogitbias.md)<br>
+Implements [IEquatable&lt;LLamaLogitBias&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+
+## Fields
+
+### **Token**
+
+The token to apply the bias to
+
+```csharp
+public LLamaToken Token;
+```
+
+### **Bias**
+
+The bias to add
+
+```csharp
+public float Bias;
+```
+
+## Methods
+
+### **ToString()**
+
+```csharp
+string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **GetHashCode()**
+
+```csharp
+int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Equals(Object)**
+
+```csharp
+bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(LLamaLogitBias)**
+
+```csharp
+bool Equals(LLamaLogitBias other)
+```
+
+#### Parameters
+
+`other` [LLamaLogitBias](./llama.native.llamalogitbias.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamaloglevel.md b/docs/xmldocs/llama.native.llamaloglevel.md
index 5c54507d0..508be4647 100644
--- a/docs/xmldocs/llama.native.llamaloglevel.md
+++ b/docs/xmldocs/llama.native.llamaloglevel.md
@@ -1,21 +1,33 @@
+[`< Back`](./)
+
+---
+
 # LLamaLogLevel
 
 Namespace: LLama.Native
 
-Severity level of a log message
+Severity level of a log message. This enum should always be aligned with
+ the one defined on llama.cpp side at
+ https://github.com/ggerganov/llama.cpp/blob/0eb4e12beebabae46d37b78742f4c5d4dbe52dc1/ggml/include/ggml.h#L559
 
 ```csharp
 public enum LLamaLogLevel
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaLogLevel](./llama.native.llamaloglevel.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 ## Fields
 
 | Name | Value | Description |
 | --- | --: | --- |
-| Error | 2 | Logs that highlight when the current flow of execution is stopped due to a failure. |
+| None | 0 | Logs are never written. |
+| Debug | 1 | Logs that are used for interactive investigation during development. |
+| Info | 2 | Logs that track the general flow of the application. |
 | Warning | 3 | Logs that highlight an abnormal or unexpected event in the application flow, but do not otherwise cause the application execution to stop. |
-| Info | 4 | Logs that track the general flow of the application. |
-| Debug | 5 | Logs that are used for interactive investigation during development. |
+| Error | 4 | Logs that highlight when the current flow of execution is stopped due to a failure. |
+| Continue | 5 | Continue log level is equivalent to None in the way it is used in llama.cpp. |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamamodelkvoverridetype.md b/docs/xmldocs/llama.native.llamamodelkvoverridetype.md
index 43bf13973..28b970c51 100644
--- a/docs/xmldocs/llama.native.llamamodelkvoverridetype.md
+++ b/docs/xmldocs/llama.native.llamamodelkvoverridetype.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaModelKvOverrideType
 
 Namespace: LLama.Native
@@ -9,7 +13,7 @@ public enum LLamaModelKvOverrideType
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaModelKvOverrideType](./llama.native.llamamodelkvoverridetype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 **Remarks:**
 
@@ -22,3 +26,8 @@ llama_model_kv_override_type
 | Int | 0 | Overriding an int value |
 | Float | 1 | Overriding a float value |
 | Bool | 2 | Overriding a bool value |
+| String | 3 | Overriding a string value |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamamodelmetadataoverride.md b/docs/xmldocs/llama.native.llamamodelmetadataoverride.md
index 4b069e637..eb652d409 100644
--- a/docs/xmldocs/llama.native.llamamodelmetadataoverride.md
+++ b/docs/xmldocs/llama.native.llamamodelmetadataoverride.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaModelMetadataOverride
 
 Namespace: LLama.Native
@@ -51,3 +55,15 @@ Value, **must** only be used if Tag == LLAMA_KV_OVERRIDE_BOOL
 ```csharp
 public long BoolValue;
 ```
+
+### **StringValue**
+
+Value, **must** only be used if Tag == String
+
+```csharp
+public <StringValue>e__FixedBuffer StringValue;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamamodelparams.md b/docs/xmldocs/llama.native.llamamodelparams.md
index ca9a2982b..23465a81e 100644
--- a/docs/xmldocs/llama.native.llamamodelparams.md
+++ b/docs/xmldocs/llama.native.llamamodelparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaModelParams
 
 Namespace: LLama.Native
@@ -12,6 +16,14 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
 
 ## Fields
 
+### **tensor_buft_overrides**
+
+NULL-terminated list of buffer types to use for tensors that match a pattern
+
+```csharp
+public LLamaModelTensorBufferOverride* tensor_buft_overrides;
+```
+
 ### **n_gpu_layers**
 
 // number of layers to store in VRAM
@@ -30,7 +42,7 @@ public GPUSplitMode split_mode;
 
 ### **main_gpu**
 
-the GPU that is used for scratch and small tensors
+the GPU that is used for the entire model when split_mode is LLAMA_SPLIT_MODE_NONE
 
 ```csharp
 public int main_gpu;
@@ -106,3 +118,33 @@ public bool use_mlock { get; set; }
 #### Property Value
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **check_tensors**
+
+validate model tensor data
+
+```csharp
+public bool check_tensors { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Default()**
+
+Create a LLamaModelParams with default values
+
+```csharp
+LLamaModelParams Default()
+```
+
+#### Returns
+
+[LLamaModelParams](./llama.native.llamamodelparams.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamamodelquantizeparams.md b/docs/xmldocs/llama.native.llamamodelquantizeparams.md
index 36d7e356c..439d0b1ee 100644
--- a/docs/xmldocs/llama.native.llamamodelquantizeparams.md
+++ b/docs/xmldocs/llama.native.llamamodelquantizeparams.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaModelQuantizeParams
 
 Namespace: LLama.Native
@@ -32,6 +36,22 @@ quantize to this llama_ftype
 public LLamaFtype ftype;
 ```
 
+### **output_tensor_type**
+
+output tensor type
+
+```csharp
+public GGMLType output_tensor_type;
+```
+
+### **token_embedding_type**
+
+token embeddings tensor type
+
+```csharp
+public GGMLType token_embedding_type;
+```
+
 ### **imatrix**
 
 pointer to importance matrix data
@@ -40,6 +60,22 @@ pointer to importance matrix data
 public IntPtr imatrix;
 ```
 
+### **kv_overrides**
+
+pointer to vector containing overrides
+
+```csharp
+public IntPtr kv_overrides;
+```
+
+### **tensor_types**
+
+pointer to vector containing tensor types
+
+```csharp
+public IntPtr tensor_types;
+```
+
 ## Properties
 
 ### **allow_requantize**
@@ -80,7 +116,7 @@ public bool only_copy { get; set; }
 
 ### **pure**
 
-disable k-quant mixtures and quantize all tensors to the same type
+quantize all tensors to the default type
 
 ```csharp
 public bool pure { get; set; }
@@ -89,3 +125,33 @@ public bool pure { get; set; }
 #### Property Value
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **keep_split**
+
+quantize to the same number of shards
+
+```csharp
+public bool keep_split { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Default()**
+
+Create a LLamaModelQuantizeParams with default values
+
+```csharp
+LLamaModelQuantizeParams Default()
+```
+
+#### Returns
+
+[LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamamodeltensorbufferoverride.md b/docs/xmldocs/llama.native.llamamodeltensorbufferoverride.md
new file mode 100644
index 000000000..2b1aa0bc8
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamamodeltensorbufferoverride.md
@@ -0,0 +1,38 @@
+[`< Back`](./)
+
+---
+
+# LLamaModelTensorBufferOverride
+
+Namespace: LLama.Native
+
+Represents a mapping between a tensor name pattern and a backend buffer type<br>
+ Original type: llama_model_tensor_buft_override
+
+```csharp
+public struct LLamaModelTensorBufferOverride
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaModelTensorBufferOverride](./llama.native.llamamodeltensorbufferoverride.md)
+
+## Fields
+
+### **Pattern**
+
+Tensor name pattern to match
+
+```csharp
+public Byte* Pattern;
+```
+
+### **BufferType**
+
+Backend buffer type to use for matching tensors, as obtained via ggml_backend_dev_buffer_type
+
+```csharp
+public IntPtr BufferType;
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamanativebatch.md b/docs/xmldocs/llama.native.llamanativebatch.md
index 56a8ab9ce..40187d669 100644
--- a/docs/xmldocs/llama.native.llamanativebatch.md
+++ b/docs/xmldocs/llama.native.llamanativebatch.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaNativeBatch
 
 Namespace: LLama.Native
@@ -41,6 +45,7 @@ public Single* embd;
 ### **pos**
 
 the positions of the respective token in the sequence
+ (if set to NULL, the token position will be tracked automatically by llama_decode)
 
 ```csharp
 public LLamaPos* pos;
@@ -57,6 +62,7 @@ public Int32* n_seq_id;
 ### **seq_id**
 
 the sequence to which the respective token belongs
+ (if set to NULL, the sequence ID will be assumed to be 0)
 
 ```csharp
 public LLamaSeqId** seq_id;
@@ -65,7 +71,12 @@ public LLamaSeqId** seq_id;
 ### **logits**
 
 if zero, the logits for the respective token will not be output
+ (if set to NULL, only the logits for last token will be returned)
 
 ```csharp
 public Byte* logits;
 ```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamaperfcontexttimings.md b/docs/xmldocs/llama.native.llamaperfcontexttimings.md
new file mode 100644
index 000000000..b964644fe
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamaperfcontexttimings.md
@@ -0,0 +1,97 @@
+[`< Back`](./)
+
+---
+
+# LLamaPerfContextTimings
+
+Namespace: LLama.Native
+
+LLama performance information
+
+```csharp
+public struct LLamaPerfContextTimings
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaPerfContextTimings](./llama.native.llamaperfcontexttimings.md)
+
+**Remarks:**
+
+llama_perf_context_data
+
+## Properties
+
+### **ResetTimestamp**
+
+Timestamp when reset was last called
+
+```csharp
+public TimeSpan ResetTimestamp { get; }
+```
+
+#### Property Value
+
+[TimeSpan](https://docs.microsoft.com/en-us/dotnet/api/system.timespan)<br>
+
+### **Loading**
+
+Time spent loading
+
+```csharp
+public TimeSpan Loading { get; }
+```
+
+#### Property Value
+
+[TimeSpan](https://docs.microsoft.com/en-us/dotnet/api/system.timespan)<br>
+
+### **PromptEval**
+
+total milliseconds spent prompt processing
+
+```csharp
+public TimeSpan PromptEval { get; }
+```
+
+#### Property Value
+
+[TimeSpan](https://docs.microsoft.com/en-us/dotnet/api/system.timespan)<br>
+
+### **Eval**
+
+Total milliseconds in eval/decode calls
+
+```csharp
+public TimeSpan Eval { get; }
+```
+
+#### Property Value
+
+[TimeSpan](https://docs.microsoft.com/en-us/dotnet/api/system.timespan)<br>
+
+### **PrompTokensEvaluated**
+
+number of tokens in eval calls for the prompt (with batch size &gt; 1)
+
+```csharp
+public int PrompTokensEvaluated { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **TokensEvaluated**
+
+number of eval calls
+
+```csharp
+public int TokensEvaluated { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamapoolingtype.md b/docs/xmldocs/llama.native.llamapoolingtype.md
index 6e26cd24c..b5d5ed20c 100644
--- a/docs/xmldocs/llama.native.llamapoolingtype.md
+++ b/docs/xmldocs/llama.native.llamapoolingtype.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaPoolingType
 
 Namespace: LLama.Native
@@ -9,7 +13,7 @@ public enum LLamaPoolingType
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaPoolingType](./llama.native.llamapoolingtype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 **Remarks:**
 
@@ -19,3 +23,13 @@ llama_pooling_type
 
 | Name | Value | Description |
 | --- | --: | --- |
+| Unspecified | -1 | No specific pooling type. Use the model default if this is specific in [IContextParams.PoolingType](./llama.abstractions.icontextparams.md#poolingtype) |
+| None | 0 | Do not pool embeddings (per-token embeddings) |
+| Mean | 1 | Take the mean of every token embedding |
+| CLS | 2 | Return the embedding for the special "CLS" token |
+| Last | 3 | Return the embeddings of the last token |
+| Rank | 4 | Used by reranking models to attach the classification head to the graph |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamapos.md b/docs/xmldocs/llama.native.llamapos.md
index 2aaccbbac..f5f1d4fa4 100644
--- a/docs/xmldocs/llama.native.llamapos.md
+++ b/docs/xmldocs/llama.native.llamapos.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaPos
 
 Namespace: LLama.Native
@@ -70,3 +74,7 @@ bool Equals(LLamaPos other)
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamaropetype.md b/docs/xmldocs/llama.native.llamaropetype.md
index d2b528a7c..53ab30d1f 100644
--- a/docs/xmldocs/llama.native.llamaropetype.md
+++ b/docs/xmldocs/llama.native.llamaropetype.md
@@ -1,15 +1,31 @@
+[`< Back`](./)
+
+---
+
 # LLamaRopeType
 
 Namespace: LLama.Native
 
+
+
 ```csharp
 public enum LLamaRopeType
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaRopeType](./llama.native.llamaropetype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+
+**Remarks:**
+
+llama_rope_type
 
 ## Fields
 
 | Name | Value | Description |
 | --- | --: | --- |
+| None | -1 |  |
+| Norm | 0 |  |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamasamplerchainparams.md b/docs/xmldocs/llama.native.llamasamplerchainparams.md
new file mode 100644
index 000000000..08eef6098
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamasamplerchainparams.md
@@ -0,0 +1,51 @@
+[`< Back`](./)
+
+---
+
+# LLamaSamplerChainParams
+
+Namespace: LLama.Native
+
+
+
+```csharp
+public struct LLamaSamplerChainParams
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaSamplerChainParams](./llama.native.llamasamplerchainparams.md)
+
+**Remarks:**
+
+llama_sampler_chain_params
+
+## Properties
+
+### **NoPerf**
+
+whether to measure performance timings
+
+```csharp
+public bool NoPerf { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Default()**
+
+Get the default LLamaSamplerChainParams
+
+```csharp
+LLamaSamplerChainParams Default()
+```
+
+#### Returns
+
+[LLamaSamplerChainParams](./llama.native.llamasamplerchainparams.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamasamplingtimings.md b/docs/xmldocs/llama.native.llamasamplingtimings.md
new file mode 100644
index 000000000..4f9f1c76e
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamasamplingtimings.md
@@ -0,0 +1,23 @@
+[`< Back`](./)
+
+---
+
+# LLamaSamplingTimings
+
+Namespace: LLama.Native
+
+LLama performance information
+
+```csharp
+public struct LLamaSamplingTimings
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaSamplingTimings](./llama.native.llamasamplingtimings.md)
+
+**Remarks:**
+
+llama_perf_sampler_data
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamaseqid.md b/docs/xmldocs/llama.native.llamaseqid.md
index eb34f4617..9fe6f104f 100644
--- a/docs/xmldocs/llama.native.llamaseqid.md
+++ b/docs/xmldocs/llama.native.llamaseqid.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaSeqId
 
 Namespace: LLama.Native
@@ -78,3 +82,7 @@ bool Equals(LLamaSeqId other)
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamatoken.md b/docs/xmldocs/llama.native.llamatoken.md
index 8282f92c3..06297a2d9 100644
--- a/docs/xmldocs/llama.native.llamatoken.md
+++ b/docs/xmldocs/llama.native.llamatoken.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaToken
 
 Namespace: LLama.Native
@@ -9,10 +13,133 @@ public struct LLamaToken
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaToken](./llama.native.llamatoken.md)<br>
-Implements [IEquatable&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+Implements [IEquatable&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute), [IsReadOnlyAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.isreadonlyattribute), [DebuggerDisplayAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.debuggerdisplayattribute)
+
+## Fields
+
+### **InvalidToken**
+
+Token Value used when token is inherently null
+
+```csharp
+public static LLamaToken InvalidToken;
+```
 
 ## Methods
 
+### **GetAttributes(SafeLlamaModelHandle)**
+
+Get attributes for this token
+
+```csharp
+LLamaTokenAttr GetAttributes(SafeLlamaModelHandle model)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+#### Returns
+
+[LLamaTokenAttr](./llama.native.llamatokenattr.md)<br>
+
+### **GetAttributes(Vocabulary)**
+
+Get attributes for this token
+
+```csharp
+LLamaTokenAttr GetAttributes(Vocabulary vocab)
+```
+
+#### Parameters
+
+`vocab` [Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
+
+#### Returns
+
+[LLamaTokenAttr](./llama.native.llamatokenattr.md)<br>
+
+### **GetScore(Vocabulary)**
+
+Get score for this token
+
+```csharp
+float GetScore(Vocabulary vocab)
+```
+
+#### Parameters
+
+`vocab` [Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
+
+#### Returns
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **IsControl(SafeLlamaModelHandle)**
+
+Check if this is a control token
+
+```csharp
+bool IsControl(SafeLlamaModelHandle model)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsControl(Vocabulary)**
+
+Check if this is a control token
+
+```csharp
+bool IsControl(Vocabulary vocab)
+```
+
+#### Parameters
+
+`vocab` [Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsEndOfGeneration(SafeLlamaModelHandle)**
+
+Check if this token should end generation
+
+```csharp
+bool IsEndOfGeneration(SafeLlamaModelHandle model)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsEndOfGeneration(Vocabulary)**
+
+Check if this token should end generation
+
+```csharp
+bool IsEndOfGeneration(Vocabulary vocab)
+```
+
+#### Parameters
+
+`vocab` [Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **ToString()**
 
 ```csharp
@@ -60,3 +187,7 @@ bool Equals(LLamaToken other)
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamatokenattr.md b/docs/xmldocs/llama.native.llamatokenattr.md
new file mode 100644
index 000000000..fc73029ee
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamatokenattr.md
@@ -0,0 +1,41 @@
+[`< Back`](./)
+
+---
+
+# LLamaTokenAttr
+
+Namespace: LLama.Native
+
+Token attributes
+
+```csharp
+public enum LLamaTokenAttr
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaTokenAttr](./llama.native.llamatokenattr.md)<br>
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)<br>
+Attributes [FlagsAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.flagsattribute)
+
+**Remarks:**
+
+C# equivalent of llama_token_attr
+
+## Fields
+
+| Name | Value | Description |
+| --- | --: | --- |
+| Undefined | 0 |  |
+| Unknown | 1 |  |
+| Unused | 2 |  |
+| Normal | 4 |  |
+| Control | 8 |  |
+| UserDefined | 16 |  |
+| Byte | 32 |  |
+| Normalized | 64 |  |
+| LStrip | 128 |  |
+| RStrip | 256 |  |
+| SingleWord | 512 |  |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamatokendata.md b/docs/xmldocs/llama.native.llamatokendata.md
index 29151a256..b6ae94f09 100644
--- a/docs/xmldocs/llama.native.llamatokendata.md
+++ b/docs/xmldocs/llama.native.llamatokendata.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaTokenData
 
 Namespace: LLama.Native
@@ -12,28 +16,28 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
 
 ## Fields
 
-### **id**
+### **ID**
 
 token id
 
 ```csharp
-public LLamaToken id;
+public LLamaToken ID;
 ```
 
-### **logit**
+### **Logit**
 
 log-odds of the token
 
 ```csharp
-public float logit;
+public float Logit;
 ```
 
-### **p**
+### **Probability**
 
 probability of the token
 
 ```csharp
-public float p;
+public float Probability;
 ```
 
 ## Constructors
@@ -43,7 +47,7 @@ public float p;
 Create a new LLamaTokenData
 
 ```csharp
-LLamaTokenData(LLamaToken id, float logit, float p)
+LLamaTokenData(LLamaToken id, float logit, float probability)
 ```
 
 #### Parameters
@@ -52,4 +56,8 @@ LLamaTokenData(LLamaToken id, float logit, float p)
 
 `logit` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+`probability` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamatokendataarray.md b/docs/xmldocs/llama.native.llamatokendataarray.md
index 7ebcd2065..8a3277c24 100644
--- a/docs/xmldocs/llama.native.llamatokendataarray.md
+++ b/docs/xmldocs/llama.native.llamatokendataarray.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaTokenDataArray
 
 Namespace: LLama.Native
@@ -12,20 +16,20 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
 
 ## Fields
 
-### **data**
+### **Data**
 
 The LLamaTokenData
 
 ```csharp
-public Memory<LLamaTokenData> data;
+public Memory<LLamaTokenData> Data;
 ```
 
-### **sorted**
+### **Sorted**
 
 Indicates if `data` is sorted by logits in descending order. If this is false the token data is in _no particular order_.
 
 ```csharp
-public bool sorted;
+public bool Sorted;
 ```
 
 ## Constructors
@@ -62,264 +66,50 @@ LLamaTokenDataArray Create(ReadOnlySpan<float> logits)
 
 [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
 
-### **OverwriteLogits(ReadOnlySpan&lt;ValueTuple&lt;LLamaToken, Single&gt;&gt;)**
-
-Overwrite the logit values for all given tokens
-
-```csharp
-void OverwriteLogits(ReadOnlySpan<ValueTuple<LLamaToken, float>> values)
-```
-
-#### Parameters
-
-`values` [ReadOnlySpan&lt;ValueTuple&lt;LLamaToken, Single&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-tuples of token and logit value to overwrite
-
-### **ApplyGrammar(SafeLLamaContextHandle, SafeLLamaGrammarHandle)**
-
-Apply grammar rules to candidate tokens
-
-```csharp
-void ApplyGrammar(SafeLLamaContextHandle ctx, SafeLLamaGrammarHandle grammar)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
-### **TopK(SafeLLamaContextHandle, Int32, UInt64)**
-
-Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
-
-```csharp
-void TopK(SafeLLamaContextHandle context, int k, ulong minKeep)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Number of tokens to keep
-
-`minKeep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-Minimum number to keep
-
-### **TopP(SafeLLamaContextHandle, Single, UInt64)**
-
-Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
-
-```csharp
-void TopP(SafeLLamaContextHandle context, float p, ulong minKeep)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`minKeep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-### **MinP(SafeLLamaContextHandle, Single, UInt64)**
-
-Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
-
-```csharp
-void MinP(SafeLLamaContextHandle context, float p, ulong minKeep)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-All tokens with probability greater than this will be kept
-
-`minKeep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-### **TailFree(SafeLLamaContextHandle, Single, UInt64)**
-
-Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
-
-```csharp
-void TailFree(SafeLLamaContextHandle context, float z, ulong min_keep)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-### **LocallyTypical(SafeLLamaContextHandle, Single, UInt64)**
-
-Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
-
-```csharp
-void LocallyTypical(SafeLLamaContextHandle context, float p, ulong min_keep)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+### **Create(ReadOnlySpan&lt;Single&gt;, Memory&lt;LLamaTokenData&gt;)**
 
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-### **RepetitionPenalty(SafeLLamaContextHandle, ReadOnlySpan&lt;LLamaToken&gt;, Single, Single, Single)**
-
-Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
- Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
-
-```csharp
-void RepetitionPenalty(SafeLLamaContextHandle context, ReadOnlySpan<LLamaToken> last_tokens, float penalty_repeat, float penalty_freq, float penalty_present)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`last_tokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-
-`penalty_repeat` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`penalty_freq` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`penalty_present` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Guidance(SafeLLamaContextHandle, ReadOnlySpan&lt;Single&gt;, Single)**
-
-Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
-
-```csharp
-void Guidance(SafeLLamaContextHandle context, ReadOnlySpan<float> guidanceLogits, float guidance)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`guidanceLogits` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-Logits extracted from a separate context from the same model.
- Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
-
-`guidance` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-Guidance strength. 0 means no guidance, higher values applies stronger guidance
-
-### **Temperature(SafeLLamaContextHandle, Single)**
-
-Sample with temperature.
- As temperature increases, the prediction becomes more diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual
-
-```csharp
-void Temperature(SafeLLamaContextHandle context, float temp)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Softmax(SafeLLamaContextHandle)**
-
-Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
+Create a new LLamaTokenDataArray, copying the data from the given logits into temporary memory.
 
 ```csharp
-void Softmax(SafeLLamaContextHandle context)
+LLamaTokenDataArray Create(ReadOnlySpan<float> logits, Memory<LLamaTokenData> buffer)
 ```
 
 #### Parameters
 
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-### **SampleToken(SafeLLamaContextHandle)**
-
-Randomly selects a token from the candidates based on their probabilities.
-
-```csharp
-LLamaToken SampleToken(SafeLLamaContextHandle context)
-```
-
-#### Parameters
+`logits` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
 
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`buffer` [Memory&lt;LLamaTokenData&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+Temporary memory which will be used to work on these logits. Must be at least as large as logits array
 
 #### Returns
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
-
-### **SampleTokenGreedy(SafeLLamaContextHandle)**
-
-Selects the token with the highest probability.
-
-```csharp
-LLamaToken SampleTokenGreedy(SafeLLamaContextHandle context)
-```
-
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+[LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
 
-#### Returns
+**Remarks:**
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
+The memory must not be modified while this [LLamaTokenDataArray](./llama.native.llamatokendataarray.md) is in use.
 
-### **SampleTokenMirostat(SafeLLamaContextHandle, Single, Single, Int32, Single&)**
+### **OverwriteLogits(ReadOnlySpan&lt;ValueTuple&lt;LLamaToken, Single&gt;&gt;)**
 
-Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+Overwrite the logit values for all given tokens
 
 ```csharp
-LLamaToken SampleTokenMirostat(SafeLLamaContextHandle context, float tau, float eta, int m, Single& mu)
+void OverwriteLogits(ReadOnlySpan<ValueTuple<LLamaToken, float>> values)
 ```
 
 #### Parameters
 
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
-
-`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
-
-`m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
-
-`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
-Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
-
-#### Returns
-
-[LLamaToken](./llama.native.llamatoken.md)<br>
+`values` [ReadOnlySpan&lt;ValueTuple&lt;LLamaToken, Single&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+tuples of token and logit value to overwrite
 
-### **SampleTokenMirostat2(SafeLLamaContextHandle, Single, Single, Single&)**
+### **Softmax()**
 
-Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
 
 ```csharp
-LLamaToken SampleTokenMirostat2(SafeLLamaContextHandle context, float tau, float eta, Single& mu)
+void Softmax()
 ```
 
-#### Parameters
-
-`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
-
-`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
-
-`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
-Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
-
-#### Returns
+---
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamatokendataarraynative.md b/docs/xmldocs/llama.native.llamatokendataarraynative.md
index 8a557cf2e..639f5ed85 100644
--- a/docs/xmldocs/llama.native.llamatokendataarraynative.md
+++ b/docs/xmldocs/llama.native.llamatokendataarraynative.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaTokenDataArrayNative
 
 Namespace: LLama.Native
@@ -10,41 +14,59 @@ public struct LLamaTokenDataArrayNative
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaTokenDataArrayNative](./llama.native.llamatokendataarraynative.md)
 
-## Fields
+**Remarks:**
+
+C# equivalent of llama_token_data_array
+
+## Properties
 
-### **data**
+### **Data**
 
 A pointer to an array of LlamaTokenData
 
 ```csharp
-public IntPtr data;
+public Span<LLamaTokenData> Data { get; }
 ```
 
-**Remarks:**
+#### Property Value
 
-Memory must be pinned in place for all the time this LLamaTokenDataArrayNative is in use
+[Span&lt;LLamaTokenData&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
 
-### **size**
+### **Sorted**
 
-Number of LLamaTokenData in the array
+Indicates if the items in the array are sorted, so the most likely token is first
 
 ```csharp
-public ulong size;
+public bool Sorted { get; set; }
 ```
 
-## Properties
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **sorted**
+### **Selected**
 
-Indicates if the items in the array are sorted
+The index of the selected token (i.e. not the token id)
 
 ```csharp
-public bool sorted { get; set; }
+public long Selected { get; set; }
 ```
 
 #### Property Value
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[Int64](https://docs.microsoft.com/en-us/dotnet/api/system.int64)<br>
+
+### **Size**
+
+Number of LLamaTokenData in the array. Set this to shrink the array
+
+```csharp
+public ulong Size { get; set; }
+```
+
+#### Property Value
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
 
 ## Methods
 
@@ -68,3 +90,7 @@ Created native array
 
 [MemoryHandle](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memoryhandle)<br>
 A memory handle, pinning the data in place until disposed
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llamatokentype.md b/docs/xmldocs/llama.native.llamatokentype.md
deleted file mode 100644
index 6edad50f1..000000000
--- a/docs/xmldocs/llama.native.llamatokentype.md
+++ /dev/null
@@ -1,25 +0,0 @@
-# LLamaTokenType
-
-Namespace: LLama.Native
-
-Token Types
-
-```csharp
-public enum LLamaTokenType
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaTokenType](./llama.native.llamatokentype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
-
-**Remarks:**
-
-C# equivalent of llama_token_get_type
-
-## Fields
-
-| Name | Value | Description |
-| --- | --: | --- |
-| LLAMA_TOKEN_TYPE_UNDEFINED | 0 | No specific type has been set for this token |
-| LLAMA_TOKEN_TYPE_NORMAL | 1 | This is a "normal" token |
-| LLAMA_TOKEN_TYPE_UNKNOWN | 2 | An "unknown" character/text token e.g. &lt;unk&gt; |
-| LLAMA_TOKEN_TYPE_CONTROL | 3 | A special control token e.g. &lt;/s&gt; |
diff --git a/docs/xmldocs/llama.native.llamavocabtype.md b/docs/xmldocs/llama.native.llamavocabtype.md
index d24b3d536..5cadd2b63 100644
--- a/docs/xmldocs/llama.native.llamavocabtype.md
+++ b/docs/xmldocs/llama.native.llamavocabtype.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLamaVocabType
 
 Namespace: LLama.Native
@@ -9,7 +13,7 @@ public enum LLamaVocabType
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaVocabType](./llama.native.llamavocabtype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 **Remarks:**
 
@@ -19,3 +23,13 @@ llama_vocab_type
 
 | Name | Value | Description |
 | --- | --: | --- |
+| None | 0 | For models without vocab |
+| SentencePiece | 1 | LLaMA tokenizer based on byte-level BPE with byte fallback |
+| BytePairEncoding | 2 | GPT-2 tokenizer based on byte-level BPE |
+| WordPiece | 3 | BERT tokenizer based on WordPiece |
+| Unigram | 4 | T5 tokenizer based on Unigram |
+| RWKV | 5 | RWKV tokenizer based on greedy tokenization |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.llavaimageembed.md b/docs/xmldocs/llama.native.llavaimageembed.md
index be6346cb9..9afc23413 100644
--- a/docs/xmldocs/llama.native.llavaimageembed.md
+++ b/docs/xmldocs/llama.native.llavaimageembed.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # LLavaImageEmbed
 
 Namespace: LLama.Native
@@ -10,16 +14,28 @@ public struct LLavaImageEmbed
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLavaImageEmbed](./llama.native.llavaimageembed.md)
 
+**Remarks:**
+
+llava_image_embed
+
 ## Fields
 
 ### **embed**
 
+The embeddings of the embedded image.
+
 ```csharp
 public Single* embed;
 ```
 
 ### **n_image_pos**
 
+The position of the image's tokens.
+
 ```csharp
 public int n_image_pos;
 ```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.loraadapter.md b/docs/xmldocs/llama.native.loraadapter.md
new file mode 100644
index 000000000..5dfa34a0c
--- /dev/null
+++ b/docs/xmldocs/llama.native.loraadapter.md
@@ -0,0 +1,56 @@
+[`< Back`](./)
+
+---
+
+# LoraAdapter
+
+Namespace: LLama.Native
+
+A LoRA adapter which can be applied to a context for a specific model
+
+```csharp
+public class LoraAdapter
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LoraAdapter](./llama.native.loraadapter.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **Model**
+
+The model which this LoRA adapter was loaded with.
+
+```csharp
+public SafeLlamaModelHandle Model { get; }
+```
+
+#### Property Value
+
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+### **Path**
+
+The full path of the file this adapter was loaded from
+
+```csharp
+public string Path { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Methods
+
+### **Unload()**
+
+Unload this adapter
+
+```csharp
+public void Unload()
+```
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativeapi.md b/docs/xmldocs/llama.native.nativeapi.md
index 7df6349cb..7dab89db7 100644
--- a/docs/xmldocs/llama.native.nativeapi.md
+++ b/docs/xmldocs/llama.native.nativeapi.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # NativeApi
 
 Namespace: LLama.Native
@@ -8,1619 +12,749 @@ Direct translation of the llama.cpp API
 public static class NativeApi
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeApi](./llama.native.nativeapi.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeApi](./llama.native.nativeapi.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Methods
 
-### **llama_sample_token_mirostat(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Int32, Single&)**
+### **llama_empty_call()**
 
-Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+A method that does nothing. This is a native method, calling it will force the llama native dependencies to be loaded.
 
 ```csharp
-public static LLamaToken llama_sample_token_mirostat(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float tau, float eta, int m, Single& mu)
+public static void llama_empty_call()
 ```
 
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+### **llama_backend_free()**
 
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+Call once at the end of the program - currently only used for MPI
 
-`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+```csharp
+public static void llama_backend_free()
+```
 
-`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+### **llama_max_devices()**
 
-`m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
+Get the maximum number of devices supported by llama.cpp
 
-`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
-Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+```csharp
+public static long llama_max_devices()
+```
 
 #### Returns
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
+[Int64](https://docs.microsoft.com/en-us/dotnet/api/system.int64)<br>
 
-### **llama_sample_token_mirostat_v2(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single&)**
+### **llama_supports_mmap()**
 
-Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+Check if memory mapping is supported
 
 ```csharp
-public static LLamaToken llama_sample_token_mirostat_v2(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float tau, float eta, Single& mu)
+public static bool llama_supports_mmap()
 ```
 
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+#### Returns
 
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+### **llama_supports_mlock()**
 
-`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+Check if memory locking is supported
 
-`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
-Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+```csharp
+public static bool llama_supports_mlock()
+```
 
 #### Returns
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **llama_sample_token_greedy(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
+### **llama_supports_gpu_offload()**
 
-Selects the token with the highest probability.
+Check if GPU offload is supported
 
 ```csharp
-public static LLamaToken llama_sample_token_greedy(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
+public static bool llama_supports_gpu_offload()
 ```
 
-#### Parameters
+#### Returns
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **llama_supports_rpc()**
 
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
+Check if RPC offload is supported
+
+```csharp
+public static bool llama_supports_rpc()
+```
 
 #### Returns
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **llama_sample_token(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
+### **llama_state_load_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)**
 
-Randomly selects a token from the candidates based on their probabilities.
+Load session file
 
 ```csharp
-public static LLamaToken llama_sample_token(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
+public static bool llama_state_load_file(SafeLLamaContextHandle ctx, string path_session, LLamaToken[] tokens_out, ulong n_token_capacity, UInt64& n_token_count_out)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
+`path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`tokens_out` [LLamaToken[]](./llama.native.llamatoken.md)<br>
+
+`n_token_capacity` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+`n_token_count_out` [UInt64&](https://docs.microsoft.com/en-us/dotnet/api/system.uint64&)<br>
 
 #### Returns
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **llama_state_save_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64)**
 
-### **&lt;llama_get_embeddings&gt;g__llama_get_embeddings_native|30_0(SafeLLamaContextHandle)**
+Save session file
 
 ```csharp
-internal static Single* <llama_get_embeddings>g__llama_get_embeddings_native|30_0(SafeLLamaContextHandle ctx)
+public static bool llama_state_save_file(SafeLLamaContextHandle ctx, string path_session, LLamaToken[] tokens, ulong n_token_count)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
+`path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`tokens` [LLamaToken[]](./llama.native.llamatoken.md)<br>
+
+`n_token_count` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
 #### Returns
 
-[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **llama_state_seq_save_file(SafeLLamaContextHandle, String, LLamaSeqId, LLamaToken*, UIntPtr)**
 
-### **&lt;llama_token_to_piece&gt;g__llama_token_to_piece_native|44_0(SafeLlamaModelHandle, LLamaToken, Byte*, Int32)**
+Saves the specified sequence as a file on specified filepath. Can later be loaded via [NativeApi.llama_state_load_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)](./llama.native.nativeapi.md#llama_state_load_filesafellamacontexthandle-string-llamatoken-uint64-uint64&)
 
 ```csharp
-internal static int <llama_token_to_piece>g__llama_token_to_piece_native|44_0(SafeLlamaModelHandle model, LLamaToken llamaToken, Byte* buffer, int length)
+public static UIntPtr llama_state_seq_save_file(SafeLLamaContextHandle ctx, string filepath, LLamaSeqId seq_id, LLamaToken* tokens, UIntPtr n_token_count)
 ```
 
 #### Parameters
 
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`llamaToken` [LLamaToken](./llama.native.llamatoken.md)<br>
+`filepath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`buffer` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+`seq_id` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
 
-`length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`tokens` [LLamaToken*](./llama.native.llamatoken*.md)<br>
+
+`n_token_count` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
 #### Returns
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
-### **&lt;TryLoadLibraries&gt;g__TryLoad|84_0(String)**
+### **llama_state_seq_load_file(SafeLLamaContextHandle, String, LLamaSeqId, LLamaToken*, UIntPtr, UIntPtr&)**
+
+Loads a sequence saved as a file via [NativeApi.llama_state_save_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64)](./llama.native.nativeapi.md#llama_state_save_filesafellamacontexthandle-string-llamatoken-uint64) into the specified sequence
 
 ```csharp
-internal static IntPtr <TryLoadLibraries>g__TryLoad|84_0(string path)
+public static UIntPtr llama_state_seq_load_file(SafeLLamaContextHandle ctx, string filepath, LLamaSeqId dest_seq_id, LLamaToken* tokens_out, UIntPtr n_token_capacity, UIntPtr& n_token_count_out)
 ```
 
 #### Parameters
 
-`path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-#### Returns
-
-[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-### **&lt;TryLoadLibraries&gt;g__TryFindPath|84_1(String, &lt;&gt;c__DisplayClass84_0&)**
+`filepath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-```csharp
-internal static string <TryLoadLibraries>g__TryFindPath|84_1(string filename, <>c__DisplayClass84_0& )
-```
+`dest_seq_id` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
 
-#### Parameters
+`tokens_out` [LLamaToken*](./llama.native.llamatoken*.md)<br>
 
-`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`n_token_capacity` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
-`` [&lt;&gt;c__DisplayClass84_0&](./llama.native.nativeapi.<>c__displayclass84_0&.md)<br>
+`n_token_count_out` [UIntPtr&](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr&)<br>
 
 #### Returns
 
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
-### **llama_set_n_threads(SafeLLamaContextHandle, UInt32, UInt32)**
+### **llama_set_causal_attn(SafeLLamaContextHandle, Boolean)**
 
-Set the number of threads used for decoding
+Set whether to use causal attention or not. If set to true, the model will only attend to the past tokens
 
 ```csharp
-public static void llama_set_n_threads(SafeLLamaContextHandle ctx, uint n_threads, uint n_threads_batch)
+public static void llama_set_causal_attn(SafeLLamaContextHandle ctx, bool causalAttn)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`n_threads` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-n_threads is the number of threads used for generation (single token)
+`causalAttn` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-`n_threads_batch` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
+### **llama_set_embeddings(SafeLLamaContextHandle, Boolean)**
 
-### **llama_vocab_type(SafeLlamaModelHandle)**
+Set whether the model is in embeddings mode or not.
 
 ```csharp
-public static LLamaVocabType llama_vocab_type(SafeLlamaModelHandle model)
+public static void llama_set_embeddings(SafeLLamaContextHandle ctx, bool embeddings)
 ```
 
 #### Parameters
 
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-#### Returns
+`embeddings` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+If true, embeddings will be returned but logits will not
 
-[LLamaVocabType](./llama.native.llamavocabtype.md)<br>
+### **llama_set_abort_callback(SafeLlamaModelHandle, IntPtr, IntPtr)**
 
-### **llama_rope_type(SafeLlamaModelHandle)**
+Set abort callback
 
 ```csharp
-public static LLamaRopeType llama_rope_type(SafeLlamaModelHandle model)
+public static void llama_set_abort_callback(SafeLlamaModelHandle ctx, IntPtr abortCallback, IntPtr abortCallbackData)
 ```
 
 #### Parameters
 
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+`ctx` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
 
-#### Returns
+`abortCallback` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
 
-[LLamaRopeType](./llama.native.llamaropetype.md)<br>
+`abortCallbackData` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
 
-### **llama_grammar_init(LLamaGrammarElement**, UInt64, UInt64)**
+### **llama_n_seq_max(SafeLLamaContextHandle)**
 
-Create a new grammar from the given set of grammar rules
+Get the n_seq_max for this context
 
 ```csharp
-public static IntPtr llama_grammar_init(LLamaGrammarElement** rules, ulong n_rules, ulong start_rule_index)
+public static uint llama_n_seq_max(SafeLLamaContextHandle ctx)
 ```
 
 #### Parameters
 
-`rules` [LLamaGrammarElement**](./llama.native.llamagrammarelement**.md)<br>
-
-`n_rules` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
 #### Returns
 
-[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-
-### **llama_grammar_free(IntPtr)**
-
-Free all memory from the given SafeLLamaGrammarHandle
-
-```csharp
-public static void llama_grammar_free(IntPtr grammar)
-```
-
-#### Parameters
-
-`grammar` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
 
-### **llama_grammar_copy(SafeLLamaGrammarHandle)**
+### **llama_get_embeddings(SafeLLamaContextHandle)**
 
-Create a copy of an existing grammar instance
+Get all output token embeddings.
+ When pooling_type == LLAMA_POOLING_TYPE_NONE or when using a generative model, the embeddings for which
+ llama_batch.logits[i] != 0 are stored contiguously in the order they have appeared in the batch.
+ shape: [n_outputs*n_embd]
+ Otherwise, returns an empty span.
 
 ```csharp
-public static IntPtr llama_grammar_copy(SafeLLamaGrammarHandle grammar)
+public static Single* llama_get_embeddings(SafeLLamaContextHandle ctx)
 ```
 
 #### Parameters
 
-`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
 #### Returns
 
-[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
 
-### **llama_sample_grammar(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, SafeLLamaGrammarHandle)**
+### **llama_chat_apply_template(Byte*, LLamaChatMessage*, UIntPtr, Boolean, Byte*, Int32)**
 
-Apply constraints from grammar
+Apply chat template. Inspired by hf apply_chat_template() on python.
 
 ```csharp
-public static void llama_sample_grammar(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, SafeLLamaGrammarHandle grammar)
+public static int llama_chat_apply_template(Byte* tmpl, LLamaChatMessage* chat, UIntPtr n_msg, bool add_ass, Byte* buf, int length)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-
-`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+`tmpl` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
 
-### **llama_grammar_accept_token(SafeLLamaContextHandle, SafeLLamaGrammarHandle, LLamaToken)**
+`chat` [LLamaChatMessage*](./llama.native.llamachatmessage*.md)<br>
+Pointer to a list of multiple llama_chat_message
 
-Accepts the sampled token into the grammar
+`n_msg` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
+Number of llama_chat_message in this chat
 
-```csharp
-public static void llama_grammar_accept_token(SafeLLamaContextHandle ctx, SafeLLamaGrammarHandle grammar, LLamaToken token)
-```
+`add_ass` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether to end the prompt with the token(s) that indicate the start of an assistant message.
 
-#### Parameters
+`buf` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages)
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The size of the allocated buffer
 
-`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+#### Returns
 
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.
 
-### **llava_validate_embed_size(SafeLLamaContextHandle, SafeLlavaModelHandle)**
+### **llama_chat_builtin_templates(Char**, UIntPtr)**
 
-Sanity check for clip &lt;-&gt; llava embed size match
+Get list of built-in chat templates
 
 ```csharp
-public static bool llava_validate_embed_size(SafeLLamaContextHandle ctxLlama, SafeLlavaModelHandle ctxClip)
+public static int llama_chat_builtin_templates(Char** output, UIntPtr len)
 ```
 
 #### Parameters
 
-`ctxLlama` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-LLama Context
+`output` [Char**](https://docs.microsoft.com/en-us/dotnet/api/system.char**)<br>
 
-`ctxClip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
-Llava Model
+`len` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
 #### Returns
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-True if validate successfully
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **llava_image_embed_make_with_bytes(SafeLlavaModelHandle, Int32, Byte[], Int32)**
+### **llama_print_timings(SafeLLamaContextHandle)**
 
-Build an image embed from image file bytes
+Print out timing information for this context
 
 ```csharp
-public static SafeLlavaImageEmbedHandle llava_image_embed_make_with_bytes(SafeLlavaModelHandle ctx_clip, int n_threads, Byte[] image_bytes, int image_bytes_length)
+public static void llama_print_timings(SafeLLamaContextHandle ctx)
 ```
 
 #### Parameters
 
-`ctx_clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
-SafeHandle to the Clip Model
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Number of threads
+### **llama_print_system_info()**
 
-`image_bytes` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
-Binary image in jpeg format
+Print system information
 
-`image_bytes_length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Bytes length of the image
+```csharp
+public static IntPtr llama_print_system_info()
+```
 
 #### Returns
 
-[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
-SafeHandle to the Embeddings
+[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
 
-### **llava_image_embed_make_with_filename(SafeLlavaModelHandle, Int32, String)**
+### **llama_token_to_piece(Vocabulary, LLamaToken, Span&lt;Byte&gt;, Int32, Boolean)**
 
-Build an image embed from a path to an image filename
+Convert a single token into text
 
 ```csharp
-public static SafeLlavaImageEmbedHandle llava_image_embed_make_with_filename(SafeLlavaModelHandle ctx_clip, int n_threads, string image_path)
+public static int llama_token_to_piece(Vocabulary vocab, LLamaToken llamaToken, Span<byte> buffer, int lstrip, bool special)
 ```
 
 #### Parameters
 
-`ctx_clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
-SafeHandle to the Clip Model
+`vocab` [Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
 
-`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Number of threads
+`llamaToken` [LLamaToken](./llama.native.llamatoken.md)<br>
 
-`image_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-Image filename (jpeg) to generate embeddings
+`buffer` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+buffer to write string into
+
+`lstrip` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+User can skip up to 'lstrip' leading spaces before copying (useful when encoding/decoding multiple tokens with 'add_space_prefix')
+
+`special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+If true, special tokens are rendered in the output
 
 #### Returns
 
-[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
-SafeHandel to the embeddings
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The length written, or if the buffer is too small a negative that indicates the length required
 
-### **llava_image_embed_free(IntPtr)**
+### **llama_log_set(LLamaLogCallback)**
 
-Free an embedding made with llava_image_embed_make_*
+#### Caution
+
+Use `NativeLogConfig.llama_log_set` instead
+
+---
+
+Register a callback to receive llama log messages
 
 ```csharp
-public static void llava_image_embed_free(IntPtr embed)
+public static void llama_log_set(LLamaLogCallback logCallback)
 ```
 
 #### Parameters
 
-`embed` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Embeddings to release
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
 
-### **llava_eval_image_embed(SafeLLamaContextHandle, SafeLlavaImageEmbedHandle, Int32, Int32&)**
+### **llama_kv_self_seq_rm(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos)**
 
-Write the image represented by embed into the llama context with batch size n_batch, starting at context
- pos n_past. on completion, n_past points to the next position in the context after the image embed.
+Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
 
 ```csharp
-public static bool llava_eval_image_embed(SafeLLamaContextHandle ctx_llama, SafeLlavaImageEmbedHandle embed, int n_batch, Int32& n_past)
+public static bool llama_kv_self_seq_rm(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1)
 ```
 
 #### Parameters
 
-`ctx_llama` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-Llama Context
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`embed` [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
-Embedding handle
+`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
 
-`n_batch` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`p0` [LLamaPos](./llama.native.llamapos.md)<br>
 
-`n_past` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
+`p1` [LLamaPos](./llama.native.llamapos.md)<br>
 
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-True on success
+Returns false if a partial sequence cannot be removed. Removing a whole sequence never fails
 
-### **llama_model_quantize(String, String, LLamaModelQuantizeParams*)**
+### **llama_batch_init(Int32, Int32, Int32)**
 
-Returns 0 on success
+Allocates a batch of tokens on the heap
+ Each token can be assigned up to n_seq_max sequence ids
+ The batch has to be freed with llama_batch_free()
+ If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float)
+ Otherwise, llama_batch.token will be allocated to store n_tokens llama_token
+ The rest of the llama_batch members are allocated with size n_tokens
+ All members are left uninitialized
 
 ```csharp
-public static uint llama_model_quantize(string fname_inp, string fname_out, LLamaModelQuantizeParams* param)
+public static LLamaNativeBatch llama_batch_init(int n_tokens, int embd, int n_seq_max)
 ```
 
 #### Parameters
 
-`fname_inp` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`n_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`fname_out` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`embd` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`param` [LLamaModelQuantizeParams*](./llama.native.llamamodelquantizeparams*.md)<br>
+`n_seq_max` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Each token can be assigned up to n_seq_max sequence ids
 
 #### Returns
 
-[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-Returns 0 on success
+[LLamaNativeBatch](./llama.native.llamanativebatch.md)<br>
+
+### **llama_batch_free(LLamaNativeBatch)**
+
+Frees a batch of tokens allocated with llama_batch_init()
+
+```csharp
+public static void llama_batch_free(LLamaNativeBatch batch)
+```
+
+#### Parameters
+
+`batch` [LLamaNativeBatch](./llama.native.llamanativebatch.md)<br>
 
-### **llama_sample_repetition_penalties(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, LLamaToken*, UInt64, Single, Single, Single)**
+### **llama_apply_adapter_cvec(SafeLLamaContextHandle, Single*, UIntPtr, Int32, Int32, Int32)**
 
-Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
- Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
+Apply a loaded control vector to a llama_context, or if data is NULL, clear
+ the currently loaded vector.
+ n_embd should be the size of a single layer's control, and data should point
+ to an n_embd x n_layers buffer starting from layer 1.
+ il_start and il_end are the layer range the vector should apply to (both inclusive)
+ See llama_control_vector_load in common to load a control vector.
 
 ```csharp
-public static void llama_sample_repetition_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, LLamaToken* last_tokens, ulong last_tokens_size, float penalty_repeat, float penalty_freq, float penalty_present)
+public static int llama_apply_adapter_cvec(SafeLLamaContextHandle ctx, Single* data, UIntPtr len, int n_embd, int il_start, int il_end)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
+`data` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
+
+`len` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
-`last_tokens` [LLamaToken*](./llama.native.llamatoken*.md)<br>
+`n_embd` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`il_start` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`penalty_repeat` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
+`il_end` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`penalty_freq` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
+#### Returns
 
-`penalty_present` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **llama_sample_apply_guidance(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;Single&gt;, Single)**
+### **llama_split_path(String, UIntPtr, String, Int32, Int32)**
 
-Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
+Build a split GGUF final path for this chunk.
+ llama_split_path(split_path, sizeof(split_path), "/models/ggml-model-q4_0", 2, 4) =&gt; split_path = "/models/ggml-model-q4_0-00002-of-00004.gguf"
 
 ```csharp
-public static void llama_sample_apply_guidance(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<float> logits_guidance, float scale)
+public static int llama_split_path(string split_path, UIntPtr maxlen, string path_prefix, int split_no, int split_count)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`split_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`maxlen` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-Logits extracted from the original generation context.
+`path_prefix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`logits_guidance` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-Logits extracted from a separate context from the same model.
- Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
+`split_no` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
+`split_count` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **llama_sample_apply_guidance(SafeLLamaContextHandle, Single*, Single*, Single)**
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Returns the split_path length.
+
+### **llama_split_prefix(String, UIntPtr, String, Int32, Int32)**
 
-Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
+Extract the path prefix from the split_path if and only if the split_no and split_count match.
+ llama_split_prefix(split_prefix, 64, "/models/ggml-model-q4_0-00002-of-00004.gguf", 2, 4) =&gt; split_prefix = "/models/ggml-model-q4_0"
 
 ```csharp
-public static void llama_sample_apply_guidance(SafeLLamaContextHandle ctx, Single* logits, Single* logits_guidance, float scale)
+public static int llama_split_prefix(string split_prefix, UIntPtr maxlen, string split_path, int split_no, int split_count)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`split_prefix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`maxlen` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
+
+`split_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`split_no` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`logits` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
-Logits extracted from the original generation context.
+`split_count` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`logits_guidance` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
-Logits extracted from a separate context from the same model.
- Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
+#### Returns
 
-`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Returns the split_prefix length.
 
-### **llama_sample_softmax(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
+### **ggml_backend_dev_count()**
 
-Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
+Get the number of available backend devices
 
 ```csharp
-public static void llama_sample_softmax(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
+public static UIntPtr ggml_backend_dev_count()
 ```
 
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+#### Returns
 
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
+Count of available backend devices
 
-### **llama_sample_top_k(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Int32, UInt64)**
+### **ggml_backend_dev_get(UIntPtr)**
 
-Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+Get a backend device by index
 
 ```csharp
-public static void llama_sample_top_k(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, int k, ulong min_keep)
+public static IntPtr ggml_backend_dev_get(UIntPtr i)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
+`i` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
+Device index
 
-`k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+#### Returns
 
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Pointer to the backend device
 
-### **llama_sample_top_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
+### **ggml_backend_dev_buffer_type(IntPtr)**
 
-Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+Get the buffer type for a backend device
 
 ```csharp
-public static void llama_sample_top_p(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
+public static IntPtr ggml_backend_dev_buffer_type(IntPtr dev)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
+`dev` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Backend device pointer
 
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+#### Returns
 
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Pointer to the buffer type
 
-### **llama_sample_min_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
+### **ggml_backend_buft_name(IntPtr)**
 
-Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
+Get the name of a buffer type
 
 ```csharp
-public static void llama_sample_min_p(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
+public static IntPtr ggml_backend_buft_name(IntPtr buft)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`buft` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Buffer type pointer
 
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
+#### Returns
 
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-### **llama_sample_tail_free(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
-
-Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
-
-```csharp
-public static void llama_sample_tail_free(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float z, ulong min_keep)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
-
-`z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
-
-Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
-
-```csharp
-public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
-
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single)**
-
-Dynamic temperature implementation described in the paper https://arxiv.org/abs/2309.02772.
-
-```csharp
-public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float min_temp, float max_temp, float exponent_val)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-Pointer to LLamaTokenDataArray
-
-`min_temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`max_temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`exponent_val` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **llama_sample_temp(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single)**
-
-Modify logits by temperature
-
-```csharp
-public static void llama_sample_temp(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float temp)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
-
-`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **llama_get_embeddings(SafeLLamaContextHandle)**
-
-Get the embeddings for the input
-
-```csharp
-public static Span<float> llama_get_embeddings(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-#### Returns
-
-[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-
-### **llama_chat_apply_template(SafeLlamaModelHandle, Char*, LLamaChatMessage*, IntPtr, Boolean, Char*, Int32)**
-
-Apply chat template. Inspired by hf apply_chat_template() on python.
- Both "model" and "custom_template" are optional, but at least one is required. "custom_template" has higher precedence than "model"
- NOTE: This function does not use a jinja parser. It only supports a pre-defined list of template. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
-
-```csharp
-public static int llama_chat_apply_template(SafeLlamaModelHandle model, Char* tmpl, LLamaChatMessage* chat, IntPtr n_msg, bool add_ass, Char* buf, int length)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-`tmpl` [Char*](https://docs.microsoft.com/en-us/dotnet/api/system.char*)<br>
-A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
-
-`chat` [LLamaChatMessage*](./llama.native.llamachatmessage*.md)<br>
-Pointer to a list of multiple llama_chat_message
-
-`n_msg` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Number of llama_chat_message in this chat
-
-`add_ass` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-Whether to end the prompt with the token(s) that indicate the start of an assistant message.
-
-`buf` [Char*](https://docs.microsoft.com/en-us/dotnet/api/system.char*)<br>
-A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages)
-
-`length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-The size of the allocated buffer
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.
-
-### **llama_token_bos(SafeLlamaModelHandle)**
-
-Get the "Beginning of sentence" token
-
-```csharp
-public static LLamaToken llama_token_bos(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[LLamaToken](./llama.native.llamatoken.md)<br>
-
-### **llama_token_eos(SafeLlamaModelHandle)**
-
-Get the "End of sentence" token
-
-```csharp
-public static LLamaToken llama_token_eos(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[LLamaToken](./llama.native.llamatoken.md)<br>
-
-### **llama_token_nl(SafeLlamaModelHandle)**
-
-Get the "new line" token
-
-```csharp
-public static LLamaToken llama_token_nl(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[LLamaToken](./llama.native.llamatoken.md)<br>
-
-### **llama_add_bos_token(SafeLlamaModelHandle)**
-
-Returns -1 if unknown, 1 for true or 0 for false.
-
-```csharp
-public static int llama_add_bos_token(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_add_eos_token(SafeLlamaModelHandle)**
-
-Returns -1 if unknown, 1 for true or 0 for false.
-
-```csharp
-public static int llama_add_eos_token(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_token_prefix(SafeLlamaModelHandle)**
-
-codellama infill tokens, Beginning of infill prefix
-
-```csharp
-public static int llama_token_prefix(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_token_middle(SafeLlamaModelHandle)**
-
-codellama infill tokens, Beginning of infill middle
-
-```csharp
-public static int llama_token_middle(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_token_suffix(SafeLlamaModelHandle)**
-
-codellama infill tokens, Beginning of infill suffix
-
-```csharp
-public static int llama_token_suffix(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_token_eot(SafeLlamaModelHandle)**
-
-codellama infill tokens, End of infill middle
-
-```csharp
-public static int llama_token_eot(SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_print_timings(SafeLLamaContextHandle)**
-
-Print out timing information for this context
-
-```csharp
-public static void llama_print_timings(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-### **llama_reset_timings(SafeLLamaContextHandle)**
-
-Reset all collected timing information for this context
-
-```csharp
-public static void llama_reset_timings(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-### **llama_print_system_info()**
-
-Print system information
-
-```csharp
-public static IntPtr llama_print_system_info()
-```
-
-#### Returns
-
-[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-
-### **llama_token_to_piece(SafeLlamaModelHandle, LLamaToken, Span&lt;Byte&gt;)**
-
-Convert a single token into text
-
-```csharp
-public static int llama_token_to_piece(SafeLlamaModelHandle model, LLamaToken llamaToken, Span<byte> buffer)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-`llamaToken` [LLamaToken](./llama.native.llamatoken.md)<br>
-
-`buffer` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-buffer to write string into
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-The length written, or if the buffer is too small a negative that indicates the length required
-
-### **llama_tokenize(SafeLlamaModelHandle, Byte*, Int32, LLamaToken*, Int32, Boolean, Boolean)**
-
-Convert text into tokens
-
-```csharp
-public static int llama_tokenize(SafeLlamaModelHandle model, Byte* text, int text_len, LLamaToken* tokens, int n_max_tokens, bool add_bos, bool special)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-`text` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
-
-`text_len` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`tokens` [LLamaToken*](./llama.native.llamatoken*.md)<br>
-
-`n_max_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-`special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Returns the number of tokens on success, no more than n_max_tokens.
- Returns a negative number on failure - the number of tokens that would have been returned
-
-### **llama_log_set(LLamaLogCallback)**
-
-Register a callback to receive llama log messages
-
-```csharp
-public static void llama_log_set(LLamaLogCallback logCallback)
-```
-
-#### Parameters
-
-`logCallback` [LLamaLogCallback](./llama.native.llamalogcallback.md)<br>
-
-### **llama_kv_cache_clear(SafeLLamaContextHandle)**
-
-Clear the KV cache
-
-```csharp
-public static void llama_kv_cache_clear(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-### **llama_kv_cache_seq_rm(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos)**
-
-Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
-
-```csharp
-public static void llama_kv_cache_seq_rm(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
-
-`p0` [LLamaPos](./llama.native.llamapos.md)<br>
-
-`p1` [LLamaPos](./llama.native.llamapos.md)<br>
-
-### **llama_kv_cache_seq_cp(SafeLLamaContextHandle, LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)**
-
-Copy all tokens that belong to the specified sequence to another sequence
- Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence
-
-```csharp
-public static void llama_kv_cache_seq_cp(SafeLLamaContextHandle ctx, LLamaSeqId src, LLamaSeqId dest, LLamaPos p0, LLamaPos p1)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`src` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
-
-`dest` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
-
-`p0` [LLamaPos](./llama.native.llamapos.md)<br>
-
-`p1` [LLamaPos](./llama.native.llamapos.md)<br>
-
-### **llama_kv_cache_seq_keep(SafeLLamaContextHandle, LLamaSeqId)**
-
-Removes all tokens that do not belong to the specified sequence
-
-```csharp
-public static void llama_kv_cache_seq_keep(SafeLLamaContextHandle ctx, LLamaSeqId seq)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
-
-### **llama_kv_cache_seq_add(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos, Int32)**
-
-Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1)
- If the KV cache is RoPEd, the KV data is updated accordingly:
- - lazily on next llama_decode()
- - explicitly with llama_kv_cache_update()
-
-```csharp
-public static void llama_kv_cache_seq_add(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
-
-`p0` [LLamaPos](./llama.native.llamapos.md)<br>
-
-`p1` [LLamaPos](./llama.native.llamapos.md)<br>
-
-`delta` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_kv_cache_seq_div(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos, Int32)**
-
-Integer division of the positions by factor of `d &gt; 1`
- If the KV cache is RoPEd, the KV data is updated accordingly:
- - lazily on next llama_decode()
- - explicitly with llama_kv_cache_update()
- <br>
- p0 &lt; 0 : [0, p1]
- <br>
- p1 &lt; 0 : [p0, inf)
-
-```csharp
-public static void llama_kv_cache_seq_div(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int d)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
-
-`p0` [LLamaPos](./llama.native.llamapos.md)<br>
-
-`p1` [LLamaPos](./llama.native.llamapos.md)<br>
-
-`d` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_kv_cache_seq_pos_max(SafeLLamaContextHandle, LLamaSeqId)**
-
-Returns the largest position present in the KV cache for the specified sequence
-
-```csharp
-public static LLamaPos llama_kv_cache_seq_pos_max(SafeLLamaContextHandle ctx, LLamaSeqId seq)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
-
-#### Returns
-
-[LLamaPos](./llama.native.llamapos.md)<br>
-
-### **llama_kv_cache_defrag(SafeLLamaContextHandle)**
-
-Defragment the KV cache. This will be applied:
- - lazily on next llama_decode()
- - explicitly with llama_kv_cache_update()
-
-```csharp
-public static LLamaPos llama_kv_cache_defrag(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-#### Returns
-
-[LLamaPos](./llama.native.llamapos.md)<br>
-
-### **llama_kv_cache_update(SafeLLamaContextHandle)**
-
-Apply the KV cache updates (such as K-shifts, defragmentation, etc.)
-
-```csharp
-public static void llama_kv_cache_update(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-### **llama_batch_init(Int32, Int32, Int32)**
-
-Allocates a batch of tokens on the heap
- Each token can be assigned up to n_seq_max sequence ids
- The batch has to be freed with llama_batch_free()
- If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float)
- Otherwise, llama_batch.token will be allocated to store n_tokens llama_token
- The rest of the llama_batch members are allocated with size n_tokens
- All members are left uninitialized
-
-```csharp
-public static LLamaNativeBatch llama_batch_init(int n_tokens, int embd, int n_seq_max)
-```
-
-#### Parameters
-
-`n_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`embd` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`n_seq_max` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Each token can be assigned up to n_seq_max sequence ids
-
-#### Returns
-
-[LLamaNativeBatch](./llama.native.llamanativebatch.md)<br>
-
-### **llama_batch_free(LLamaNativeBatch)**
-
-Frees a batch of tokens allocated with llama_batch_init()
-
-```csharp
-public static void llama_batch_free(LLamaNativeBatch batch)
-```
-
-#### Parameters
-
-`batch` [LLamaNativeBatch](./llama.native.llamanativebatch.md)<br>
-
-### **llama_decode(SafeLLamaContextHandle, LLamaNativeBatch)**
-
-
-
-```csharp
-public static int llama_decode(SafeLLamaContextHandle ctx, LLamaNativeBatch batch)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`batch` [LLamaNativeBatch](./llama.native.llamanativebatch.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Positive return values does not mean a fatal error, but rather a warning:<br>
- - 0: success<br>
- - 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)<br>
- - &lt; 0: error<br>
-
-### **llama_kv_cache_view_init(SafeLLamaContextHandle, Int32)**
-
-Create an empty KV cache view. (use only for debugging purposes)
-
-```csharp
-public static LLamaKvCacheView llama_kv_cache_view_init(SafeLLamaContextHandle ctx, int n_max_seq)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`n_max_seq` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-#### Returns
-
-[LLamaKvCacheView](./llama.native.llamakvcacheview.md)<br>
-
-### **llama_kv_cache_view_free(LLamaKvCacheView&)**
-
-Free a KV cache view. (use only for debugging purposes)
-
-```csharp
-public static void llama_kv_cache_view_free(LLamaKvCacheView& view)
-```
-
-#### Parameters
-
-`view` [LLamaKvCacheView&](./llama.native.llamakvcacheview&.md)<br>
-
-### **llama_kv_cache_view_update(SafeLLamaContextHandle, LLamaKvCacheView&)**
-
-Update the KV cache view structure with the current state of the KV cache. (use only for debugging purposes)
-
-```csharp
-public static void llama_kv_cache_view_update(SafeLLamaContextHandle ctx, LLamaKvCacheView& view)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`view` [LLamaKvCacheView&](./llama.native.llamakvcacheview&.md)<br>
-
-### **llama_get_kv_cache_token_count(SafeLLamaContextHandle)**
-
-Returns the number of tokens in the KV cache (slow, use only for debug)
- If a KV cell has multiple sequences assigned to it, it will be counted multiple times
-
-```csharp
-public static int llama_get_kv_cache_token_count(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_get_kv_cache_used_cells(SafeLLamaContextHandle)**
-
-Returns the number of used KV cells (i.e. have at least one sequence assigned to them)
-
-```csharp
-public static int llama_get_kv_cache_used_cells(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Name of the buffer type
 
-### **llama_beam_search(SafeLLamaContextHandle, LLamaBeamSearchCallback, IntPtr, UInt64, Int32, Int32, Int32)**
+### **llava_validate_embed_size(SafeLLamaContextHandle, SafeLlavaModelHandle)**
 
-Deterministically returns entire sentence constructed by a beam search.
+Sanity check for clip &lt;-&gt; llava embed size match
 
 ```csharp
-public static void llama_beam_search(SafeLLamaContextHandle ctx, LLamaBeamSearchCallback callback, IntPtr callback_data, ulong n_beams, int n_past, int n_predict, int n_threads)
+public static bool llava_validate_embed_size(SafeLLamaContextHandle ctxLlama, SafeLlavaModelHandle ctxClip)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-Pointer to the llama_context.
-
-`callback` [LLamaBeamSearchCallback](./llama.native.nativeapi.llamabeamsearchcallback.md)<br>
-Invoked for each iteration of the beam_search loop, passing in beams_state.
-
-`callback_data` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-A pointer that is simply passed back to callback.
-
-`n_beams` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-Number of beams to use.
-
-`n_past` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Number of tokens already evaluated.
-
-`n_predict` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Maximum number of tokens to predict. EOS may occur earlier.
-
-`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Number of threads.
-
-### **llama_empty_call()**
-
-A method that does nothing. This is a native method, calling it will force the llama native dependencies to be loaded.
-
-```csharp
-public static void llama_empty_call()
-```
-
-### **llama_max_devices()**
-
-Get the maximum number of devices supported by llama.cpp
-
-```csharp
-public static long llama_max_devices()
-```
-
-#### Returns
-
-[Int64](https://docs.microsoft.com/en-us/dotnet/api/system.int64)<br>
-
-### **llama_model_default_params()**
-
-Create a LLamaModelParams with default values
-
-```csharp
-public static LLamaModelParams llama_model_default_params()
-```
-
-#### Returns
-
-[LLamaModelParams](./llama.native.llamamodelparams.md)<br>
-
-### **llama_context_default_params()**
-
-Create a LLamaContextParams with default values
-
-```csharp
-public static LLamaContextParams llama_context_default_params()
-```
-
-#### Returns
-
-[LLamaContextParams](./llama.native.llamacontextparams.md)<br>
-
-### **llama_model_quantize_default_params()**
-
-Create a LLamaModelQuantizeParams with default values
-
-```csharp
-public static LLamaModelQuantizeParams llama_model_quantize_default_params()
-```
-
-#### Returns
-
-[LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)<br>
-
-### **llama_supports_mmap()**
-
-Check if memory mapping is supported
-
-```csharp
-public static bool llama_supports_mmap()
-```
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **llama_supports_mlock()**
-
-Check if memory locking is supported
-
-```csharp
-public static bool llama_supports_mlock()
-```
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **llama_supports_gpu_offload()**
-
-Check if GPU offload is supported
+`ctxLlama` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+LLama Context
 
-```csharp
-public static bool llama_supports_gpu_offload()
-```
+`ctxClip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
+Llava Model
 
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+True if validate successfully
 
-### **llama_set_rng_seed(SafeLLamaContextHandle, UInt32)**
+### **llava_image_embed_make_with_bytes(SafeLlavaModelHandle, Int32, Byte[], Int32)**
 
-Sets the current rng seed.
+Build an image embed from image file bytes
 
 ```csharp
-public static void llama_set_rng_seed(SafeLLamaContextHandle ctx, uint seed)
+public static SafeLlavaImageEmbedHandle llava_image_embed_make_with_bytes(SafeLlavaModelHandle ctx_clip, int n_threads, Byte[] image_bytes, int image_bytes_length)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-
-### **llama_get_state_size(SafeLLamaContextHandle)**
-
-Returns the maximum size in bytes of the state (rng, logits, embedding
- and kv_cache) - will often be smaller after compacting tokens
+`ctx_clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
+SafeHandle to the Clip Model
 
-```csharp
-public static ulong llama_get_state_size(SafeLLamaContextHandle ctx)
-```
+`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Number of threads
 
-#### Parameters
+`image_bytes` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
+Binary image in jpeg format
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`image_bytes_length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Bytes length of the image
 
 #### Returns
 
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+SafeHandle to the Embeddings
 
-### **llama_copy_state_data(SafeLLamaContextHandle, Byte*)**
+### **llava_image_embed_make_with_filename(SafeLlavaModelHandle, Int32, String)**
 
-Copies the state to the specified destination address.
- Destination needs to have allocated enough memory.
+Build an image embed from a path to an image filename
 
 ```csharp
-public static ulong llama_copy_state_data(SafeLLamaContextHandle ctx, Byte* dest)
+public static SafeLlavaImageEmbedHandle llava_image_embed_make_with_filename(SafeLlavaModelHandle ctx_clip, int n_threads, string image_path)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
-
-#### Returns
-
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-the number of bytes copied
-
-### **llama_set_state_data(SafeLLamaContextHandle, Byte*)**
-
-Set the state reading from the specified address
-
-```csharp
-public static ulong llama_set_state_data(SafeLLamaContextHandle ctx, Byte* src)
-```
-
-#### Parameters
+`ctx_clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
+SafeHandle to the Clip Model
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Number of threads
 
-`src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+`image_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Image filename (jpeg) to generate embeddings
 
 #### Returns
 
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-the number of bytes read
+[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+SafeHandle to the embeddings
 
-### **llama_load_session_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)**
+### **llava_image_embed_free(IntPtr)**
 
-Load session file
+Free an embedding made with llava_image_embed_make_*
 
 ```csharp
-public static bool llama_load_session_file(SafeLLamaContextHandle ctx, string path_session, LLamaToken[] tokens_out, ulong n_token_capacity, UInt64& n_token_count_out)
+public static void llava_image_embed_free(IntPtr embed)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`tokens_out` [LLamaToken[]](./llama.native.llamatoken.md)<br>
-
-`n_token_capacity` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-
-`n_token_count_out` [UInt64&](https://docs.microsoft.com/en-us/dotnet/api/system.uint64&)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+`embed` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Embeddings to release
 
-### **llama_save_session_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64)**
+### **llava_eval_image_embed(SafeLLamaContextHandle, SafeLlavaImageEmbedHandle, Int32, Int32&)**
 
-Save session file
+Write the image represented by embed into the llama context with batch size n_batch, starting at context
+ pos n_past. on completion, n_past points to the next position in the context after the image embed.
 
 ```csharp
-public static bool llama_save_session_file(SafeLLamaContextHandle ctx, string path_session, LLamaToken[] tokens, ulong n_token_count)
+public static bool llava_eval_image_embed(SafeLLamaContextHandle ctx_llama, SafeLlavaImageEmbedHandle embed, int n_batch, Int32& n_past)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`ctx_llama` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+Llama Context
 
-`path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`embed` [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+Embedding handle
 
-`tokens` [LLamaToken[]](./llama.native.llamatoken.md)<br>
+`n_batch` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`n_token_count` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`n_past` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
 
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+True on success
 
-### **llama_token_get_text(SafeLlamaModelHandle, LLamaToken)**
-
-```csharp
-public static Byte* llama_token_get_text(SafeLlamaModelHandle model, LLamaToken token)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
-
-#### Returns
-
-[Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+### **GetLoadedNativeLibrary(NativeLibraryName)**
 
-### **llama_token_get_score(SafeLlamaModelHandle, LLamaToken)**
+Get the loaded native library. If you are using netstandard2.0, it will always return null.
 
 ```csharp
-public static float llama_token_get_score(SafeLlamaModelHandle model, LLamaToken token)
+public static INativeLibrary GetLoadedNativeLibrary(NativeLibraryName name)
 ```
 
 #### Parameters
 
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+`name` [NativeLibraryName](./llama.native.nativelibraryname.md)<br>
 
 #### Returns
 
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+[INativeLibrary](./llama.abstractions.inativelibrary.md)<br>
 
-### **llama_token_get_type(SafeLlamaModelHandle, LLamaToken)**
+#### Exceptions
 
-```csharp
-public static LLamaTokenType llama_token_get_type(SafeLlamaModelHandle model, LLamaToken token)
-```
-
-#### Parameters
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
-
-#### Returns
+[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
 
-[LLamaTokenType](./llama.native.llamatokentype.md)<br>
+### **llama_model_quantize(String, String, LLamaModelQuantizeParams&)**
 
-### **llama_n_ctx(SafeLLamaContextHandle)**
-
-Get the size of the context window for the model for this context
+Returns 0 on success
 
 ```csharp
-public static uint llama_n_ctx(SafeLLamaContextHandle ctx)
+public static uint llama_model_quantize(string fname_inp, string fname_out, LLamaModelQuantizeParams& param)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-#### Returns
-
-[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-
-### **llama_n_batch(SafeLLamaContextHandle)**
-
-Get the batch size for this context
-
-```csharp
-public static uint llama_n_batch(SafeLLamaContextHandle ctx)
-```
+`fname_inp` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-#### Parameters
+`fname_out` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`param` [LLamaModelQuantizeParams&](./llama.native.llamamodelquantizeparams&.md)<br>
 
 #### Returns
 
 [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+Returns 0 on success
 
-### **llama_get_logits(SafeLLamaContextHandle)**
-
-Token logits obtained from the last call to llama_decode
- The logits for the last token are stored in the last row
- Can be mutated in order to change the probabilities of the next token.<br>
- Rows: n_tokens<br>
- Cols: n_vocab
-
-```csharp
-public static Single* llama_get_logits(SafeLLamaContextHandle ctx)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-#### Returns
-
-[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
-
-### **llama_get_logits_ith(SafeLLamaContextHandle, Int32)**
-
-Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
-
-```csharp
-public static Single* llama_get_logits_ith(SafeLLamaContextHandle ctx, int i)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`i` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-#### Returns
-
-[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
-
-### **llama_get_embeddings_ith(SafeLLamaContextHandle, Int32)**
-
-Get the embeddings for the ith sequence. Equivalent to: llama_get_embeddings(ctx) + i*n_embd
-
-```csharp
-public static Single* llama_get_embeddings_ith(SafeLLamaContextHandle ctx, int i)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`i` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-#### Returns
+---
 
-[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibraryconfig.md b/docs/xmldocs/llama.native.nativelibraryconfig.md
index 38e8f96ad..471e1367d 100644
--- a/docs/xmldocs/llama.native.nativelibraryconfig.md
+++ b/docs/xmldocs/llama.native.nativelibraryconfig.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # NativeLibraryConfig
 
 Namespace: LLama.Native
@@ -9,16 +13,59 @@ Allows configuration of the native llama.cpp libraries to load and use.
 public sealed class NativeLibraryConfig
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryConfig](./llama.native.nativelibraryconfig.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryConfig](./llama.native.nativelibraryconfig.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
 ### **Instance**
 
-Get the config instance
+#### Caution
+
+Please use NativeLibraryConfig.All instead, or set configurations for NativeLibraryConfig.LLama and NativeLibraryConfig.LLavaShared respectively.
+
+---
+
+Set configurations for all the native libraries, including LLama and LLava
+
+```csharp
+public static NativeLibraryConfigContainer Instance { get; }
+```
+
+#### Property Value
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+### **All**
+
+Set configurations for all the native libraries, including LLama and LLava
+
+```csharp
+public static NativeLibraryConfigContainer All { get; }
+```
+
+#### Property Value
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+### **LLama**
+
+Configuration for LLama native library
+
+```csharp
+public static NativeLibraryConfig LLama { get; }
+```
+
+#### Property Value
+
+[NativeLibraryConfig](./llama.native.nativelibraryconfig.md)<br>
+
+### **LLava**
+
+Configuration for LLava native library
 
 ```csharp
-public static NativeLibraryConfig Instance { get; }
+public static NativeLibraryConfig LLava { get; }
 ```
 
 #### Property Value
@@ -30,7 +77,7 @@ public static NativeLibraryConfig Instance { get; }
 Check if the native library has already been loaded. Configuration cannot be modified if this is true.
 
 ```csharp
-public static bool LibraryHasLoaded { get; internal set; }
+public bool LibraryHasLoaded { get; internal set; }
 ```
 
 #### Property Value
@@ -39,22 +86,19 @@ public static bool LibraryHasLoaded { get; internal set; }
 
 ## Methods
 
-### **WithLibrary(String, String)**
+### **WithLibrary(String)**
 
 Load a specified native library as backend for LLamaSharp.
  When this method is called, all the other configurations will be ignored.
 
 ```csharp
-public NativeLibraryConfig WithLibrary(string llamaPath, string llavaPath)
+public NativeLibraryConfig WithLibrary(string libraryPath)
 ```
 
 #### Parameters
 
-`llamaPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-The full path to the llama library to load.
-
-`llavaPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-The full path to the llava library to load.
+`libraryPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The full path to the native library to load.
 
 #### Returns
 
@@ -67,7 +111,7 @@ Thrown if `LibraryHasLoaded` is true.
 
 ### **WithCuda(Boolean)**
 
-Configure whether to use cuda backend if possible.
+Configure whether to use cuda backend if possible. Default is true.
 
 ```csharp
 public NativeLibraryConfig WithCuda(bool enable)
@@ -86,17 +130,17 @@ public NativeLibraryConfig WithCuda(bool enable)
 [InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
 Thrown if `LibraryHasLoaded` is true.
 
-### **WithAvx(AvxLevel)**
+### **WithVulkan(Boolean)**
 
-Configure the prefferred avx support level of the backend.
+Configure whether to use vulkan backend if possible. Default is true.
 
 ```csharp
-public NativeLibraryConfig WithAvx(AvxLevel level)
+public NativeLibraryConfig WithVulkan(bool enable)
 ```
 
 #### Parameters
 
-`level` [AvxLevel](./llama.native.nativelibraryconfig.avxlevel.md)<br>
+`enable` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
 #### Returns
 
@@ -107,17 +151,18 @@ public NativeLibraryConfig WithAvx(AvxLevel level)
 [InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
 Thrown if `LibraryHasLoaded` is true.
 
-### **WithAutoFallback(Boolean)**
+### **WithAvx(AvxLevel)**
 
-Configure whether to allow fallback when there's no match for preferred settings.
+Configure the prefferred avx support level of the backend. 
+ Default value is detected automatically due to your operating system.
 
 ```csharp
-public NativeLibraryConfig WithAutoFallback(bool enable)
+public NativeLibraryConfig WithAvx(AvxLevel level)
 ```
 
 #### Parameters
 
-`enable` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+`level` [AvxLevel](./llama.native.avxlevel.md)<br>
 
 #### Returns
 
@@ -128,14 +173,12 @@ public NativeLibraryConfig WithAutoFallback(bool enable)
 [InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
 Thrown if `LibraryHasLoaded` is true.
 
-### **SkipCheck(Boolean)**
+### **WithAutoFallback(Boolean)**
 
-Whether to skip the check when you don't allow fallback. This option 
- may be useful under some complex conditions. For example, you're sure 
- you have your cublas configured but LLamaSharp take it as invalid by mistake.
+Configure whether to allow fallback when there's no match for preferred settings. Default is true.
 
 ```csharp
-public NativeLibraryConfig SkipCheck(bool enable)
+public NativeLibraryConfig WithAutoFallback(bool enable)
 ```
 
 #### Parameters
@@ -151,12 +194,14 @@ public NativeLibraryConfig SkipCheck(bool enable)
 [InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
 Thrown if `LibraryHasLoaded` is true.
 
-### **WithLogs(Boolean)**
+### **SkipCheck(Boolean)**
 
-Whether to output the logs to console when loading the native library with your configuration.
+Whether to skip the check when you don't allow fallback. This option 
+ may be useful under some complex conditions. For example, you're sure 
+ you have your cublas configured but LLamaSharp take it as invalid by mistake. Default is false;
 
 ```csharp
-public NativeLibraryConfig WithLogs(bool enable)
+public NativeLibraryConfig SkipCheck(bool enable)
 ```
 
 #### Parameters
@@ -172,87 +217,121 @@ public NativeLibraryConfig WithLogs(bool enable)
 [InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
 Thrown if `LibraryHasLoaded` is true.
 
-### **WithLogs(LLamaLogLevel)**
+### **WithSearchDirectories(IEnumerable&lt;String&gt;)**
 
-Enable console logging with the specified log logLevel.
+Add self-defined search directories. Note that the file structure of the added 
+ directories must be the same as the default directory. Besides, the directory 
+ won't be used recursively.
 
 ```csharp
-public NativeLibraryConfig WithLogs(LLamaLogLevel logLevel)
+public NativeLibraryConfig WithSearchDirectories(IEnumerable<string> directories)
 ```
 
 #### Parameters
 
-`logLevel` [LLamaLogLevel](./llama.native.llamaloglevel.md)<br>
+`directories` [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
 
 #### Returns
 
 [NativeLibraryConfig](./llama.native.nativelibraryconfig.md)<br>
 
-#### Exceptions
-
-[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
-Thrown if `LibraryHasLoaded` is true.
-
-### **WithSearchDirectories(IEnumerable&lt;String&gt;)**
+### **WithSearchDirectory(String)**
 
 Add self-defined search directories. Note that the file structure of the added 
  directories must be the same as the default directory. Besides, the directory 
  won't be used recursively.
 
 ```csharp
-public NativeLibraryConfig WithSearchDirectories(IEnumerable<string> directories)
+public NativeLibraryConfig WithSearchDirectory(string directory)
 ```
 
 #### Parameters
 
-`directories` [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+`directory` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
 #### Returns
 
 [NativeLibraryConfig](./llama.native.nativelibraryconfig.md)<br>
 
-### **WithSearchDirectory(String)**
+### **WithSelectingPolicy(INativeLibrarySelectingPolicy)**
 
-Add self-defined search directories. Note that the file structure of the added 
- directories must be the same as the default directory. Besides, the directory 
- won't be used recursively.
+Set the policy which decides how to select the desired native libraries and order them by priority. 
+ By default we use [DefaultNativeLibrarySelectingPolicy](./llama.native.defaultnativelibraryselectingpolicy.md).
 
 ```csharp
-public NativeLibraryConfig WithSearchDirectory(string directory)
+public NativeLibraryConfig WithSelectingPolicy(INativeLibrarySelectingPolicy policy)
 ```
 
 #### Parameters
 
-`directory` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`policy` [INativeLibrarySelectingPolicy](./llama.abstractions.inativelibraryselectingpolicy.md)<br>
 
 #### Returns
 
 [NativeLibraryConfig](./llama.native.nativelibraryconfig.md)<br>
 
-### **CheckAndGatherDescription(LibraryName)**
+### **WithLogCallback(LLamaLogCallback)**
+
+Set the log callback that will be used for all llama.cpp log messages
+
+```csharp
+public NativeLibraryConfig WithLogCallback(LLamaLogCallback callback)
+```
+
+#### Parameters
+
+`callback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[NativeLibraryConfig](./llama.native.nativelibraryconfig.md)<br>
+
+#### Exceptions
+
+[NotImplementedException](https://docs.microsoft.com/en-us/dotnet/api/system.notimplementedexception)<br>
+
+### **WithLogCallback(ILogger)**
+
+Set the log callback that will be used for all llama.cpp log messages
 
 ```csharp
-internal static Description CheckAndGatherDescription(LibraryName library)
+public NativeLibraryConfig WithLogCallback(ILogger logger)
 ```
 
 #### Parameters
 
-`library` [LibraryName](./llama.native.libraryname.md)<br>
+`logger` ILogger<br>
 
 #### Returns
 
-[Description](./llama.native.nativelibraryconfig.description.md)<br>
+[NativeLibraryConfig](./llama.native.nativelibraryconfig.md)<br>
+
+#### Exceptions
+
+[NotImplementedException](https://docs.microsoft.com/en-us/dotnet/api/system.notimplementedexception)<br>
+
+### **DryRun(INativeLibrary&)**
 
-### **AvxLevelToString(AvxLevel)**
+Try to load the native library with the current configurations, 
+ but do not actually set it to [NativeApi](./llama.native.nativeapi.md).
+ 
+ You can still modify the configuration after this calling but only before any call from [NativeApi](./llama.native.nativeapi.md).
 
 ```csharp
-internal static string AvxLevelToString(AvxLevel level)
+public bool DryRun(INativeLibrary& loadedLibrary)
 ```
 
 #### Parameters
 
-`level` [AvxLevel](./llama.native.nativelibraryconfig.avxlevel.md)<br>
+`loadedLibrary` [INativeLibrary&](./llama.abstractions.inativelibrary&.md)<br>
+The loaded livrary. When the loading failed, this will be null. 
+ However if you are using .NET standard2.0, this will never return null.
 
 #### Returns
 
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether the running is successful.
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibraryconfigcontainer.md b/docs/xmldocs/llama.native.nativelibraryconfigcontainer.md
new file mode 100644
index 000000000..c3d607853
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelibraryconfigcontainer.md
@@ -0,0 +1,282 @@
+[`< Back`](./)
+
+---
+
+# NativeLibraryConfigContainer
+
+Namespace: LLama.Native
+
+A class to set same configurations to multiple libraries at the same time.
+
+```csharp
+public sealed class NativeLibraryConfigContainer
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Methods
+
+### **ForEach(Action&lt;NativeLibraryConfig&gt;)**
+
+Do an action for all the configs in this container.
+
+```csharp
+public void ForEach(Action<NativeLibraryConfig> action)
+```
+
+#### Parameters
+
+`action` [Action&lt;NativeLibraryConfig&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.action-1)<br>
+
+### **WithLibrary(String, String)**
+
+Load a specified native library as backend for LLamaSharp.
+ When this method is called, all the other configurations will be ignored.
+
+```csharp
+public NativeLibraryConfigContainer WithLibrary(string llamaPath, string llavaPath)
+```
+
+#### Parameters
+
+`llamaPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The full path to the llama library to load.
+
+`llavaPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The full path to the llava library to load.
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+#### Exceptions
+
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+Thrown if `LibraryHasLoaded` is true.
+
+### **WithCuda(Boolean)**
+
+Configure whether to use cuda backend if possible.
+
+```csharp
+public NativeLibraryConfigContainer WithCuda(bool enable)
+```
+
+#### Parameters
+
+`enable` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+#### Exceptions
+
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+Thrown if `LibraryHasLoaded` is true.
+
+### **WithVulkan(Boolean)**
+
+Configure whether to use vulkan backend if possible.
+
+```csharp
+public NativeLibraryConfigContainer WithVulkan(bool enable)
+```
+
+#### Parameters
+
+`enable` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+#### Exceptions
+
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+Thrown if `LibraryHasLoaded` is true.
+
+### **WithAvx(AvxLevel)**
+
+Configure the prefferred avx support level of the backend.
+
+```csharp
+public NativeLibraryConfigContainer WithAvx(AvxLevel level)
+```
+
+#### Parameters
+
+`level` [AvxLevel](./llama.native.avxlevel.md)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+#### Exceptions
+
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+Thrown if `LibraryHasLoaded` is true.
+
+### **WithAutoFallback(Boolean)**
+
+Configure whether to allow fallback when there's no match for preferred settings.
+
+```csharp
+public NativeLibraryConfigContainer WithAutoFallback(bool enable)
+```
+
+#### Parameters
+
+`enable` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+#### Exceptions
+
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+Thrown if `LibraryHasLoaded` is true.
+
+### **SkipCheck(Boolean)**
+
+Whether to skip the check when you don't allow fallback. This option 
+ may be useful under some complex conditions. For example, you're sure 
+ you have your cublas configured but LLamaSharp take it as invalid by mistake.
+
+```csharp
+public NativeLibraryConfigContainer SkipCheck(bool enable)
+```
+
+#### Parameters
+
+`enable` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+#### Exceptions
+
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+Thrown if `LibraryHasLoaded` is true.
+
+### **WithSearchDirectories(IEnumerable&lt;String&gt;)**
+
+Add self-defined search directories. Note that the file structure of the added 
+ directories must be the same as the default directory. Besides, the directory 
+ won't be used recursively.
+
+```csharp
+public NativeLibraryConfigContainer WithSearchDirectories(IEnumerable<string> directories)
+```
+
+#### Parameters
+
+`directories` [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+### **WithSearchDirectory(String)**
+
+Add self-defined search directories. Note that the file structure of the added 
+ directories must be the same as the default directory. Besides, the directory 
+ won't be used recursively.
+
+```csharp
+public NativeLibraryConfigContainer WithSearchDirectory(string directory)
+```
+
+#### Parameters
+
+`directory` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+### **WithSelectingPolicy(INativeLibrarySelectingPolicy)**
+
+Set the policy which decides how to select the desired native libraries and order them by priority. 
+ By default we use [DefaultNativeLibrarySelectingPolicy](./llama.native.defaultnativelibraryselectingpolicy.md).
+
+```csharp
+public NativeLibraryConfigContainer WithSelectingPolicy(INativeLibrarySelectingPolicy policy)
+```
+
+#### Parameters
+
+`policy` [INativeLibrarySelectingPolicy](./llama.abstractions.inativelibraryselectingpolicy.md)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+### **WithLogCallback(LLamaLogCallback)**
+
+Set the log callback that will be used for all llama.cpp log messages
+
+```csharp
+public NativeLibraryConfigContainer WithLogCallback(LLamaLogCallback callback)
+```
+
+#### Parameters
+
+`callback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+#### Exceptions
+
+[NotImplementedException](https://docs.microsoft.com/en-us/dotnet/api/system.notimplementedexception)<br>
+
+### **WithLogCallback(ILogger)**
+
+Set the log callback that will be used for all llama.cpp log messages
+
+```csharp
+public NativeLibraryConfigContainer WithLogCallback(ILogger logger)
+```
+
+#### Parameters
+
+`logger` ILogger<br>
+
+#### Returns
+
+[NativeLibraryConfigContainer](./llama.native.nativelibraryconfigcontainer.md)<br>
+
+#### Exceptions
+
+[NotImplementedException](https://docs.microsoft.com/en-us/dotnet/api/system.notimplementedexception)<br>
+
+### **DryRun(INativeLibrary&, INativeLibrary&)**
+
+Try to load the native library with the current configurations, 
+ but do not actually set it to [NativeApi](./llama.native.nativeapi.md).
+ 
+ You can still modify the configuration after this calling but only before any call from [NativeApi](./llama.native.nativeapi.md).
+
+```csharp
+public bool DryRun(INativeLibrary& loadedLLamaNativeLibrary, INativeLibrary& loadedLLavaNativeLibrary)
+```
+
+#### Parameters
+
+`loadedLLamaNativeLibrary` [INativeLibrary&](./llama.abstractions.inativelibrary&.md)<br>
+
+`loadedLLavaNativeLibrary` [INativeLibrary&](./llama.abstractions.inativelibrary&.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether the running is successful.
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibraryfrompath.md b/docs/xmldocs/llama.native.nativelibraryfrompath.md
new file mode 100644
index 000000000..dbe079119
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelibraryfrompath.md
@@ -0,0 +1,65 @@
+[`< Back`](./)
+
+---
+
+# NativeLibraryFromPath
+
+Namespace: LLama.Native
+
+A native library specified with a local file path.
+
+```csharp
+public class NativeLibraryFromPath : LLama.Abstractions.INativeLibrary
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryFromPath](./llama.native.nativelibraryfrompath.md)<br>
+Implements [INativeLibrary](./llama.abstractions.inativelibrary.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **Metadata**
+
+```csharp
+public NativeLibraryMetadata Metadata { get; }
+```
+
+#### Property Value
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+## Constructors
+
+### **NativeLibraryFromPath(String)**
+
+
+
+```csharp
+public NativeLibraryFromPath(string path)
+```
+
+#### Parameters
+
+`path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Methods
+
+### **Prepare(SystemInfo, LLamaLogCallback)**
+
+```csharp
+public IEnumerable<string> Prepare(SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibrarymetadata.md b/docs/xmldocs/llama.native.nativelibrarymetadata.md
new file mode 100644
index 000000000..6e21f6acf
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelibrarymetadata.md
@@ -0,0 +1,205 @@
+[`< Back`](./)
+
+---
+
+# NativeLibraryMetadata
+
+Namespace: LLama.Native
+
+Information of a native library file.
+
+```csharp
+public class NativeLibraryMetadata : System.IEquatable`1[[LLama.Native.NativeLibraryMetadata, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+Implements [IEquatable&lt;NativeLibraryMetadata&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **EqualityContract**
+
+```csharp
+protected Type EqualityContract { get; }
+```
+
+#### Property Value
+
+[Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
+
+### **NativeLibraryName**
+
+Which kind of library it is.
+
+```csharp
+public NativeLibraryName NativeLibraryName { get; set; }
+```
+
+#### Property Value
+
+[NativeLibraryName](./llama.native.nativelibraryname.md)<br>
+
+### **UseCuda**
+
+Whether it's compiled with cublas.
+
+```csharp
+public bool UseCuda { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **UseVulkan**
+
+Whether it's compiled with vulkan.
+
+```csharp
+public bool UseVulkan { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **AvxLevel**
+
+Which AvxLevel it's compiled with.
+
+```csharp
+public AvxLevel AvxLevel { get; set; }
+```
+
+#### Property Value
+
+[AvxLevel](./llama.native.avxlevel.md)<br>
+
+## Constructors
+
+### **NativeLibraryMetadata(NativeLibraryName, Boolean, Boolean, AvxLevel)**
+
+Information of a native library file.
+
+```csharp
+public NativeLibraryMetadata(NativeLibraryName NativeLibraryName, bool UseCuda, bool UseVulkan, AvxLevel AvxLevel)
+```
+
+#### Parameters
+
+`NativeLibraryName` [NativeLibraryName](./llama.native.nativelibraryname.md)<br>
+Which kind of library it is.
+
+`UseCuda` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether it's compiled with cublas.
+
+`UseVulkan` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether it's compiled with vulkan.
+
+`AvxLevel` [AvxLevel](./llama.native.avxlevel.md)<br>
+Which AvxLevel it's compiled with.
+
+### **NativeLibraryMetadata(NativeLibraryMetadata)**
+
+```csharp
+protected NativeLibraryMetadata(NativeLibraryMetadata original)
+```
+
+#### Parameters
+
+`original` [NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+## Methods
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **PrintMembers(StringBuilder)**
+
+```csharp
+protected bool PrintMembers(StringBuilder builder)
+```
+
+#### Parameters
+
+`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GetHashCode()**
+
+```csharp
+public int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Equals(Object)**
+
+```csharp
+public bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(NativeLibraryMetadata)**
+
+```csharp
+public bool Equals(NativeLibraryMetadata other)
+```
+
+#### Parameters
+
+`other` [NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **&lt;Clone&gt;$()**
+
+```csharp
+public NativeLibraryMetadata <Clone>$()
+```
+
+#### Returns
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+### **Deconstruct(NativeLibraryName&, Boolean&, Boolean&, AvxLevel&)**
+
+```csharp
+public void Deconstruct(NativeLibraryName& NativeLibraryName, Boolean& UseCuda, Boolean& UseVulkan, AvxLevel& AvxLevel)
+```
+
+#### Parameters
+
+`NativeLibraryName` [NativeLibraryName&](./llama.native.nativelibraryname&.md)<br>
+
+`UseCuda` [Boolean&](https://docs.microsoft.com/en-us/dotnet/api/system.boolean&)<br>
+
+`UseVulkan` [Boolean&](https://docs.microsoft.com/en-us/dotnet/api/system.boolean&)<br>
+
+`AvxLevel` [AvxLevel&](./llama.native.avxlevel&.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibraryname.md b/docs/xmldocs/llama.native.nativelibraryname.md
new file mode 100644
index 000000000..ef555cfa9
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelibraryname.md
@@ -0,0 +1,27 @@
+[`< Back`](./)
+
+---
+
+# NativeLibraryName
+
+Namespace: LLama.Native
+
+The name of the native library
+
+```csharp
+public enum NativeLibraryName
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [NativeLibraryName](./llama.native.nativelibraryname.md)<br>
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+
+## Fields
+
+| Name | Value | Description |
+| --- | --: | --- |
+| LLama | 0 | The native library compiled from llama.cpp. |
+| LLava | 1 | The native library compiled from the LLaVA example of llama.cpp. |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibrarywithavx.md b/docs/xmldocs/llama.native.nativelibrarywithavx.md
new file mode 100644
index 000000000..812327489
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelibrarywithavx.md
@@ -0,0 +1,69 @@
+[`< Back`](./)
+
+---
+
+# NativeLibraryWithAvx
+
+Namespace: LLama.Native
+
+A native library compiled with avx support but without cuda/cublas.
+
+```csharp
+public class NativeLibraryWithAvx : LLama.Abstractions.INativeLibrary
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryWithAvx](./llama.native.nativelibrarywithavx.md)<br>
+Implements [INativeLibrary](./llama.abstractions.inativelibrary.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **Metadata**
+
+```csharp
+public NativeLibraryMetadata Metadata { get; }
+```
+
+#### Property Value
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+## Constructors
+
+### **NativeLibraryWithAvx(NativeLibraryName, AvxLevel, Boolean)**
+
+
+
+```csharp
+public NativeLibraryWithAvx(NativeLibraryName libraryName, AvxLevel avxLevel, bool skipCheck)
+```
+
+#### Parameters
+
+`libraryName` [NativeLibraryName](./llama.native.nativelibraryname.md)<br>
+
+`avxLevel` [AvxLevel](./llama.native.avxlevel.md)<br>
+
+`skipCheck` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Prepare(SystemInfo, LLamaLogCallback)**
+
+```csharp
+public IEnumerable<string> Prepare(SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibrarywithcuda.md b/docs/xmldocs/llama.native.nativelibrarywithcuda.md
new file mode 100644
index 000000000..06b2e20bd
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelibrarywithcuda.md
@@ -0,0 +1,71 @@
+[`< Back`](./)
+
+---
+
+# NativeLibraryWithCuda
+
+Namespace: LLama.Native
+
+A native library compiled with cublas/cuda.
+
+```csharp
+public class NativeLibraryWithCuda : LLama.Abstractions.INativeLibrary
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryWithCuda](./llama.native.nativelibrarywithcuda.md)<br>
+Implements [INativeLibrary](./llama.abstractions.inativelibrary.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **Metadata**
+
+```csharp
+public NativeLibraryMetadata Metadata { get; }
+```
+
+#### Property Value
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+## Constructors
+
+### **NativeLibraryWithCuda(Int32, NativeLibraryName, AvxLevel, Boolean)**
+
+
+
+```csharp
+public NativeLibraryWithCuda(int majorCudaVersion, NativeLibraryName libraryName, AvxLevel avxLevel, bool skipCheck)
+```
+
+#### Parameters
+
+`majorCudaVersion` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`libraryName` [NativeLibraryName](./llama.native.nativelibraryname.md)<br>
+
+`avxLevel` [AvxLevel](./llama.native.avxlevel.md)<br>
+
+`skipCheck` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Prepare(SystemInfo, LLamaLogCallback)**
+
+```csharp
+public IEnumerable<string> Prepare(SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibrarywithmacorfallback.md b/docs/xmldocs/llama.native.nativelibrarywithmacorfallback.md
new file mode 100644
index 000000000..9871e0890
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelibrarywithmacorfallback.md
@@ -0,0 +1,65 @@
+[`< Back`](./)
+
+---
+
+# NativeLibraryWithMacOrFallback
+
+Namespace: LLama.Native
+
+A native library compiled on Mac, or fallbacks from all other libraries in the selection.
+
+```csharp
+public class NativeLibraryWithMacOrFallback : LLama.Abstractions.INativeLibrary
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryWithMacOrFallback](./llama.native.nativelibrarywithmacorfallback.md)<br>
+Implements [INativeLibrary](./llama.abstractions.inativelibrary.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **Metadata**
+
+```csharp
+public NativeLibraryMetadata Metadata { get; }
+```
+
+#### Property Value
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+## Constructors
+
+### **NativeLibraryWithMacOrFallback(NativeLibraryName)**
+
+
+
+```csharp
+public NativeLibraryWithMacOrFallback(NativeLibraryName libraryName)
+```
+
+#### Parameters
+
+`libraryName` [NativeLibraryName](./llama.native.nativelibraryname.md)<br>
+
+## Methods
+
+### **Prepare(SystemInfo, LLamaLogCallback)**
+
+```csharp
+public IEnumerable<string> Prepare(SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelibrarywithvulkan.md b/docs/xmldocs/llama.native.nativelibrarywithvulkan.md
new file mode 100644
index 000000000..dc997ec51
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelibrarywithvulkan.md
@@ -0,0 +1,71 @@
+[`< Back`](./)
+
+---
+
+# NativeLibraryWithVulkan
+
+Namespace: LLama.Native
+
+A native library compiled with vulkan.
+
+```csharp
+public class NativeLibraryWithVulkan : LLama.Abstractions.INativeLibrary
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLibraryWithVulkan](./llama.native.nativelibrarywithvulkan.md)<br>
+Implements [INativeLibrary](./llama.abstractions.inativelibrary.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **Metadata**
+
+```csharp
+public NativeLibraryMetadata Metadata { get; }
+```
+
+#### Property Value
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+## Constructors
+
+### **NativeLibraryWithVulkan(String, NativeLibraryName, AvxLevel, Boolean)**
+
+
+
+```csharp
+public NativeLibraryWithVulkan(string vulkanVersion, NativeLibraryName libraryName, AvxLevel avxLevel, bool skipCheck)
+```
+
+#### Parameters
+
+`vulkanVersion` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`libraryName` [NativeLibraryName](./llama.native.nativelibraryname.md)<br>
+
+`avxLevel` [AvxLevel](./llama.native.avxlevel.md)<br>
+
+`skipCheck` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Prepare(SystemInfo, LLamaLogCallback)**
+
+```csharp
+public IEnumerable<string> Prepare(SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.nativelogconfig.md b/docs/xmldocs/llama.native.nativelogconfig.md
new file mode 100644
index 000000000..a5ac92c5c
--- /dev/null
+++ b/docs/xmldocs/llama.native.nativelogconfig.md
@@ -0,0 +1,46 @@
+[`< Back`](./)
+
+---
+
+# NativeLogConfig
+
+Namespace: LLama.Native
+
+Configure llama.cpp logging
+
+```csharp
+public static class NativeLogConfig
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeLogConfig](./llama.native.nativelogconfig.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Methods
+
+### **llama_log_set(LLamaLogCallback)**
+
+Register a callback to receive llama log messages
+
+```csharp
+public static void llama_log_set(LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+### **llama_log_set(ILogger)**
+
+Register a callback to receive llama log messages
+
+```csharp
+public static void llama_log_set(ILogger logger)
+```
+
+#### Parameters
+
+`logger` ILogger<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.ropescalingtype.md b/docs/xmldocs/llama.native.ropescalingtype.md
index 928627b4b..a68d95768 100644
--- a/docs/xmldocs/llama.native.ropescalingtype.md
+++ b/docs/xmldocs/llama.native.ropescalingtype.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # RopeScalingType
 
 Namespace: LLama.Native
@@ -9,7 +13,7 @@ public enum RopeScalingType
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [RopeScalingType](./llama.native.ropescalingtype.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [ISpanFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.ispanformattable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 **Remarks:**
 
@@ -23,3 +27,8 @@ C# equivalent of llama_rope_scaling_type
 | None | 0 | Do not apply any RoPE scaling |
 | Linear | 1 | Positional linear interpolation, as described by kaikendev: https://kaiokendev.github.io/til#extending-context-to-8k |
 | Yarn | 2 | YaRN scaling: https://arxiv.org/pdf/2309.00071.pdf |
+| LongRope | 3 | LongRope scaling |
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.safellamacontexthandle.md b/docs/xmldocs/llama.native.safellamacontexthandle.md
index af07938ec..8ea66954f 100644
--- a/docs/xmldocs/llama.native.safellamacontexthandle.md
+++ b/docs/xmldocs/llama.native.safellamacontexthandle.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # SafeLLamaContextHandle
 
 Namespace: LLama.Native
@@ -9,21 +13,18 @@ public sealed class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDispos
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
-
-## Properties
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
-### **VocabCount**
+## Fields
 
-Total number of tokens in vocabulary of this model
+### **handle**
 
 ```csharp
-public int VocabCount { get; }
+protected IntPtr handle;
 ```
 
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+## Properties
 
 ### **ContextSize**
 
@@ -61,6 +62,54 @@ public uint BatchSize { get; }
 
 [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
 
+### **UBatchSize**
+
+Get the physical maximum batch size for this context
+
+```csharp
+public uint UBatchSize { get; }
+```
+
+#### Property Value
+
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+### **GenerationThreads**
+
+Get or set the number of threads used for generation of a single token.
+
+```csharp
+public int GenerationThreads { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **BatchThreads**
+
+Get or set the number of threads used for prompt and batch processing (multiple token).
+
+```csharp
+public int BatchThreads { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **PoolingType**
+
+Get the pooling type for this context
+
+```csharp
+public LLamaPoolingType PoolingType { get; }
+```
+
+#### Property Value
+
+[LLamaPoolingType](./llama.native.llamapoolingtype.md)<br>
+
 ### **ModelHandle**
 
 Get the model which this context is using
@@ -73,6 +122,30 @@ public SafeLlamaModelHandle ModelHandle { get; }
 
 [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
 
+### **Vocab**
+
+Get the vocabulary for the model this context is using
+
+```csharp
+public Vocabulary Vocab { get; }
+```
+
+#### Property Value
+
+[Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
+
+### **KvCacheCanShift**
+
+Check if the context supports KV cache shifting
+
+```csharp
+public bool KvCacheCanShift { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **IsInvalid**
 
 ```csharp
@@ -135,18 +208,71 @@ public static SafeLLamaContextHandle Create(SafeLlamaModelHandle model, LLamaCon
 
 [RuntimeError](./llama.exceptions.runtimeerror.md)<br>
 
-### **GetLogits()**
+### **AddLoraAdapter(LoraAdapter, Single)**
+
+Add a LoRA adapter to this context
+
+```csharp
+public void AddLoraAdapter(LoraAdapter lora, float scale)
+```
+
+#### Parameters
+
+`lora` [LoraAdapter](./llama.native.loraadapter.md)<br>
+
+`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+#### Exceptions
+
+[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **RemoveLoraAdapter(LoraAdapter)**
+
+Remove a LoRA adapter from this context
+
+```csharp
+public bool RemoveLoraAdapter(LoraAdapter lora)
+```
+
+#### Parameters
+
+`lora` [LoraAdapter](./llama.native.loraadapter.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Indicates if the lora was in this context and was remove
+
+### **ClearLoraAdapters()**
+
+Remove all LoRA adapters from this context
+
+```csharp
+public void ClearLoraAdapters()
+```
+
+### **GetLogits(Int32)**
 
-Token logits obtained from the last call to llama_decode
- The logits for the last token are stored in the last row
+Token logits obtained from the last call to llama_decode.
+ The logits for the last token are stored in the last row.
+ Only tokens with `logits = true` requested are present.<br>
  Can be mutated in order to change the probabilities of the next token.<br>
  Rows: n_tokens<br>
  Cols: n_vocab
 
 ```csharp
-public Span<float> GetLogits()
+public Span<float> GetLogits(int numTokens)
 ```
 
+#### Parameters
+
+`numTokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The amount of tokens whose logits should be retrieved, in [numTokens X n_vocab] format.<br>
+ Tokens' order is based on their order in the LlamaBatch (so, first tokens are first, etc).<br>
+ This is helpful when requesting logits for many tokens in a sequence, or want to decode multiple sequences in one go.
+
 #### Returns
 
 [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
@@ -167,6 +293,42 @@ public Span<float> GetLogitsIth(int i)
 
 [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
 
+### **GetEmbeddingsIth(LLamaPos)**
+
+Get the embeddings for the ith sequence.
+ Equivalent to: llama_get_embeddings(ctx) + ctx-&gt;output_ids[i]*n_embd
+
+```csharp
+public Span<float> GetEmbeddingsIth(LLamaPos pos)
+```
+
+#### Parameters
+
+`pos` [LLamaPos](./llama.native.llamapos.md)<br>
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+A pointer to the first float in an embedding, length = ctx.EmbeddingSize
+
+### **GetEmbeddingsSeq(LLamaSeqId)**
+
+Get the embeddings for the a specific sequence.
+ Equivalent to: llama_get_embeddings(ctx) + ctx-&gt;output_ids[i]*n_embd
+
+```csharp
+public Span<float> GetEmbeddingsSeq(LLamaSeqId seq)
+```
+
+#### Parameters
+
+`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+A pointer to the first float in an embedding, length = ctx.EmbeddingSize
+
 ### **Tokenize(String, Boolean, Boolean, Encoding)**
 
 Convert the given text into tokens
@@ -218,6 +380,33 @@ A span to attempt to write into. If this is too small nothing will be written
 [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
 The size of this token. **nothing will be written** if this is larger than `dest`
 
+### **Synchronize()**
+
+Wait until all computations are finished. This is automatically done when using any of the functions to obtain computation results
+ and is not necessary to call it explicitly in most cases.
+
+```csharp
+public void Synchronize()
+```
+
+### **Encode(LLamaBatch)**
+
+Processes a batch of tokens with the encoder part of the encoder-decoder model. Stores the encoder output
+ internally for later use by the decoder cross-attention layers.
+
+```csharp
+public DecodeResult Encode(LLamaBatch batch)
+```
+
+#### Parameters
+
+`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
+
+#### Returns
+
+[DecodeResult](./llama.native.decoderesult.md)<br>
+0 = success <br>&lt; 0 = error (the KV cache state is restored to the state before this call)
+
 ### **Decode(LLamaBatch)**
 
 
@@ -236,49 +425,62 @@ public DecodeResult Decode(LLamaBatch batch)
 Positive return values does not mean a fatal error, but rather a warning:<br>
  - 0: success<br>
  - 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)<br>
- - &lt; 0: error<br>
+ - &lt; 0: error (the KV cache state is restored to the state before this call)<br>
+
+### **Decode(LLamaBatchEmbeddings)**
 
-### **Decode(List&lt;LLamaToken&gt;, LLamaSeqId, LLamaBatch, Int32&)**
 
-Decode a set of tokens in batch-size chunks.
 
 ```csharp
-internal ValueTuple<DecodeResult, int> Decode(List<LLamaToken> tokens, LLamaSeqId id, LLamaBatch batch, Int32& n_past)
+public DecodeResult Decode(LLamaBatchEmbeddings batch)
 ```
 
 #### Parameters
 
-`tokens` [List&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+`batch` [LLamaBatchEmbeddings](./llama.native.llamabatchembeddings.md)<br>
 
-`id` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+#### Returns
 
-`batch` [LLamaBatch](./llama.native.llamabatch.md)<br>
+[DecodeResult](./llama.native.decoderesult.md)<br>
+Positive return values does not mean a fatal error, but rather a warning:<br>
+ - 0: success<br>
+ - 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)<br>
+ - &lt; 0: error<br>
 
-`n_past` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
+### **GetStateSize()**
+
+Get the size of the state, when saved as bytes
+
+```csharp
+public UIntPtr GetStateSize()
+```
 
 #### Returns
 
-[ValueTuple&lt;DecodeResult, Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.valuetuple-2)<br>
-A tuple, containing the decode result and the number of tokens that have not been decoded yet.
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
-### **GetStateSize()**
+### **GetStateSize(LLamaSeqId)**
 
-Get the size of the state, when saved as bytes
+Get the size of the KV cache for a single sequence ID, when saved as bytes
 
 ```csharp
-public ulong GetStateSize()
+public UIntPtr GetStateSize(LLamaSeqId sequence)
 ```
 
+#### Parameters
+
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
 #### Returns
 
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 
-### **GetState(Byte*, UInt64)**
+### **GetState(Byte*, UIntPtr)**
 
 Get the raw state of this context, encoded as bytes. Data is written into the `dest` pointer.
 
 ```csharp
-public ulong GetState(Byte* dest, ulong size)
+public UIntPtr GetState(Byte* dest, UIntPtr size)
 ```
 
 #### Parameters
@@ -286,12 +488,12 @@ public ulong GetState(Byte* dest, ulong size)
 `dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
 Destination to write to
 
-`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`size` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 Number of bytes available to write to in dest (check required size with `GetStateSize()`)
 
 #### Returns
 
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 The number of bytes written to dest
 
 #### Exceptions
@@ -299,38 +501,36 @@ The number of bytes written to dest
 [ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
 Thrown if dest is too small
 
-### **GetState(IntPtr, UInt64)**
+### **GetState(Byte*, UIntPtr, LLamaSeqId)**
 
-Get the raw state of this context, encoded as bytes. Data is written into the `dest` pointer.
+Get the raw state of a single sequence from this context, encoded as bytes. Data is written into the `dest` pointer.
 
 ```csharp
-public ulong GetState(IntPtr dest, ulong size)
+public UIntPtr GetState(Byte* dest, UIntPtr size, LLamaSeqId sequence)
 ```
 
 #### Parameters
 
-`dest` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+`dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
 Destination to write to
 
-`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`size` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 Number of bytes available to write to in dest (check required size with `GetStateSize()`)
 
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+The sequence to get state data for
+
 #### Returns
 
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 The number of bytes written to dest
 
-#### Exceptions
-
-[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
-Thrown if dest is too small
-
-### **SetState(Byte*)**
+### **SetState(Byte*, UIntPtr)**
 
 Set the raw state of this context
 
 ```csharp
-public ulong SetState(Byte* src)
+public UIntPtr SetState(Byte* src, UIntPtr size)
 ```
 
 #### Parameters
@@ -338,56 +538,75 @@ public ulong SetState(Byte* src)
 `src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
 The pointer to read the state from
 
+`size` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
+Number of bytes that can be safely read from the pointer
+
 #### Returns
 
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 Number of bytes read from the src pointer
 
-### **SetState(IntPtr)**
+### **SetState(Byte*, UIntPtr, LLamaSeqId)**
 
-Set the raw state of this context
+Set the raw state of a single sequence
 
 ```csharp
-public ulong SetState(IntPtr src)
+public UIntPtr SetState(Byte* src, UIntPtr size, LLamaSeqId sequence)
 ```
 
 #### Parameters
 
-`src` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+`src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
 The pointer to read the state from
 
+`size` [UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
+Number of bytes that can be safely read from the pointer
+
+`sequence` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+Sequence ID to set
+
 #### Returns
 
-[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+[UIntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.uintptr)<br>
 Number of bytes read from the src pointer
 
-### **SetSeed(UInt32)**
+### **GetTimings()**
 
-Set the RNG seed
+Get performance information
 
 ```csharp
-public void SetSeed(uint seed)
+public LLamaPerfContextTimings GetTimings()
 ```
 
-#### Parameters
+#### Returns
 
-`seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+[LLamaPerfContextTimings](./llama.native.llamaperfcontexttimings.md)<br>
 
-### **SetThreads(UInt32, UInt32)**
+### **ResetTimings()**
 
-Set the number of threads used for decoding
+Reset all performance information for this context
 
 ```csharp
-public void SetThreads(uint threads, uint threadsBatch)
+public void ResetTimings()
 ```
 
-#### Parameters
+### **KvCacheUpdate()**
+
+Apply KV cache updates (such as K-shifts, defragmentation, etc.)
+
+```csharp
+public void KvCacheUpdate()
+```
+
+### **KvCacheDefrag()**
 
-`threads` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-n_threads is the number of threads used for generation (single token)
+Defragment the KV cache. This will be applied:
+ - lazily on next llama_decode()
+ - explicitly with llama_kv_self_update()
 
-`threadsBatch` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
+```csharp
+public void KvCacheDefrag()
+```
 
 ### **KvCacheGetDebugView(Int32)**
 
@@ -432,7 +651,7 @@ public int KvCacheCountTokens()
 
 ### **KvCacheClear()**
 
-Clear the KV cache
+Clear the KV cache - both cell info is erased and KV data is zeroed
 
 ```csharp
 public void KvCacheClear()
@@ -526,3 +745,23 @@ public void KvCacheSequenceDivide(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int
 `p1` [LLamaPos](./llama.native.llamapos.md)<br>
 
 `divisor` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **KvCacheMaxPosition(LLamaSeqId)**
+
+Returns the largest position present in the KV cache for the specified sequence
+
+```csharp
+public LLamaPos KvCacheMaxPosition(LLamaSeqId seq)
+```
+
+#### Parameters
+
+`seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
+
+#### Returns
+
+[LLamaPos](./llama.native.llamapos.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.safellamagrammarhandle.md b/docs/xmldocs/llama.native.safellamagrammarhandle.md
deleted file mode 100644
index 0a08687d4..000000000
--- a/docs/xmldocs/llama.native.safellamagrammarhandle.md
+++ /dev/null
@@ -1,123 +0,0 @@
-# SafeLLamaGrammarHandle
-
-Namespace: LLama.Native
-
-A safe reference to a `llama_grammar`
-
-```csharp
-public class SafeLLamaGrammarHandle : SafeLLamaHandleBase, System.IDisposable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
-
-## Properties
-
-### **IsInvalid**
-
-```csharp
-public bool IsInvalid { get; }
-```
-
-#### Property Value
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **IsClosed**
-
-```csharp
-public bool IsClosed { get; }
-```
-
-#### Property Value
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-## Methods
-
-### **ReleaseHandle()**
-
-```csharp
-protected bool ReleaseHandle()
-```
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Create(IReadOnlyList&lt;GrammarRule&gt;, UInt64)**
-
-Create a new llama_grammar
-
-```csharp
-public static SafeLLamaGrammarHandle Create(IReadOnlyList<GrammarRule> rules, ulong start_rule_index)
-```
-
-#### Parameters
-
-`rules` [IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
-A list of list of elements, each inner list makes up one grammar rule
-
-`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-The index (in the outer list) of the start rule
-
-#### Returns
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
-#### Exceptions
-
-[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
-
-### **Create(LLamaGrammarElement**, UInt64, UInt64)**
-
-Create a new llama_grammar
-
-```csharp
-public static SafeLLamaGrammarHandle Create(LLamaGrammarElement** rules, ulong nrules, ulong start_rule_index)
-```
-
-#### Parameters
-
-`rules` [LLamaGrammarElement**](./llama.native.llamagrammarelement**.md)<br>
-rules list, each rule is a list of rule elements (terminated by a LLamaGrammarElementType.END element)
-
-`nrules` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-total number of rules
-
-`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
-index of the start rule of the grammar
-
-#### Returns
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
-#### Exceptions
-
-[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
-
-### **Clone()**
-
-Create a copy of this grammar instance
-
-```csharp
-public SafeLLamaGrammarHandle Clone()
-```
-
-#### Returns
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
-### **AcceptToken(SafeLLamaContextHandle, LLamaToken)**
-
-Accepts the sampled token into the grammar
-
-```csharp
-public void AcceptToken(SafeLLamaContextHandle ctx, LLamaToken token)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
diff --git a/docs/xmldocs/llama.native.safellamahandlebase.md b/docs/xmldocs/llama.native.safellamahandlebase.md
index eccbff036..ab0622dfd 100644
--- a/docs/xmldocs/llama.native.safellamahandlebase.md
+++ b/docs/xmldocs/llama.native.safellamahandlebase.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # SafeLLamaHandleBase
 
 Namespace: LLama.Native
@@ -11,6 +15,14 @@ public abstract class SafeLLamaHandleBase : System.Runtime.InteropServices.SafeH
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md)<br>
 Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
 
+## Fields
+
+### **handle**
+
+```csharp
+protected IntPtr handle;
+```
+
 ## Properties
 
 ### **IsInvalid**
@@ -44,3 +56,7 @@ public string ToString()
 #### Returns
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.safellamamodelhandle.md b/docs/xmldocs/llama.native.safellamamodelhandle.md
index e6dd6e64a..7276357e2 100644
--- a/docs/xmldocs/llama.native.safellamamodelhandle.md
+++ b/docs/xmldocs/llama.native.safellamamodelhandle.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # SafeLlamaModelHandle
 
 Namespace: LLama.Native
@@ -9,25 +13,34 @@ public sealed class SafeLlamaModelHandle : SafeLLamaHandleBase, System.IDisposab
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Fields
+
+### **handle**
+
+```csharp
+protected IntPtr handle;
+```
 
 ## Properties
 
-### **VocabCount**
+### **RopeType**
 
-Total number of tokens in vocabulary of this model
+Get the rope (positional embedding) type for this model
 
 ```csharp
-public int VocabCount { get; }
+public LLamaRopeType RopeType { get; }
 ```
 
 #### Property Value
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[LLamaRopeType](./llama.native.llamaropetype.md)<br>
 
 ### **ContextSize**
 
-Total number of tokens in the context
+The number of tokens in the context that this model was trained for
 
 ```csharp
 public int ContextSize { get; }
@@ -85,231 +98,236 @@ public ulong ParameterCount { get; }
 
 [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
 
-### **Description**
+### **LayerCount**
 
-Get a description of this model
+Get the number of layers in this model
 
 ```csharp
-public string Description { get; }
+public int LayerCount { get; }
 ```
 
 #### Property Value
 
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **MetadataCount**
+### **HeadCount**
 
-Get the number of metadata key/value pairs
+Get the number of heads in this model
 
 ```csharp
-public int MetadataCount { get; }
+public int HeadCount { get; }
 ```
 
 #### Property Value
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **IsInvalid**
+### **KVHeadCount**
+
+Get the number of KV heads in this model
 
 ```csharp
-public bool IsInvalid { get; }
+public int KVHeadCount { get; }
 ```
 
 #### Property Value
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **IsClosed**
+### **HasEncoder**
+
+Returns true if the model contains an encoder that requires llama_encode() call
 
 ```csharp
-public bool IsClosed { get; }
+public bool HasEncoder { get; }
 ```
 
 #### Property Value
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-## Constructors
+### **HasDecoder**
 
-### **SafeLlamaModelHandle()**
+Returns true if the model contains a decoder that requires llama_decode() call
 
 ```csharp
-public SafeLlamaModelHandle()
+public bool HasDecoder { get; }
 ```
 
-## Methods
+#### Property Value
 
-### **ReleaseHandle()**
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsRecurrent**
+
+Returns true if the model is recurrent (like Mamba, RWKV, etc.)
 
 ```csharp
-protected bool ReleaseHandle()
+public bool IsRecurrent { get; }
 ```
 
-#### Returns
+#### Property Value
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **LoadFromFile(String, LLamaModelParams)**
+### **Description**
 
-Load a model from the given file path into memory
+Get a description of this model
 
 ```csharp
-public static SafeLlamaModelHandle LoadFromFile(string modelPath, LLamaModelParams lparams)
+public string Description { get; }
 ```
 
-#### Parameters
+#### Property Value
 
-`modelPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`lparams` [LLamaModelParams](./llama.native.llamamodelparams.md)<br>
+### **MetadataCount**
 
-#### Returns
+Get the number of metadata key/value pairs
 
-[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+```csharp
+public int MetadataCount { get; }
+```
 
-#### Exceptions
+#### Property Value
 
-[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **llama_model_apply_lora_from_file(SafeLlamaModelHandle, String, Single, String, Int32)**
+### **Vocab**
 
-Apply a LoRA adapter to a loaded model
- path_base_model is the path to a higher quality model to use as a base for
- the layers modified by the adapter. Can be NULL to use the current loaded model.
- The model needs to be reloaded before applying a new adapter, otherwise the adapter
- will be applied on top of the previous one
+Get the vocabulary of this model
 
 ```csharp
-public static int llama_model_apply_lora_from_file(SafeLlamaModelHandle model_ptr, string path_lora, float scale, string path_base_model, int n_threads)
+public Vocabulary Vocab { get; }
 ```
 
-#### Parameters
+#### Property Value
 
-`model_ptr` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+[Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
 
-`path_lora` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+### **IsInvalid**
 
-`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+```csharp
+public bool IsInvalid { get; }
+```
 
-`path_base_model` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+#### Property Value
 
-`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-#### Returns
+### **IsClosed**
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Returns 0 on success
+```csharp
+public bool IsClosed { get; }
+```
 
-### **llama_model_meta_val_str(SafeLlamaModelHandle, Byte*, Byte*, Int64)**
+#### Property Value
 
-Get metadata value as a string by key name
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-```csharp
-public static int llama_model_meta_val_str(SafeLlamaModelHandle model, Byte* key, Byte* buf, long buf_size)
-```
+## Constructors
 
-#### Parameters
+### **SafeLlamaModelHandle()**
 
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+```csharp
+public SafeLlamaModelHandle()
+```
 
-`key` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+## Methods
 
-`buf` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+### **ReleaseHandle()**
 
-`buf_size` [Int64](https://docs.microsoft.com/en-us/dotnet/api/system.int64)<br>
+```csharp
+protected bool ReleaseHandle()
+```
 
 #### Returns
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-The length of the string on success, or -1 on failure
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **ApplyLoraFromFile(String, Single, String, Nullable&lt;Int32&gt;)**
+### **LoadFromFile(String, LLamaModelParams)**
 
-Apply a LoRA adapter to a loaded model
+Load a model from the given file path into memory
 
 ```csharp
-public void ApplyLoraFromFile(string lora, float scale, string modelBase, Nullable<int> threads)
+public static SafeLlamaModelHandle LoadFromFile(string modelPath, LLamaModelParams lparams)
 ```
 
 #### Parameters
 
-`lora` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`modelPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+`lparams` [LLamaModelParams](./llama.native.llamamodelparams.md)<br>
 
-`modelBase` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-A path to a higher quality model to use as a base for the layers modified by the
- adapter. Can be NULL to use the current loaded model.
+#### Returns
 
-`threads` [Nullable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
 
 #### Exceptions
 
 [RuntimeError](./llama.exceptions.runtimeerror.md)<br>
 
-### **TokenToSpan(LLamaToken, Span&lt;Byte&gt;)**
+### **LoadLoraFromFile(String)**
 
-Convert a single llama token into bytes
+Load a LoRA adapter from file. The adapter will be associated with this model but will not be applied
 
 ```csharp
-public uint TokenToSpan(LLamaToken token, Span<byte> dest)
+public LoraAdapter LoadLoraFromFile(string path)
 ```
 
 #### Parameters
 
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
-Token to decode
-
-`dest` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-A span to attempt to write into. If this is too small nothing will be written
+`path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
 #### Returns
 
-[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
-The size of this token. **nothing will be written** if this is larger than `dest`
-
-### **TokensToSpan(IReadOnlyList&lt;LLamaToken&gt;, Span&lt;Char&gt;, Encoding)**
+[LoraAdapter](./llama.native.loraadapter.md)<br>
 
-#### Caution
+#### Exceptions
 
-Use a StreamingTokenDecoder instead
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
 
----
+### **TokenToSpan(LLamaToken, Span&lt;Byte&gt;, Int32, Boolean)**
 
-Convert a sequence of tokens into characters.
+Convert a single llama token into bytes
 
 ```csharp
-internal Span<char> TokensToSpan(IReadOnlyList<LLamaToken> tokens, Span<char> dest, Encoding encoding)
+public uint TokenToSpan(LLamaToken token, Span<byte> dest, int lstrip, bool special)
 ```
 
 #### Parameters
 
-`tokens` [IReadOnlyList&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+Token to decode
 
-`dest` [Span&lt;Char&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+`dest` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+A span to attempt to write into. If this is too small nothing will be written
 
-`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+`lstrip` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+User can skip up to 'lstrip' leading spaces before copying (useful when encoding/decoding multiple tokens with 'add_space_prefix')
+
+`special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+If true, special characters will be converted to text. If false they will be invisible.
 
 #### Returns
 
-[Span&lt;Char&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-The section of the span which has valid data in it.
- If there was insufficient space in the output span this will be
- filled with as many characters as possible, starting from the _last_ token.
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+The size of this token. **nothing will be written** if this is larger than `dest`
 
 ### **Tokenize(String, Boolean, Boolean, Encoding)**
 
 Convert a string of text into tokens
 
 ```csharp
-public LLamaToken[] Tokenize(string text, bool add_bos, bool special, Encoding encoding)
+public LLamaToken[] Tokenize(string text, bool addBos, bool special, Encoding encoding)
 ```
 
 #### Parameters
 
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+`addBos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
 `special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.
@@ -336,30 +354,30 @@ public SafeLLamaContextHandle CreateContext(LLamaContextParams params)
 
 [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-### **MetadataKeyByIndex(Int32)**
+### **MetadataValueByKey(String)**
 
-Get the metadata key for the given index
+Get the metadata value for the given key
 
 ```csharp
-public Nullable<Memory<byte>> MetadataKeyByIndex(int index)
+public Nullable<Memory<byte>> MetadataValueByKey(string key)
 ```
 
 #### Parameters
 
-`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-The index to get
+`key` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The key to fetch
 
 #### Returns
 
 [Nullable&lt;Memory&lt;Byte&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
-The key, null if there is no such key or if the buffer was too small
+The value, null if there is no such key
 
-### **MetadataValueByIndex(Int32)**
+### **MetadataKeyByIndex(Int32)**
 
-Get the metadata value for the given index
+Get the metadata key for the given index
 
 ```csharp
-public Nullable<Memory<byte>> MetadataValueByIndex(int index)
+public Nullable<Memory<byte>> MetadataKeyByIndex(int index)
 ```
 
 #### Parameters
@@ -370,54 +388,47 @@ The index to get
 #### Returns
 
 [Nullable&lt;Memory&lt;Byte&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
-The value, null if there is no such value or if the buffer was too small
-
-### **ReadMetadata()**
-
-```csharp
-internal IReadOnlyDictionary<string, string> ReadMetadata()
-```
-
-#### Returns
+The key, null if there is no such key or if the buffer was too small
 
-[IReadOnlyDictionary&lt;String, String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlydictionary-2)<br>
+### **MetadataValueByIndex(Int32)**
 
-### **&lt;llama_model_meta_key_by_index&gt;g__llama_model_meta_key_by_index_native|23_0(SafeLlamaModelHandle, Int32, Byte*, Int64)**
+Get the metadata value for the given index
 
 ```csharp
-internal static int <llama_model_meta_key_by_index>g__llama_model_meta_key_by_index_native|23_0(SafeLlamaModelHandle model, int index, Byte* buf, long buf_size)
+public Nullable<Memory<byte>> MetadataValueByIndex(int index)
 ```
 
 #### Parameters
 
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
 `index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`buf` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
-
-`buf_size` [Int64](https://docs.microsoft.com/en-us/dotnet/api/system.int64)<br>
+The index to get
 
 #### Returns
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[Nullable&lt;Memory&lt;Byte&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1)<br>
+The value, null if there is no such value or if the buffer was too small
+
+### **GetTemplate(String, Boolean)**
 
-### **&lt;llama_model_meta_val_str_by_index&gt;g__llama_model_meta_val_str_by_index_native|24_0(SafeLlamaModelHandle, Int32, Byte*, Int64)**
+Get the default chat template. Returns nullptr if not available
+ If name is NULL, returns the default chat template
 
 ```csharp
-internal static int <llama_model_meta_val_str_by_index>g__llama_model_meta_val_str_by_index_native|24_0(SafeLlamaModelHandle model, int index, Byte* buf, long buf_size)
+public string GetTemplate(string name, bool strict)
 ```
 
 #### Parameters
 
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+`name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The name of the template, in case there are many or differently named. Set to 'null' for the default behaviour of finding an appropriate match.
 
-`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`strict` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Setting this to true will cause the call to throw if no valid templates are found.
 
-`buf` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+#### Returns
 
-`buf_size` [Int64](https://docs.microsoft.com/en-us/dotnet/api/system.int64)<br>
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-#### Returns
+---
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.safellamasamplerchainhandle.md b/docs/xmldocs/llama.native.safellamasamplerchainhandle.md
new file mode 100644
index 000000000..7eece9ec7
--- /dev/null
+++ b/docs/xmldocs/llama.native.safellamasamplerchainhandle.md
@@ -0,0 +1,573 @@
+[`< Back`](./)
+
+---
+
+# SafeLLamaSamplerChainHandle
+
+Namespace: LLama.Native
+
+A chain of sampler stages that can be used to select tokens from logits.
+
+```csharp
+public sealed class SafeLLamaSamplerChainHandle : SafeLLamaHandleBase, System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLLamaSamplerChainHandle](./llama.native.safellamasamplerchainhandle.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+**Remarks:**
+
+Wraps a handle returned from `llama_sampler_chain_init`. Other samplers are owned by this chain and are never directly exposed.
+
+## Fields
+
+### **handle**
+
+```csharp
+protected IntPtr handle;
+```
+
+## Properties
+
+### **Count**
+
+Get the number of samplers in this chain
+
+```csharp
+public int Count { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **IsInvalid**
+
+```csharp
+public bool IsInvalid { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsClosed**
+
+```csharp
+public bool IsClosed { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Constructors
+
+### **SafeLLamaSamplerChainHandle()**
+
+```csharp
+public SafeLLamaSamplerChainHandle()
+```
+
+## Methods
+
+### **ReleaseHandle()**
+
+```csharp
+protected bool ReleaseHandle()
+```
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Apply(LLamaTokenDataArrayNative&)**
+
+Apply this sampler to a set of candidates
+
+```csharp
+public void Apply(LLamaTokenDataArrayNative& candidates)
+```
+
+#### Parameters
+
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+
+### **Sample(SafeLLamaContextHandle, Int32)**
+
+Sample and accept a token from the idx-th output of the last evaluation. Shorthand for:
+
+```csharp
+var logits = ctx.GetLogitsIth(idx);
+var token_data_array = LLamaTokenDataArray.Create(logits);
+using LLamaTokenDataArrayNative.Create(token_data_array, out var native_token_data);
+sampler_chain.Apply(native_token_data);
+var token = native_token_data.Data.Span[native_token_data.Selected];
+sampler_chain.Accept(token);
+return token;
+```
+
+```csharp
+public LLamaToken Sample(SafeLLamaContextHandle context, int index)
+```
+
+#### Parameters
+
+`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[LLamaToken](./llama.native.llamatoken.md)<br>
+
+### **Reset()**
+
+Reset the state of this sampler
+
+```csharp
+public void Reset()
+```
+
+### **Accept(LLamaToken)**
+
+Accept a token and update the internal state of this sampler
+
+```csharp
+public void Accept(LLamaToken token)
+```
+
+#### Parameters
+
+`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+
+### **GetName(Int32)**
+
+Get the name of the sampler at the given index
+
+```csharp
+public string GetName(int index)
+```
+
+#### Parameters
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **GetSeed(Int32)**
+
+Get the seed of the sampler at the given index if applicable. returns LLAMA_DEFAULT_SEED otherwise
+
+```csharp
+public uint GetSeed(int index)
+```
+
+#### Parameters
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+### **Create(LLamaSamplerChainParams)**
+
+Create a new sampler chain
+
+```csharp
+public static SafeLLamaSamplerChainHandle Create(LLamaSamplerChainParams params)
+```
+
+#### Parameters
+
+`params` [LLamaSamplerChainParams](./llama.native.llamasamplerchainparams.md)<br>
+
+#### Returns
+
+[SafeLLamaSamplerChainHandle](./llama.native.safellamasamplerchainhandle.md)<br>
+
+### **AddClone(SafeLLamaSamplerChainHandle, Int32)**
+
+Clone a sampler stage from another chain and add it to this chain
+
+```csharp
+public void AddClone(SafeLLamaSamplerChainHandle src, int index)
+```
+
+#### Parameters
+
+`src` [SafeLLamaSamplerChainHandle](./llama.native.safellamasamplerchainhandle.md)<br>
+The chain to clone a stage from
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The index of the stage to clone
+
+### **Remove(Int32)**
+
+Remove a sampler stage from this chain
+
+```csharp
+public void Remove(int index)
+```
+
+#### Parameters
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+
+### **AddCustom&lt;TSampler&gt;(TSampler)**
+
+Add a custom sampler stage
+
+```csharp
+public void AddCustom<TSampler>(TSampler sampler)
+```
+
+#### Type Parameters
+
+`TSampler`<br>
+
+#### Parameters
+
+`sampler` TSampler<br>
+
+### **AddGreedySampler()**
+
+Add a sampler which picks the most likely token.
+
+```csharp
+public void AddGreedySampler()
+```
+
+### **AddDistributionSampler(UInt32)**
+
+Add a sampler which picks from the probability distribution of all tokens
+
+```csharp
+public void AddDistributionSampler(uint seed)
+```
+
+#### Parameters
+
+`seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+### **AddMirostat1Sampler(Int32, UInt32, Single, Single, Int32)**
+
+Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+
+```csharp
+public void AddMirostat1Sampler(int vocabCount, uint seed, float tau, float eta, int m)
+```
+
+#### Parameters
+
+`vocabCount` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+
+`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+
+`m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
+
+### **AddMirostat2Sampler(UInt32, Single, Single)**
+
+Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+
+```csharp
+public void AddMirostat2Sampler(uint seed, float tau, float eta)
+```
+
+#### Parameters
+
+`seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+
+`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+
+### **AddTopK(Int32)**
+
+Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+
+```csharp
+public void AddTopK(int k)
+```
+
+#### Parameters
+
+`k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+**Remarks:**
+
+Setting k &lt;= 0 makes this a noop
+
+### **AddTopNSigma(Single)**
+
+Top n sigma sampling as described in academic paper "Top-nσ: Not All Logits Are You Need" https://arxiv.org/pdf/2411.07641
+
+```csharp
+public void AddTopNSigma(float n)
+```
+
+#### Parameters
+
+`n` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **AddTopP(Single, IntPtr)**
+
+Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+
+```csharp
+public void AddTopP(float p, IntPtr minKeep)
+```
+
+#### Parameters
+
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`minKeep` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+### **AddMinP(Single, IntPtr)**
+
+Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
+
+```csharp
+public void AddMinP(float p, IntPtr minKeep)
+```
+
+#### Parameters
+
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`minKeep` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+### **AddTypical(Single, IntPtr)**
+
+Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
+
+```csharp
+public void AddTypical(float p, IntPtr minKeep)
+```
+
+#### Parameters
+
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`minKeep` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+### **AddTemperature(Single)**
+
+Apply temperature to the logits.
+ If temperature is less than zero the maximum logit is left unchanged and the rest are set to -infinity
+
+```csharp
+public void AddTemperature(float t)
+```
+
+#### Parameters
+
+`t` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **AddDynamicTemperature(Single, Single, Single)**
+
+Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.
+
+```csharp
+public void AddDynamicTemperature(float t, float delta, float exponent)
+```
+
+#### Parameters
+
+`t` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`delta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`exponent` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **AddXTC(Single, Single, Int32, UInt32)**
+
+XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335
+
+```csharp
+public void AddXTC(float p, float t, int minKeep, uint seed)
+```
+
+#### Parameters
+
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`t` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`minKeep` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+### **AddFillInMiddleInfill(SafeLlamaModelHandle)**
+
+This sampler is meant to be used for fill-in-the-middle infilling, after top_k + top_p sampling
+ <br>
+ 1. if the sum of the EOG probs times the number of candidates is higher than the sum of the other probs -&gt; pick EOG<br>
+ 2. combine probs of tokens that have the same prefix<br><br>
+ example:<br><br>
+ - before:<br>
+ "abc": 0.5<br>
+ "abcd": 0.2<br>
+ "abcde": 0.1<br>
+ "dummy": 0.1<br><br>
+ - after:<br>
+ "abc": 0.8<br>
+ "dummy": 0.1<br><br>
+ 3. discard non-EOG tokens with low prob<br>
+ 4. if no tokens are left -&gt; pick EOT
+
+```csharp
+public void AddFillInMiddleInfill(SafeLlamaModelHandle model)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+### **AddGrammar(SafeLlamaModelHandle, String, String)**
+
+Create a sampler which makes tokens impossible unless they match the grammar.
+
+```csharp
+public void AddGrammar(SafeLlamaModelHandle model, string grammar, string root)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+The model that this grammar will be used with
+
+`grammar` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`root` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Root rule of the grammar
+
+### **AddGrammar(Vocabulary, String, String)**
+
+Create a sampler which makes tokens impossible unless they match the grammar.
+
+```csharp
+public void AddGrammar(Vocabulary vocab, string grammar, string root)
+```
+
+#### Parameters
+
+`vocab` [Vocabulary](./llama.native.safellamamodelhandle.vocabulary.md)<br>
+The vocabulary that this grammar will be used with
+
+`grammar` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`root` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Root rule of the grammar
+
+### **AddLazyGrammar(SafeLlamaModelHandle, String, String, ReadOnlySpan&lt;String&gt;, ReadOnlySpan&lt;LLamaToken&gt;)**
+
+Create a sampler using lazy grammar sampling: https://github.com/ggerganov/llama.cpp/pull/9639
+
+```csharp
+public void AddLazyGrammar(SafeLlamaModelHandle model, string grammar, string root, ReadOnlySpan<string> patterns, ReadOnlySpan<LLamaToken> triggerTokens)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+`grammar` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Grammar in GBNF form
+
+`root` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Root rule of the grammar
+
+`patterns` [ReadOnlySpan&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+A list of patterns that will trigger the grammar sampler. Pattern will be matched from the start of the generation output, and grammar sampler will be fed content starting from its first match group.
+
+`triggerTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+A list of tokens that will trigger the grammar sampler. Grammar sampler will be fed content starting from the trigger token included..
+
+### **AddPenalties(Int32, Single, Single, Single)**
+
+Create a sampler that applies various repetition penalties.
+ 
+ Avoid using on the full vocabulary as searching for repeated tokens can become slow. For example, apply top-k or top-p sampling first.
+
+```csharp
+public void AddPenalties(int penaltyCount, float repeat, float freq, float presence)
+```
+
+#### Parameters
+
+`penaltyCount` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+How many tokens of history to consider when calculating penalties
+
+`repeat` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+Repetition penalty
+
+`freq` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+Frequency penalty
+
+`presence` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+Presence penalty
+
+### **AddDry(SafeLlamaModelHandle, ReadOnlySpan&lt;String&gt;, Single, Single, Int32, Int32)**
+
+DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677.
+ Porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982
+
+```csharp
+public void AddDry(SafeLlamaModelHandle model, ReadOnlySpan<string> sequenceBreakers, float multiplier, float base, int allowedLength, int penaltyLastN)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+The model this sampler will be used with
+
+`sequenceBreakers` [ReadOnlySpan&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+
+`multiplier` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+penalty multiplier, 0.0 = disabled
+
+`base` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+exponential base
+
+`allowedLength` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+repeated sequences longer than this are penalized
+
+`penaltyLastN` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+how many tokens to scan for repetitions (0 = entire context)
+
+### **AddLogitBias(Int32, Span&lt;LLamaLogitBias&gt;)**
+
+Create a sampler that applies a bias directly to the logits
+
+```csharp
+public void AddLogitBias(int vocabSize, Span<LLamaLogitBias> biases)
+```
+
+#### Parameters
+
+`vocabSize` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`biases` [Span&lt;LLamaLogitBias&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.safellavaimageembedhandle.md b/docs/xmldocs/llama.native.safellavaimageembedhandle.md
index 741c5acf1..7da2ff7c3 100644
--- a/docs/xmldocs/llama.native.safellavaimageembedhandle.md
+++ b/docs/xmldocs/llama.native.safellavaimageembedhandle.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # SafeLlavaImageEmbedHandle
 
 Namespace: LLama.Native
@@ -9,10 +13,55 @@ public sealed class SafeLlavaImageEmbedHandle : SafeLLamaHandleBase, System.IDis
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Fields
+
+### **handle**
+
+```csharp
+protected IntPtr handle;
+```
 
 ## Properties
 
+### **Model**
+
+Get the model used to create this image embedding
+
+```csharp
+public SafeLlavaModelHandle Model { get; private set; }
+```
+
+#### Property Value
+
+[SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
+
+### **EmbeddingDimensions**
+
+Get the number of dimensions in an embedding
+
+```csharp
+public int EmbeddingDimensions { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **PatchCount**
+
+Get the number of "patches" in an image embedding
+
+```csharp
+public int PatchCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **IsInvalid**
 
 ```csharp
@@ -33,6 +82,14 @@ public bool IsClosed { get; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
+## Constructors
+
+### **SafeLlavaImageEmbedHandle()**
+
+```csharp
+public SafeLlavaImageEmbedHandle()
+```
+
 ## Methods
 
 ### **CreateFromFileName(SafeLlavaModelHandle, LLamaContext, String)**
@@ -40,18 +97,52 @@ public bool IsClosed { get; }
 Create an image embed from an image file
 
 ```csharp
-public static SafeLlavaImageEmbedHandle CreateFromFileName(SafeLlavaModelHandle ctxLlava, LLamaContext ctxLlama, string image)
+public static SafeLlavaImageEmbedHandle CreateFromFileName(SafeLlavaModelHandle clip, LLamaContext ctx, string image)
 ```
 
 #### Parameters
 
-`ctxLlava` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
+`clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
 
-`ctxLlama` [LLamaContext](./llama.llamacontext.md)<br>
+`ctx` [LLamaContext](./llama.llamacontext.md)<br>
 
 `image` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 Path to the image file. Supported formats:
- JPGPNGBMPTGA
+
+- 
+- 
+- 
+-
+
+#### Returns
+
+[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+
+#### Exceptions
+
+[InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
+
+### **CreateFromFileName(SafeLlavaModelHandle, String, Int32)**
+
+Create an image embed from an image file
+
+```csharp
+public static SafeLlavaImageEmbedHandle CreateFromFileName(SafeLlavaModelHandle clip, string image, int threads)
+```
+
+#### Parameters
+
+`clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
+
+`image` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Path to the image file. Supported formats:
+
+- 
+- 
+- 
+-
+
+`threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
 #### Returns
 
@@ -66,18 +157,48 @@ Path to the image file. Supported formats:
 Create an image embed from the bytes of an image.
 
 ```csharp
-public static SafeLlavaImageEmbedHandle CreateFromMemory(SafeLlavaModelHandle ctxLlava, LLamaContext ctxLlama, Byte[] image)
+public static SafeLlavaImageEmbedHandle CreateFromMemory(SafeLlavaModelHandle clip, LLamaContext ctx, Byte[] image)
 ```
 
 #### Parameters
 
-`ctxLlava` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
+`clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
+
+`ctx` [LLamaContext](./llama.llamacontext.md)<br>
+
+`image` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
+Image bytes. Supported formats:
+
+- 
+- 
+- 
+-
+
+#### Returns
+
+[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+
+### **CreateFromMemory(SafeLlavaModelHandle, Byte[], Int32)**
+
+Create an image embed from the bytes of an image.
 
-`ctxLlama` [LLamaContext](./llama.llamacontext.md)<br>
+```csharp
+public static SafeLlavaImageEmbedHandle CreateFromMemory(SafeLlavaModelHandle clip, Byte[] image, int threads)
+```
+
+#### Parameters
+
+`clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
 
 `image` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
 Image bytes. Supported formats:
- JPGPNGBMPTGA
+
+- 
+- 
+- 
+-
+
+`threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
 #### Returns
 
@@ -92,3 +213,21 @@ protected bool ReleaseHandle()
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GetEmbedding(Span&lt;Single&gt;, Int32)**
+
+Copy the embeddings data to the destination span
+
+```csharp
+public void GetEmbedding(Span<float> dest, int index)
+```
+
+#### Parameters
+
+`dest` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.safellavamodelhandle.md b/docs/xmldocs/llama.native.safellavamodelhandle.md
index 4f8179f32..3778e9aa4 100644
--- a/docs/xmldocs/llama.native.safellavamodelhandle.md
+++ b/docs/xmldocs/llama.native.safellavamodelhandle.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # SafeLlavaModelHandle
 
 Namespace: LLama.Native
@@ -9,10 +13,43 @@ public sealed class SafeLlavaModelHandle : SafeLLamaHandleBase, System.IDisposab
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Fields
+
+### **handle**
+
+```csharp
+protected IntPtr handle;
+```
 
 ## Properties
 
+### **EmbeddingDimensions**
+
+Get the number of dimensions in an embedding
+
+```csharp
+public int EmbeddingDimensions { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **PatchCount**
+
+Get the number of "patches" in an image embedding
+
+```csharp
+public int PatchCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **IsInvalid**
 
 ```csharp
@@ -33,6 +70,14 @@ public bool IsClosed { get; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
+## Constructors
+
+### **SafeLlavaModelHandle()**
+
+```csharp
+public SafeLlavaModelHandle()
+```
+
 ## Methods
 
 ### **ReleaseHandle()**
@@ -70,7 +115,7 @@ SafeHandle of the Clip Model
 
 [InvalidOperationException](https://docs.microsoft.com/en-us/dotnet/api/system.invalidoperationexception)<br>
 
-[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+[LoadWeightsFailedException](./llama.exceptions.loadweightsfailedexception.md)<br>
 
 ### **CreateImageEmbeddings(LLamaContext, String)**
 
@@ -93,6 +138,27 @@ Image filename (it supports jpeg format only)
 [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
 return the SafeHandle of these embeddings
 
+### **CreateImageEmbeddings(String, Int32)**
+
+Create the Image Embeddings.
+
+```csharp
+public SafeLlavaImageEmbedHandle CreateImageEmbeddings(string image, int threads)
+```
+
+#### Parameters
+
+`image` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Image in binary format (it supports jpeg format only)
+
+`threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Number of threads to use
+
+#### Returns
+
+[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+return the SafeHandle of these embeddings
+
 ### **CreateImageEmbeddings(LLamaContext, Byte[])**
 
 Create the Image Embeddings.
@@ -114,6 +180,27 @@ Image in binary format (it supports jpeg format only)
 [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
 return the SafeHandle of these embeddings
 
+### **CreateImageEmbeddings(Byte[], Int32)**
+
+Create the Image Embeddings.
+
+```csharp
+public SafeLlavaImageEmbedHandle CreateImageEmbeddings(Byte[] image, int threads)
+```
+
+#### Parameters
+
+`image` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
+Image in binary format (it supports jpeg format only)
+
+`threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Number of threads to use
+
+#### Returns
+
+[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
+return the SafeHandle of these embeddings
+
 ### **EvalImageEmbed(LLamaContext, SafeLlavaImageEmbedHandle, Int32&)**
 
 Evaluates the image embeddings.
@@ -136,3 +223,7 @@ The current embeddings to evaluate
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 True on success
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.systeminfo.md b/docs/xmldocs/llama.native.systeminfo.md
new file mode 100644
index 000000000..666d55140
--- /dev/null
+++ b/docs/xmldocs/llama.native.systeminfo.md
@@ -0,0 +1,201 @@
+[`< Back`](./)
+
+---
+
+# SystemInfo
+
+Namespace: LLama.Native
+
+Operating system information.
+
+```csharp
+public class SystemInfo : System.IEquatable`1[[LLama.Native.SystemInfo, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [SystemInfo](./llama.native.systeminfo.md)<br>
+Implements [IEquatable&lt;SystemInfo&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **EqualityContract**
+
+```csharp
+protected Type EqualityContract { get; }
+```
+
+#### Property Value
+
+[Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
+
+### **OSPlatform**
+
+
+
+```csharp
+public OSPlatform OSPlatform { get; set; }
+```
+
+#### Property Value
+
+[OSPlatform](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.osplatform)<br>
+
+### **CudaMajorVersion**
+
+
+
+```csharp
+public int CudaMajorVersion { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **VulkanVersion**
+
+
+
+```csharp
+public string VulkanVersion { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Constructors
+
+### **SystemInfo(OSPlatform, Int32, String)**
+
+Operating system information.
+
+```csharp
+public SystemInfo(OSPlatform OSPlatform, int CudaMajorVersion, string VulkanVersion)
+```
+
+#### Parameters
+
+`OSPlatform` [OSPlatform](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.osplatform)<br>
+
+`CudaMajorVersion` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`VulkanVersion` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **SystemInfo(SystemInfo)**
+
+```csharp
+protected SystemInfo(SystemInfo original)
+```
+
+#### Parameters
+
+`original` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+## Methods
+
+### **Get()**
+
+Get the system information of the current machine.
+
+```csharp
+public static SystemInfo Get()
+```
+
+#### Returns
+
+[SystemInfo](./llama.native.systeminfo.md)<br>
+
+#### Exceptions
+
+[PlatformNotSupportedException](https://docs.microsoft.com/en-us/dotnet/api/system.platformnotsupportedexception)<br>
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **PrintMembers(StringBuilder)**
+
+```csharp
+protected bool PrintMembers(StringBuilder builder)
+```
+
+#### Parameters
+
+`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GetHashCode()**
+
+```csharp
+public int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Equals(Object)**
+
+```csharp
+public bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(SystemInfo)**
+
+```csharp
+public bool Equals(SystemInfo other)
+```
+
+#### Parameters
+
+`other` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **&lt;Clone&gt;$()**
+
+```csharp
+public SystemInfo <Clone>$()
+```
+
+#### Returns
+
+[SystemInfo](./llama.native.systeminfo.md)<br>
+
+### **Deconstruct(OSPlatform&, Int32&, String&)**
+
+```csharp
+public void Deconstruct(OSPlatform& OSPlatform, Int32& CudaMajorVersion, String& VulkanVersion)
+```
+
+#### Parameters
+
+`OSPlatform` [OSPlatform&](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.osplatform&)<br>
+
+`CudaMajorVersion` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
+
+`VulkanVersion` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.native.unknownnativelibrary.md b/docs/xmldocs/llama.native.unknownnativelibrary.md
new file mode 100644
index 000000000..c39733fc9
--- /dev/null
+++ b/docs/xmldocs/llama.native.unknownnativelibrary.md
@@ -0,0 +1,59 @@
+[`< Back`](./)
+
+---
+
+# UnknownNativeLibrary
+
+Namespace: LLama.Native
+
+When you are using .NET standard2.0, dynamic native library loading is not supported.
+ This class will be returned in [NativeLibraryConfig.DryRun(INativeLibrary&)](./llama.native.nativelibraryconfig.md#dryruninativelibrary&).
+
+```csharp
+public class UnknownNativeLibrary : LLama.Abstractions.INativeLibrary
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [UnknownNativeLibrary](./llama.native.unknownnativelibrary.md)<br>
+Implements [INativeLibrary](./llama.abstractions.inativelibrary.md)
+
+## Properties
+
+### **Metadata**
+
+```csharp
+public NativeLibraryMetadata Metadata { get; }
+```
+
+#### Property Value
+
+[NativeLibraryMetadata](./llama.native.nativelibrarymetadata.md)<br>
+
+## Constructors
+
+### **UnknownNativeLibrary()**
+
+```csharp
+public UnknownNativeLibrary()
+```
+
+## Methods
+
+### **Prepare(SystemInfo, LLamaLogCallback)**
+
+```csharp
+public IEnumerable<string> Prepare(SystemInfo systemInfo, LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`systemInfo` [SystemInfo](./llama.native.systeminfo.md)<br>
+
+`logCallback` [LLamaLogCallback](./llama.native.nativelogconfig.llamalogcallback.md)<br>
+
+#### Returns
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.quantizer.md b/docs/xmldocs/llama.quantizer.md
deleted file mode 100644
index e1220b76d..000000000
--- a/docs/xmldocs/llama.quantizer.md
+++ /dev/null
@@ -1,63 +0,0 @@
-# Quantizer
-
-Namespace: LLama
-
-```csharp
-public class Quantizer
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Quantizer](./llama.quantizer.md)
-
-## Constructors
-
-### **Quantizer()**
-
-```csharp
-public Quantizer()
-```
-
-## Methods
-
-### **Quantize(String, String, LLamaFtype, Int32, Boolean)**
-
-```csharp
-public static bool Quantize(string srcFileName, string dstFilename, LLamaFtype ftype, int nthread, bool printInfo)
-```
-
-#### Parameters
-
-`srcFileName` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`dstFilename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`ftype` [LLamaFtype](./llama.native.llamaftype.md)<br>
-
-`nthread` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`printInfo` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Quantize(String, String, String, Int32, Boolean)**
-
-```csharp
-public static bool Quantize(string srcFileName, string dstFilename, string ftype, int nthread, bool printInfo)
-```
-
-#### Parameters
-
-`srcFileName` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`dstFilename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`ftype` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`nthread` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`printInfo` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
diff --git a/docs/xmldocs/llama.sampling.basesamplingpipeline.md b/docs/xmldocs/llama.sampling.basesamplingpipeline.md
index 9ebadc012..498c258f9 100644
--- a/docs/xmldocs/llama.sampling.basesamplingpipeline.md
+++ b/docs/xmldocs/llama.sampling.basesamplingpipeline.md
@@ -1,103 +1,94 @@
+[`< Back`](./)
+
+---
+
 # BaseSamplingPipeline
 
 Namespace: LLama.Sampling
 
-Base class for implementing custom sampling pipelines. This provides a helpful framework for implementing `ISamplingPipeline`.
-
 ```csharp
 public abstract class BaseSamplingPipeline : ISamplingPipeline, System.IDisposable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [BaseSamplingPipeline](./llama.sampling.basesamplingpipeline.md)<br>
-Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
-## Properties
+## Constructors
 
-### **Grammar**
+### **BaseSamplingPipeline()**
 
-Grammar to constrain valid tokens
+Create a new sampler wrapping a llama.cpp sampler chain
 
 ```csharp
-public SafeLLamaGrammarHandle Grammar { get; set; }
+public BaseSamplingPipeline()
 ```
 
-#### Property Value
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
 ## Methods
 
-### **Sample(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;LLamaToken&gt;)**
+### **CreateChain(SafeLLamaContextHandle)**
+
+Create a sampling chain. This will be called once, the base class will automatically dispose the chain.
 
 ```csharp
-public LLamaToken Sample(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
+protected abstract SafeLLamaSamplerChainHandle CreateChain(SafeLLamaContextHandle context)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+#### Returns
 
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+[SafeLLamaSamplerChainHandle](./llama.native.safellamasamplerchainhandle.md)<br>
 
-#### Returns
+### **Dispose()**
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
+```csharp
+public void Dispose()
+```
 
-### **Accept(SafeLLamaContextHandle, LLamaToken)**
+### **Sample(SafeLLamaContextHandle, Int32)**
 
 ```csharp
-public void Accept(SafeLLamaContextHandle ctx, LLamaToken token)
+public LLamaToken Sample(SafeLLamaContextHandle ctx, int index)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **ProcessLogits(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;LLamaToken&gt;)**
+#### Returns
 
-Process the raw logit values
+[LLamaToken](./llama.native.llamatoken.md)<br>
+
+### **Apply(SafeLLamaContextHandle, LLamaTokenDataArray)**
 
 ```csharp
-protected abstract void ProcessLogits(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
+public void Apply(SafeLLamaContextHandle ctx, LLamaTokenDataArray data)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-The context being sampled from
 
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-The logits produced by the model
+`data` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
 
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-A list of tokens recently returned by the model
+### **Apply(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
 
-### **ProcessTokenDataArray(SafeLLamaContextHandle, LLamaTokenDataArray, ReadOnlySpan&lt;LLamaToken&gt;)**
-
-Process the LLamaTokenDataArray and select a single token
+Apply this sampling chain to a LLamaTokenDataArrayNative
 
 ```csharp
-protected abstract LLamaToken ProcessTokenDataArray(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, ReadOnlySpan<LLamaToken> lastTokens)
+public void Apply(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& data)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-The context being sampled from
-
-`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
-The LLamaTokenDataArray data produced by the model
 
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-A list of tokens recently returned by the model
-
-#### Returns
-
-[LLamaToken](./llama.native.llamatoken.md)<br>
+`data` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
 
 ### **Reset()**
 
@@ -105,18 +96,16 @@ A list of tokens recently returned by the model
 public void Reset()
 ```
 
-### **Clone()**
+### **Accept(LLamaToken)**
 
 ```csharp
-public abstract ISamplingPipeline Clone()
+public void Accept(LLamaToken token)
 ```
 
-#### Returns
+#### Parameters
 
-[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
+`token` [LLamaToken](./llama.native.llamatoken.md)<br>
 
-### **Dispose()**
+---
 
-```csharp
-public void Dispose()
-```
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.sampling.defaultsamplingpipeline.md b/docs/xmldocs/llama.sampling.defaultsamplingpipeline.md
index e052ade53..92c14bca6 100644
--- a/docs/xmldocs/llama.sampling.defaultsamplingpipeline.md
+++ b/docs/xmldocs/llama.sampling.defaultsamplingpipeline.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # DefaultSamplingPipeline
 
 Namespace: LLama.Sampling
@@ -5,11 +9,12 @@ Namespace: LLama.Sampling
 An implementation of ISamplePipeline which mimics the default llama.cpp sampling
 
 ```csharp
-public sealed class DefaultSamplingPipeline : BaseSamplingPipeline, ISamplingPipeline, System.IDisposable
+public class DefaultSamplingPipeline : BaseSamplingPipeline, ISamplingPipeline, System.IDisposable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [BaseSamplingPipeline](./llama.sampling.basesamplingpipeline.md) → [DefaultSamplingPipeline](./llama.sampling.defaultsamplingpipeline.md)<br>
-Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -18,12 +23,12 @@ Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposa
 Bias values to add to certain logits
 
 ```csharp
-public Dictionary<int, float> LogitBias { get; }
+public IReadOnlyDictionary<LLamaToken, float> LogitBias { get; set; }
 ```
 
 #### Property Value
 
-[Dictionary&lt;Int32, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
+[IReadOnlyDictionary&lt;LLamaToken, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlydictionary-2)<br>
 
 ### **RepeatPenalty**
 
@@ -37,70 +42,94 @@ public float RepeatPenalty { get; set; }
 
 [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-### **AlphaFrequency**
+### **FrequencyPenalty**
 
 Frequency penalty as described by OpenAI: https://platform.openai.com/docs/api-reference/chat/create<br>
  Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text
  so far, decreasing the model's likelihood to repeat the same line verbatim.
 
 ```csharp
-public float AlphaFrequency { get; set; }
+public float FrequencyPenalty { get; set; }
 ```
 
 #### Property Value
 
 [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-### **AlphaPresence**
+### **PresencePenalty**
 
 Presence penalty as described by OpenAI: https://platform.openai.com/docs/api-reference/chat/create<br>
  Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the
  text so far, increasing the model's likelihood to talk about new topics.
 
 ```csharp
-public float AlphaPresence { get; set; }
+public float PresencePenalty { get; set; }
 ```
 
 #### Property Value
 
 [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-### **Temperature**
+### **PenaltyCount**
 
-Temperature to apply (higher temperature is more "creative")
+How many tokens should be considered for penalties
 
 ```csharp
-public float Temperature { get; set; }
+public int PenaltyCount { get; set; }
 ```
 
 #### Property Value
 
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **TopK**
+### **PenalizeNewline**
 
-Number of tokens to keep in TopK sampling
+Whether the newline token should be protected from being modified by penalty
 
 ```csharp
-public int TopK { get; set; }
+public bool PenalizeNewline { get; set; }
 ```
 
 #### Property Value
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **PreventEOS**
+
+Whether the EOS token should be suppressed. Setting this to 'true' prevents EOS from being sampled
+
+```csharp
+public bool PreventEOS { get; set; }
+```
+
+#### Property Value
 
-### **TailFreeZ**
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-Z value for tail free sampling
+### **Temperature**
+
+Temperature to apply (higher temperature is more "creative")
 
 ```csharp
-public float TailFreeZ { get; set; }
+public float Temperature { get; set; }
 ```
 
 #### Property Value
 
 [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
+### **TopK**
+
+Number of tokens to keep in TopK sampling
+
+```csharp
+public int TopK { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **TypicalP**
 
 P value for locally typical sampling
@@ -137,29 +166,53 @@ public float MinP { get; set; }
 
 [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-### **PenalizeNewline**
+### **Grammar**
 
-Whether the newline value should be protected from being modified by logit bias and repeat penalty
+Grammar to apply to constrain possible tokens
 
 ```csharp
-public bool PenalizeNewline { get; set; }
+public Grammar Grammar { get; set; }
 ```
 
 #### Property Value
 
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[Grammar](./llama.sampling.grammar.md)<br>
 
-### **Grammar**
+### **MinKeep**
+
+The minimum number of tokens to keep for samplers which remove tokens
+
+```csharp
+public int MinKeep { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Seed**
+
+Seed to use for random sampling
+
+```csharp
+public uint Seed { get; set; }
+```
+
+#### Property Value
+
+[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+### **GrammarOptimization**
 
-Grammar to constrain valid tokens
+Selected grammar optimization mode
 
 ```csharp
-public SafeLLamaGrammarHandle Grammar { get; set; }
+public GrammarOptimizationMode GrammarOptimization { get; set; }
 ```
 
 #### Property Value
 
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+[GrammarOptimizationMode](./llama.sampling.defaultsamplingpipeline.grammaroptimizationmode.md)<br>
 
 ## Constructors
 
@@ -171,56 +224,58 @@ public DefaultSamplingPipeline()
 
 ## Methods
 
-### **ProcessLogits(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;LLamaToken&gt;)**
+### **Dispose()**
 
 ```csharp
-protected void ProcessLogits(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
+public void Dispose()
 ```
 
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+### **Reset()**
 
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+```csharp
+public void Reset()
+```
 
-### **ProcessTokenDataArray(SafeLLamaContextHandle, LLamaTokenDataArray, ReadOnlySpan&lt;LLamaToken&gt;)**
+### **Accept(LLamaToken)**
 
 ```csharp
-protected LLamaToken ProcessTokenDataArray(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, ReadOnlySpan<LLamaToken> lastTokens)
+public void Accept(LLamaToken token)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+
+### **CreateChain(SafeLLamaContextHandle)**
+
+```csharp
+protected SafeLLamaSamplerChainHandle CreateChain(SafeLLamaContextHandle context)
+```
 
-`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+#### Parameters
 
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
 #### Returns
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
+[SafeLLamaSamplerChainHandle](./llama.native.safellamasamplerchainhandle.md)<br>
 
-### **Accept(SafeLLamaContextHandle, LLamaToken)**
+### **Sample(SafeLLamaContextHandle, Int32)**
 
 ```csharp
-public void Accept(SafeLLamaContextHandle ctx, LLamaToken token)
+public LLamaToken Sample(SafeLLamaContextHandle ctx, int index)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **Clone()**
+#### Returns
 
-```csharp
-public ISamplingPipeline Clone()
-```
+[LLamaToken](./llama.native.llamatoken.md)<br>
 
-#### Returns
+---
 
-[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.sampling.grammar.md b/docs/xmldocs/llama.sampling.grammar.md
new file mode 100644
index 000000000..ffe147183
--- /dev/null
+++ b/docs/xmldocs/llama.sampling.grammar.md
@@ -0,0 +1,169 @@
+[`< Back`](./)
+
+---
+
+# Grammar
+
+Namespace: LLama.Sampling
+
+A grammar in GBNF form
+
+```csharp
+public class Grammar : System.IEquatable`1[[LLama.Sampling.Grammar, LLamaSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Grammar](./llama.sampling.grammar.md)<br>
+Implements [IEquatable&lt;Grammar&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **EqualityContract**
+
+```csharp
+protected Type EqualityContract { get; }
+```
+
+#### Property Value
+
+[Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
+
+### **Gbnf**
+
+
+
+```csharp
+public string Gbnf { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Root**
+
+
+
+```csharp
+public string Root { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Constructors
+
+### **Grammar(String, String)**
+
+A grammar in GBNF form
+
+```csharp
+public Grammar(string Gbnf, string Root)
+```
+
+#### Parameters
+
+`Gbnf` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`Root` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Grammar(Grammar)**
+
+```csharp
+protected Grammar(Grammar original)
+```
+
+#### Parameters
+
+`original` [Grammar](./llama.sampling.grammar.md)<br>
+
+## Methods
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **PrintMembers(StringBuilder)**
+
+```csharp
+protected bool PrintMembers(StringBuilder builder)
+```
+
+#### Parameters
+
+`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GetHashCode()**
+
+```csharp
+public int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Equals(Object)**
+
+```csharp
+public bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(Grammar)**
+
+```csharp
+public bool Equals(Grammar other)
+```
+
+#### Parameters
+
+`other` [Grammar](./llama.sampling.grammar.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **&lt;Clone&gt;$()**
+
+```csharp
+public Grammar <Clone>$()
+```
+
+#### Returns
+
+[Grammar](./llama.sampling.grammar.md)<br>
+
+### **Deconstruct(String&, String&)**
+
+```csharp
+public void Deconstruct(String& Gbnf, String& Root)
+```
+
+#### Parameters
+
+`Gbnf` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
+
+`Root` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.sampling.greedysamplingpipeline.md b/docs/xmldocs/llama.sampling.greedysamplingpipeline.md
index a02bada5c..9a2002b32 100644
--- a/docs/xmldocs/llama.sampling.greedysamplingpipeline.md
+++ b/docs/xmldocs/llama.sampling.greedysamplingpipeline.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # GreedySamplingPipeline
 
 Namespace: LLama.Sampling
@@ -9,21 +13,22 @@ public class GreedySamplingPipeline : BaseSamplingPipeline, ISamplingPipeline, S
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [BaseSamplingPipeline](./llama.sampling.basesamplingpipeline.md) → [GreedySamplingPipeline](./llama.sampling.greedysamplingpipeline.md)<br>
-Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
 ### **Grammar**
 
-Grammar to constrain valid tokens
+Grammar to apply to constrain possible tokens
 
 ```csharp
-public SafeLLamaGrammarHandle Grammar { get; set; }
+public Grammar Grammar { get; set; }
 ```
 
 #### Property Value
 
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+[Grammar](./llama.sampling.grammar.md)<br>
 
 ## Constructors
 
@@ -35,44 +40,20 @@ public GreedySamplingPipeline()
 
 ## Methods
 
-### **ProcessLogits(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;LLamaToken&gt;)**
-
-```csharp
-protected void ProcessLogits(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-
-### **ProcessTokenDataArray(SafeLLamaContextHandle, LLamaTokenDataArray, ReadOnlySpan&lt;LLamaToken&gt;)**
+### **CreateChain(SafeLLamaContextHandle)**
 
 ```csharp
-protected LLamaToken ProcessTokenDataArray(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, ReadOnlySpan<LLamaToken> lastTokens)
+protected SafeLLamaSamplerChainHandle CreateChain(SafeLLamaContextHandle context)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
-
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+`context` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
 #### Returns
 
-[LLamaToken](./llama.native.llamatoken.md)<br>
-
-### **Clone()**
+[SafeLLamaSamplerChainHandle](./llama.native.safellamasamplerchainhandle.md)<br>
 
-```csharp
-public ISamplingPipeline Clone()
-```
-
-#### Returns
+---
 
-[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.sampling.isamplingpipeline.md b/docs/xmldocs/llama.sampling.isamplingpipeline.md
index 481b3514a..c5176baea 100644
--- a/docs/xmldocs/llama.sampling.isamplingpipeline.md
+++ b/docs/xmldocs/llama.sampling.isamplingpipeline.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # ISamplingPipeline
 
 Namespace: LLama.Sampling
@@ -8,16 +12,17 @@ Convert a span of logits into a single sampled token. This interface can be impl
 public interface ISamplingPipeline : System.IDisposable
 ```
 
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute)
 
 ## Methods
 
-### **Sample(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;LLamaToken&gt;)**
+### **Sample(SafeLLamaContextHandle, Int32)**
 
-Sample a single token from the given logits
+Sample a single token from the given context at the given position
 
 ```csharp
-LLamaToken Sample(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
+LLamaToken Sample(SafeLLamaContextHandle ctx, int index)
 ```
 
 #### Parameters
@@ -25,29 +30,26 @@ LLamaToken Sample(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<L
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 The context being sampled from
 
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-The logits produced by the model
-
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-A span of tokens recently returned by the model
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Position to sample logits from
 
 #### Returns
 
 [LLamaToken](./llama.native.llamatoken.md)<br>
 
-### **Accept(SafeLLamaContextHandle, LLamaToken)**
+### **Apply(SafeLLamaContextHandle, LLamaTokenDataArray)**
 
-Update the pipeline, with knowledge that a particular token was just accepted
+Apply this pipeline to a set of token data
 
 ```csharp
-void Accept(SafeLLamaContextHandle ctx, LLamaToken token)
+void Apply(SafeLLamaContextHandle ctx, LLamaTokenDataArray data)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+`data` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
 
 ### **Reset()**
 
@@ -57,14 +59,18 @@ Reset all internal state of the sampling pipeline
 void Reset()
 ```
 
-### **Clone()**
+### **Accept(LLamaToken)**
 
-Create a copy of this sampling pipeline
+Update the pipeline, with knowledge that a particular token was just accepted
 
 ```csharp
-ISamplingPipeline Clone()
+void Accept(LLamaToken token)
 ```
 
-#### Returns
+#### Parameters
+
+`token` [LLamaToken](./llama.native.llamatoken.md)<br>
+
+---
 
-[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.sampling.isamplingpipelineextensions.md b/docs/xmldocs/llama.sampling.isamplingpipelineextensions.md
index 6ac83b875..1df1f4900 100644
--- a/docs/xmldocs/llama.sampling.isamplingpipelineextensions.md
+++ b/docs/xmldocs/llama.sampling.isamplingpipelineextensions.md
@@ -1,38 +1,44 @@
+[`< Back`](./)
+
+---
+
 # ISamplingPipelineExtensions
 
 Namespace: LLama.Sampling
 
-Extensions methods for ISamplingPipeline
+Extension methods for [ISamplingPipeline](./llama.sampling.isamplingpipeline.md)
 
 ```csharp
 public static class ISamplingPipelineExtensions
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ISamplingPipelineExtensions](./llama.sampling.isamplingpipelineextensions.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ISamplingPipelineExtensions](./llama.sampling.isamplingpipelineextensions.md)<br>
+Attributes [ExtensionAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.extensionattribute)
 
 ## Methods
 
-### **Sample(ISamplingPipeline, SafeLLamaContextHandle, Span&lt;Single&gt;, List&lt;LLamaToken&gt;)**
+### **Sample(ISamplingPipeline, LLamaContext, Int32)**
 
-Sample a single token from the given logits
+Sample a single token from the given context at the given position
 
 ```csharp
-public static LLamaToken Sample(ISamplingPipeline pipeline, SafeLLamaContextHandle ctx, Span<float> logits, List<LLamaToken> lastTokens)
+public static LLamaToken Sample(ISamplingPipeline pipe, LLamaContext ctx, int index)
 ```
 
 #### Parameters
 
-`pipeline` [ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
+`pipe` [ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`ctx` [LLamaContext](./llama.llamacontext.md)<br>
 The context being sampled from
 
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-The logits produced by the model
-
-`lastTokens` [List&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
-A list of tokens recently returned by the model
+`index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Position to sample logits from
 
 #### Returns
 
 [LLamaToken](./llama.native.llamatoken.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.sampling.mirostate2samplingpipeline.md b/docs/xmldocs/llama.sampling.mirostate2samplingpipeline.md
deleted file mode 100644
index 954dd35ff..000000000
--- a/docs/xmldocs/llama.sampling.mirostate2samplingpipeline.md
+++ /dev/null
@@ -1,120 +0,0 @@
-# Mirostate2SamplingPipeline
-
-Namespace: LLama.Sampling
-
-A sampling pipeline which uses mirostat (v2) to select tokens
-
-```csharp
-public class Mirostate2SamplingPipeline : BaseSamplingPipeline, ISamplingPipeline, System.IDisposable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [BaseSamplingPipeline](./llama.sampling.basesamplingpipeline.md) → [Mirostate2SamplingPipeline](./llama.sampling.mirostate2samplingpipeline.md)<br>
-Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
-
-## Properties
-
-### **Mu**
-
-Currently learned mu value
-
-```csharp
-public float Mu { get; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Tau**
-
-target entropy
-
-```csharp
-public float Tau { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Eta**
-
-learning rate
-
-```csharp
-public float Eta { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Grammar**
-
-Grammar to constrain valid tokens
-
-```csharp
-public SafeLLamaGrammarHandle Grammar { get; set; }
-```
-
-#### Property Value
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
-## Constructors
-
-### **Mirostate2SamplingPipeline()**
-
-```csharp
-public Mirostate2SamplingPipeline()
-```
-
-## Methods
-
-### **ProcessLogits(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;LLamaToken&gt;)**
-
-```csharp
-protected void ProcessLogits(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-
-### **ProcessTokenDataArray(SafeLLamaContextHandle, LLamaTokenDataArray, ReadOnlySpan&lt;LLamaToken&gt;)**
-
-```csharp
-protected LLamaToken ProcessTokenDataArray(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, ReadOnlySpan<LLamaToken> lastTokens)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
-
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-
-#### Returns
-
-[LLamaToken](./llama.native.llamatoken.md)<br>
-
-### **Reset()**
-
-```csharp
-public void Reset()
-```
-
-### **Clone()**
-
-```csharp
-public ISamplingPipeline Clone()
-```
-
-#### Returns
-
-[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
diff --git a/docs/xmldocs/llama.sampling.mirostatesamplingpipeline.md b/docs/xmldocs/llama.sampling.mirostatesamplingpipeline.md
deleted file mode 100644
index 143c5d232..000000000
--- a/docs/xmldocs/llama.sampling.mirostatesamplingpipeline.md
+++ /dev/null
@@ -1,120 +0,0 @@
-# MirostateSamplingPipeline
-
-Namespace: LLama.Sampling
-
-A sampling pipeline which uses mirostat (v1) to select tokens
-
-```csharp
-public class MirostateSamplingPipeline : BaseSamplingPipeline, ISamplingPipeline, System.IDisposable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [BaseSamplingPipeline](./llama.sampling.basesamplingpipeline.md) → [MirostateSamplingPipeline](./llama.sampling.mirostatesamplingpipeline.md)<br>
-Implements [ISamplingPipeline](./llama.sampling.isamplingpipeline.md), [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
-
-## Properties
-
-### **Mu**
-
-Currently learned mu value
-
-```csharp
-public float Mu { get; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Tau**
-
-target entropy
-
-```csharp
-public float Tau { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Eta**
-
-learning rate
-
-```csharp
-public float Eta { get; set; }
-```
-
-#### Property Value
-
-[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Grammar**
-
-Grammar to constrain valid tokens
-
-```csharp
-public SafeLLamaGrammarHandle Grammar { get; set; }
-```
-
-#### Property Value
-
-[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
-
-## Constructors
-
-### **MirostateSamplingPipeline()**
-
-```csharp
-public MirostateSamplingPipeline()
-```
-
-## Methods
-
-### **ProcessLogits(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;LLamaToken&gt;)**
-
-```csharp
-protected void ProcessLogits(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<LLamaToken> lastTokens)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
-
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-
-### **ProcessTokenDataArray(SafeLLamaContextHandle, LLamaTokenDataArray, ReadOnlySpan&lt;LLamaToken&gt;)**
-
-```csharp
-protected LLamaToken ProcessTokenDataArray(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, ReadOnlySpan<LLamaToken> lastTokens)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
-
-`lastTokens` [ReadOnlySpan&lt;LLamaToken&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
-
-#### Returns
-
-[LLamaToken](./llama.native.llamatoken.md)<br>
-
-### **Reset()**
-
-```csharp
-public void Reset()
-```
-
-### **Clone()**
-
-```csharp
-public ISamplingPipeline Clone()
-```
-
-#### Returns
-
-[ISamplingPipeline](./llama.sampling.isamplingpipeline.md)<br>
diff --git a/docs/xmldocs/llama.sessionstate.md b/docs/xmldocs/llama.sessionstate.md
index 0d6c8dbef..29421760c 100644
--- a/docs/xmldocs/llama.sessionstate.md
+++ b/docs/xmldocs/llama.sessionstate.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # SessionState
 
 Namespace: LLama
@@ -9,10 +13,21 @@ public class SessionState : System.IEquatable`1[[LLama.SessionState, LLamaSharp,
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [SessionState](./llama.sessionstate.md)<br>
-Implements [IEquatable&lt;SessionState&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+Implements [IEquatable&lt;SessionState&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
+### **EqualityContract**
+
+```csharp
+protected Type EqualityContract { get; }
+```
+
+#### Property Value
+
+[Type](https://docs.microsoft.com/en-us/dotnet/api/system.type)<br>
+
 ### **ExecutorState**
 
 Saved executor state for the session in JSON format.
@@ -109,6 +124,16 @@ public SessionState(State contextState, ExecutorBaseState executorState, ChatHis
 
 `historyTransform` [IHistoryTransform](./llama.abstractions.ihistorytransform.md)<br>
 
+### **SessionState(SessionState)**
+
+```csharp
+protected SessionState(SessionState original)
+```
+
+#### Parameters
+
+`original` [SessionState](./llama.sessionstate.md)<br>
+
 ## Methods
 
 ### **Save(String)**
@@ -215,3 +240,7 @@ public SessionState <Clone>$()
 #### Returns
 
 [SessionState](./llama.sessionstate.md)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.statefulexecutorbase.md b/docs/xmldocs/llama.statefulexecutorbase.md
new file mode 100644
index 000000000..8e4b2226d
--- /dev/null
+++ b/docs/xmldocs/llama.statefulexecutorbase.md
@@ -0,0 +1,404 @@
+[`< Back`](./)
+
+---
+
+# StatefulExecutorBase
+
+Namespace: LLama
+
+The base class for stateful LLama executors.
+
+```csharp
+public abstract class StatefulExecutorBase : LLama.Abstractions.ILLamaExecutor
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [StatefulExecutorBase](./llama.statefulexecutorbase.md)<br>
+Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Fields
+
+### **_logger**
+
+The logger used by this executor.
+
+```csharp
+protected ILogger _logger;
+```
+
+### **_pastTokensCount**
+
+The tokens that were already processed by the model.
+
+```csharp
+protected int _pastTokensCount;
+```
+
+### **_consumedTokensCount**
+
+The tokens that were consumed by the model during the current inference.
+
+```csharp
+protected int _consumedTokensCount;
+```
+
+### **_n_session_consumed**
+
+
+
+```csharp
+protected int _n_session_consumed;
+```
+
+### **_n_matching_session_tokens**
+
+
+
+```csharp
+protected int _n_matching_session_tokens;
+```
+
+### **_pathSession**
+
+The path of the session file.
+
+```csharp
+protected string _pathSession;
+```
+
+### **_embeds**
+
+A container of the tokens to be processed and after processed.
+
+```csharp
+protected List<LLamaToken> _embeds;
+```
+
+### **_embed_inps**
+
+A container for the tokens of input.
+
+```csharp
+protected List<LLamaToken> _embed_inps;
+```
+
+### **_session_tokens**
+
+
+
+```csharp
+protected List<LLamaToken> _session_tokens;
+```
+
+### **_last_n_tokens**
+
+The last tokens generated by the model.
+
+```csharp
+protected FixedSizeQueue<LLamaToken> _last_n_tokens;
+```
+
+## Properties
+
+### **Context**
+
+The context used by the executor.
+
+```csharp
+public LLamaContext Context { get; }
+```
+
+#### Property Value
+
+[LLamaContext](./llama.llamacontext.md)<br>
+
+### **IsMultiModal**
+
+```csharp
+public bool IsMultiModal { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **ClipModel**
+
+```csharp
+public LLavaWeights ClipModel { get; }
+```
+
+#### Property Value
+
+[LLavaWeights](./llama.llavaweights.md)<br>
+
+### **Images**
+
+```csharp
+public List<Byte[]> Images { get; }
+```
+
+#### Property Value
+
+[List&lt;Byte[]&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+
+## Constructors
+
+### **StatefulExecutorBase(LLamaContext, ILogger)**
+
+
+
+```csharp
+protected StatefulExecutorBase(LLamaContext context, ILogger logger)
+```
+
+#### Parameters
+
+`context` [LLamaContext](./llama.llamacontext.md)<br>
+
+`logger` ILogger<br>
+
+### **StatefulExecutorBase(LLamaContext, LLavaWeights, ILogger)**
+
+
+
+```csharp
+public StatefulExecutorBase(LLamaContext context, LLavaWeights lLavaWeights, ILogger logger)
+```
+
+#### Parameters
+
+`context` [LLamaContext](./llama.llamacontext.md)<br>
+
+`lLavaWeights` [LLavaWeights](./llama.llavaweights.md)<br>
+
+`logger` ILogger<br>
+
+## Methods
+
+### **WithSessionFile(String)**
+
+This API is currently not verified.
+
+```csharp
+public StatefulExecutorBase WithSessionFile(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[StatefulExecutorBase](./llama.statefulexecutorbase.md)<br>
+
+#### Exceptions
+
+[ArgumentNullException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentnullexception)<br>
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **SaveSessionFile(String)**
+
+This API has not been verified currently.
+
+```csharp
+public void SaveSessionFile(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HandleRunOutOfContext(Int32)**
+
+After running out of the context, take some tokens from the original prompt and recompute the logits in batches.
+
+```csharp
+protected void HandleRunOutOfContext(int tokensToKeep)
+```
+
+#### Parameters
+
+`tokensToKeep` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **TryReuseMatchingPrefix()**
+
+Try to reuse the matching prefix from the session file.
+
+```csharp
+protected void TryReuseMatchingPrefix()
+```
+
+### **GetLoopCondition(InferStateArgs)**
+
+Decide whether to continue the loop.
+
+```csharp
+protected abstract Task<bool> GetLoopCondition(InferStateArgs args)
+```
+
+#### Parameters
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task&lt;Boolean&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **PreprocessInputs(String, InferStateArgs)**
+
+Preprocess the inputs before the inference.
+
+```csharp
+protected abstract Task PreprocessInputs(string text, InferStateArgs args)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **PostProcess(IInferenceParams, InferStateArgs)**
+
+Do some post processing after the inference.
+
+```csharp
+protected abstract Task<ValueTuple<bool, IReadOnlyList<string>>> PostProcess(IInferenceParams inferenceParams, InferStateArgs args)
+```
+
+#### Parameters
+
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task&lt;ValueTuple&lt;Boolean, IReadOnlyList&lt;String&gt;&gt;&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task-1)<br>
+
+### **InferInternal(IInferenceParams, InferStateArgs)**
+
+The core inference logic.
+
+```csharp
+protected abstract Task InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
+```
+
+#### Parameters
+
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+
+`args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **SaveState(String)**
+
+Save the current state to a file.
+
+```csharp
+public abstract Task SaveState(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **GetStateData()**
+
+Get the current state data.
+
+```csharp
+public abstract ExecutorBaseState GetStateData()
+```
+
+#### Returns
+
+[ExecutorBaseState](./llama.statefulexecutorbase.executorbasestate.md)<br>
+
+### **LoadState(ExecutorBaseState)**
+
+Load the state from data.
+
+```csharp
+public abstract Task LoadState(ExecutorBaseState data)
+```
+
+#### Parameters
+
+`data` [ExecutorBaseState](./llama.statefulexecutorbase.executorbasestate.md)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **LoadState(String)**
+
+Load the state from a file.
+
+```csharp
+public abstract Task LoadState(string filename)
+```
+
+#### Parameters
+
+`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+### **InferAsync(String, IInferenceParams, CancellationToken)**
+
+Execute the inference.
+
+```csharp
+public IAsyncEnumerable<string> InferAsync(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The prompt. If null, generation will continue where it left off previously.
+
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+
+#### Returns
+
+[IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
+
+### **PrefillPromptAsync(String)**
+
+Asynchronously runs a prompt through the model to compute KV cache without generating any new tokens.
+ It could reduce the latency of the first time response if the first input from the user is not immediate.
+
+```csharp
+public Task PrefillPromptAsync(string prompt)
+```
+
+#### Parameters
+
+`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Prompt to process
+
+#### Returns
+
+[Task](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.statelessexecutor.md b/docs/xmldocs/llama.statelessexecutor.md
new file mode 100644
index 000000000..60c2e6e8b
--- /dev/null
+++ b/docs/xmldocs/llama.statelessexecutor.md
@@ -0,0 +1,128 @@
+[`< Back`](./)
+
+---
+
+# StatelessExecutor
+
+Namespace: LLama
+
+This executor infer the input as one-time job. Previous inputs won't impact on the 
+ response to current input.
+
+```csharp
+public class StatelessExecutor : LLama.Abstractions.ILLamaExecutor
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [StatelessExecutor](./llama.statelessexecutor.md)<br>
+Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Properties
+
+### **IsMultiModal**
+
+```csharp
+public bool IsMultiModal { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **ClipModel**
+
+```csharp
+public LLavaWeights ClipModel { get; }
+```
+
+#### Property Value
+
+[LLavaWeights](./llama.llavaweights.md)<br>
+
+### **Images**
+
+```csharp
+public List<Byte[]> Images { get; }
+```
+
+#### Property Value
+
+[List&lt;Byte[]&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+
+### **Context**
+
+The context used by the executor when running the inference.
+
+```csharp
+public LLamaContext Context { get; private set; }
+```
+
+#### Property Value
+
+[LLamaContext](./llama.llamacontext.md)<br>
+
+### **ApplyTemplate**
+
+If true, applies the default template to the prompt as defined in the rules for llama_chat_apply_template template.
+
+```csharp
+public bool ApplyTemplate { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **SystemMessage**
+
+The system message to use with the prompt. Only used when [StatelessExecutor.ApplyTemplate](./llama.statelessexecutor.md#applytemplate) is true.
+
+```csharp
+public string SystemMessage { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+## Constructors
+
+### **StatelessExecutor(LLamaWeights, IContextParams, ILogger)**
+
+Create a new stateless executor which will use the given model
+
+```csharp
+public StatelessExecutor(LLamaWeights weights, IContextParams params, ILogger logger)
+```
+
+#### Parameters
+
+`weights` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IContextParams](./llama.abstractions.icontextparams.md)<br>
+
+`logger` ILogger<br>
+
+## Methods
+
+### **InferAsync(String, IInferenceParams, CancellationToken)**
+
+```csharp
+public IAsyncEnumerable<string> InferAsync(string prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)
+```
+
+#### Parameters
+
+`prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+
+#### Returns
+
+[IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.streamingtokendecoder.md b/docs/xmldocs/llama.streamingtokendecoder.md
index e5fc9e73e..ccb730ee1 100644
--- a/docs/xmldocs/llama.streamingtokendecoder.md
+++ b/docs/xmldocs/llama.streamingtokendecoder.md
@@ -1,3 +1,7 @@
+[`< Back`](./)
+
+---
+
 # StreamingTokenDecoder
 
 Namespace: LLama
@@ -8,7 +12,8 @@ Decodes a stream of tokens into a stream of characters
 public sealed class StreamingTokenDecoder
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [StreamingTokenDecoder](./llama.streamingtokendecoder.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [StreamingTokenDecoder](./llama.streamingtokendecoder.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
 
 ## Properties
 
@@ -24,6 +29,18 @@ public int AvailableCharacters { get; }
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
+### **DecodeSpecialTokens**
+
+If true, special characters will be converted to text. If false they will be invisible.
+
+```csharp
+public bool DecodeSpecialTokens { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ## Constructors
 
 ### **StreamingTokenDecoder(Encoding, LLamaWeights)**
@@ -173,20 +190,6 @@ Set the decoder back to its initial state
 public void Reset()
 ```
 
-### **&lt;Add&gt;g__TokenToBytes|9_0(Byte[]&, LLamaToken, SafeLlamaModelHandle)**
-
-```csharp
-internal static Span<byte> <Add>g__TokenToBytes|9_0(Byte[]& bytes, LLamaToken token, SafeLlamaModelHandle model)
-```
-
-#### Parameters
-
-`bytes` [Byte[]&](https://docs.microsoft.com/en-us/dotnet/api/system.byte&)<br>
-
-`token` [LLamaToken](./llama.native.llamatoken.md)<br>
-
-`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
-
-#### Returns
+---
 
-[Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.transformers.prompttemplatetransformer.md b/docs/xmldocs/llama.transformers.prompttemplatetransformer.md
new file mode 100644
index 000000000..4dcfe13b5
--- /dev/null
+++ b/docs/xmldocs/llama.transformers.prompttemplatetransformer.md
@@ -0,0 +1,98 @@
+[`< Back`](./)
+
+---
+
+# PromptTemplateTransformer
+
+Namespace: LLama.Transformers
+
+A prompt formatter that will use llama.cpp's template formatter
+ If your model is not supported, you will need to define your own formatter according the cchat prompt specification for your model
+
+```csharp
+public class PromptTemplateTransformer : LLama.Abstractions.IHistoryTransform
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [PromptTemplateTransformer](./llama.transformers.prompttemplatetransformer.md)<br>
+Implements [IHistoryTransform](./llama.abstractions.ihistorytransform.md)<br>
+Attributes [NullableContextAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullablecontextattribute), [NullableAttribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.nullableattribute)
+
+## Constructors
+
+### **PromptTemplateTransformer(LLamaWeights, Boolean)**
+
+A prompt formatter that will use llama.cpp's template formatter
+ If your model is not supported, you will need to define your own formatter according the cchat prompt specification for your model
+
+```csharp
+public PromptTemplateTransformer(LLamaWeights model, bool withAssistant)
+```
+
+#### Parameters
+
+`model` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`withAssistant` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **HistoryToText(ChatHistory)**
+
+```csharp
+public string HistoryToText(ChatHistory history)
+```
+
+#### Parameters
+
+`history` [ChatHistory](./llama.common.chathistory.md)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **TextToHistory(AuthorRole, String)**
+
+```csharp
+public ChatHistory TextToHistory(AuthorRole role, string text)
+```
+
+#### Parameters
+
+`role` [AuthorRole](./llama.common.authorrole.md)<br>
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[ChatHistory](./llama.common.chathistory.md)<br>
+
+### **Clone()**
+
+```csharp
+public IHistoryTransform Clone()
+```
+
+#### Returns
+
+[IHistoryTransform](./llama.abstractions.ihistorytransform.md)<br>
+
+### **ToModelPrompt(LLamaTemplate)**
+
+Apply the template to the messages and return the resulting prompt as a string
+
+```csharp
+public static string ToModelPrompt(LLamaTemplate template)
+```
+
+#### Parameters
+
+`template` [LLamaTemplate](./llama.llamatemplate.md)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The formatted template string as defined by the model
+
+---
+
+[`< Back`](./)
diff --git a/docs/xmldocs/llama.types.chatcompletion.md b/docs/xmldocs/llama.types.chatcompletion.md
deleted file mode 100644
index 303be9a62..000000000
--- a/docs/xmldocs/llama.types.chatcompletion.md
+++ /dev/null
@@ -1,188 +0,0 @@
-# ChatCompletion
-
-Namespace: LLama.Types
-
-```csharp
-public class ChatCompletion : System.IEquatable`1[[LLama.Types.ChatCompletion, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletion](./llama.types.chatcompletion.md)<br>
-Implements [IEquatable&lt;ChatCompletion&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Id**
-
-```csharp
-public string Id { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Object**
-
-```csharp
-public string Object { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Created**
-
-```csharp
-public int Created { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Model**
-
-```csharp
-public string Model { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Choices**
-
-```csharp
-public ChatCompletionChoice[] Choices { get; set; }
-```
-
-#### Property Value
-
-[ChatCompletionChoice[]](./llama.types.chatcompletionchoice.md)<br>
-
-### **Usage**
-
-```csharp
-public CompletionUsage Usage { get; set; }
-```
-
-#### Property Value
-
-[CompletionUsage](./llama.types.completionusage.md)<br>
-
-## Constructors
-
-### **ChatCompletion(String, String, Int32, String, ChatCompletionChoice[], CompletionUsage)**
-
-```csharp
-public ChatCompletion(string Id, string Object, int Created, string Model, ChatCompletionChoice[] Choices, CompletionUsage Usage)
-```
-
-#### Parameters
-
-`Id` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Object` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Created` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`Model` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Choices` [ChatCompletionChoice[]](./llama.types.chatcompletionchoice.md)<br>
-
-`Usage` [CompletionUsage](./llama.types.completionusage.md)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(ChatCompletion)**
-
-```csharp
-public bool Equals(ChatCompletion other)
-```
-
-#### Parameters
-
-`other` [ChatCompletion](./llama.types.chatcompletion.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public ChatCompletion <Clone>$()
-```
-
-#### Returns
-
-[ChatCompletion](./llama.types.chatcompletion.md)<br>
-
-### **Deconstruct(String&, String&, Int32&, String&, ChatCompletionChoice[]&, CompletionUsage&)**
-
-```csharp
-public void Deconstruct(String& Id, String& Object, Int32& Created, String& Model, ChatCompletionChoice[]& Choices, CompletionUsage& Usage)
-```
-
-#### Parameters
-
-`Id` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Object` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Created` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`Model` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Choices` [ChatCompletionChoice[]&](./llama.types.chatcompletionchoice&.md)<br>
-
-`Usage` [CompletionUsage&](./llama.types.completionusage&.md)<br>
diff --git a/docs/xmldocs/llama.types.chatcompletionchoice.md b/docs/xmldocs/llama.types.chatcompletionchoice.md
deleted file mode 100644
index 85a009e1f..000000000
--- a/docs/xmldocs/llama.types.chatcompletionchoice.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# ChatCompletionChoice
-
-Namespace: LLama.Types
-
-```csharp
-public class ChatCompletionChoice : System.IEquatable`1[[LLama.Types.ChatCompletionChoice, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChoice](./llama.types.chatcompletionchoice.md)<br>
-Implements [IEquatable&lt;ChatCompletionChoice&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Index**
-
-```csharp
-public int Index { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Message**
-
-```csharp
-public ChatCompletionMessage Message { get; set; }
-```
-
-#### Property Value
-
-[ChatCompletionMessage](./llama.types.chatcompletionmessage.md)<br>
-
-### **FinishReason**
-
-```csharp
-public string FinishReason { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Constructors
-
-### **ChatCompletionChoice(Int32, ChatCompletionMessage, String)**
-
-```csharp
-public ChatCompletionChoice(int Index, ChatCompletionMessage Message, string FinishReason)
-```
-
-#### Parameters
-
-`Index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`Message` [ChatCompletionMessage](./llama.types.chatcompletionmessage.md)<br>
-
-`FinishReason` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(ChatCompletionChoice)**
-
-```csharp
-public bool Equals(ChatCompletionChoice other)
-```
-
-#### Parameters
-
-`other` [ChatCompletionChoice](./llama.types.chatcompletionchoice.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public ChatCompletionChoice <Clone>$()
-```
-
-#### Returns
-
-[ChatCompletionChoice](./llama.types.chatcompletionchoice.md)<br>
-
-### **Deconstruct(Int32&, ChatCompletionMessage&, String&)**
-
-```csharp
-public void Deconstruct(Int32& Index, ChatCompletionMessage& Message, String& FinishReason)
-```
-
-#### Parameters
-
-`Index` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`Message` [ChatCompletionMessage&](./llama.types.chatcompletionmessage&.md)<br>
-
-`FinishReason` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
diff --git a/docs/xmldocs/llama.types.chatcompletionchunk.md b/docs/xmldocs/llama.types.chatcompletionchunk.md
deleted file mode 100644
index ae1747bff..000000000
--- a/docs/xmldocs/llama.types.chatcompletionchunk.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# ChatCompletionChunk
-
-Namespace: LLama.Types
-
-```csharp
-public class ChatCompletionChunk : System.IEquatable`1[[LLama.Types.ChatCompletionChunk, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunk](./llama.types.chatcompletionchunk.md)<br>
-Implements [IEquatable&lt;ChatCompletionChunk&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Id**
-
-```csharp
-public string Id { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Model**
-
-```csharp
-public string Model { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Object**
-
-```csharp
-public string Object { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Created**
-
-```csharp
-public int Created { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Choices**
-
-```csharp
-public ChatCompletionChunkChoice[] Choices { get; set; }
-```
-
-#### Property Value
-
-[ChatCompletionChunkChoice[]](./llama.types.chatcompletionchunkchoice.md)<br>
-
-## Constructors
-
-### **ChatCompletionChunk(String, String, String, Int32, ChatCompletionChunkChoice[])**
-
-```csharp
-public ChatCompletionChunk(string Id, string Model, string Object, int Created, ChatCompletionChunkChoice[] Choices)
-```
-
-#### Parameters
-
-`Id` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Model` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Object` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Created` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`Choices` [ChatCompletionChunkChoice[]](./llama.types.chatcompletionchunkchoice.md)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(ChatCompletionChunk)**
-
-```csharp
-public bool Equals(ChatCompletionChunk other)
-```
-
-#### Parameters
-
-`other` [ChatCompletionChunk](./llama.types.chatcompletionchunk.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public ChatCompletionChunk <Clone>$()
-```
-
-#### Returns
-
-[ChatCompletionChunk](./llama.types.chatcompletionchunk.md)<br>
-
-### **Deconstruct(String&, String&, String&, Int32&, ChatCompletionChunkChoice[]&)**
-
-```csharp
-public void Deconstruct(String& Id, String& Model, String& Object, Int32& Created, ChatCompletionChunkChoice[]& Choices)
-```
-
-#### Parameters
-
-`Id` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Model` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Object` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Created` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`Choices` [ChatCompletionChunkChoice[]&](./llama.types.chatcompletionchunkchoice&.md)<br>
diff --git a/docs/xmldocs/llama.types.chatcompletionchunkchoice.md b/docs/xmldocs/llama.types.chatcompletionchunkchoice.md
deleted file mode 100644
index 8d15fbd5c..000000000
--- a/docs/xmldocs/llama.types.chatcompletionchunkchoice.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# ChatCompletionChunkChoice
-
-Namespace: LLama.Types
-
-```csharp
-public class ChatCompletionChunkChoice : System.IEquatable`1[[LLama.Types.ChatCompletionChunkChoice, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunkChoice](./llama.types.chatcompletionchunkchoice.md)<br>
-Implements [IEquatable&lt;ChatCompletionChunkChoice&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Index**
-
-```csharp
-public int Index { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Delta**
-
-```csharp
-public ChatCompletionChunkDelta Delta { get; set; }
-```
-
-#### Property Value
-
-[ChatCompletionChunkDelta](./llama.types.chatcompletionchunkdelta.md)<br>
-
-### **FinishReason**
-
-```csharp
-public string FinishReason { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Constructors
-
-### **ChatCompletionChunkChoice(Int32, ChatCompletionChunkDelta, String)**
-
-```csharp
-public ChatCompletionChunkChoice(int Index, ChatCompletionChunkDelta Delta, string FinishReason)
-```
-
-#### Parameters
-
-`Index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`Delta` [ChatCompletionChunkDelta](./llama.types.chatcompletionchunkdelta.md)<br>
-
-`FinishReason` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(ChatCompletionChunkChoice)**
-
-```csharp
-public bool Equals(ChatCompletionChunkChoice other)
-```
-
-#### Parameters
-
-`other` [ChatCompletionChunkChoice](./llama.types.chatcompletionchunkchoice.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public ChatCompletionChunkChoice <Clone>$()
-```
-
-#### Returns
-
-[ChatCompletionChunkChoice](./llama.types.chatcompletionchunkchoice.md)<br>
-
-### **Deconstruct(Int32&, ChatCompletionChunkDelta&, String&)**
-
-```csharp
-public void Deconstruct(Int32& Index, ChatCompletionChunkDelta& Delta, String& FinishReason)
-```
-
-#### Parameters
-
-`Index` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`Delta` [ChatCompletionChunkDelta&](./llama.types.chatcompletionchunkdelta&.md)<br>
-
-`FinishReason` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
diff --git a/docs/xmldocs/llama.types.chatcompletionchunkdelta.md b/docs/xmldocs/llama.types.chatcompletionchunkdelta.md
deleted file mode 100644
index 12244ff52..000000000
--- a/docs/xmldocs/llama.types.chatcompletionchunkdelta.md
+++ /dev/null
@@ -1,132 +0,0 @@
-# ChatCompletionChunkDelta
-
-Namespace: LLama.Types
-
-```csharp
-public class ChatCompletionChunkDelta : System.IEquatable`1[[LLama.Types.ChatCompletionChunkDelta, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunkDelta](./llama.types.chatcompletionchunkdelta.md)<br>
-Implements [IEquatable&lt;ChatCompletionChunkDelta&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Role**
-
-```csharp
-public string Role { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Content**
-
-```csharp
-public string Content { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Constructors
-
-### **ChatCompletionChunkDelta(String, String)**
-
-```csharp
-public ChatCompletionChunkDelta(string Role, string Content)
-```
-
-#### Parameters
-
-`Role` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(ChatCompletionChunkDelta)**
-
-```csharp
-public bool Equals(ChatCompletionChunkDelta other)
-```
-
-#### Parameters
-
-`other` [ChatCompletionChunkDelta](./llama.types.chatcompletionchunkdelta.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public ChatCompletionChunkDelta <Clone>$()
-```
-
-#### Returns
-
-[ChatCompletionChunkDelta](./llama.types.chatcompletionchunkdelta.md)<br>
-
-### **Deconstruct(String&, String&)**
-
-```csharp
-public void Deconstruct(String& Role, String& Content)
-```
-
-#### Parameters
-
-`Role` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Content` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
diff --git a/docs/xmldocs/llama.types.chatcompletionmessage.md b/docs/xmldocs/llama.types.chatcompletionmessage.md
deleted file mode 100644
index 6d0eb1da4..000000000
--- a/docs/xmldocs/llama.types.chatcompletionmessage.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# ChatCompletionMessage
-
-Namespace: LLama.Types
-
-```csharp
-public class ChatCompletionMessage : System.IEquatable`1[[LLama.Types.ChatCompletionMessage, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionMessage](./llama.types.chatcompletionmessage.md)<br>
-Implements [IEquatable&lt;ChatCompletionMessage&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Role**
-
-```csharp
-public ChatRole Role { get; set; }
-```
-
-#### Property Value
-
-[ChatRole](./llama.types.chatrole.md)<br>
-
-### **Content**
-
-```csharp
-public string Content { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Name**
-
-```csharp
-public string Name { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Constructors
-
-### **ChatCompletionMessage(ChatRole, String, String)**
-
-```csharp
-public ChatCompletionMessage(ChatRole Role, string Content, string Name)
-```
-
-#### Parameters
-
-`Role` [ChatRole](./llama.types.chatrole.md)<br>
-
-`Content` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(ChatCompletionMessage)**
-
-```csharp
-public bool Equals(ChatCompletionMessage other)
-```
-
-#### Parameters
-
-`other` [ChatCompletionMessage](./llama.types.chatcompletionmessage.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public ChatCompletionMessage <Clone>$()
-```
-
-#### Returns
-
-[ChatCompletionMessage](./llama.types.chatcompletionmessage.md)<br>
-
-### **Deconstruct(ChatRole&, String&, String&)**
-
-```csharp
-public void Deconstruct(ChatRole& Role, String& Content, String& Name)
-```
-
-#### Parameters
-
-`Role` [ChatRole&](./llama.types.chatrole&.md)<br>
-
-`Content` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Name` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
diff --git a/docs/xmldocs/llama.types.chatmessagerecord.md b/docs/xmldocs/llama.types.chatmessagerecord.md
deleted file mode 100644
index bc143fb99..000000000
--- a/docs/xmldocs/llama.types.chatmessagerecord.md
+++ /dev/null
@@ -1,132 +0,0 @@
-# ChatMessageRecord
-
-Namespace: LLama.Types
-
-```csharp
-public class ChatMessageRecord : System.IEquatable`1[[LLama.Types.ChatMessageRecord, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatMessageRecord](./llama.types.chatmessagerecord.md)<br>
-Implements [IEquatable&lt;ChatMessageRecord&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Message**
-
-```csharp
-public ChatCompletionMessage Message { get; set; }
-```
-
-#### Property Value
-
-[ChatCompletionMessage](./llama.types.chatcompletionmessage.md)<br>
-
-### **Time**
-
-```csharp
-public DateTime Time { get; set; }
-```
-
-#### Property Value
-
-[DateTime](https://docs.microsoft.com/en-us/dotnet/api/system.datetime)<br>
-
-## Constructors
-
-### **ChatMessageRecord(ChatCompletionMessage, DateTime)**
-
-```csharp
-public ChatMessageRecord(ChatCompletionMessage Message, DateTime Time)
-```
-
-#### Parameters
-
-`Message` [ChatCompletionMessage](./llama.types.chatcompletionmessage.md)<br>
-
-`Time` [DateTime](https://docs.microsoft.com/en-us/dotnet/api/system.datetime)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(ChatMessageRecord)**
-
-```csharp
-public bool Equals(ChatMessageRecord other)
-```
-
-#### Parameters
-
-`other` [ChatMessageRecord](./llama.types.chatmessagerecord.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public ChatMessageRecord <Clone>$()
-```
-
-#### Returns
-
-[ChatMessageRecord](./llama.types.chatmessagerecord.md)<br>
-
-### **Deconstruct(ChatCompletionMessage&, DateTime&)**
-
-```csharp
-public void Deconstruct(ChatCompletionMessage& Message, DateTime& Time)
-```
-
-#### Parameters
-
-`Message` [ChatCompletionMessage&](./llama.types.chatcompletionmessage&.md)<br>
-
-`Time` [DateTime&](https://docs.microsoft.com/en-us/dotnet/api/system.datetime&)<br>
diff --git a/docs/xmldocs/llama.types.chatrole.md b/docs/xmldocs/llama.types.chatrole.md
deleted file mode 100644
index d8f88c120..000000000
--- a/docs/xmldocs/llama.types.chatrole.md
+++ /dev/null
@@ -1,15 +0,0 @@
-# ChatRole
-
-Namespace: LLama.Types
-
-```csharp
-public enum ChatRole
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [ChatRole](./llama.types.chatrole.md)<br>
-Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
-
-## Fields
-
-| Name | Value | Description |
-| --- | --: | --- |
diff --git a/docs/xmldocs/llama.types.completion.md b/docs/xmldocs/llama.types.completion.md
deleted file mode 100644
index 78d43329b..000000000
--- a/docs/xmldocs/llama.types.completion.md
+++ /dev/null
@@ -1,188 +0,0 @@
-# Completion
-
-Namespace: LLama.Types
-
-```csharp
-public class Completion : System.IEquatable`1[[LLama.Types.Completion, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Completion](./llama.types.completion.md)<br>
-Implements [IEquatable&lt;Completion&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Id**
-
-```csharp
-public string Id { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Object**
-
-```csharp
-public string Object { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Created**
-
-```csharp
-public int Created { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Model**
-
-```csharp
-public string Model { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Choices**
-
-```csharp
-public CompletionChoice[] Choices { get; set; }
-```
-
-#### Property Value
-
-[CompletionChoice[]](./llama.types.completionchoice.md)<br>
-
-### **Usage**
-
-```csharp
-public CompletionUsage Usage { get; set; }
-```
-
-#### Property Value
-
-[CompletionUsage](./llama.types.completionusage.md)<br>
-
-## Constructors
-
-### **Completion(String, String, Int32, String, CompletionChoice[], CompletionUsage)**
-
-```csharp
-public Completion(string Id, string Object, int Created, string Model, CompletionChoice[] Choices, CompletionUsage Usage)
-```
-
-#### Parameters
-
-`Id` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Object` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Created` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`Model` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Choices` [CompletionChoice[]](./llama.types.completionchoice.md)<br>
-
-`Usage` [CompletionUsage](./llama.types.completionusage.md)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(Completion)**
-
-```csharp
-public bool Equals(Completion other)
-```
-
-#### Parameters
-
-`other` [Completion](./llama.types.completion.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public Completion <Clone>$()
-```
-
-#### Returns
-
-[Completion](./llama.types.completion.md)<br>
-
-### **Deconstruct(String&, String&, Int32&, String&, CompletionChoice[]&, CompletionUsage&)**
-
-```csharp
-public void Deconstruct(String& Id, String& Object, Int32& Created, String& Model, CompletionChoice[]& Choices, CompletionUsage& Usage)
-```
-
-#### Parameters
-
-`Id` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Object` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Created` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`Model` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Choices` [CompletionChoice[]&](./llama.types.completionchoice&.md)<br>
-
-`Usage` [CompletionUsage&](./llama.types.completionusage&.md)<br>
diff --git a/docs/xmldocs/llama.types.completionchoice.md b/docs/xmldocs/llama.types.completionchoice.md
deleted file mode 100644
index c9fd69d74..000000000
--- a/docs/xmldocs/llama.types.completionchoice.md
+++ /dev/null
@@ -1,160 +0,0 @@
-# CompletionChoice
-
-Namespace: LLama.Types
-
-```csharp
-public class CompletionChoice : System.IEquatable`1[[LLama.Types.CompletionChoice, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionChoice](./llama.types.completionchoice.md)<br>
-Implements [IEquatable&lt;CompletionChoice&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Text**
-
-```csharp
-public string Text { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Index**
-
-```csharp
-public int Index { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Logprobs**
-
-```csharp
-public CompletionLogprobs Logprobs { get; set; }
-```
-
-#### Property Value
-
-[CompletionLogprobs](./llama.types.completionlogprobs.md)<br>
-
-### **FinishReason**
-
-```csharp
-public string FinishReason { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Constructors
-
-### **CompletionChoice(String, Int32, CompletionLogprobs, String)**
-
-```csharp
-public CompletionChoice(string Text, int Index, CompletionLogprobs Logprobs, string FinishReason)
-```
-
-#### Parameters
-
-`Text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`Logprobs` [CompletionLogprobs](./llama.types.completionlogprobs.md)<br>
-
-`FinishReason` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(CompletionChoice)**
-
-```csharp
-public bool Equals(CompletionChoice other)
-```
-
-#### Parameters
-
-`other` [CompletionChoice](./llama.types.completionchoice.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public CompletionChoice <Clone>$()
-```
-
-#### Returns
-
-[CompletionChoice](./llama.types.completionchoice.md)<br>
-
-### **Deconstruct(String&, Int32&, CompletionLogprobs&, String&)**
-
-```csharp
-public void Deconstruct(String& Text, Int32& Index, CompletionLogprobs& Logprobs, String& FinishReason)
-```
-
-#### Parameters
-
-`Text` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Index` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`Logprobs` [CompletionLogprobs&](./llama.types.completionlogprobs&.md)<br>
-
-`FinishReason` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
diff --git a/docs/xmldocs/llama.types.completionchunk.md b/docs/xmldocs/llama.types.completionchunk.md
deleted file mode 100644
index 2fc9dd9fd..000000000
--- a/docs/xmldocs/llama.types.completionchunk.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# CompletionChunk
-
-Namespace: LLama.Types
-
-```csharp
-public class CompletionChunk : System.IEquatable`1[[LLama.Types.CompletionChunk, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionChunk](./llama.types.completionchunk.md)<br>
-Implements [IEquatable&lt;CompletionChunk&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Id**
-
-```csharp
-public string Id { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Object**
-
-```csharp
-public string Object { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Created**
-
-```csharp
-public int Created { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Model**
-
-```csharp
-public string Model { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Choices**
-
-```csharp
-public CompletionChoice[] Choices { get; set; }
-```
-
-#### Property Value
-
-[CompletionChoice[]](./llama.types.completionchoice.md)<br>
-
-## Constructors
-
-### **CompletionChunk(String, String, Int32, String, CompletionChoice[])**
-
-```csharp
-public CompletionChunk(string Id, string Object, int Created, string Model, CompletionChoice[] Choices)
-```
-
-#### Parameters
-
-`Id` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Object` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Created` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`Model` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Choices` [CompletionChoice[]](./llama.types.completionchoice.md)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(CompletionChunk)**
-
-```csharp
-public bool Equals(CompletionChunk other)
-```
-
-#### Parameters
-
-`other` [CompletionChunk](./llama.types.completionchunk.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public CompletionChunk <Clone>$()
-```
-
-#### Returns
-
-[CompletionChunk](./llama.types.completionchunk.md)<br>
-
-### **Deconstruct(String&, String&, Int32&, String&, CompletionChoice[]&)**
-
-```csharp
-public void Deconstruct(String& Id, String& Object, Int32& Created, String& Model, CompletionChoice[]& Choices)
-```
-
-#### Parameters
-
-`Id` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Object` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Created` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`Model` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Choices` [CompletionChoice[]&](./llama.types.completionchoice&.md)<br>
diff --git a/docs/xmldocs/llama.types.completionlogprobs.md b/docs/xmldocs/llama.types.completionlogprobs.md
deleted file mode 100644
index 234b30386..000000000
--- a/docs/xmldocs/llama.types.completionlogprobs.md
+++ /dev/null
@@ -1,160 +0,0 @@
-# CompletionLogprobs
-
-Namespace: LLama.Types
-
-```csharp
-public class CompletionLogprobs : System.IEquatable`1[[LLama.Types.CompletionLogprobs, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionLogprobs](./llama.types.completionlogprobs.md)<br>
-Implements [IEquatable&lt;CompletionLogprobs&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **TextOffset**
-
-```csharp
-public Int32[] TextOffset { get; set; }
-```
-
-#### Property Value
-
-[Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **TokenLogProbs**
-
-```csharp
-public Single[] TokenLogProbs { get; set; }
-```
-
-#### Property Value
-
-[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-### **Tokens**
-
-```csharp
-public String[] Tokens { get; set; }
-```
-
-#### Property Value
-
-[String[]](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **TopLogprobs**
-
-```csharp
-public Dictionary`2[] TopLogprobs { get; set; }
-```
-
-#### Property Value
-
-[Dictionary`2[]](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
-
-## Constructors
-
-### **CompletionLogprobs(Int32[], Single[], String[], Dictionary`2[])**
-
-```csharp
-public CompletionLogprobs(Int32[] TextOffset, Single[] TokenLogProbs, String[] Tokens, Dictionary`2[] TopLogprobs)
-```
-
-#### Parameters
-
-`TextOffset` [Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`TokenLogProbs` [Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-`Tokens` [String[]](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`TopLogprobs` [Dictionary`2[]](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(CompletionLogprobs)**
-
-```csharp
-public bool Equals(CompletionLogprobs other)
-```
-
-#### Parameters
-
-`other` [CompletionLogprobs](./llama.types.completionlogprobs.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public CompletionLogprobs <Clone>$()
-```
-
-#### Returns
-
-[CompletionLogprobs](./llama.types.completionlogprobs.md)<br>
-
-### **Deconstruct(Int32[]&, Single[]&, String[]&, Dictionary`2[]&)**
-
-```csharp
-public void Deconstruct(Int32[]& TextOffset, Single[]& TokenLogProbs, String[]& Tokens, Dictionary`2[]& TopLogprobs)
-```
-
-#### Parameters
-
-`TextOffset` [Int32[]&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`TokenLogProbs` [Single[]&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
-
-`Tokens` [String[]&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`TopLogprobs` [Dictionary`2[]&](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2&)<br>
diff --git a/docs/xmldocs/llama.types.completionusage.md b/docs/xmldocs/llama.types.completionusage.md
deleted file mode 100644
index c45092349..000000000
--- a/docs/xmldocs/llama.types.completionusage.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# CompletionUsage
-
-Namespace: LLama.Types
-
-```csharp
-public class CompletionUsage : System.IEquatable`1[[LLama.Types.CompletionUsage, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionUsage](./llama.types.completionusage.md)<br>
-Implements [IEquatable&lt;CompletionUsage&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **PromptTokens**
-
-```csharp
-public int PromptTokens { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **CompletionTokens**
-
-```csharp
-public int CompletionTokens { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **TotalTokens**
-
-```csharp
-public int TotalTokens { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-## Constructors
-
-### **CompletionUsage(Int32, Int32, Int32)**
-
-```csharp
-public CompletionUsage(int PromptTokens, int CompletionTokens, int TotalTokens)
-```
-
-#### Parameters
-
-`PromptTokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`CompletionTokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`TotalTokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(CompletionUsage)**
-
-```csharp
-public bool Equals(CompletionUsage other)
-```
-
-#### Parameters
-
-`other` [CompletionUsage](./llama.types.completionusage.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public CompletionUsage <Clone>$()
-```
-
-#### Returns
-
-[CompletionUsage](./llama.types.completionusage.md)<br>
-
-### **Deconstruct(Int32&, Int32&, Int32&)**
-
-```csharp
-public void Deconstruct(Int32& PromptTokens, Int32& CompletionTokens, Int32& TotalTokens)
-```
-
-#### Parameters
-
-`PromptTokens` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`CompletionTokens` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`TotalTokens` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
diff --git a/docs/xmldocs/llama.types.embedding.md b/docs/xmldocs/llama.types.embedding.md
deleted file mode 100644
index 4f37d7e2e..000000000
--- a/docs/xmldocs/llama.types.embedding.md
+++ /dev/null
@@ -1,160 +0,0 @@
-# Embedding
-
-Namespace: LLama.Types
-
-```csharp
-public class Embedding : System.IEquatable`1[[LLama.Types.Embedding, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Embedding](./llama.types.embedding.md)<br>
-Implements [IEquatable&lt;Embedding&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Object**
-
-```csharp
-public string Object { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Model**
-
-```csharp
-public string Model { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Data**
-
-```csharp
-public EmbeddingData[] Data { get; set; }
-```
-
-#### Property Value
-
-[EmbeddingData[]](./llama.types.embeddingdata.md)<br>
-
-### **Usage**
-
-```csharp
-public EmbeddingUsage Usage { get; set; }
-```
-
-#### Property Value
-
-[EmbeddingUsage](./llama.types.embeddingusage.md)<br>
-
-## Constructors
-
-### **Embedding(String, String, EmbeddingData[], EmbeddingUsage)**
-
-```csharp
-public Embedding(string Object, string Model, EmbeddingData[] Data, EmbeddingUsage Usage)
-```
-
-#### Parameters
-
-`Object` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Model` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Data` [EmbeddingData[]](./llama.types.embeddingdata.md)<br>
-
-`Usage` [EmbeddingUsage](./llama.types.embeddingusage.md)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(Embedding)**
-
-```csharp
-public bool Equals(Embedding other)
-```
-
-#### Parameters
-
-`other` [Embedding](./llama.types.embedding.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public Embedding <Clone>$()
-```
-
-#### Returns
-
-[Embedding](./llama.types.embedding.md)<br>
-
-### **Deconstruct(String&, String&, EmbeddingData[]&, EmbeddingUsage&)**
-
-```csharp
-public void Deconstruct(String& Object, String& Model, EmbeddingData[]& Data, EmbeddingUsage& Usage)
-```
-
-#### Parameters
-
-`Object` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Model` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Data` [EmbeddingData[]&](./llama.types.embeddingdata&.md)<br>
-
-`Usage` [EmbeddingUsage&](./llama.types.embeddingusage&.md)<br>
diff --git a/docs/xmldocs/llama.types.embeddingdata.md b/docs/xmldocs/llama.types.embeddingdata.md
deleted file mode 100644
index 2ee5f636b..000000000
--- a/docs/xmldocs/llama.types.embeddingdata.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# EmbeddingData
-
-Namespace: LLama.Types
-
-```csharp
-public class EmbeddingData : System.IEquatable`1[[LLama.Types.EmbeddingData, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [EmbeddingData](./llama.types.embeddingdata.md)<br>
-Implements [IEquatable&lt;EmbeddingData&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **Index**
-
-```csharp
-public int Index { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Object**
-
-```csharp
-public string Object { get; set; }
-```
-
-#### Property Value
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Embedding**
-
-```csharp
-public Single[] Embedding { get; set; }
-```
-
-#### Property Value
-
-[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-## Constructors
-
-### **EmbeddingData(Int32, String, Single[])**
-
-```csharp
-public EmbeddingData(int Index, string Object, Single[] Embedding)
-```
-
-#### Parameters
-
-`Index` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`Object` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-`Embedding` [Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(EmbeddingData)**
-
-```csharp
-public bool Equals(EmbeddingData other)
-```
-
-#### Parameters
-
-`other` [EmbeddingData](./llama.types.embeddingdata.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public EmbeddingData <Clone>$()
-```
-
-#### Returns
-
-[EmbeddingData](./llama.types.embeddingdata.md)<br>
-
-### **Deconstruct(Int32&, String&, Single[]&)**
-
-```csharp
-public void Deconstruct(Int32& Index, String& Object, Single[]& Embedding)
-```
-
-#### Parameters
-
-`Index` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`Object` [String&](https://docs.microsoft.com/en-us/dotnet/api/system.string&)<br>
-
-`Embedding` [Single[]&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
diff --git a/docs/xmldocs/llama.types.embeddingusage.md b/docs/xmldocs/llama.types.embeddingusage.md
deleted file mode 100644
index b640b3d0f..000000000
--- a/docs/xmldocs/llama.types.embeddingusage.md
+++ /dev/null
@@ -1,132 +0,0 @@
-# EmbeddingUsage
-
-Namespace: LLama.Types
-
-```csharp
-public class EmbeddingUsage : System.IEquatable`1[[LLama.Types.EmbeddingUsage, LLamaSharp, Version=0.2.0.0, Culture=neutral, PublicKeyToken=null]]
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [EmbeddingUsage](./llama.types.embeddingusage.md)<br>
-Implements [IEquatable&lt;EmbeddingUsage&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
-
-## Properties
-
-### **PromptTokens**
-
-```csharp
-public int PromptTokens { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **TotalTokens**
-
-```csharp
-public int TotalTokens { get; set; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-## Constructors
-
-### **EmbeddingUsage(Int32, Int32)**
-
-```csharp
-public EmbeddingUsage(int PromptTokens, int TotalTokens)
-```
-
-#### Parameters
-
-`PromptTokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-`TotalTokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-## Methods
-
-### **ToString()**
-
-```csharp
-public string ToString()
-```
-
-#### Returns
-
-[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **PrintMembers(StringBuilder)**
-
-```csharp
-protected bool PrintMembers(StringBuilder builder)
-```
-
-#### Parameters
-
-`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **GetHashCode()**
-
-```csharp
-public int GetHashCode()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Equals(Object)**
-
-```csharp
-public bool Equals(object obj)
-```
-
-#### Parameters
-
-`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **Equals(EmbeddingUsage)**
-
-```csharp
-public bool Equals(EmbeddingUsage other)
-```
-
-#### Parameters
-
-`other` [EmbeddingUsage](./llama.types.embeddingusage.md)<br>
-
-#### Returns
-
-[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
-
-### **&lt;Clone&gt;$()**
-
-```csharp
-public EmbeddingUsage <Clone>$()
-```
-
-#### Returns
-
-[EmbeddingUsage](./llama.types.embeddingusage.md)<br>
-
-### **Deconstruct(Int32&, Int32&)**
-
-```csharp
-public void Deconstruct(Int32& PromptTokens, Int32& TotalTokens)
-```
-
-#### Parameters
-
-`PromptTokens` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
-
-`TotalTokens` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
diff --git a/docs/xmldocs/logger.md b/docs/xmldocs/logger.md
deleted file mode 100644
index db70fa943..000000000
--- a/docs/xmldocs/logger.md
+++ /dev/null
@@ -1,69 +0,0 @@
-# Logger
-
-Namespace:
-
-```csharp
-public sealed class Logger
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Logger](./logger.md)
-
-## Properties
-
-### **Default**
-
-```csharp
-public static Logger Default { get; }
-```
-
-#### Property Value
-
-[Logger](./logger.md)<br>
-
-## Methods
-
-### **ToConsole()**
-
-```csharp
-public void ToConsole()
-```
-
-### **ToFile(String)**
-
-```csharp
-public void ToFile(string filename)
-```
-
-#### Parameters
-
-`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Info(String)**
-
-```csharp
-public void Info(string message)
-```
-
-#### Parameters
-
-`message` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Warn(String)**
-
-```csharp
-public void Warn(string message)
-```
-
-#### Parameters
-
-`message` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-### **Error(String)**
-
-```csharp
-public void Error(string message)
-```
-
-#### Parameters
-
-`message` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>