Skip to content

Conversation

Lyrcaxis
Copy link
Contributor

There was an issue where executors wouldn't tokenize the special tokens at all, leaving them as text
(e.g.: <|im_start|> would be tokenized as [< + | + im + _ + start + | + >] instead of a single token.)

I also saw no way of doing it via inference parameters. Not sure if that was intended but this PR fixes it.

@martindevans
Copy link
Member

Can you rebase this onto master? The CI is failing due to a mistake I made earlier (merging a bad commit to master)

@martindevans martindevans merged commit f01c13e into SciSharp:master Apr 20, 2024
@Lyrcaxis Lyrcaxis deleted the fix-tokenization-issues branch April 21, 2024 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants