Learning about LLM token counters

Just one of the things I'm learning. https://github.com/hchiam/learning

Could be used to tell the user ahead of time that there’s too many tokens in the input.

The demo of this repo lets you check for a few different LLMs.

Notes

For example: here’s OpenAI token counter that could be implemented in JS with js-tiktoken:

import { getEncoding, encodingForModel } from "js-tiktoken";
const tokenCount = getEncoding(modelName).encode(text).length;

Or maybe for other models, use @xenova/transformers:

import { AutoTokenizer } from "@xenova/transformers";
const tokenizer = await AutoTokenizer.from_pretrained(modelName);
const { input_ids } = await tokenizer(text);
const tokenCount = input_ids.size; // ?

Or maybe use llama-tokenizer-js for Meta LLama:

import llamaTokenizer from "llama-tokenizer-js";
const tokenCount = llamaTokenizer.encode(text).length;

Demos

To run repo's demo locally: you need yarn and vite so you can run cd demo; yarn dev; --> http://localhost:5173/

Or just go to this live demo: https://hchiam-llm-token-count.surge.sh/

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
demo		demo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
jsconfig.json		jsconfig.json
screenshot1.png		screenshot1.png
screenshot2.png		screenshot2.png
screenshot3.png		screenshot3.png
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning about LLM token counters

Notes

Demos

About

Uh oh!

Releases

Packages

Uh oh!

License

hchiam/learning-llm-token-counter

Folders and files

Latest commit

History

Repository files navigation

Learning about LLM token counters

Notes

Demos

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages