Pinned Loading
-
hiyouga/LLaMA-Factory
hiyouga/LLaMA-Factory PublicUnified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
lm-sys/FastChat
lm-sys/FastChat PublicAn open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
-
BerriAI/litellm
BerriAI/litellm PublicPython SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
-
NVIDIA/TensorRT-LLM
NVIDIA/TensorRT-LLM PublicTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
-
microsoft/LLMLingua
microsoft/LLMLingua Public[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
If the problem persists, check the GitHub status page or contact support.