Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. LeetCUDA LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

    Cuda 5.9k 619

  2. lite.ai.toolkit lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

    C++ 4.2k 754

  3. LLM-Infra LLM-Infra Public

    📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

    Python 4.3k 297

  4. DiT-Infra DiT-Infra Public

    📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

    Python 354 18

  5. torchlm torchlm Public

    💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉

    Python 265 25

  6. ffpa-attn ffpa-attn Public

    ⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, ~2x↑ vs SDPA EA.🎉

    Cuda 201 9

Repositories

Showing 10 of 33 repositories
  • DiT-Infra Public

    📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

    xlite-dev/DiT-Infra’s past year of commit activity
    Python 354 GPL-3.0 18 0 0 Updated Aug 5, 2025
  • LLM-Infra Public

    📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

    xlite-dev/LLM-Infra’s past year of commit activity
    Python 4,338 GPL-3.0 297 0 0 Updated Aug 5, 2025
  • ffpa-attn Public

    ⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, ~2x↑ vs SDPA EA.🎉

    xlite-dev/ffpa-attn’s past year of commit activity
    Cuda 201 GPL-3.0 9 2 0 Updated Aug 5, 2025
  • diffusers Public Forked from huggingface/diffusers

    🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

    xlite-dev/diffusers’s past year of commit activity
    Python 0 Apache-2.0 6,235 0 0 Updated Aug 5, 2025
  • cache-dit Public Forked from vipshop/cache-dit

    🤗A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers🔥

    xlite-dev/cache-dit’s past year of commit activity
    Python 4 4 0 0 Updated Aug 5, 2025
  • lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

    xlite-dev/lite.ai.toolkit’s past year of commit activity
    C++ 4,198 GPL-3.0 754 0 0 Updated Aug 5, 2025
  • .github Public
    xlite-dev/.github’s past year of commit activity
    1 0 0 0 Updated Aug 5, 2025
  • SageAttention Public Forked from thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    xlite-dev/SageAttention’s past year of commit activity
    Cuda 0 Apache-2.0 180 0 0 Updated Aug 5, 2025
  • pytorch Public Forked from pytorch/pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

    xlite-dev/pytorch’s past year of commit activity
    Python 0 25,484 0 0 Updated Aug 5, 2025
  • LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

    xlite-dev/LeetCUDA’s past year of commit activity
    Cuda 5,880 GPL-3.0 619 3 0 Updated Aug 1, 2025

Most used topics

Loading…