link

러스트 genai
- https://github.com/YoungHaKim7/rust-genai
- Original Code https://github.com/jeremychone/rust-genai
CUDA여기에 정리중.
- https://github.com/YoungHaKim7/rust_CUDA_training

머신러닝 전용 TPU(TPU 심층 분석)

NVIDIA CUDA 용도별 정리numpy같은 라이브러리들

머신러닝 연구 자료가 잘 정리됨.https://www.natolambert.com/writing/debugging-mbrl
외부링크)머신러닝 논문정리 잘된 채널_임커밋
- CNN
  - (외부링크)20분만에 이해하는 CNN 】 | 임커밋
  - (외부링크)230725_Deep Learning 101 합성곱신경망 CNN, Convolutional Neural Network | 신박Ai
DeekSeek분석 자료
- (외부링크)250220_DeepSeek 기술분석 Workshop | DeepSeek 경쟁력의 비밀을 정밀하게 뜯어본다 (1/2) | TERA KAIST
- (외부링크)250220_DeepSeek 기술분석 Workshop | DeepSeek 경쟁력의 비밀을 정밀하게 뜯어본다 (2/2) | TERA KAIST
  - GeekNews에 올라온 Deepseek 파트 5개짜리 굿^^
    - DeepSeek 추론 엔진 오픈소스를 향한 여정
(외부링크) 250320_What Are AI Agents Really About? | ByteByteGo
(외부링크) 250219 What is MCP? Integrate AI Agents with Databases & APIs | IBM Technology
250419_Microsoft, CPU에서 실행가능한 초고효율 AI 모델 BitNet 개발

RAG 세상을 헤엄치는 사람들을 위한 가이드북

Ollama쓸만한거 요즘 잘 쓰는중241217

러스트Rust로 구현한 머신 러닝
- CubeCL - CUDA, ROCm, WGPU를 위한 Rust 기반 GPU 커널

C++로 구현한 머신 러닝
- (241217)드디어 올라옴 이걸 러스트 코드로 만들면 대박이요 ㅋㅋGN⁺: C++와 CUDA를 사용하여 처음부터 LLM 추론 엔진 만들기
C로 구현한 머신 러닝
- 250617(외부링크)Machine learning in C was a mistake | Faisal's Devlog

요즘 핫한 기술 1bits집중하자. 엔비디아 float종류 알아보기

Computer Science 관점에서 머신 러닝 이해하기

Compiling CUDA with clang

C++ examples for the Vulkan graphics API

https://github.com/Rust-GPU/VulkanShaderExamples

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧|🔝|

https://github.com/Rust-GPU/rust-gpu

GPU에 대한 기본 팩트들|🔝|

GPU는 연산 속도가 메모리 접근 속도보다 월등히 빨라서, 메모리 계층 구조가 성능의 병목을 일으킴
연산 집약도(Arithmetic Intensity, AI) 에 따라 연산이 메모리 바운드, 계산 바운드 상태로 구분되며, A100 GPU의 임계점은 약 13 FLOPs/Byte임
성능 최적화 주요 전략으로 연산 합치기(Fusio…

머신러닝 전용 TPU(TPU 심층 분석)|🔝|

250623TPU 심층 분석
TPU는 구글이 개발한 대규모 AI 학습 및 추론용 맞춤형 칩으로, GPU와는 다른 설계 철학을 가지고 있음
확장성과 에너지 효율성을 강조하며, 하드웨어(예: 시스템 온칩 구성, 대형 온칩 메모리)와 소프트웨어(** XLA 컴파일러**)를 함께 설계함
핵심 구조는 시스톨릭 어레이와 파…

Cuda(nvidia의 용도별 정리|🔝|

55분 부터 보면 잘 정리됨
- https://www.youtube.com/live/_waPvOwL9Z8?si=YGTqNQXqNVldUdwy


cuPYNUMERIC	NUMERICAL COMPUTING
cuLITHO	COMPUTATIONAL LITHOGRAPHY
AERIAL	5G/6G SIGNAL PROCESSING
cuOPT	DECISION OPTIMIZATION
PARABRICK	GENE SEQUENCING
MONAI	MEDICAL IMAGING 문서 https://docs.monai.io/en/stable/networks.html
EARTH-2	WEATHER ANALYTICS
cuQUANTUM CUDA-Q	QUANTUM COMPUTING
cuEQUIVARIANCE cuTENSOR	QUANTUM CHEMISTRY

깔끔하게 글로 정리 됨
- https://medium.com/@vmule942/cuda-x-for-every-industry-revolutionizing-computing-across-domains-82ef84b4f59a

DeepSeek 추론 엔진 오픈소스를 향한 여정

DeepSeek 팀이 내부 추론 엔진(DeepSeek Inference Engine)을 오픈소스로 환원하기 위한 계획을 공개함
기존의 추론 엔진은 vLLM 기반이며, DeepSeek-V3 및 R1 모델의 배포 수요 증가에 따라 공유를 고려중
기존 코드와 인프라 종속성, 유지보수 부담 등으로 전체 공개는 어려움, 대신 **모듈화 및 기…

DeepSeek의 분산 파일 시스템 3FS 소개

3FS는 DeepSeek가 개발한 고성능 오픈소스 분산 파일 시스템으로, 대규모 데이터 처리와 높은 처리량을 지원함
일반적인 파일 시스템처럼 동작하지만, 실제로는 여러 머신에 데이터를 분산 저장하며 사용자는 이를 의식하지 않아도 되는 추상화 구조를 가짐
**4가지 주요 구성 요소 (Meta, Mgmtd, Sto…

▲DeepSeek, 3FS 파일시스템 과 Smallpond 데이터 처리 프레임워크 오픈소스 공개 (5 of 5) (github.com/deepseek-ai)|🔝|

250228
Fire-Flyer File System(3FS)는 AI 학습 및 추론 워크로드를 처리하기 위해 설계된 고성능 분산 파일 시스템 최신 SSD 및 RDMA 네트워크를 활용하여 공유 스토리지 계층을 제공하고, 분산 애플리케이션 개발을 단순화함
- https://news.hada.io/topic?id=19489

DeepSeek, 최적화된 병렬 처리 전략 오픈소스 3가지 공개 (4 of 5)

DeepSeek V3/R1에서 사용했던 전략 및 코드들
- DualPipe : 계산-통신 오버랩을 위한 양방향 파이프라인 병렬화 알고리듬
- EPLB: Expert-Parallel 로드밸런서
- Profile-Data: DeepSeek 인프라의 데이터 프로파일링으로 계산-통신 오버랩을 분석

DualPipe

DeepSeek, DeepGEMM 오픈소스 공개 (3 of 5) (github.com/deepseek-ai)|🔝|

https://news.hada.io/topic?id=19444 3P by xguru 2일전 | ★ favorite
FP8 행렬 곱셈(GEMM) 을 효율적으로 수행하는 라이브러리로, DeepSeek-V3에서 제안된 미세 조정 스케일링(fine-grained scaling) 방식을 지원함 일반 GEMM과 Mix-of-Experts(MoE) 그룹화 GEMM을 모두 지원 CUDA 기반으로 구현되었으며, 설치 시 별도 컴파일 없이 경량 Just-In-Time(JIT) 모듈을 사용하여 런타임에서 커널을 컴파일함 현재 NVIDIA Hopper 텐서 코어 전용으로 지원 FP8 텐서 코어의 부정확한 누적 연산을 보완하기 위해 CUDA 코어 기반 이중 누적(promotion) 사용 CUTLASS 및 CuTe의 일부 개념을 활용하지만, 복잡한 템플릿 의존성을 줄여 약 300줄의 커널 코드만 포함하는 단순한 설계 Hopper FP8 행렬 연산 및 최적화 기법을 학습하기에 적합 경량 설계에도 불구하고 다양한 행렬 크기에서 전문가 수준으로 튜닝된 라이브러리와 유사하거나 더 나은 성능을 보임

DeepSeek, DeepEP 오픈소스 공개 (2 of 5) (github.com/deepseek-ai)|🔝|

https://news.hada.io/topic?id=19421 3P by xguru 3일전 | ★ favorite | 댓글과 토론
Mixture-of-Experts(MoE) 및 Expert Parallelism(EP)을 위한 고성능 통신 라이브러리 GPU 기반 All-to-All 커널을 제공하여 MoE 디스패치 및 결합 연산을 고속으로 처리 FP8과 같은 저정밀 연산 지원 DeepSeek-V3 논문에서 제안한 그룹 제한 게이팅(group-limited gating) 알고리즘을 적용하여 비대칭 도메인 대역폭 포워딩을 최적화 예: NVLink → RDMA 데이터 전송 최적화 훈련 및 추론 프리필링(prefilling) 작업에 적합한 높은 처리량 제공 지연시간에 민감한 추론 디코딩을 위해 RDMA 전용 저지연 커널 포함 통신-연산 오버랩 기법 제공 (SM 리소스를 점유하지 않음)

DeepSeek, FlashMLA 오픈소스 공개 (1 of 5) (github.com/deepseek-ai)

https://news.hada.io/topic?id=19401 5P by xguru 4일전 | ★ favorite | 댓글 2개
Hopper GPU를 위한 효율적인 MLA 디코딩 커널 가변 길이 시퀀스 서빙을 위해 최적화 됨 현재 릴리즈 된 것 BF16 64 블록사이즈 Paged kvcache 벤치마크: CUDA 12.6을 사용하여 H800 SXM5에서 메모리 바운드 구성에서 최대 3000GB/s, 연산 바운드 구성에서 580 TFLOPS를 달성 FlashAttention 2&3 와 cutlass 에서 영감을 받음 DeepSeek Open Infra 로 공개되는 5개 오픈소스 중 첫번째 임
https://github.com/deepseek-ai/FlashMLA

(250203)머신러닝 모든 모델 설명 👍 굿|🔝|

All Machine Learning Models Clearly Explained! | AI For Beginners
- https://youtu.be/0YdpwSYMY6I?si=Kr2-FYBC6273a9RN
Transformers (how LLMs work) explained visually | DL5 | 3Blue1Brown
- https://youtu.be/wjZofJX0v4M?si=50KTkZEfDVPazy7o

huggingface.co 모델 다운 받는 방법|🔝|

https://huggingface.co/TheBloke/LLaMA-13b-GGUF

pip3 install huggingface-hub

huggingface-cli download TheBloke/LLaMA-13b-GGUF llama-13b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

(241217)드디어 올라옴 이걸 러스트 코드로 만들면 대박이요 ㅋㅋGN⁺: C++와 CUDA를 사용하여 처음부터 LLM 추론 엔진 만들기|🔝|

https://andrewkchan.dev/posts/yalm.html
- https://github.com/andrewkchan/yalm
  - 여기서 fork해서 만듬 https://github.com/zeux/calm
- C++와 CUDA를 사용하여 라이브러리 없이 LLM 추론 엔진을 구축하는 방법
- 이를 통해 LLM 추론의 전체 스택을 이해하고, 다양한 최적화가 추론 속도에 미치는 영향을 실감할 수 있음
- 목표 : 단일 CPU + GPU 서버에서 단일 배치로 빠르게 추론할 수 있도록 모델을 구현하고 llama.cpp보다 빠른 토큰 처...

Run LLaMA inference on CPU, with Rust 🦀🚀🦙|🔝|

https://github.com/rustformers/llama-rs
- 이게 맞는주소? https://github.com/rustformers/llm
Inference Llama 2 in one file of pure Rust 🦀
- https://github.com/gaxler/llama2.rs

Fast ML inference & training for ONNX models in Rust(컴퓨터 비젼 찾다가 알게 됨yolo)|🔝|

ort
Fast ML inference & training for ONNX models in Rust
- Rust bindings for ONNX Runtime
  - https://crates.io/crates/ort
  - https://github.com/pykeio/ort
가이드북 ort is an open-source Rust binding for ONNX Runtime.

Artificial_Intelligence(NLP, Natural Language Processing models and pipelines.)|🔝|

burn-candle
- Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.
  - burn https://github.com/tracel-ai/burn
- https://crates.io/crates/burn-candle
- https://github.com/tracel-ai/burn/tree/main/crates/burn-candle
- burn https://github.com/tracel-ai/burn
rust-bert
- Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
- https://github.com/guillaume-be/rust-bert
- https://crates.io/crates/rust-bert

Rust MachineLearning |🔝|

linfa
- A Rust machine learning framework.
- https://github.com/rust-ml/linfa
- https://crates.io/crates/linfa
ndarray
- ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
- https://github.com/rust-ndarray/ndarray
- https://crates.io/crates/ndarray

dfdx: shape checked deep learning in rust|🔝|

dfdx
https://github.com/coreylowman/dfdx
- dfdx v0.11.0: ergonomic GPU accelerated deep learning ENTIRELY in rust!
  - https://coreylowman.github.io/2023/03/15/release-0.11.0.html

Minimalist ML framework for Rust|🔝|

https://github.com/huggingface/candle

ollama 쓸만한거|🔝|

# llama3.3(가정용 컴퓨터로 405B모델을 경험 가능 지금은 아주 느리다. 241212
# New state of the art 70B model. Llama 3.3 70B offers similar performance compared to Llama 3.1 405B model.
ollama run llama3.3

# ollam 프로세서 잘 실행 되는지 확인
# pgrep ollama
6327
6521

# ollam "/bye" 로 종료 시키고 서비스 종료 시키기 
$ systemctl stop ollama.service

# 4.7GB
ollama run llama3.1

# 26GB
ollama run mixtral:8x7b

# 39GB
ollama run llama3.1:70b

# 79GB
ollama run mixtral:8x22b

(C++코드로 머신러닝 잘 설명됨.)Snake learns with NEUROEVOLUTION (implementing NEAT from scratch in C++) |Tech With Nikola|🔝|

https://youtu.be/lAjcH-hCusg?si=eeEWJpy3SacoQYAb
- 역시 핵심은 Sigmoid 함수와 Bias를 활용하는것!

1bit에 집중하자 NVIDIA도 이제 끝이네|🔝|

Will NVIDIA Survive The Era of 1-Bit LLMs? | Finxter
- https://youtu.be/HGbTAV8RoZQ?si=X6qZbabhOkAOj8Xr
Matrix Multiplication is AI - What 1.58b LLMs Mean for NVIDIA | Finxter
- https://youtu.be/isOcqRuJkAo?si=zlzqt5gaTdc7y3LX

	Blackwell	Hopper
Supported Tensor Core precisions	FP64, TF32, BF16, FP16, FP8, INT8, FP6, FP4	FP64, TF32, BF16, FP16, FP8, INT8
Supported CUDA* Core precisions	FP64, FP32, FP16, BF16	FP64, FP32, FP16, BF16, INT8

Nvidia Blackwell Deep Dive (GB 200 NVL72) $125B Revenue Projection
- https://youtu.be/gxRGPTv82AY?si=AplrKMKzFXO5qOOw

Microsoft, CPU에서 실행가능한 초고효율 AI 모델 BitNet 개발|🔝|

Microsoft 연구진이 BitNet b1.58 2B4T라는 초효율적인 AI 모델을 개발했음
1비트 양자화를 통해 높은 속도와 낮은 메모리 사용량 달성하여 CPU에서도 실행 가능하며 MIT 라이선스로 공개됨
Appl…
https://news.hada.io/topic?id=20406

NVIDIA칩 자세히 알아보기(240617)|🔝|

https://youtu.be/0v-W7TM6NCE?si=eVgokdiNLwQA0Tob

NVIDIA는 16-bit Float(FP16/BF16) 부동 소수점에 최적화 되어있어서|🔝|

완전히 다른 방식으로 접근하고 있다.
- Develop optimized kernels for 1-bit operations
- Use FPGAs or ASICs for 1-bit operations

BitNet b1.58(This Work). vs 16-bit Float(FP16/BF16)|🔝|

9min 46s 참고

Why BitNet b1.58?|🔝|

Each cell only three values:
- { -1, 0 ,1 }
- How many bits are needed to differentiate three equally likely states?

$$Log_2(3) = 1.58$$

https://youtu.be/HGbTAV8RoZQ?si=0Bu_ovLzuTI9SRWR

(24년 04월경쯤)GN⁺: 1비트 LLM 시대: 비용 효율적인 컴퓨팅을 위한 삼진 파라미터 (arxiv.org)|🔝|

https://news.hada.io/topic?id=13573

벡터 DB의 개념잡기 & LLM의 정의|🔝|

출처 : http://www.itdaily.kr/news/articleView.html?idxno=220008
LLM은
- 딥러닝의 한 종류로 벡터 데이터로 이루어진 언어모델이라는 것이다. 실제 벡터 데이터로 이뤄진 LLM에는 이미 데이터들을 저장하는 벡터 DB가 내장돼 있다.
벡터 DB는
- 유사한 벡터값끼리 군집을 형성한 딥러닝 모델을 학습시키기 위해선 벡터 데이터에 특화된 데이터 저장소가 필요하다. 이러한 요구에 따라 정형화된 데이터가 아닌 비정형 데이터를 빠르게 벡터화(임베딩, Embedding)해 저장하고 읽을 수 있는 벡터 DB가 등장했다.
벡터DB의 장점 4가지
- △벡터 데이터 처리
- △벡터 검색 및 유사성 분석
- △대용량 벡터 데이터 처리
- △벡터 데이터 갱신 등에 특화된 장점을 갖고 있다.

1. △벡터 데이터 처리
- 벡터 데이터를 처리하는 데 특화돼 있다. 생성형 AI에서는 주로 벡터 데이터가 이용된다. LLM 내 토큰이 벡터화돼 내장된 벡터 DB에 저장돼 있기 때문이다. 생성형 AI에서는 주로 이미지, 음성, 텍스트 등 비정형 데이터가 벡터 형태로 변환돼 처리되는데, 벡터 DB는 벡터 데이터를 효과적으로 저장하고 관리할 수 있다.
1. △벡터 검색 및 유사성 분석
- 벡터 검색 및 유사성 분석에 특화된 기능을 제공한다는 점이다. 이에 대해 EDB 측 관계자는 “생성형 AI에서는 벡터 데이터 간의 유사성을 분석하거나 벡터 데이터 검색이 필요한 경우가 많다. 벡터 DB는 이러한 요구사항에 특화된 기능을 제공해 벡터 데이터 간 유사성을 효과적으로 계산하고 검색할 수 있다. 따라서 생성형 AI에서는 벡터 DB를 통해 빠르고 정확한 벡터 검색 및 유사성 분석이 가능하다”고 설명했다.
1. △대용량 벡터 데이터 처리
- 세 번째로는 대용량 벡터 데이터 처리가 가능하다는 점이다. 생성형 AI 모델은 대용량의 벡터 데이터를 다룬다. 벡터 DB는 대용량의 벡터 데이터를 효과적으로 저장하고 처리할 수 있는 기능을 제공하기 때문에 생성형 AI 모델의 대규모 데이터에 대응할 수 있다. 널리 쓰이는 관계형 데이터베이스(RDB)는 벡터 데이터의 길이나 형태의 다양성, 쿼리 및 인덱싱, 데이터 모델 무결성 제약 등 때문에 벡터 데이터를 저장하기에 적합하지 않다.
1. △벡터 데이터 갱신
- 벡터 DB는 벡터 데이터의 갱신 및 쿼리에 대한 기능을 제공한다. 벡터 DB는 일반적으로 단독으로 쓰이지 않는다. 대부분의 경우 LLM을 벡터 DB가 연결된 랭체인(LangChain)이라는 언어 데이터를 수집하고 저장하는 플랫폼을 연결해 이용하는 구조다. 데이터 소스. 단어 임베딩, 벡터 DB 등을 LLM과 연결하는 매개라고 볼 수 있다. 개별로 구축된 벡터 DB에 최신의 데이터를 임베딩해 저장하면 LLM 재학습 하지 않고도 최신 데이터를 LLM에 적용할 수 있게 된다.
  - “생성형 AI의 치명적인 문제로 꼽히는 환각 현상은 상당 부분 데이터 최신화 문제 때문에 발생한다. LLM은 특정 시점까지 학습된 데이터로 구축된다. 때문에 학습 데이터를 꾸준히 최신화해야 한다. 그러나 학습 데이터를 최신화 하기 위해서는 인프라 비용, GPU 비용, 인력 투입 등 많은 비용과 시간이 필요하다. 이런 문제의 상당부분을 벡터 DB로 해결할 수 있다”
벡터DB의 차이점
- 벡터 DB는 사실 타 DBMS와 큰 차이가 없다. 다만 다루는 데이터의 성격과 처리 방법이 다르다. 벡터 DB는 주로 실수(Real Number) 형태의 데이터가 포함된다. 또 실수 형태의 데이터를 기반으로 유사도가 높은 결과를 추출하기 위해 다양한 방법을 제공할 수 있다”면서 “벡터화된 데이터 간의 유사도를 측정하는 데에는 주로 ‘코사인 유사도(Cosine Similarity)’와 ‘유크리드 거리(Euclidean Distance)’ 등의 측정 방법이 활용된다. 코사인 유사도는 두 벡터 사잇각을 통해 벡터 데이터가 얼마나 유사도가 있는지 측정하는 방법이며, 유크리드 거리는 평면에서의 두 벡터값 사이의 직선거리를 측정해 값 사이의 유사도를 파악하는 방법이다”라고 설명했다.
- “벡터 DB는 의미 기반의 쿼리를 가능하게 한다. 기존 DBMS는 각 스키마의 관계를 통해 데이터를 효율적으로 추출하는 데 중점을 두지만, 벡터 DB는 수치화된 벡터의 의미를 효과적으로 추출하는 데 중점을 둔다. 수치화된 데이터 저장과 유사도 측정을 위한 다양한 알고리즘을 제공하는 DB가 바로 벡터 DB다”

초보자를 위한 Vector Embeddings 가이드 (timescale.com)|🔝|

https://news.hada.io/topic?id=15094&utm_source=weekly&utm_medium=email&utm_campaign=202423
- 26P by xguru 24.05.31.
벡터 임베딩의 종류
- 단어 임베딩: NLP에서 단어를 표현하며, 단어 간의 의미적 관계를 캡처함. 언어 번역, 단어 유사성, 감정 분석 등에 사용됨.
- 문장 임베딩: 문장의 의미와 문맥을 캡처하며, 정보 검색, 텍스트 분류, 감정 분석 등에 사용됨.
- 문서 임베딩: 보고서나 기사 같은 문서의 내용을 캡처하며, 추천 시스템, 정보 검색, 문서 유사성 및 분류 등에 사용됨.
- 그래프 임베딩: 그래프의 노드와 엣지를 벡터 공간에 표현하며, 노드 분류, 커뮤니티 인식, 링크 예측 등에 사용됨.
- 이미지 임베딩: 이미지의 다양한 측면을 표현하며, 콘텐츠 기반 추천 시스템, 이미지 및 객체 인식, 이미지 검색 시스템 등에 사용됨.
- 제품 임베딩: 디지털 제품이나 물리적 제품을 표현하며, 제품 추천 및 분류 시스템, 제품 검색 등에 사용됨.
- 오디오 임베딩: 오디오 신호의 리듬, 톤, 피치 등을 표현하며, 감정 감지, 음성 인식, 음악 추천 등에 사용됨.
신경망이 임베딩을 생성하는 방법
- 표현 학습: 신경망이 고차원 데이터를 저차원 공간으로 매핑하여 중요한 특성을 보존함.
- 훈련 과정: 신경망이 데이터를 의미 있는 임베딩으로 변환하도록 학습함. 이는 뉴런의 가중치와 바이어스를 조정하는 과정에서 이루어짐.
- 예시: 영화 리뷰의 긍정/부정 분류를 위한 신경망에서 단어 임베딩이 학습됨. "good"과 "excellent" 같은 단어는 유사한 임베딩을 가지게 됨.
벡터 임베딩의 작동 원리
- 벡터 공간: 객체나 특징을 다차원 벡터 공간의 점으로 표현하며, 유사한 항목은 가까이 위치함.
- 거리 측정: 유클리드 거리, 코사인 유사도 등을 사용하여 벡터 간의 관계를 정량화함.
- 예시: "cat"과 "dog"의 벡터는 "cat"과 "car"의 벡터보다 더 가까이 위치함.
벡터 임베딩을 활용한 개발
- 챗봇: 사용자 쿼리에 더 잘 응답하고, 문맥적으로 관련된 응답을 생성하며, 일관된 대화를 유지함.
- 시맨틱 검색 엔진: 키워드 매칭 대신 의미적 유사성에 기반한 검색 결과를 제공함.
- 텍스트 분류 시스템: 문서를 구문과 단어에 따라 분류함.
- 추천 시스템: 키워드와 설명의 유사성에 따라 콘텐츠를 추천함.
데이터에 대한 벡터 임베딩 생성 방법
- 데이터 수집: 텍스트, 오디오, 이미지, 시계열 데이터 등 다양한 데이터를 수집함.
- 데이터 전처리: 토큰화, 노이즈 제거, 이미지 크기 조정, 정규화 등 데이터를 분석에 적합하게 처리함.
- 데이터 분할: 텍스트를 문장이나 단어로, 이미지를 세그먼트로, 시계열 데이터를 간격으로 나눔.
- 벡터화: 각 데이터 조각을 벡터로 변환함. 텍스트 데이터는 OpenAI의 텍스트 임베딩 모델, 이미지 데이터는 CNN 모델, 오디오 데이터는 스펙트로그램 등을 사용함.
벡터 임베딩 저장 방법
- 벡터 데이터베이스: 벡터 데이터를 효율적으로 저장하고 검색할 수 있는 데이터베이스 사용.
- PostgreSQL: 벡터 데이터를 다른 관계형 데이터와 함께 저장할 수 있음. pgvector 확장을 사용하여 벡터를 저장하고 쿼리할 수 있음.
그 외에 좋은글

역시 갓 c언어|🔝|

llm.c, 이제 멀티GPU 트레이닝을 지원하며 PyTorch보다 ~7% 빠름
Andrej Karpathy가 순수 C/CUDA로 만든 간단한 LLM 훈련 코드
이제 멀티 GPU 트레이닝을 bfloat16으로 Flash Attention과 함께 수행
~3000 라인의 C/CUDA 코드로 구현되었으며, 전반적으로 PyTorch보다 7% 정도까지 빠름
지금까지 작업한 내용들
- 혼합 정밀도 훈련(bfloat16)
- 정규화된...

파이토치 bye bye 👋 존나게 구린 파이토치 ㅋㅋㅋ 그동안 참고 쓰느라 힘들었다 ㅋㅋ 더럽고 치사해서 더 공부해서 러스트로 만들어 보자 ㅋㅋ|🔝|

https://news.hada.io/topic?id=14228

바로 해봐야지|🔝|

karpathy/llm.c#344

MachineLearning_Tutorial|🔝|

Introduction to Deep Learning
- https://github.com/sjchoi86/intro-dl
Lecture notes on Bayesian deep learning
- https://github.com/sjchoi86/bayes-nn
파이토치 구리지만 아직 기득권이니 공부하자( PyTorch for Deep Learning & Machine Learning – Full Course | freeCodeCamp.org
- https://youtu.be/V_xro1bcAuA?si=ZVKXLB8Q6kwugdm2

LLM -> LMM으로 패러다임 전환 중~~|🔝|

231012_LLM은 옛말...이미지까지 학습한 'LMM' 뜬다
- https://www.aitimes.com/news/articleView.html?idxno=154291
- 대형언어모델(LLM)'에 이어 앞으로는 '대형멀티모달모델(LMM)
- LLM(Large Language Models)
- Multimodality and Large Multimodal Models (LMMs)
  - https://huyenchip.com/2023/10/10/multimodal.html
231014_모든 DB는 머지않아 벡터 데이터베이스가 될 것이다 (nextword.substack.com)
- https://news.hada.io/topic?id=11263&utm_source=discord&utm_medium=bot&utm_campaign=1480
- https://nextword.substack.com/p/vector-database-is-not-a-separate
231014_Llama 2 Everywhere (L2E) - 스탠드얼론, 바이너리 포터블, 부팅 가능한 Llama 2 (github.com/trholding)
- https://news.hada.io/topic?id=11285&utm_source=discord&utm_medium=bot&utm_campaign=1480
- https://github.com/trholding/llama2.c

Jupyter 노트북 러스트로 빠르게 돌리기|🔝|

https://racum.blog/articles/rust-jupyter/
First, you need to download and build the kernel itself via cargo:

$ cargo install --locked evcxr_jupyter

Then, use its binary to automatically install it inside Jupyter:

$ evcxr_jupyter --install

Rust+WASM으로 이기종 Edge에서 빠르고 포터블한 Llama2 추론 실행하기 (secondstate.io)|🔝|

https://news.hada.io/topic?id=11847&utm_source=discord&utm_medium=bot&utm_campaign=1480

m1 macOS pytorch install

https://pytorch.org/get-started/locally/

h2oGPT - 완전한 오픈소스 GPT (github.com/h2oai)|🔝|

https://github.com/h2oai/h2ogpt
- 한글로 된 뉴스 기사‘노트북으로도 뚝딱’··· 로컬 시스템용 LLM 도구 5종 따라잡기 https://www.ciokorea.com/news/305929?page=0,0
https://news.hada.io/topic?id=9105
노트북으로도 뚝딱’··· 로컬 시스템용 LLM 도구 5종 따라잡기 (원문보기:https://www.ciokorea.com/news/305929?page=0,0#csidx306f4ef0cca5c53bd4ca4c68ff36480 )

llama2를 파인 튜닝 하고 있는 사람들|🔝|

https://news.hada.io/topic?id=10898&utm_source=discord&utm_medium=bot&utm_campaign=1480
- https://economiceco.tistory.com/18790

JS강의 No Black Box Machine Learning Course – Learn Without Libraries|🔝|

https://youtu.be/vDDjtwQDw2k?si=exYH6L2aHAYEqGTJ

Machine Learning & Neural Networks without Libraries – No Black Box Course
- https://youtu.be/3wwiOSxDAmg?si=FndfDStC4CRGoDWM

AlphaGo - The Movie | Full award-winning documentary|🔝|

https://youtu.be/WXuK6gekU1Y?si=D9ZPN7Lxc6icN2g9

Building a neural network FROM SCRATCH (no Tensorflow/Pytorch, just numpy & math)|🔝|

C언어로 Tesorflow/Pythorch 라이브러리 안 쓰고 신경망 구축하기 꼭 해보자❤

https://youtu.be/w8yWXqWQYmU

신경망 수학 그림으로 다 이해하기 - 복잡한 신경망도 다 이해된다 !!! 최고Why Neural Networks can learn (almost) anything | Emergent Garden|🔝|

https://youtu.be/0QczhVg5HaI

Dalai - Automatically install, run, and play with LLaMA on your computer|🔝|

What is Dalai?

It lets you one-click install LLaMA on your machine. No need to bother building cpp files, cloning GitHub, and downloading files and stuff. Everything is automated. Dalai is a tool in the Large Language Model Tools category of a tech stack. Dalai is an open source tool with GitHub stars and GitHub forks. Here’s a link to Dalai's open source repository on GitHub

https://cocktailpeanut.github.io/dalai/#/

https://stackshare.io/dalai?utm_source=weekly_digest&utm_medium=email&utm_campaign=03292023&utm_content=new_tool

한국에 누군가 올린 게시판 글

https://www.ddengle.com/board_free/19129866

The Pile is a large, diverse, open source language modelling data set|🔝|

https://github.com/EleutherAI/the-pile

brew install libtorch(macOS)|🔝|

pytorch 실행전 이거 먼저 실행할 것 !!!

export LIBTORCH='/opt/homebrew/Cellar/pytorch/1.13.1'

export LD_LIBRARY_PATH=$LIBTORCH:$LD_LIBRARY_PATH


echo $LD_LIBRARY_PATH
/opt/homebrew/Cellar/pytorch/1.13.1:

Rust Artificial Intelligence (The Simple Way)|🔝|

https://youtu.be/StMP7g-0wK4

https://github.com/guillaume-be/rust-bert

The AI community building the future.|🔝|

https://huggingface.co/

How to Build a Machine Learning Model in Rust|🔝|

https://www.freecodecamp.org/news/how-to-build-a-machine-learning-model-in-rust/

Rust Machine Learning Book|🔝|

https://rust-ml.github.io/book/

Unicode (Vim Plug-in)|🔝|

https://github.com/chrisbra/unicode.vim

Ex commands:

:UnicodeTable    - Print Unicode Table in new window
:Digraphs        - Search for specific digraph char
:UnicodeSearch   - Search for specific unicode char
:UnicodeSearch!  - Search for specific unicode char (and add at current cursor position)
:UnicodeName     - Identify character under cursor (like ga command)
:DownloadUnicode - Download (or update) Unicode data
:UnicodeCache    - Create cache file

Normal mode commands:


<C-X><C-G>  - Complete Digraph char
<C-X><C-Z>  - Complete Unicode char
<F4>        - Combine characters into digraphs
Scripting Functions:
unicode#FindUnicodeBy() - Find unicode characters
unicode#FindDigraphBy() - Find Digraph char
unicode#Digraph()       - Returns digraph char
unicode#UnicodeName()   - Identifies unicode character (by value)

Natural Language Processing for Rust|🔝|

https://github.com/lexi-sh/rs-natural

This repository is a list of machine learning libraries written in Rust. It's a compilation of GitHub repositories, blogs, books, movies, discussions, papers, etc. 🦀|🔝|

https://github.com/vaaaaanquish/Awesome-Rust-MachineLearning

Here is an example of a simple machine learning program written in the Rust programming language:|🔝|

GPT3

https://chat.openai.com/chat

This program sets up a game context and creates a new MainState struct to hold the game's state. It then runs the game loop, which handles events and updates the game state accordingly. The MainState struct and its associated methods and the event handling methods can be customized to implement the specific game logic and mechanics.


use ndarray::{Array, Array2};
use rand::{thread_rng, Rng};
use rusty_machine::linalg::{Matrix, Vector};
use rusty_machine::learning::k_means::KMeansClassifier;
use rusty_machine::learning::UnSupModel;

// Create some random data to cluster
let mut rng = thread_rng();
let data: Vec<Vector<f64>> = (0..100)
    .map(|_| Vector::new(vec![rng.gen(), rng.gen()]))
    .collect();
let data = Matrix::new(100, 2, data);

// Create a new k-means model with 2 clusters
let mut model = KMeansClassifier::new(2);

// Train the model on the data
model.train(&data)?;

// Predict the cluster assignments for the data
let assignments = model.predict(&data)?;

// Print the cluster assignments
println!("Cluster assignments: {:?}", assignments);
This program creates some random data and uses the KMeansClassifier type from the rusty_machine crate to train a k-means clustering model on the data. It then uses the trained model to predict the cluster assignments for the data and prints the results. The KMeansClassifier type and the train and predict methods can be customized to implement different machine learning algorithms and apply them to different types of data.

은근히 코드 안 맞는다. ㅋ

https://athemathmo.github.io/rusty-machine/doc/rusty_machine/index.html

How to Build a Machine Learning Model in Rust|🔝|

https://www.freecodecamp.org/news/how-to-build-a-machine-learning-model-in-rust/

Machine_Learning_Rust|🔝|

머신러닝기초|🔝|

Machine Learning Course for Beginners | freeCodeCamp.org
- https://youtu.be/NWONeJKn6kc?si=wmh6EmpSKH1ZIzXH
Understanding NVIDIA GPU Hardware as a CUDA C Programmer | Episode 2: GPU Compute Architecture | 0Mean1Sigma
- https://youtu.be/1Goq8Yc3dfo?si=KgztF66DmIwP7qa9
Awesome Production Machine Learning
- This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale, and secure your production machine learning 🚀
- https://github.com/EthicalML/awesome-production-machine-learning

Rust code들 모음
- 최신코드 많음
  - https://github.com/e-tornike/best-of-ml-rust
- 좋긴 한데 옛날꺼...
  - This repository accompanies Practical Machine Learning with Rust by Joydeep Bhattacharjee (Apress, 2020).https://github.com/Apress/practical-machine-learning-w-rust
  - https://github.com/vaaaaanquish/Awesome-Rust-MachineLearning

[번역] RAG 세상을 헤엄치는 사람들을 위한 가이드북|🔝|

https://sigridjin.medium.com/rag-%EC%84%B8%EC%83%81%EC%9D%84-%ED%97%A4%EC%97%84%EC%B9%98%EB%8A%94-%EC%82%AC%EB%9E%8C%EB%93%A4%EC%9D%84-%EC%9C%84%ED%95%9C-%EA%B0%80%EC%9D%B4%EB%93%9C%EB%B6%81-3e90f515d800

임베딩과 정보 검색 전 과정 — 임베딩 개념과 한계, 데이터셋 생성·라벨링, 각종 오프 더 셸프 모델 평가, 하이브리드·리랭킹, 임베딩 모델 파인튜닝 및 최적화, 해석 가능성까지 –을 주욱 살펴보는 글입니다.

다루는 주제들

임베딩과 그 일반화 가능성(Generalizability)에 대한 논의
인간과 + ...

Compiling CUDA with clang|🔝|

https://llvm.org/docs/CompileCudaWithLLVM.html

clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch> \
    -L<CUDA install path>/<lib64 or lib>             \
    -lcudart_static -ldl -lrt -pthread

./axpy
y[0] = 2
y[1] = 4
y[2] = 6
y[3] = 8

https://gist.github.com/anonymous/855e277884eb6b388cd2f00d956c2fd4

#include <iostream>

__global__ void axpy(float a, float* x, float* y) {
  y[threadIdx.x] = a * x[threadIdx.x];
}

int main(int argc, char* argv[]) {
  const int kDataLen = 4;

  float a = 2.0f;
  float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
  float host_y[kDataLen];

  // Copy input data to device.
  float* device_x;
  float* device_y;
  cudaMalloc(&device_x, kDataLen * sizeof(float));
  cudaMalloc(&device_y, kDataLen * sizeof(float));
  cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
             cudaMemcpyHostToDevice);

  // Launch the kernel.
  axpy<<<1, kDataLen>>>(a, device_x, device_y);

  // Copy output data to host.
  cudaDeviceSynchronize();
  cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
             cudaMemcpyDeviceToHost);

  // Print the results.
  for (int i = 0; i < kDataLen; ++i) {
    std::cout << "y[" << i << "] = " << host_y[i] << "\n";
  }

  cudaDeviceReset();
  return 0;
}

Rust로 만든 머신러닝 관련 자료 모음

Porting GPU shaders to Rust 30x faster with AI

June 24, 2025 · 15 min read | Christian Legnitto
- (250624)Rust GPU and Rust CUDA maintainer

CubeCL - CUDA, ROCm, WGPU를 위한 Rust 기반 GPU 커널|🔝|

CubeCL은 Rust에서 GPU 커널을 작성할 수 있도록 해주는 고성능 멀티플랫폼 언어 확장
함수, 제네릭, 구조체를 완벽하게 지원하며, 특성, 메서드, 타입 추론은 부분적으로 지원
WGPU, CUDA, ROCm 기반 런타임을 지원하며, SIMD 명령어를 활용한 최적화된 JIT CPU 런타임도 개발중
*…

250511Llama.cpp 이제 비전 기능 지원 (멀티모달 입력)|🔝|

Llama.cpp가 이제 libmtmd를 통해 멀티모달 입력(비전 포함)을 지원함
- llama-mtmd-cli 또는 llama-server를 통한 OpenAI 호환 /chat/completions API
Gemma 3, SmolVLM, Pixtral, Qwen 2/2.5, Mistra Small, InternVL 등 모델에서 멀티모달 기능 즉시 사용 가능함
- Pre-quantized 모델…

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
000_Ollame_llm_Local		000_Ollame_llm_Local
001_burn-rs		001_burn-rs
002_candle		002_candle
003_Practical_Machine_learning_Rust		003_Practical_Machine_learning_Rust
004_Machine_Math_Basic_matrix_cal		004_Machine_Math_Basic_matrix_cal
Artificial_Intelligence		Artificial_Intelligence
linfa_machine_learning_model		linfa_machine_learning_model
rusty_machine/rusty_machine		rusty_machine/rusty_machine
tch-rs_pytorch		tch-rs_pytorch
README.md		README.md

YoungHaKim7/Machine_Learning_Rust

Folders and files

Latest commit

History

Repository files navigation

link

최신뉴스(외부링크)

C++ examples for the Vulkan graphics API

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧|🔝|

머신러닝 전용 TPU(TPU 심층 분석)|🔝|

Cuda(nvidia의 용도별 정리|🔝|

▲DeepSeek, 3FS 파일시스템 과 Smallpond 데이터 처리 프레임워크 오픈소스 공개 (5 of 5) (github.com/deepseek-ai)|🔝|

DeepSeek, DeepGEMM 오픈소스 공개 (3 of 5) (github.com/deepseek-ai)|🔝|

DeepSeek, DeepEP 오픈소스 공개 (2 of 5) (github.com/deepseek-ai)|🔝|

DeepSeek, FlashMLA 오픈소스 공개 (1 of 5) (github.com/deepseek-ai)

(250203)머신러닝 모든 모델 설명 👍 굿|🔝|

huggingface.co 모델 다운 받는 방법|🔝|

(241217)드디어 올라옴 이걸 러스트 코드로 만들면 대박이요 ㅋㅋ**GN⁺: C++와 CUDA를 사용하여 처음부터 LLM 추론 엔진 만들기**|🔝|

Run LLaMA inference on CPU, with Rust 🦀🚀🦙|🔝|

Fast ML inference & training for ONNX models in Rust(컴퓨터 비젼 찾다가 알게 됨yolo)|🔝|

Artificial_Intelligence(NLP, Natural Language Processing models and pipelines.)|🔝|

Rust MachineLearning|🔝|

dfdx: shape checked deep learning in rust|🔝|

Minimalist ML framework for Rust|🔝|

ollama 쓸만한거|🔝|

(C++코드로 머신러닝 잘 설명됨.)Snake learns with NEUROEVOLUTION (implementing NEAT from scratch in C++) |Tech With Nikola|🔝|

1bit에 집중하자 NVIDIA도 이제 끝이네|🔝|

Microsoft, CPU에서 실행가능한 초고효율 AI 모델 BitNet 개발|🔝|

NVIDIA칩 자세히 알아보기(240617)|🔝|

NVIDIA는 16-bit Float(FP16/BF16) 부동 소수점에 최적화 되어있어서|🔝|

BitNet b1.58(This Work). vs 16-bit Float(FP16/BF16)|🔝|

Why BitNet b1.58?|🔝|

$$Log_2(3) = 1.58$$

(24년 04월경쯤)GN⁺: 1비트 LLM 시대: 비용 효율적인 컴퓨팅을 위한 삼진 파라미터 (arxiv.org)|🔝|

벡터 DB의 개념잡기 & LLM의 정의|🔝|

초보자를 위한 Vector Embeddings 가이드 (timescale.com)|🔝|

역시 갓 c언어|🔝|

파이토치 bye bye 👋 존나게 구린 파이토치 ㅋㅋㅋ 그동안 참고 쓰느라 힘들었다 ㅋㅋ 더럽고 치사해서 더 공부해서 러스트로 만들어 보자 ㅋㅋ|🔝|

바로 해봐야지|🔝|

MachineLearning_Tutorial|🔝|

LLM -> LMM으로 패러다임 전환 중~~|🔝|

Jupyter 노트북 러스트로 빠르게 돌리기|🔝|

Rust+WASM으로 이기종 Edge에서 빠르고 포터블한 Llama2 추론 실행하기 (secondstate.io)|🔝|

m1 macOS pytorch install

h2oGPT - 완전한 오픈소스 GPT (github.com/h2oai)|🔝|

llama2를 파인 튜닝 하고 있는 사람들|🔝|

JS강의 No Black Box Machine Learning Course – Learn Without Libraries|🔝|

AlphaGo - The Movie | Full award-winning documentary|🔝|

Building a neural network FROM SCRATCH (no Tensorflow/Pytorch, just numpy & math)|🔝|

신경망 수학 그림으로 다 이해하기 - 복잡한 신경망도 다 이해된다 !!! 최고Why Neural Networks can learn (almost) anything | Emergent Garden|🔝|

Dalai - Automatically install, run, and play with LLaMA on your computer|🔝|

The Pile is a large, diverse, open source language modelling data set|🔝|

brew install libtorch(macOS)|🔝|

Rust Artificial Intelligence (The Simple Way)|🔝|

The AI community building the future.|🔝|

How to Build a Machine Learning Model in Rust|🔝|

Rust Machine Learning Book|🔝|

Unicode (Vim Plug-in)|🔝|

Natural Language Processing for Rust|🔝|

This repository is a list of machine learning libraries written in Rust. It's a compilation of GitHub repositories, blogs, books, movies, discussions, papers, etc. 🦀|🔝|

Here is an example of a simple machine learning program written in the Rust programming language:|🔝|

How to Build a Machine Learning Model in Rust|🔝|

Machine_Learning_Rust|🔝|

머신러닝기초|🔝|

Compiling CUDA with clang|🔝|

최신뉴스 모음

Rust로 만든 머신러닝 관련 자료 모음

Porting GPU shaders to Rust 30x faster with AI

250511**Llama.cpp 이제 비전 기능 지원 (멀티모달 입력)**|🔝|

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

(241217)드디어 올라옴 이걸 러스트 코드로 만들면 대박이요 ㅋㅋGN⁺: C++와 CUDA를 사용하여 처음부터 LLM 추론 엔진 만들기|🔝|

Rust MachineLearning |🔝|

250511Llama.cpp 이제 비전 기능 지원 (멀티모달 입력)|🔝|