Skip to content

Commit 42326bc

Browse files
authored
Merge pull request #5 from AIComputing101/coketaste/readme
Update the docs in the project
2 parents f3cec0e + c6c8f6e commit 42326bc

File tree

14 files changed

+48
-48
lines changed

14 files changed

+48
-48
lines changed

README.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
**A comprehensive, hands-on educational project for mastering GPU programming with CUDA and HIP**
1111

12-
*From beginner fundamentals to production-ready optimization techniques*
12+
*From beginner fundamentals to professional-grade optimization techniques*
1313

1414
## 📑 Table of Contents
1515

@@ -37,7 +37,7 @@
3737
- **9 comprehensive modules** covering beginner to expert topics
3838
- **71 working code examples** in both CUDA and HIP
3939
- **Cross-platform support** for NVIDIA and AMD GPUs
40-
- **Production-ready development environment** with Docker
40+
- **Comprehensive development environment** with Docker
4141
- **Professional tooling** including profilers, debuggers, and CI/CD
4242

4343
Perfect for students, researchers, and developers looking to master GPU computing.
@@ -198,7 +198,7 @@ This architectural knowledge is essential for writing efficient GPU code and is
198198
| 🎯 **Complete Curriculum** | 9 progressive modules from basics to advanced topics |
199199
| 💻 **Cross-Platform** | Full CUDA and HIP support for NVIDIA and AMD GPUs |
200200
| 🐳 **Docker Ready** | Complete containerized development environment with CUDA 12.9.1 & ROCm 7.0 |
201-
| 🔧 **Production Quality** | Professional build systems, auto-detection, testing, and profiling |
201+
| 🔧 **Professional Quality** | Professional build systems, auto-detection, testing, and profiling |
202202
| 📊 **Performance Focus** | Optimization techniques and benchmarking throughout |
203203
| 🌐 **Community Driven** | Open source with comprehensive contribution guidelines |
204204
| 🧪 **Advanced Libraries** | Support for Thrust, MIOpen, and production ML frameworks |
@@ -252,21 +252,21 @@ Choose your track based on your experience level:
252252

253253
## 📚 Modules
254254

255-
Our comprehensive curriculum progresses from fundamental concepts to production-ready optimization techniques:
255+
Our comprehensive curriculum progresses from fundamental concepts to advanced optimization techniques:
256256

257-
| Module | Level | Duration | Focus Area | Key Topics | Examples |
258-
|--------|-------|----------|------------|------------|----------|
259-
| [**Module 1**](modules/module1/) | 👶 Beginner | 4-6h | **GPU Fundamentals** | Architecture, Memory, First Kernels | 13 |
260-
| [**Module 2**](modules/module2/) | 👶→🔥 | 6-8h | **Memory Optimization** | Coalescing, Shared Memory, Texture | 10 |
261-
| [**Module 3**](modules/module3/) | 🔥 Intermediate | 6-8h | **Execution Models** | Warps, Occupancy, Synchronization | 12 |
262-
| [**Module 4**](modules/module4/) | 🔥→🚀 | 8-10h | **Advanced Programming** | Streams, Multi-GPU, Unified Memory | 9 |
263-
| [**Module 5**](modules/module5/) | 🚀 Advanced | 6-8h | **Performance Engineering** | Profiling, Bottleneck Analysis | 5 |
264-
| [**Module 6**](modules/module6/) | 🚀 Advanced | 8-10h | **Parallel Algorithms** | Reduction, Scan, Convolution | 10 |
265-
| [**Module 7**](modules/module7/) | 🚀 Expert | 8-10h | **Algorithmic Patterns** | Sorting, Graph Algorithms | 4 |
266-
| [**Module 8**](modules/module8/) | 🚀 Expert | 10-12h | **Domain Applications** | ML, Scientific Computing | 4 |
267-
| [**Module 9**](modules/module9/) | 🚀 Expert | 6-8h | **Production Deployment** | Libraries, Integration, Scaling | 4 |
257+
| Module | Level | Focus Area | Key Topics | Examples |
258+
|--------|-------|------------|------------|----------|
259+
| [**Module 1**](modules/module1/) | 👶 Beginner | **GPU Fundamentals** | Architecture, Memory, First Kernels | 13 |
260+
| [**Module 2**](modules/module2/) | 👶→🔥 | **Memory Optimization** | Coalescing, Shared Memory, Texture | 10 |
261+
| [**Module 3**](modules/module3/) | 🔥 Intermediate | **Execution Models** | Warps, Occupancy, Synchronization | 12 |
262+
| [**Module 4**](modules/module4/) | 🔥→🚀 | **Advanced Programming** | Streams, Multi-GPU, Unified Memory | 9 |
263+
| [**Module 5**](modules/module5/) | 🚀 Advanced | **Performance Engineering** | Profiling, Bottleneck Analysis | 5 |
264+
| [**Module 6**](modules/module6/) | 🚀 Advanced | **Parallel Algorithms** | Reduction, Scan, Convolution | 10 |
265+
| [**Module 7**](modules/module7/) | 🚀 Expert | **Algorithmic Patterns** | Sorting, Graph Algorithms | 4 |
266+
| [**Module 8**](modules/module8/) | 🚀 Expert | **Domain Applications** | ML, Scientific Computing | 4 |
267+
| [**Module 9**](modules/module9/) | 🚀 Expert | **Production Deployment** | Libraries, Integration, Scaling | 4 |
268268

269-
**📈 Progressive Learning Path: 71 Examples • 50+ Hours • Beginner to Expert**
269+
**📈 Progressive Learning Path: 71 Examples • Beginner to Expert**
270270

271271
### Learning Progression
272272

@@ -387,7 +387,7 @@ Experience the full development environment with zero setup:
387387
**Container Specifications:**
388388
- **CUDA**: NVIDIA CUDA 12.9.1 on Ubuntu 22.04
389389
- **ROCm**: AMD ROCm 7.0 on Ubuntu 24.04
390-
- **Libraries**: Production-ready toolchains with debugging support
390+
- **Libraries**: Professional toolchains with debugging support
391391

392392
**[📖 Complete Docker Guide →](docker/README.md)**
393393

@@ -415,7 +415,7 @@ make debug # Debug builds with extra checks
415415

416416
### Advanced Build Features
417417
- **Automatic GPU Detection**: Detects NVIDIA/AMD hardware and builds accordingly
418-
- **Production Optimization**: `-O3`, fast math, architecture-specific optimizations
418+
- **Professional Optimization**: `-O3`, fast math, architecture-specific optimizations
419419
- **Debug Support**: Full debugging symbols and validation checks
420420
- **Library Management**: Automatic detection of optional dependencies (NVML, MIOpen)
421421
- **Cross-Platform**: Single Makefile supports both CUDA and HIP builds
@@ -426,7 +426,7 @@ make debug # Debug builds with extra checks
426426
|--------------|-------------------|------------------|--------------|
427427
| **Beginner** | 10-100x | 60-80% | Educational |
428428
| **Intermediate** | 50-500x | 80-95% | Optimized |
429-
| **Advanced** | 100-1000x | 85-95% | Production |
429+
| **Advanced** | 100-1000x | 85-95% | Professional |
430430
| **Expert** | 500-5000x | 95%+ | Library-Quality |
431431

432432
## 🐛 Troubleshooting
@@ -507,7 +507,7 @@ If you use this project in your research, education, or publications, please cit
507507
author={{Stephen Shao}},
508508
year={2025},
509509
howpublished={\url{https://github.com/AIComputing101/gpu-programming-101}},
510-
note={A complete GPU programming educational resource with 70+ production-ready examples covering fundamentals through advanced optimization techniques for NVIDIA CUDA and AMD HIP platforms}
510+
note={A complete GPU programming educational resource with 71 comprehensive examples covering fundamentals through advanced optimization techniques for NVIDIA CUDA and AMD HIP platforms}
511511
}
512512
```
513513

modules/module3/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,4 +334,4 @@ ncu --metrics l1tex__data_bank_conflicts_pipe_lsu_mem_shared_op_ld.sum ./02_scan
334334

335335
---
336336

337-
**Note**: This module provides both educational implementations (showing algorithm progression) and production-ready optimized versions. Focus on understanding the concepts before optimizing for specific use cases.
337+
**Note**: This module provides both educational implementations (showing algorithm progression) and optimized versions. Focus on understanding the concepts before optimizing for specific use cases.

modules/module5/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -436,11 +436,11 @@ Module 5 represents the pinnacle of GPU performance optimization, covering:
436436
- **Memory Subsystem Optimization** across all levels of the GPU memory hierarchy
437437
- **Compute Optimization Strategies** for maximum algorithmic efficiency
438438
- **Cross-Platform Performance** considerations for portable high-performance code
439-
- **Production-Ready Optimization** techniques used in industry applications
439+
- **Professional Optimization** techniques used in industry applications
440440

441441
These skills are essential for:
442442
- Achieving maximum performance from GPU investments
443-
- Building production-quality high-performance applications
443+
- Building professional-quality high-performance applications
444444
- Understanding performance trade-offs in GPU algorithm design
445445
- Developing performance-portable code across GPU architectures
446446

modules/module6/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,4 +367,4 @@ These algorithms form the foundation for more complex applications covered in su
367367
**Difficulty**: Intermediate-Advanced
368368
**Prerequisites**: Modules 1-5 completion, parallel algorithm concepts
369369

370-
**Note**: This module emphasizes both educational understanding and production-ready implementations. Focus on mastering the algorithmic concepts before diving into platform-specific optimizations.
370+
**Note**: This module emphasizes both educational understanding and optimized implementations. Focus on mastering the algorithmic concepts before diving into platform-specific optimizations.

modules/module7/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -372,4 +372,4 @@ Master these concepts to tackle the most demanding computational challenges and
372372
**Difficulty**: Advanced
373373
**Prerequisites**: Modules 1-6 completion, advanced algorithm knowledge
374374

375-
**Note**: This module focuses on production-level implementations of sophisticated algorithms. Emphasis is placed on understanding both the theoretical foundations and practical optimization techniques required for real-world deployment.
375+
**Note**: This module focuses on advanced-level implementations of sophisticated algorithms. Emphasis is placed on understanding both the theoretical foundations and practical optimization techniques required for real-world deployment.

modules/module8/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ By completing this module, you will:
3030

3131
#### 1. Deep Learning Inference Kernels (`01_deep_learning_*.cu/.cpp`)
3232

33-
Production-quality neural network inference implementations:
33+
Professional-quality neural network inference implementations:
3434

3535
- **Custom Convolution Kernels**: Optimized for specific layer configurations
3636
- **GEMM Optimization**: High-performance matrix multiplication for fully connected layers
@@ -214,7 +214,7 @@ make monte_carlo # Monte Carlo simulations
214214
make finance # Computational finance
215215
make library_integration # Library integration examples
216216

217-
# Production builds with optimizations
217+
# Professional builds with optimizations
218218
make production
219219

220220
# Debug builds for development
@@ -396,7 +396,7 @@ make scaling_analysis
396396
Module 8 bridges the gap between GPU programming techniques and real-world applications:
397397

398398
- **Domain Expertise**: Apply GPU techniques to solve actual industry problems
399-
- **Production Quality**: Build applications that meet real-world performance and accuracy requirements
399+
- **Professional Quality**: Build applications that meet real-world performance and accuracy requirements
400400
- **Integration Skills**: Successfully integrate GPU computing into existing workflows and systems
401401
- **Optimization Mastery**: Achieve optimal performance for domain-specific computational patterns
402402

@@ -414,4 +414,4 @@ Master these domain-specific applications to become a complete GPU computing exp
414414
**Difficulty**: Advanced
415415
**Prerequisites**: Modules 1-7 completion, domain-specific knowledge
416416

417-
**Note**: This module emphasizes real-world application development with production-quality implementations. Students should focus on both technical excellence and practical deployment considerations.
417+
**Note**: This module emphasizes real-world application development with professional-quality implementations. Students should focus on both technical excellence and practical deployment considerations.

modules/module8/examples/01_deep_learning_cuda.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
/**
22
* Module 8: Domain-Specific Applications - Deep Learning Inference Kernels (CUDA)
33
*
4-
* Production-quality neural network inference implementations optimized for NVIDIA GPU architectures.
4+
* Professional-quality neural network inference implementations optimized for NVIDIA GPU architectures.
55
* This example demonstrates custom convolution kernels, GEMM optimization, activation functions,
66
* and mixed precision inference with Tensor Core utilization.
77
*

modules/module8/examples/01_deep_learning_hip.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
1414
const int WAVEFRONT_SIZE = 64;earning Inference Kernels (HIP)
1515
*
16-
* Production-quality neural network inference implementations optimized for AMD GPU architectures.
16+
* Professional-quality neural network inference implementations optimized for AMD GPU architectures.
1717
* This example demonstrates deep learning kernels adapted for ROCm/HIP with wavefront-aware
1818
* optimizations and LDS utilization patterns specific to AMD hardware.
1919
*

modules/module8/examples/Makefile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ BUILD_HIP = 0
2424
GPU_VENDOR = NONE
2525
endif
2626

27-
# Compiler flags for production-quality applications
27+
# Compiler flags for professional-quality applications
2828
CUDA_FLAGS = -std=c++17 -O3 -arch=sm_70 -lineinfo --use_fast_math
2929
CUDA_DEBUG_FLAGS = -std=c++17 -g -G -arch=sm_70
3030
HIP_FLAGS = -std=c++17 -O3 -ffast-math
@@ -186,7 +186,7 @@ debug: CUDA_FLAGS = $(CUDA_DEBUG_FLAGS)
186186
debug: HIP_FLAGS = $(HIP_DEBUG_FLAGS)
187187
debug: all
188188

189-
# Production builds with maximum optimization
189+
# Professional builds with maximum optimization
190190
.PHONY: production
191191
production: CUDA_FLAGS += -DNDEBUG -Xptxas -O3
192192
production: HIP_FLAGS += -DNDEBUG
@@ -589,7 +589,7 @@ help:
589589
@echo " cuda - Build CUDA applications only"
590590
@echo " hip - Build HIP applications only"
591591
@echo " debug - Build with debug flags"
592-
@echo " production - Build with maximum optimization"
592+
@echo " professional - Build with maximum optimization"
593593
@echo " clean - Remove build artifacts"
594594
@echo ""
595595
@echo "Domain Application Targets:"

modules/module9/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Module 9: Production GPU Programming
22

3-
This module focuses on building enterprise-grade GPU applications with emphasis on deployment, maintenance, scalability, and integration with production systems. Learn how to transition from prototype to production-ready GPU software.
3+
This module focuses on building enterprise-grade GPU applications with emphasis on deployment, maintenance, scalability, and integration with production systems. Learn how to transition from prototype to professional-grade GPU software.
44

55
## Learning Objectives
66

@@ -357,7 +357,7 @@ make cost_analysis
357357
- [ ] Monitoring and observability built into the application
358358

359359
### Infrastructure
360-
- [ ] Production-grade Kubernetes cluster with GPU support
360+
- [ ] Enterprise-grade Kubernetes cluster with GPU support
361361
- [ ] Monitoring and alerting infrastructure deployed
362362
- [ ] Backup and disaster recovery procedures implemented
363363
- [ ] Security scanning and vulnerability management in place

0 commit comments

Comments
 (0)