русский   english    [ Login ]

NVIDIA Advanced CUDA Programming Course Plan

 

1. From GPU to GPGPU
  • Performance and parallelism
  • GPU evolution
  • Parallel systems: multicore and clustering

2. CUDA programming model

  • Key principles
  • Threads and blocks
  • Language extensions
  • Attributes
  • Builtin types and variables
  • Kernel invocation operator
  • CUDA runtime API
  • Asynchronous execution
  • Handling runtime errors in CUDA
  • Querying GPU capabilities

3. Memory hierarchy

  • Global memory
  • Example: matrix multiplication
  • Optimizing global memory usage
  • Block-shared memory
  • Example: matrix multiplication
  • Shared memory access patterns
  • Constant memory
  • Texture memory
  • Unified virtual address space (UVA)

4. Implementing basic data processing

  • Parallel reduction
  • Prefix sum (scan)
  • CUDA implementation
  • CUDPP implementation

5. CUDA Libraries

  • CUBLAS
  • CUSPARSE
  • CUFFT
  • CURAND

6. CUDA Fortran Overiew

7. Using multiple GPUs

  • CUDA context
  • fork
  • MPI
  • POSIX-threads
  • OpenMP
  • Boost.Threads

8. CUDA Streams

  • Example: concurrent kernels execution
  • Example: matrix multiplication
  • Example: Multi-GPU Async Copy

9. Debugging

  • Principles and terminology
  • gdb
  • cuda-gdb
  • Nsight
  • CUDA (Visual) Profiler
  • cuda-memcheck

10. OpenCL Overview

  • Simple example
  • OpenCL host API
  • Developing and deploying OpenCL kernels
  • Comparison with CUDA

 11. Optimization Techniques

 

Hands-ons

  • Parallel sine function computation
  • Matrix-matrix multiply with shared memory