CSE 599I - Spring 2017
Accelerated Computing - Programming GPUs
EEB 037, Mon/Wed 3:00 - 4:20
Instructor: Tanner Schmidt
Office Hours: Mon 4:30 - 6:00, CSE 674
Course Description
This course is an introduction to accelerated computing using graphics processing units (GPUs). We will be focussing on CUDA programming, but the concepts taught will apply to other GPU frameworks as well. The course will start by covering CUDA syntax extensions and the CUDA runtime API, then move on to more advanced topics such as bandwidth optimization, memory access performance, and floating point considerations. We will learn about common parallel computing patterns such as scans and reductions, and study use cases for GPU acceleration such as matrix multiplication and convolution.
Prerequisites
As CUDA is an extension of the C language, students taking this course should be familiar with C programming.
Prior knowledge of computer architecture concepts such as locality of reference will be useful but not required.
Grading
Grades for this course will be based on a series of 3-5 programming assignments designed to allow students to apply GPU programming skills taught in the lectures.
Textbook (Optional)
Programming Massively Parallel Processors, Third Edition: A Hands-on Approach
David B. Kirk and Wen-mei W. Hwu.
I can provide students with a code for a 30% discount on the textbook from Elsevier.
Computing Resources
For the programming assignments, students will need access to a computer with a CUDA-compatible GPU. I can help arrange access to a remote CUDA-capable machine for students without local access.
Schedule and Slides (subject to change)
3 / 27 | Course Introduction | |
3 / 29 | Intro to CUDA C | axpy |
4 / 3 | CUDA parallelism model | |
4 / 5 | Memory and data locality Thread execution / computational efficiency | TiledMatrixMultiplication |
4 / 10 | Memory performance Stencil pattern | |
4 / 12 | Prefix sum pattern | |
4 / 17 | Histogram pattern | |
4 / 19 | Sparse matrix pattern | TiledMatrixMultiplication due |
4 / 24 | Merge sort pattern | Assignment 2 |
4 / 26 | Graph search pattern | |
5 / 1 | Advanced host / device interface Streams, events, and concurrency | |
5 / 3 | Dynamic parallelism / recursion | |
5 / 8 | Floating point considerations Intrinsic Functions | Assignment 2 due Final project |
5 / 10 | In-warp shuffles | |
5 / 15 | Multi-GPU programming | Final project proposal due |
5 / 17 | No class / Go see the CSE 599g guest lecture on cuDNN instead: 5/18 1-2pm @CSE305 | |
5 / 22 | OpenCL / OpenACC | |
5 / 24 | Beyond CUDA | |
5 / 29 | Memorial Day (no class) | |
5 / 31 | No class / work on projects | |
6 / 1 | Final project due |