CS267 is designed to teach students how to program parallel computers to efficiently solve challenging problems in science and engineering, where high performance computers or clusters are required either to perform complex simulations or to analyze enormous datasets.

FREE
This course includes
Hours of videos

611 years

Units & Quizzes

22

Unlimited Lifetime access
Access on mobile app
Certificate of Completion

But the course also addresses the broader issue that since the mid 2000's: not only are the fastest computers parallel, but nearly all computers are parallel, because the physics of semiconductor manufacturing will no longer lets conventional sequential processors get faster year after year, as they had historically. So all programs that need to run faster will have to become parallel programs.  The types of processors is also changing towards smaller simpler processors and more complex memory architectures in an effort to continue boosting performance.  For background on these trends, see the  Berkeley View report. These changes are not limited to science and engineering but affect the entire computing industry, which has depended on selling new computers by running their users' programs faster without the users having to reprogram them. Today computers from mobile phones and embedded devices to supercomputer and cloud data centers are parallel, and programmers faced with performance goals for software need to understand how to take advantage of various types of parallelism.

Students in CS267 will get the skills to use some of the best existing parallel programming tools, will learn how to analyze and tune for performance, and will get an overview of parallel architectures, algorithms, and applications as well a number of open research questions.

Course Currilcum

  • Lecture 1: Introduction Unlimited
  • Lecture 2: Processors, Memories, Roofline and MatMul Unlimited
  • Lecture 3: MatMul (con’t) + Parallel Architectures Unlimited
  • Lecture 4: NERSC, Cori, Knights Landing, and Other Matters Unlimited
  • Lecture 5: Sources of Parallelism and Locality (Part 1) Unlimited
  • Lecture 6: Sources of Parallelism and Locality (Part 2) Unlimited
  • Lecture 7: Shared Memory Programming Unlimited
  • Lecture 8: Cloud Computing and Big Data Processing Unlimited
  • Lecture 9a: Data Parallelism and Tricks with Trees Unlimited
  • Lecture 9b: Distributed Memory Machines and Programming Unlimited
  • Lecture 10: Partitioned Global Address Space Languages Unlimited
  • Lecture 11: An Introduction to CUDA/OpenCL and GPUs Unlimited
  • Lecture 12: Dense Linear Algebra Unlimited
  • Lecture 13: Sorting and Searching Unlimited
  • Lecture 14: Autotuning and Sparse-Matrix-Vector-Multiplication Unlimited
  • Lecture 15: Graph Partitioning Unlimited
  • Lecture 16: Structured Grids Unlimited
  • Lecture 17: Parallel Graph Algorithms Unlimited
  • Lecture 18a: Parallel Machine Learning Unlimited
  • Lecture 19: Fast Fourier Transform Unlimited
  • Lecture 21: Scientific Software Ecosystems Unlimited
  • Lecture 24: Sparse Linear Algebra Unlimited