Created by:

Profile Photo

Last updated:

November 11, 2023


Unlimited Duration


This course includes:

Unlimited Duration

Badge on Completion

Certificate of completion

Unlimited Duration


CS267 is designed to teach students how to program parallel computers to efficiently solve challenging problems in science and engineering, where high performance computers or clusters are required either to perform complex simulations or to analyze enormous datasets.

But the course also addresses the broader issue that since the mid 2000's: not only are the fastest computers parallel, but nearly all computers are parallel, because the physics of semiconductor manufacturing will no longer lets conventional sequential processors get faster year after year, as they had historically. So all programs that need to run faster will have to become parallel programs.  The types of processors is also changing towards smaller simpler processors and more complex memory architectures in an effort to continue boosting performance.  For background on these trends, see the  Berkeley View report. These changes are not limited to science and engineering but affect the entire computing industry, which has depended on selling new computers by running their users' programs faster without the users having to reprogram them. Today computers from mobile phones and embedded devices to supercomputer and cloud data centers are parallel, and programmers faced with performance goals for software need to understand how to take advantage of various types of parallelism.

Students in CS267 will get the skills to use some of the best existing parallel programming tools, will learn how to analyze and tune for performance, and will get an overview of parallel architectures, algorithms, and applications as well a number of open research questions.

Course Curriculum

  • Lecture 1: Introduction Unlimited
  • Lecture 2: Processors, Memories, Roofline and MatMul Unlimited
  • Lecture 3: MatMul (con’t) + Parallel Architectures Unlimited
  • Lecture 4: NERSC, Cori, Knights Landing, and Other Matters Unlimited
  • Lecture 5: Sources of Parallelism and Locality (Part 1) Unlimited
  • Lecture 6: Sources of Parallelism and Locality (Part 2) Unlimited
  • Lecture 7: Shared Memory Programming Unlimited
  • Lecture 8: Cloud Computing and Big Data Processing Unlimited
  • Lecture 9a: Data Parallelism and Tricks with Trees Unlimited
  • Lecture 9b: Distributed Memory Machines and Programming Unlimited
  • Lecture 10: Partitioned Global Address Space Languages Unlimited
  • Lecture 11: An Introduction to CUDA/OpenCL and GPUs Unlimited
  • Lecture 12: Dense Linear Algebra Unlimited
  • Lecture 13: Sorting and Searching Unlimited
  • Lecture 14: Autotuning and Sparse-Matrix-Vector-Multiplication Unlimited
  • Lecture 15: Graph Partitioning Unlimited
  • Lecture 16: Structured Grids Unlimited
  • Lecture 17: Parallel Graph Algorithms Unlimited
  • Lecture 18a: Parallel Machine Learning Unlimited
  • Lecture 19: Fast Fourier Transform Unlimited
  • Lecture 21: Scientific Software Ecosystems Unlimited
  • Lecture 24: Sparse Linear Algebra Unlimited

About the instructor

5 5

Instructor Rating







Profile Photo
We are an educational and skills marketplace to accommodate the needs of skills enhancement and free equal education across the globe to the millions. We are bringing courses and trainings every single day for our users. We welcome everyone woth all ages, all background to learn. There is so much available to learn and deliver to the people.