I just completed my first open-source commit: the sort! method for Julia’s official CUDA library, CUDA.jl. You can now sort arrays stored in GPU memory. It’s an implementation of Quicksort, which is a bit tricky to parallelize effectively. Here I’ll share a few of the insights which brought my pull request together.

This article balances code, algorithms, and hardware considerations like a whiteboard interview: lots of pictures and a little code when it helps. I’ve organized the content as follows:

  1. Overview of GPUs and CUDA.jl
  2. Parallelizing Quicksort for GPUs
  3. Using sort! for fun and profit

