Portfolio · Mark Saroufim

Recent Projects (2024-2025)

Project popcorn - Training an LLM in public to generate efficient CUDA and Triton Kernels. With 4 key workstreams: Data, Infra, Science and Language.
GPU MODE (fka CUDA MODE) - I cofounded this community with Andreas Kopf. Probably my most important project, a Discord community where people learn how GPUs work and then go on to ship real world projects.
torchao: PyTorch native quantization - I was a cofounder and tech lead where our goal was to make it trivial to explore and apply quantization algorithms.

NeurIPS Hacker Cup AI challenge - We opened up the popular Hacker cup competition for the first time ever to AI because we need harder evals.
The PyTorch fast series - Christian Puhrsch and Horace He pitched me getting SOTA fast inference working for LLMs, diffusion models and image models using PyTorch native. I was the TL for our benchmarking and evaluation efforts.
NeurIPS LLM efficiency challenge - I was one of the 2 lead organizers along with Weiwei Yang for one of the most popular NeurIPS 2023 competition where you had to finetune an LLM in 1 day on 1 GPU.
mlperf Algorithmic efficiency - Played a small role making sure the PyTorch baselines were good but hey maybe we finally dethroned Adam.
Launch of PyTorch 2.0 - I was leading a workstream for Alpha users readiness and was the first person to ship compiled models both internally and externally. We also wrote a pretty nice ASPLOS paper.
torchserve - I was one of the lead maintainers for our inference server and it was used in prod by some massive workloads like Walmart Search. Touched most parts of it and it’s where I learnt to respond to a lot of issues and PRs and go to practice some more on pytorch/examples.

torchX Ray Scheduler - This is when I met Kiuk Chung and actually learnt how to write good code. We built a ray scheduler for PyTorch where you could launch large jobs from a notebook that was used in production by some Mega companies.
The Great Stagnation - The most popular thing I ever wrote, didn’t age particularly well unfortunately but it was peak COVID, I was writing and livestreaming a lot.
Graph Neural Networks for healthcare - When I learnt I could just reach out to amazing people like Michael Bronstein and got 5x speedups on GNN runs.
Local Updates - Parallel Training of Deep Networks with Local Updates on the Graphcore IPU. Was a true pleasure working with Misha, Seth and Luke who would all go on to do great things.
The Robot Overlord Manual - I wrote a book about everything I learnt during my time at yuri.ai mostly focused on applied ML, RL and robotics.
yuri.ai: Game AI using RL - It was truly magical for me seeing AI beat top humans at DOTA and Starcraft so I wanted to do the same for all games. This was unfortunately my most painful and least succesful project but I learnt a lot about game development and online communities. One saving grace was I became an amateur game developer and that meant a lot considering I’ve been a lifelong competitive gamer.

High Dimensional Geometry - Met two role models in Sanjoy Dasgupta and Charles Elkan where I got really deep into ML theory working on accelerating nearest neighbor search.

📝 👨‍💻 🐦 📧 🎮 🎥