A deep dive into the architectural differences that matter most when programming modern ML accelerators. From Nvidia's tensor cores and TMEM to AMD's chiplet topology to Google's systolic arrays, this post explores how hardware-specific design choices shape kernel optimization strategies.
Evaluating LLM negotiation and planning with head-to-head fantasy football leagues. Ten models compete across full NFL seasons, testing long-horizon decision making and multi-agent interaction.
Exploring the evolution of hardware-software codesign from the 1960s to today, and the role of Generative AI in autonomous performance engineering. This comprehensive journey through computing history reveals how hardware awareness has shaped software optimization, from early compiler innovations to modern learned optimization techniques.