Research notes on AI systems, TPU, LLM inference, distributed systems, and federated learning. Most of it comes from experiments I actually ran and papers I actually read, not second-hand summaries.
#
Featured
- LLM Inference on TPU v6e-4 Benchmarking small dense, large MoE, and large dense models on one TPU v6e-4 host
- Is TPU really 4x cheaper than GPU? Checking the real cost gap between TPU and GPU from a TCO angle