A new technical paper titled “SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference” was published by researchers at Princeton University and University of Washington. “Large ...
A new technical paper titled “Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling” was published by researchers at Uppsala University. “Energy consumption ...