Abstract: Near-memory processing (NMP), which places lightweight processing units near the DRAM memory, has been actively studied to speed up the execution of memory-intensive applications by reducing ...
A novel AI-acceleration paper presents a method to optimize sparse matrix multiplication for machine learning models, particularly focusing on structured sparsity. Structured sparsity involves a ...