This project implements various sparse matrix computations in CUDA and C++. It includes conversion routines between sparse matrix formats and efficient CUDA kernels for Sparse Matrix-Vector ...
in which sparse_self_attention is an instance of SparseSelfAttention. This module computes attention context through sparse attention replacing underlying matrix multiplications and softmax with their ...
Abstract: Sparsification technology is crucial for deploying convolutional neural networks in resource-constrained environments. However, the efficiency of sparse models is hampered by irregular ...