PolarGrad (Polar Gradient methods; Lau et al., 2025) is a class of matrix-gradient optimizers based on the concept of gradient-anisotropy preconditioning in optimization. It has close relation to Muon ...
Clone this repository and change into the project directory: If a CUDA-enabled GPU is available, we strongly recommend installing the GPU version of PyTorch and running the notebooks on GPU. GPU ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results