L-GreCo: A Framework for Layerwise Adaptive Gradient Compression
Published in , 2022
In this paper, we addressed optimal gradient compression in distributed training of neural networks. Our proposed algorithm, called L-GreCo, uses dynamic programming to find the optimal layer-wise compression. L-GreCo preserves the model accuracy while providing training-time speed-ups under different compression schemes on multiple tasks and architectures. We are currently working towards submission to MlSys 2023.