Parallel RestartedSGD with FasterConvergence and LessCommunication:Demystifying WhyModel Averaging Worksfor Deep Learning Hao YuSen YangShenghuo Zhu2019 год

Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning
статья (материалы конференций)