Researchers at the University of Michigan have developed an open-source optimization framework called Zeus that can reduce the energy consumption of deep learning models by up to 75% without requiring any new hardware. This new tool, presented at the 2023 USENIX Symposium on Networked Systems Design and Implementation, addresses the significant climate burden of artificial intelligence, which is becoming a concern as cloud computing is already out-emitting commercial aviation.
Deep learning models use multilayered, artificial neural networks to tackle machine learning tasks and learn from massive datasets. They benefit greatly from the multitasking capabilities of graphical processing units (GPUs), which consume 70% of the power that goes into training one of these models. However, Zeus uses two software knobs to reduce energy consumption: the GPU power limit and the deep learning model’s batch size parameter. Zeus is able to tune these settings in real-time to find the optimal tradeoff point at which energy usage is minimized with as little impact on training time as possible.
The team was able to visually demonstrate this tradeoff point by showing every possible combination of these two parameters. While that level of thoroughness won’t happen in practice with a particular training job, Zeus will take advantage of the repetitive nature of machine learning to come very close. Zeus is the first framework designed to plug into existing workflows for a variety of machine learning tasks and GPUs, reducing energy consumption without requiring any changes to a system’s hardware or datacenter infrastructure.
The team has also developed complementary software called Chase that reduces the carbon footprint even further. Chase privileges speed when low-carbon energy is available and chooses efficiency at the expense of speed during peak times, which are more likely to require ramping up carbon-intensive energy generation such as coal. “Our aim is to design and implement solutions that do not conflict with realistic constraints, while still reducing the carbon footprint of DNN training,” said Zhenning Yang, a master’s student in computer science and engineering.
The study was supported in part by the National Science Foundation, VMWare, the Kwanjeong Educational Foundation, and computing credits provided by CloudLab and Chameleon Cloud.