Abolfazl Younesi, Abbas Shabrang Maryan, Elyas Oustad, Zahra Najafabadi Samani, Mohsen Ansari, Thomas Fahringer
Splitwise, a Lyapunov-assisted DRL framework, optimizes LLM deployment by adaptively partitioning models across edge and cloud, reducing latency and energy consumption significantly compared to existing methods.
Deploying large language models (LLMs) on devices like smartphones or small computers is tough because they often don't have enough power or memory. Using the cloud can help, but it can be slow and expensive. The new system, called Splitwise, uses a smart method to split the work between the device and the cloud, making it faster and using less energy. This system is especially good at handling changes in internet speed and can recover from connection problems. Tests show that it works much better than older methods, making things faster and saving energy.