If you're looking to build AI agents into your workflows, don't waste the valuable compute power of large language models on these systems.
That's the opinion of a group of Nvidia researchers, who recently made the case for "small language models," or SLMs, noting that while LLMs have been the engines of generative AI up until now, they're probably overkill for supporting more focused AI agents. Instead, SLMs may present a smarter approach.
With a surge in agentic AI systems will come a host of applications that use language models to carry out a few specialized tasks over and over again, without much variation, the Nvidia team, led by Peter Belcak said in the report.
Also: Why scaling agentic AI is a marathon, not a sprint
SLMs "are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems," the report said. They could play an important role in the future of agentic AI.
In situations "where general-purpose conversational abilities are essential, heterogeneous agentic systems -- agents invoking multiple different models -- are the natural choice," the researchers continued.
SLMs could also be instrumental in lowering AI costs. Using LLMs for AI agents can be expensive, and it doesn't always match most use cases for the technology, functionally.
"Insisting on LLMs for all such tasks reflects a misallocation of computational resources -one that is economically inefficient and environmentally unsustainable at scale," the report said.
In many current instances, typical AI agents communicate with chosen LLM API endpoints by making requests to centralized cloud infrastructure that hosts these models, the report said. Such LLM API endpoints "are specifically designed to serve a large volume of diverse requests using one generalist LLM."
This LLM-based operational model is deeply ingrained. And there's a money angle at work, too. The report estimated a$63 billion market for LLM API and hosting cloud infrastructure.
Also: AI agents bring big risks and rewards for daring early adopters, says Forrester
"It is assumed that this operational model will remain the cornerstone of the industry without any substantial alterations, and that the large initial investment will deliver returns comparable to traditional software and internet solutions within three to four years," the report said.
As organizations roll out AI agents across a broad range of functions, they will recognize that LLMs are too much for these systems, said Virginia Dignum, professor of responsible AI at Umea University and chair of the ACM Technology Policy Council, in a separate recent discussion. In most cases, "the idea proposed as agentic AI consists of building an active interface on top of a large language model," she said.
There are issues with this view of agentic AI built on LLMs. First, they could be wasteful.
"LLMs are trained over huge amounts of data and computation to be able to deal with broad language issues. An agent ... is usually meant to deal with specific questions. You don't expect your realtor to discuss philosophy, or your travel agent to be able to produce art," she said. "I see a potential huge waste of data and compute to build such agents on top of LLMs."
Also: 4 ways to get your business ready for the agentic AI revolution
Multi-agent collaboration is the most effective route to getting results from agentic AI. "What is key is applications based on collaboration between many smaller agents that use less data and training, but can achieve more by combining with other agents," Dignum explained. "A distributed approach -less computational heavy, more inclusive and more able to address the difference between contexts and cultures."
The Nvidia team offered the following suggestions for deploying SLMs:
Consider costs:"Organizations should consider adopting small language models for agentic applications to reduce latency, energy consumption, and infrastructure costs, particularly in scenarios where real-time or on-device inference is required," they stated.
Consider modular design:"Leverage SLMs for routine, narrow tasks and reserving LLMs for more complex reasoning-thereby improving efficiency and maintainability."
Consider specialization:"Take advantage of the agility of SLMs by fine-tuning them for specific tasks, enabling faster iteration cycles and easier adaptation to evolving use cases and requirements."
SLMs offer a number of advantages over using LLMs for agents, the Nvidia team explained. Such advantages include lower latency, reduced memory, and computational requirements. Plus, they can be cheaper while still getting the job done.
Get the morning's top stories in your inbox each day with our Tech Today newsletter.