Making big leaps with small models: What are small language models and super tiny language models?

 
Computer generated illustration in a painting-like aesthetic of hands typing on a laptop surrounded by bright primary colours.

Did you know that small language models and super tiny language models can produce richer, more specific outputs that increasingly outperform larger models across various benchmarks? SRI Policy researcher Jamie A Sandhu writes about small models making big impacts in the field of AI.


Small language models (SLMs) and super tiny language models (STLMs) are beginning to make a big impact in the field of artificial intelligence (AI). Despite the potential of these technologies, they have yet to become part of the broader conversation around AI—in fact, at the time of writing, there is no existing Wikipedia page for them. 

After our recent article explaining large language models (LLMs) and how they work, this piece shifts focus to exploring SLMs and STLMs, laying the foundation for understanding their role in creating a safer, more sustainable AI ecosystem that benefits everyone.

Characteristics and benefits of “small” and “super tiny” models: Is bigger always better?

LLMs, SLMs, and STLMs perform similar functions in powering language-based AI applications. Their underlying mechanics are designed to synthetically generate content—usually text, but more recently audio—by identifying patterns in the data and making predictions.

As their names imply, the primary distinction is size, which significantly impacts their adoption and usage. To understand how these technologies work, we first need to unpack the acronyms. What exactly do we mean by “large” in LLMs, “small” in SLMs, and “super tiny” in STLMs? 

This difference in size is largely determined by the number of parameters within each model. Parameters are the elements that allow models to learn from training data and make predictions. For instance, LLMs like Meta’s Llama 3 boast an impressive 400 billion parameters, while OpenAI’s GPT-4 is estimated to have a staggering 1.8 trillion. In contrast, SLMs developed by Apple and Microsoft range from 3 to 14 billion parameters, and STLMs, still in development by researchers, contain between 10 to 100 million.

Recent examples show that when smaller and finely-tuned datasets are combined with advanced techniques, SLMs and STLMs can produce richer, more specific outputs that increasingly outperform larger models across various benchmarks in language, reasoning, coding, and math.

Set by researchers and engineers, parameters shape how the models process inputs and generate outputs. Think of parameters like camera settings, which photographers adjust to capture optimal photos in different environments and for different objects and persons; similarly, language models adjust parameters for optimal text and audio predictions.

The more parameters these models have, the more detailed and specific the content they can create, and the broader range of topics they can cover. However, this comes at a cost: they need significantly larger amounts of data to train them effectively. So, LLMs are extraordinarily costly to develop and predominantly controlled by a few large companies.

The emergence of SLMs and STLMs, however, presents new opportunities due to their more modest requirements. According to  VentureBeat, these smaller models operate on smaller datasets with fewer parameters, reducing the need for powerful hardware, extensive computational resources, and cloud-based data centres. This not only translates to a more cost-effective and scalable solution, but also reduces the environmental impact of AI development, making it potentially accessible to a broader range of businesses and organizations while promoting more sustainable and socially responsible technology use.

You might be wondering: if LLMs are so impressive due to the massive quantity of data and parameters they have, wouldn’t SLMs and STLMs be less efficient or less accurate due to their smaller scale?

In fact, smaller models can be more efficient in specific, targeted applications, and this efficiency is actually due to their reduced characteristics. This makes them a practical choice when the vast amounts of resources needed for an LLM aren't available or necessary. 

In other words, while an LLM is capable of performing all the functions of an SLM or STLM in a more integrated and perhaps advanced manner, SLMs and STLMs excel when they are designed with a specific focus and specialized for particular functions. 

For instance, a financial authority could deploy an STLM for fraud detection in specific transactions, without requiring extensive computational power. In education, SLMs might provide personalized learning experiences and real-time feedback to students. And, although not exclusively, the reduced resource requirements could open new avenues for non-profit organizations and local businesses to adopt AI tools—a current challenge highlighted by the Dais and Canadian Chamber of Commerce, among other public policy organizations.

Mechanics and methods: How do SLMs and STLMs work?

Firstly, the capabilities of all language models—whether large or small—are dramatically confined by the volume and type of data they’re trained on. The content these models generate is essentially a rearrangement or combination of pre-existing elements derived from the data. Because SLMs and STLMs are designed to carry out specific functions, the size of their datasets is optimized to serve those roles. In contrast, the aim of an LLM is to capture more comprehensive language patterns, nuances, and knowledge, requiring much larger datasets.

While LLMs can do incredible things by analyzing vast and diverse datasets, SLMs and STLMs have their own distinct advantages—and these are influencing further research. For example, an LLM might analyze countless online articles and books to refine its understanding of how multiple languages (human and computer) are constructed for different topics and genres. In contrast, an SLM might be trained on a specific dataset, such as medical journals, to provide specialized language processing for healthcare applications. And an STLM might be trained similarly but on an even more compact dataset, as demonstrated by Microsoft researchers who used a very small model to generate millions of children's stories from a discrete dataset of 3,000 words. Research on these smaller models is beginning to identify that they are not only more resource-efficient and interpretable (for more on explainability, see here) but also allow for focused, specialized applications, making them valuable tools in areas where precision and efficiency are key. 

Size is also crucial when it comes to the amount of parameters used for these models. As mentioned earlier, parameters are what models use to learn from their training data and make predictions. 

While increasing the number of parameters typically enhances the performance of language models, researchers are now demonstrating how certain processing techniques—such as quantization, pruning, and knowledge distillation—can compensate for fewer parameters and meet the needs of resource-constrained models. These techniques, eloquently explained in further detail by experts at the Center for Security and Emerging Technology, essentially enhance efficiency and maintain high performance despite the smaller scale nature of SLMs and STLMs. This sets emerging SLMs and STLMs apart from earlier models of similar size and ensures that these models remain computationally effective, practical, and cost-efficient for various specialized applications.

Recent examples show that when smaller and finely-tuned datasets are combined with advanced techniques, SLMs and STLMs can produce richer, more specific outputs that increasingly outperform larger models across various benchmarks in language, reasoning, coding, and math.

In specialized areas, this method proves particularly valuable, offering more purposeful solutions compared to the complexity and broader scope of LLMs. For example, an SLM might power mental health AI chatbots or provide personalized audio tours in museums and art galleries. In another vein, an STLM could be integrated into devices and applications to support emergency responders during natural disasters or be tailored to understand non-standard speech patterns, thereby empowering people with speech impairments.

Small models, big impact

SLMs and STLMs are showing their potential to drive us towards a safer and more sustainable AI-enabled economy. Yet they also underscore the need for policymakers to be wary of upcoming challenges.

The reduced technical requirements of these smaller models are facilitating the enhancement of some aspects of AI safety. For instance, SLMs and STLMs can offer enhanced privacy and security due to reduced technical vulnerabilities. As underscored by early adopters, their lower computational needs allow for local processing on devices or on-premises servers; this can improve data security and oversight by companies and organizations, and reduce the possibilities for nefarious actors to undermine systems. This could be particularly valuable for critical sectors where AI adoption lags due to extensive risk around operational integrity and security, such as nuclear energy or public services.

In addition, researchers are discovering that language models trained on carefully selected small datasets are less likely to generate false or misleading information (commonly referred to as “hallucinations,” explained in a recent SRI Seminar Series talk). This is particularly valuable as AI is integrated into critical sectors, services, and applications that could potentially cause widespread harm. According to experts, it also reduces risk exposure for companies and organizations looking to adopt AI technologies without inheriting the potential risks associated with larger, more unpredictable language models.

This approach is particularly beneficial for supporting AI regulation. In healthcare, for instance, AI-enabled medical devices supported by SLMs could allow regulators to conduct technical audits more effectively. This contrasts with LLMs, which face ongoing auditing challenges due to significant resource requirements and model complexity. Additionally, by streamlining regulatory processes, these technologies may also support RegTech tools in the future, where SLMs and STLMs are used to augment regulatory and compliance efforts for more efficient outcomes.

The potential for small models to make advanced AI more accessible to a broader range of organizations and communities is becoming increasingly evident. However, this accessibility could also heighten potential risks, including unforeseen safety issues, misuse, and more widespread job displacement. To manage these potential risks effectively, comprehensive oversight and consideration of regulatory innovations, such as the AI registries advocated by Gillian K. Hadfield and colleagues are essential. Although governance research on SLMs and STLMs is still in its early stages, these measures could deepen critical insights into the technology’s potential, while assessing small models alongside larger ones to quantify risks, identify emerging threats, and support the development of a more sustainable and safe AI ecosystem.

The death of LLMs?

It has been suggested that smaller language models may mark the end of larger ones. Whether this is true remains to be seen. However, given the advantages of SLMs and STLMs, it's clear that many organizations will make the switch. As Clem Delangue, CEO and co-founder of Hugging Face, wrote regarding SLMs: “You don't need a million-dollar Formula 1 car to get to work every day, and you don't need a banking customer success chatbot to tell you the meaning of life!” At the same time, proposed legislation targeting large, resource-intensive models may encourage a shift toward smaller, more efficient alternatives.

Nevertheless, the emergence of SLMs and STLMs illustrates the rapid pace of innovation in the AI field, highlighting the importance of educating society, decision-makers, and users about this transformative technology. As AI increasingly becomes part of our daily lives, it could be argued that ensuring AI safety is contingent on first ensuring widespread factual knowledge and understanding of the very mechanisms that put us at risk.


Want to learn more?


About the author

Jamie A. Sandhu is a policy researcher at the Schwartz Reisman Institute for Technology and Society at the University of Toronto. With several years of experience, including work at the United Nations, various European organizations, and the Government of Canada, he specializes in geopolitics, international security, and both technology governance and the use of technology to enhance governance processes. Jamie is driven to shape policy and regulation that balances industry needs, institutional integrity, socioeconomic mechanisms, and societal well-being. His dedication has earned him a track record of guiding decision-makers in tackling cross-sector socio-economic challenges arising from technological advancements and leading efforts to bridge knowledge gaps among stakeholders to achieve shared goals and common understanding. His expertise is supported by a BA in international relations from the University of British Columbia, complemented by an MSc in politics and technology from the Technical University of Munich. Jamie’s current research interests revolve around international cooperation on AI and advancing AI safety through a socio-technical approach to AI governance.


Related Posts

 
Previous
Previous

SRI experts tackle questions about AI safety, ethics during panel discussion

Next
Next

SRI partners with Data Sciences Institute on “Toward a Fair and Inclusive Future of Work with ChatGPT”