Microsoft unveils Phi-3 family of compact language models

작성자 해외뉴스
작성일 2024.05.13 21:58

조회 2,332

Microsoft has announced the Phi-3 family of open small language models (SLMs), touting them as the most capable and cost-effective of their size available. The innovative training approach developed by Microsoft researchers has allowed the Phi-3 models to outperform larger models on language, coding, and math benchmarks.

“What we’re going to start to see is not a shift from large to small, but a shift from a singular category of models to a portfolio of models where customers get the ability to make a decision on what is the best model for their scenario,” said Sonali Yadav, Principal Product Manager for Generative AI at Microsoft.

The first Phi-3 model, Phi-3-mini at 3.8 billion parameters, is now publicly available in Azure AI Model Catalog, Hugging Face, Ollama, and as an NVIDIA NIM microservice. Despite its compact size, Phi-3-mini outperforms models twice its size. Additional Phi-3 models like Phi-3-small (7B parameters) and Phi-3-medium (14B parameters) will follow soon.

phi-3-mini: 3.8B model matching Mixtral 8x7B and GPT-3.5

Plus a 7B model that matches Llama 3 8B in many benchmarks.

Plus a 14B model.https://t.co/2h0xahzUUS pic.twitter.com/XaED6mJL1V
— Mira (@_Mira___Mira_) April 23, 2024

“Some customers may only need small models, some will need big models and many are going to want to combine both in a variety of ways,” said Luis Vargas, Microsoft VP of AI.

The key advantage of SLMs is their smaller size enabling on-device deployment for low-latency AI experiences without network connectivity. Potential use cases include smart sensors, cameras, farming equipment, and more. Privacy is another benefit by keeping data on the device.

(Credit: Microsoft)

Large language models (LLMs) excel at complex reasoning over vast datasets—strengths suited to applications like drug discovery by understanding interactions across scientific literature. However, SLMs offer a compelling alternative for simpler query answering, summarisation, content generation, and the like.

“Rather than chasing ever-larger models, Microsoft is developing tools with more carefully curated data and specialised training,” commented Victor Botev, CTO and Co-Founder of Iris.ai.

“This allows for improved performance and reasoning abilities without the massive computational costs of models with trillions of parameters. Fulfilling this promise would mean tearing down a huge adoption barrier for businesses looking for AI solutions.”

Breakthrough training technique

What enabled Microsoft’s SLM quality leap was an innovative data filtering and generation approach inspired by bedtime story books.

“Instead of training on just raw web data, why don’t you look for data which is of extremely high quality?” asked Sebastien Bubeck, Microsoft VP leading SLM research.

Ronen Eldan’s nightly reading routine with his daughter sparked the idea to generate a ‘TinyStories’ dataset of millions of simple narratives created by prompting a large model with combinations of words a 4-year-old would know. Remarkably, a 10M parameter model trained on TinyStories could generate fluent stories with perfect grammar.

Building on that early success, the team procured high-quality web data vetted for educational value to create the ‘CodeTextbook’ dataset. This was synthesised through rounds of prompting, generation, and filtering by both humans and large AI models.

“A lot of care goes into producing these synthetic data,” Bubeck said. “We don’t take everything that we produce.”

The high-quality training data proved transformative. “Because it’s reading from textbook-like material…you make the task of the language model to read and understand this material much easier,” Bubeck explained.

Mitigating AI safety risks

Despite the thoughtful data curation, Microsoft emphasises applying additional safety practices to the Phi-3 release mirroring its standard processes for all generative AI models.

“As with all generative AI model releases, Microsoft’s product and responsible AI teams used a multi-layered approach to manage and mitigate risks in developing Phi-3 models,” a blog post stated.

This included further training examples to reinforce expected behaviours, assessments to identify vulnerabilities through red-teaming, and offering Azure AI tools for customers to build trustworthy applications atop Phi-3.

(Photo by Tadas Sar)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Microsoft unveils Phi-3 family of compact language models appeared first on AI News.