Microsoft makes Phi-3 generally available, previews its Phi-3-vision multimodal small language model
Microsoft’s Phi-3 models are now generally available ahead of the AI PC era. The company also revealed its Phi-3-vision multimodal variant. …
Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.
Microsoft is making its Phi-3 lightweight model family available to developers nearly a month after first announcing its release. Phi-3-medium, Phi-3-small, and Phi-3-mini are available to developers with Phi-3-mini made a part of Azure AI. In addition, the company is also showing off a multimodal variant of the small model called Phi-3-vision featuring 4.2 billion parameters.
Phi-3 for all
Developed by Microsoft Research, Phi-3 is a powerful 3 billion parameter language model designed to pack as much punch in reasoning as larger models but at a significantly lower cost. It’s the fourth iteration in compact language models Microsoft has been working on—Phi-1 was developed a year ago before making way for Phi-1.5 and Phi-2.
Not every use case calls for a large language model. The push to enact AI locally or on-device is leading developers to seek out more capable and smaller options. And the number of offerings is growing from not only Phi-3, but also includes Google’s Gemma 2 and Hugging Face’s Zephyr. But Microsoft didn’t build one small model—Phi-3 comes in three options: Phi-3-mini has 3.8 billion parameters, Phi-3-small has 7 billion parameters, and Phi-3-medium has 14 billion parameters. The company has said it performs just as well as OpenAI’s GPT-3.5 but in a more lightweight form.
The timing of Phi-3’s public release is no coincidence, with the dawn of the AI PC coming soon. Developers can now use the different variants to bring their AI implementations to laptops, mobile devices, and wearables.
What we know about Phi-3-vision
Besides releasing Phi-3, Microsoft is introducing a new model variant that supports general visual reasoning tasks as well as chart, graph and table reasoning. Called Phi-3-vision, it has 4.2 billion parameters. When implemented, users can ask questions about a chart or use an open-ended question to inquire about a specific image.
Incidentally, Google also debuted its own lightweight multimodal model last week at its developer conference. PaliGemma offers similar capabilities but has 3 billion parameters, slightly smaller than Microsoft’s version.
Having AI that can interpret multiple forms of input is valuable to developers, and if there’s a way to provide a model with the performance of an LLM but at a fraction of the cost, it could grow adoption.
Though announced as a preview, Microsoft has not revealed when Phi-3-vision will be publicly available.