Hugging Face offers inference as a service powered by Nvidia NIM

Hugging Face is offering developers an inference-as-a-service powered by Nvidia NIM microservices. …


Hugging Face is offering developers an inference-as-a-service powered by Nvidia NIM microservices.

The new service will bring up to five times better token efficiency with popular AI models to millions of
developers and enables immediate access to NIM microservices running on Nvidia DGX Cloud.

The companies made the announcements during Nvidia CEO Jensen Huang’s talk at the Siggraph computer graphics conference in Denver, Colorado.

One of the world’s largest AI communities — comprising four million developers on the Hugging Face platform — is gaining easy access to Nvidia-accelerated inference on some of the most popular AI models.


Lil Snack & GamesBeat

GamesBeat is excited to partner with Lil Snack to have customized games just for our audience! We know as gamers ourselves, this is an exciting way to engage through play with the GamesBeat content you have already come to love. Start playing games now!


New inference-as-a-service capabilities will enable developers to rapidly deploy leading large language models such as the Llama 3 family and Mistral AI models with optimization from Nvidia NIM microservices running on Nvidia DGX Cloud.

Announced today at the Siggraph conference, the service will help developers quickly prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production. Hugging Face Enterprise Hub users can tap serverless inference for increased flexibility, minimal infrastructure overhead, and optimized performance with Nvidia NIM.

Kari Briski, vice president of generative AI software product management, said in a press briefing that the time for putting generative AI into production is now, but for some this can be a daunting task.

“Developers want easy ways to work with APIs and prototype and test how a model might perform within their application for both accuracy and latency,” she said. “Applications have multiple models that work together connecting to different data sources to achieve a response, and you need models across many tasks and modalities and you need them to be optimized.”

This is why Nvidia is launching generative AI and Nvidia NIM microservices.

The inference service complements Train on DGX Cloud, an AI training service already available on Hugging Face.

Developers facing a growing number of open-source models can benefit from a hub where they can easily compare options. These training and inference tools give Hugging Face developers new ways to experiment with, test and deploy cutting-edge models on Nvidia-accelerated infrastructure. They’re made easily accessible using the “Train” and “Deploy” drop-down menus on Hugging Face model cards, letting users get started with just a few clicks.

Inference-as-a-service powered by Nvidia NIM

Nvidia physical AI NIM microservices.

Nvidia NIM is a collection of AI microservices — including Nvidia AI foundation models and open-source community models — optimized for inference using industry-standard application programming interfaces, or APIs.

NIM offers users higher efficiency in processing tokens — the units of data used and generated by a language model. The optimized microservices also improve the efficiency of the underlying Nvidia DGX Cloud infrastructure, which can increase the speed of critical AI applications.

This means developers see faster, more robust results from an AI model accessed as a NIM compared with other versions of the model. The 70-billion-parameter version of Llama 3, for example, delivers up to five times higher throughput when accessed as a NIM compared with off-the-shelf deployment on Nvidia H100 Tensor Core GPU-powered systems.

The Nvidia DGX Cloud platform is purpose-built for generative AI, offering developers easy access to reliable accelerated computing infrastructure that can help them bring production-ready applications to market faster.

The platform provides scalable GPU resources that support every step of AI development, from prototype to production, without requiring developers to make long-term AI infrastructure commitments.

Hugging Face inference-as-a-service on Nvidia DGX Cloud powered by NIM microservices offers easy access to compute resources that are optimized for AI deployment, enabling users to experiment with the latest AI models in an enterprise-grade environment.

Microservices for OpenUSD framework

Nvidia is bringing OpenUSD to metaverse-like industrial applications.
Nvidia is bringing OpenUSD to metaverse-like industrial applications.

At Siggraph, Nvidia also introduced generative AI models and NIM microservices for the OpenUSD framework to accelerate developers’ abilities to build highly accurate virtual worlds for the next evolution of AI.