Hugging Face offers inference as a service powered by Nvidia NIM

GamesBeat is excited to partner with Lil Snack to have customized games just for our audience! We know as gamers ourselves, this is an exciting way to engage through play with the GamesBeat content you have already come to love. Start playing games here.

Hugging Face is offering developers an inference-as-a-service powered by Nvidia NIM microservices.

The new service will bring up to five times better token efficiency with popular AI models to millions of
developers and enables immediate access to NIM microservices running on Nvidia DGX Cloud.

The companies made the announcements during Nvidia CEO Jensen Huang’s talk at the Siggraph computer graphics conference in Denver, Colorado.

One of the world’s largest AI communities — comprising four million developers on the Hugging Face platform — is gaining easy access to Nvidia-accelerated inference on some of the most popular AI models.

Lil Snack & GamesBeat

New inference-as-a-service capabilities will enable developers to rapidly deploy leading large language models such as the Llama 3 family and Mistral AI models with optimization from Nvidia NIM microservices running on Nvidia DGX Cloud.

Announced today at the Siggraph conference, the service will help developers quickly prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production. Hugging Face Enterprise Hub users can tap serverless inference for increased flexibility, minimal infrastructure overhead, and optimized performance with Nvidia NIM.

Kari Briski, vice president of generative AI software product management, said in a press briefing that the time for putting generative AI into production is now, but for some this can be a daunting task.

“Developers want easy ways to work with APIs and prototype and test how a model might perform within their application for both accuracy and latency,” she said. “Applications have multiple models that work together connecting to different data sources to achieve a response, and you need models across many tasks and modalities and you need them to be optimized.”

This is why Nvidia is launching generative AI and Nvidia NIM microservices.

The inference service complements Train on DGX Cloud, an AI training service already available on Hugging Face.

Developers facing a growing number of open-source models can benefit from a hub where they can easily compare options. These training and inference tools give Hugging Face developers new ways to experiment with, test and deploy cutting-edge models on Nvidia-accelerated infrastructure. They’re made easily accessible using the “Train” and “Deploy” drop-down menus on Hugging Face model cards, letting users get started with just a few clicks.

Inference-as-a-service powered by Nvidia NIM

Nvidia NIM is a collection of AI microservices — including Nvidia AI foundation models and open-source community models — optimized for inference using industry-standard application programming interfaces, or APIs.

NIM offers users higher efficiency in processing tokens — the units of data used and generated by a language model. The optimized microservices also improve the efficiency of the underlying Nvidia DGX Cloud infrastructure, which can increase the speed of critical AI applications.

This means developers see faster, more robust results from an AI model accessed as a NIM compared with other versions of the model. The 70-billion-parameter version of Llama 3, for example, delivers up to five times higher throughput when accessed as a NIM compared with off-the-shelf deployment on Nvidia H100 Tensor Core GPU-powered systems.

The Nvidia DGX Cloud platform is purpose-built for generative AI, offering developers easy access to reliable accelerated computing infrastructure that can help them bring production-ready applications to market faster.

The platform provides scalable GPU resources that support every step of AI development, from prototype to production, without requiring developers to make long-term AI infrastructure commitments.

Hugging Face inference-as-a-service on Nvidia DGX Cloud powered by NIM microservices offers easy access to compute resources that are optimized for AI deployment, enabling users to experiment with the latest AI models in an enterprise-grade environment.

Microservices for OpenUSD framework

Nvidia is bringing OpenUSD to metaverse-like industrial applications.

At Siggraph, Nvidia also introduced generative AI models and NIM microservices for the OpenUSD framework to accelerate developers’ abilities to build highly accurate virtual worlds for the next evolution of AI.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Tags: AI, business, category-/Computers & Electronics/Programming, category-/Science/Computer Science, gamesbeat, Hugging Face, nvidia, Nvidia NIM, siggraph

Hugging Face offers inference as a service powered by Nvidia NIM

Inference-as-a-service powered by Nvidia NIM

Microservices for OpenUSD framework

Google is testing a feature for real-time conversations within Search

New York Times Tech Workers Go on Strike on Eve of 2024 Election, Alleging Unfair Labor Practices

Apple Addresses Apple Intelligence: What Microsoft Missed With Copilot

AI could transform visual effects in film — but the emerging field is mired in copyright issues

X’s Community Notes aren’t flagging election misinformation

You may have missed

Taylor Swift Fans Sue Ticketmaster for Price Gouging

What Okta’s failures say about the future of identity security in 2025

Trump revoking Biden AI EO will make industry more chaotic, experts say

Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don’t tell the whole story

From traditional workspaces to “sanctuaries”: how Mo Hamzian is shaping the culture of remote work

Get to Know Us