Anyscale optimizes open source AI deployments with Endpoints

This week at the Ray Conference that runs from Sept 18-19 in San Francisco, Robert Nishihara is outlining the success and growth of Ray to date, and revealing what’s next.  …

Head over to our on-demand library to view sessions from VB Transform 2023. Register Here

With generative AI increasingly becoming table stakes for organizations, the big question facing many organizations is how to scale usage in a cost efficient manner.

That’s a question that Robert Nishihara, CEO  and co-founder of Anyscale is looking to answer. Anyscale is the lead commercial vendor behind the widely deployed open source Ray framework for distributed machine learning training and inference. This week at the Ray Conference that runs from Sept 18-19 in San Francisco, Nishihara is outlining the success and growth of Ray to date, and revealing what’s next. 

Among the big pieces of news announced today is the general availability of Anyscale Endpoints, which enables organizations to easily fine tune and deploy open source large language models (LLMs). Anyscale is also announcing a new expanded partnership with Nvidia that will see Nvidia’s software for inference and training optimized for the Anyscale Platform.

“If you took an Uber ride, ordered something on Instacart, listened to something on Spotify, or watched Netflix or TikTok, or use OpenAI’s Chat GPT, you’re interacting with models built with Ray, ” Nishira told VentureBeat. “It’s really everywhere.”


VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

Register Now

Nishihara said that in his keynote at Ray Summit he will detail how various vendors have been able to scale AI and cut costs at the same time. Among the success metrics he’s sharing is that Instacart is now able to train models up to 12 times faster with 100 times more data than ever before. Pinterest was able to cut costs by 40% for its AI processing training thousands of models.

“I’m really trying to hammer in the point that if you care about cost and performance, Ray is the way to go for LLMs and generative AI,” he said.

From the aviary to Anyscale Endpoints

From a product perspective, the Anyscale Platform is the commercially supported version of Ray providing enterprise capabilities that organizations rely on to scale and deploy any type of training for inference.

The new Anyscale Endpoints service provides a different capability. Nishihara explained that 

Anyscale Endpoints is a service that provides API access to open source LLMs without having the need for an organization to deploy or manage the models on their own. He added that AnyScale Endpoints allows developers to easily integrate LLMs into their products through a simple API, in much the same way many organizations make use of the OpenAI API today. 

“With AnyScale Endpoints, customers can query models like Llama 2 to get responses,” Nishihara said. “AnyScale handles running and optimizing the models behind the scenes.”

In some respects, Anyscale Endpoints benefits from development the company has been doing with its open source Aviary project, which debuted in May.

Aviary is an open source project for running open source LLMs on top of Ray. Nishihara noted that while Aviary allows users to run LLMs themselves using Python code on Ray, AnyScale Endpoints provides a simpler API experience where users just query models through the API without having to deploy anything. AnyScale takes care of running and optimizing the models behind the scenes.

Fine Tuning and private deployments improve open source LLM utility

Anyscale is also enabling fine tuning for the open source LLMs like Llama 2.

With LLMs, there are often both large and small models, with it typically costing more for organizations to use the larger models. As such, Nishihara noted that many organizations are looking to use smaller models to reduce costs, but the challenge is that those smaller models aren’t necessarily as good as the large ones. Fine tuning is one way that Nishihara said he is seeing organizations make smaller models work. 

With fine tuning, organizations can customize a model to improve performance and quality on a specific task. This can help make smaller, more cost-efficient models that are viable alternatives to larger ones.

Going a step further, when it comes to customized training and data, some organizations don’t feel comfortable using publicly accessible LLMs. To help support those users, Anyscale is also launching a Private Endpoints service that enables the deployment of Anyscale Endpoints, within an organization’s own virtual private cloud (VPC). With Private Endpoints, sensitive customer data and models never have to leave a company’s own infrastructure. It also provides opportunities to deeply customize and optimize the backend deployment. 

The overall goal for Nishihara is to focus on efficiency and making it cheaper for organizations to work with LLMs.

“We’re an infrastructure company, the advantage we have is deep expertise in performance optimizations and infrastructure, and we’re going to do everything we can to really double down on that and just continue to make it faster and cheaper,” he said

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Live Updates for COVID-19 CASES