Consolidation looms for pack of upstack data tools

Industry activity suggests consolidation may lie ahead for a noisy ecosystem of upstack data tools that ride top data platforms. …

Check out all the on-demand sessions from the Intelligent Security Summit here.


In recent years, a new breed of cloud data platforms has arisen right in the backyard of hyperscale mainstays such as AWS and Microsoft. Today, Snowflake, Databricks and a handful of others are successfully driving enterprise data efforts, enabling global giants to connect, store and generate insights from information flowing from different sources.

The solutions provide companies with tremendous power and capabilities. But their dominance has also triggered a “gold rush” of sorts. Case in point: a massive surge in the number of upstack tools for the data infrastructure.

A crowded ecosystem of tools has arisen in the wake of Snowflake’s and Databricks’ successes. The tool vendors seek to unlock the potential of modern data platforms. Yet as their ranks are growing, they may also see consolidation. Signs of that were seen earlier this week in analytics engineering house dbt Labs’ agreement to acquire Transform, which has sought to create a semantic data layer to better integrate the modern data stack.

While players like Snowflake and Databricks provide a platform to host the data and build applications, they can’t do it all. There are plenty of areas in the data lifecycle that these solutions do not fully serve — like data ingestion, transformation, orchestration, management and observability. Modern-day upstack tools, provided by third-party vendors, fill these gaps.

Event

Intelligent Security Summit On-Demand

Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.

Watch Here

“A large number of companies are vying to provide different products and services to companies [that] are trying to build on top of the Snowflake and Databricks ecosystems,” according to Sean Knapp, founder and CEO of Ascend.io, which automates data and analytics engineering workloads. Knapp told VentureBeat that the problem of crowding in this space has been compounded with overfunding, resulting in many potential features thriving among many separate companies.

Evolution of data monoliths

When data platforms rose to the fore, the earliest adopters looked to address their immediate pain points by building the required software solutions on their own. This was the first wave in the evolution of upstack data tools, when there was no pattern or widespread adoption to justify the existence of enterprise solutions.

Gradually, as needs emerged from the early adopter era, the second wave of point solutions arose. This is where most enterprises are right now. They take whatever specialized data tools they can find to solve small pieces of the puzzle and achieve significant gains in short timeframes.

Today, Snowflake and Databricks support partner tools in the dozens. Some popular ones come from dbt Labs, Matillion and Prophecy (for data prep and transformation); Hightouch Hevo and Fivetran (for data ingestion); and Anomalo and Lightup (for data quality).

Meanwhile, business intelligence stalwarts like Alteryx, PowerBI and Tableau tailor analytics and visualization tooling now widely used in Snowflake and Databricks implementations.

There is much overlap in what the vendors provide, and many solutions also cover aspects like data science and observability.

Most available upstack tools do the job well, but when there are too many solutions for different capabilities on the same infrastructure, teams may end up architecting extremely complex data ecosystems. They have to assemble, integrate and manage all their disparate tools at the same time, which means paying not only for the technology in use but for engineering time and opportunity cost. This directly impacts ROI.

Further, when data bounces among multiple tools, it becomes very difficult to tune and optimize its movement and processing.

“Moving from a simple monolithic model to a complex model with hundreds or even thousands of interdependencies can lead to a data ecosystem that is difficult to understand and maintain, requires many costly licenses, and forces a steep learning curve for user training and onboarding,” Ben Haynes, co-founder and CEO of Directus, told VentureBeat. Directus fields a data platform which includes a “back-end-as-service engine” for developers along with no-code tooling for non-technical users.

The different component services within stacks are constantly moving objects.

“If one of the services advances and another stagnates or is no longer supported, the integrations and dependencies between them may break,” Ascend.io’s Haynes added. “One dependency breaking can have a domino effect, bringing operations to a halt. Because microservices often don’t perfectly bookend to each other, there can also be gaps in capabilities that need to be filled with custom code and logic.”

Are new waves of consolidation ahead?  

As teams tire of managing dozens of tools, and standard patterns emerge of what’s needed in the long run, the third wave, “rapid consolidation,” is expected to rise. Here teams will look to implement a single platform that unifies most, if not all, of the capabilities they use. Such capabilities often include ingestion, transformation and observability. Teams will look to reduce complexity and better focus on core product requirements.

“What our data does, how we’re doing it, or how we’re applying the information may be different, but there are many common patterns. As we see these patterns emerge, there’s tremendous value in creating a single platform that unifies a lot more of these capabilities,” Knapp explained. 

“With consolidation, our teams don’t have to spend the majority of their time just cobbling together and integrating tools, which is non-value add,” he added. “The more unified system makes them more efficient and paves the way for new advancements. You can, for instance, apply really advanced layers of intelligence to data lifecycle because you have more unified metadata and can build automated systems. 

For his part, Directus leader Haynes sees a balanced “hub-and-spoke” model emerging, where the hub serves as a baseline of common or critical functionality, doing 80% of the job, but still provides the option to easily connect other business-critical hyper-specialized tools such as those from Stripe, Hubspot or Salesforce. 

Broadly, the consolidation of upstack tools is expected to be driven by private equity-driven mergers and acquisitions, especially those led by the dominant data platforms. 

Snowflake, for instance, recently announced the decision to acquire Myst for time-series forecasting as well as SnowConvert to aid cloud migration. Similarly, last month, Thoma Bravo-owned Qlik announced its intent to join efforts with Talend, another Thoma Bravo-owned entity.

“It makes a ton of sense for the Snowflakes and the Databricks of the world to be very acquisitive. Whether we see really big acquisitions right now or whether they come towards the latter half of this year or the next year is a point of question. I’d probably bet more on the latter half of this year and early part of next year,” Knapp said. For Snowflake and Databricks, he added, there will be some level of caution around acquiring entities that could create competitive dynamics inside of their ecosystems.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Live Updates for COVID-19 CASES