Home  >  Companies  >  Databricks
Databricks
Data lakehouse software for managing, querying, and analyzing enterprise-scale AI and data workloads

Revenue

$4.00B

2025

Valuation

$100.00B

2025

Growth Rate (y/y)

50%

2025

Funding

$9.60B

2025

Details
Headquarters
San Francisco, CA
CEO
Ali Ghodsi
Website
Milestones
FOUNDING YEAR
2013

Revenue

Sacra estimates that Databricks reached $4B in annual recurring revenue (ARR) in August 2025, up from $3.0B at the end of 2024.

As of June 2024, Databricks was reporting 80% gross margins, down from 85% a year prior, with net dollar retention is at 140%.

In April 2023, their data warehousing product—Databricks SQL—hit $100M in ARR one year after launching. By year later, Databricks SQL had grown to $400M ARR.

The company’s pay-as-you-go model aligns billing with usage, charging customers based on their tier, the processing power consumed, and the duration of use. Premium and enterprise tiers offer advanced capabilities, including enhanced security, governance, and data processing features. Databricks runs on Microsoft Azure, Google Cloud, and AWS, with charges varying slightly across platforms.

Primarily serving large enterprises, Databricks manages contracts valued in the millions annually. By June 2024, the company had over 11,500 customers globally, with an average contract value (ACV) of $208,696, demonstrating its appeal to enterprise customers with significant data processing needs.

Valuation

In September 2025, Databricks closed a $1B Series K at a ~$100B valuation, a 61% step-up from its $62B valuation as of December 2024, bringing total funding to about $9.6B.

Previously, Databricks raised a Series J funding round in December 2024, which raised $10 billion in non-dilutive financing. That round, led by Thrive Capital and co-led by Andreessen Horowitz, DST Global, and Insight Partners, equated to a 20.6x forward revenue multiple based on its 2024 ARR of $3 billion.

With $14+ billion raised to date, key investors include T. Rowe Price, NVIDIA, and strategic partners such as Microsoft and BlackRock.

Product

Databricks' Data Intelligence Platform combines data lake storage, data warehouse analytics, and machine learning in a single cloud-hosted environment. It is designed to consolidate systems for storing raw data, running analytics queries, and training AI models.

At its foundation, Databricks uses Delta Lake, an open-source storage layer that brings database-like reliability and performance to data lakes stored in cloud object storage like AWS S3 or Azure Blob Storage. Unity Catalog sits on top as a governance layer, managing permissions, lineage, and metadata across all data and AI assets with fine-grained access controls.

Core components include Databricks SQL, which provides a warehouse-style interface for business analysts to run queries and create dashboards using familiar SQL syntax. For data scientists and engineers, collaborative notebooks support Python, R, Scala, and SQL for interactive data exploration and model development.

Mosaic AI covers the machine learning lifecycle from feature engineering and model training to deployment and monitoring. It can fine-tune and serve both open-source and proprietary models, with built-in support for frameworks like TensorFlow and PyTorch.

Recent additions include Lakebase, which adds transactional database capabilities for operational workloads, and Agent Bricks, a framework for building AI agents that can access and reason over enterprise data while maintaining governance controls.

Delta Lake

Delta Lake is an open-source storage layer that provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing for data lakes. Delta Lake enables organizations to build reliable data pipelines and maintain data quality at scale.

Delta Lake is the key component of Databricks’s push to move into the BI and data analytics category to compete with data warehouse companies such as SnowFlake, Amazon, and others.

Spark

Databricks’s core offering is managed Spark clusters, groups of machines used for running data analysis. It gives a web-based portal to data scientists to create these Spark clusters for running their data analysis workloads. The portal also consists of a notebook-like workspace for data scientists to collaboratively write queries in SQL, Python, etc., and a scheduler for running data pipelines on a regular schedule that data engineers can use as a replacement for Airflow or Prefect.

MosaicML

In June 2023, Databricks acquired MosaicML for $1.3B, a move aimed at bolstering its capabilities in training large language models (LLMs) and image generation models. MosaicML has developed tools and infrastructure to simplify and reduce the cost of running LLMs from data preparation to training and managing infrastructure.

The training of LLMs like GPT-3, involves significant costs, but MosaicML claims to be able to train GPT-3 quality models for its customers for as little as $325K (compare to $368K for Google LaMDA, $1M for Bloom, and $841K for GPT-3).

These prices from MosaicML also include a full suite of ML Ops tools, thus further reducing the additional personnel required to train a model reliably.

MLFlow

MLFlow is an open-source platform for managing the machine learning lifecycle. It includes capabilities for experiment tracking, model versioning, and model deployment. MLflow helps data scientists and engineers streamline the process of developing and productionizing machine learning models.

Databricks SQL

Databricks SQL is a data warehouse that lets users run SQL on top of Delta Lake, create visualizations, and build/share dashboards aimed toward data analysts in organizations that are used to running queries in SQL.

AI

In September 2025, Databricks announced a multi‑year, minimum $100M partnership to integrate OpenAI’s latest models—including GPT‑5—directly into Agent Bricks and its data platform, following earlier additions of OpenAI’s open‑weight models (gpt‑oss 20B/120B). The company also unveiled its Mooncake technology, targeting up to 100x acceleration for agentic AI by eliminating ETL for analytics and AI.

Neon

In May 2025, Databricks announced its acquisition of Neon, a cloud-based database software startup, for approximately $1 billion.

Neon, founded in 2021, has quickly attracted over 18,000 customers—including OpenAI, Adobe, and Replit—by offering a scalable, open-source alternative to AWS Aurora Postgres. Neon’s architecture separates compute from storage, improving cost efficiency and flexibility for cloud-native workloads.

The addition of Neon expands Databricks’s capabilities in database management and strengthens its position as a unified data infrastructure platform, deepening its support for developers and AI-centric enterprises.

Business Model

Databricks uses a B2B, consumption-based SaaS model in which customers pay for compute, storage, and data processing usage rather than fixed licenses or seat counts. Pricing ties to utilization and scales as data volumes and AI workloads grow.

The company generates revenue through Databricks Units (DBUs), which customers consume when running workloads on the platform. Different types of compute clusters and AI services have different DBU rates, with more advanced capabilities like GPU-accelerated machine learning priced at higher rates.

Customers typically start with specific use cases like data engineering or business intelligence, then expand into additional workloads as they consolidate their data stack onto the platform. The unified architecture creates switching costs, since moving integrated data pipelines, trained models, and governance policies to alternative platforms requires significant re-engineering effort.

Databricks maintains gross margins in the 70-80% range typical of cloud software companies, with primary costs including cloud infrastructure, data processing, and the underlying compute resources provided by AWS, Microsoft Azure, and Google Cloud Platform. The platform runs across all three major cloud providers, reducing dependence on any single hyperscaler while enabling customers to avoid vendor lock-in.

Expansion within existing accounts is common. As organizations build more data pipelines, train additional AI models, and onboard new user groups, consumption increases without requiring new sales cycles.

Competition

None

Cloud data warehouses

Snowflake is Databricks' most direct competitor in the cloud analytics market, with both companies targeting enterprise data teams and offering similar consumption-based pricing models. Snowflake has historically been more concentrated in pure analytics workloads and business intelligence use cases, while Databricks has focused on machine learning and data engineering.

Competition has intensified as both platforms expand into each other's core areas. Snowflake has added machine learning capabilities and launched Snowpark for advanced analytics, while Databricks has expanded its SQL analytics through Databricks SQL and added features for business users.

Both companies are competing to be the primary platform for AI workloads, with Snowflake launching Cortex for AI applications and Databricks investing in its Mosaic AI capabilities and open-source model development.

Hyperscaler integrated stacks

Amazon Web Services, Microsoft Azure, and Google Cloud Platform each offer integrated data and AI services that compete with Databricks' unified platform approach. AWS provides services like Redshift, SageMaker, and EMR, while Microsoft offers Azure Synapse Analytics and Azure Machine Learning as part of its broader Fabric platform.

These hyperscalers have structural advantages including zero-egress pricing for data movement within their ecosystems, tight integration with other cloud services, and the ability to bundle data platform costs into broader enterprise agreements. Microsoft's partnership with Databricks on Azure creates a dynamic in which the companies collaborate on one cloud while competing on others.

Google Cloud has partnered with Databricks to integrate Gemini models natively into the platform, indicating a more collaborative stance relative to AWS and Microsoft.

Specialized AI platforms

A new category of AI-native platforms is emerging to challenge Databricks' position in machine learning and generative AI workloads. Companies like Palantir offer specialized AI applications for specific industries, while newer entrants focus on making AI development more accessible to non-technical users.

The rise of foundation models and AI agents is creating opportunities for platforms that specialize in model fine-tuning, deployment, and orchestration. Databricks competes by offering these capabilities as part of its broader data platform, positioning integrated workflows as an alternative to point solutions.

TAM Expansion

Operational databases

Databricks is expanding beyond analytical workloads into operational database markets through Lakebase and the acquisition of Neon, a serverless Postgres platform. This move addresses the $100 billion operational database market and enables the company to support real-time applications and transactional workloads alongside analytics.

The integration of operational and analytical data on a single platform eliminates the traditional boundaries between OLTP and OLAP systems. This unified approach is particularly valuable for AI applications that need to access both historical data for training and real-time data for inference.

By supporting the full spectrum of data workloads, Databricks can capture more of the total data infrastructure spend within existing customer accounts and attract new customers who need operational database capabilities.

AI agent development

The emergence of AI agents represents a significant expansion opportunity as organizations move beyond traditional analytics to autonomous AI systems. Databricks' Agent Bricks framework and integration with large language models positions the company to capture value from the growing market for AI application development.

Enterprise AI agents require access to governed data, model serving infrastructure, and monitoring capabilities—all areas where Databricks has existing strengths. The platform's ability to ground AI models in enterprise data while maintaining security and compliance controls creates a natural expansion path.

As AI agents become more prevalent in business processes, the consumption of compute resources for model inference and data processing is expected to grow substantially, driving increased platform usage.

Vertical solutions

Databricks is developing industry-specific solutions that package its platform capabilities for particular sectors like cybersecurity, financial services, and healthcare. The Data Intelligence for Cybersecurity offering demonstrates how the company can create specialized applications while leveraging its core data platform.

These vertical solutions allow Databricks to compete more effectively against specialized vendors in each industry while commanding premium pricing for domain-specific functionality. The approach also enables faster sales cycles by providing pre-built solutions that address common industry use cases.

The company's partnerships with industry leaders like SAP and Palantir create additional channels for reaching vertical markets and integrating with existing enterprise software ecosystems.

Risks

Hyperscaler competition: Databricks depends on AWS, Microsoft Azure, and Google Cloud Platform for underlying infrastructure while competing with their integrated data services. These cloud providers could change pricing, limit API access, or enhance their own offerings in ways that disadvantage Databricks, particularly as they prioritize their own unified data platforms.

AI commoditization: The rapid advancement of open-source AI tools and models could reduce demand for Databricks' proprietary AI capabilities. If foundation models become commoditized and AI development tools become freely available, customers might choose to build their own solutions rather than pay for Databricks' integrated platform, pressuring both pricing and differentiation.

Implementation complexity: Despite efforts to broaden access to data and AI, Databricks remains a sophisticated platform that requires significant technical expertise to implement and optimize. Organizations with limited data engineering capabilities may struggle to realize value from the platform, potentially limiting adoption among mid-market customers and slowing expansion beyond technical teams.

Funding Rounds

Share Name Issue Price Issued At
Series K $150.00 Sep 2025
Share Name Issue Price Issued At
Series J $92.50 Dec 2024
Share Name Issue Price Issued At
Series I $73.50 Jun 2024
Share Name Issue Price Issued At
Series H $73.48413 Aug 2021
Share Name Issue Price Issued At
Series G $59.122633 Feb 2021
Share Name Issue Price Issued At
Series F $14.316133 Oct 2019
Share Name Issue Price Issued At
Series E $7.0959667 Feb 2019
Share Name Issue Price Issued At
Series D $2.794133 Aug 2017
Share Name Issue Price Issued At
Series C $1.966 Dec 2016
Share Name Issue Price Issued At
Series B $1.1151204 Jun 2014
Share Name Issue Price Issued At
Series A $0.25945 Sep 2013
View the source Certificate of Incorporation copy.

News

DISCLAIMERS

This report is for information purposes only and is not to be used or considered as an offer or the solicitation of an offer to sell or to buy or subscribe for securities or other financial instruments. Nothing in this report constitutes investment, legal, accounting or tax advice or a representation that any investment or strategy is suitable or appropriate to your individual circumstances or otherwise constitutes a personal trade recommendation to you.

This research report has been prepared solely by Sacra and should not be considered a product of any person or entity that makes such report available, if any.

Information and opinions presented in the sections of the report were obtained or derived from sources Sacra believes are reliable, but Sacra makes no representation as to their accuracy or completeness. Past performance should not be taken as an indication or guarantee of future performance, and no representation or warranty, express or implied, is made regarding future performance. Information, opinions and estimates contained in this report reflect a determination at its original date of publication by Sacra and are subject to change without notice.

Sacra accepts no liability for loss arising from the use of the material presented in this report, except that this exclusion of liability does not apply to the extent that liability arises under specific statutes or regulations applicable to Sacra. Sacra may have issued, and may in the future issue, other reports that are inconsistent with, and reach different conclusions from, the information presented in this report. Those reports reflect different assumptions, views and analytical methods of the analysts who prepared them and Sacra is under no obligation to ensure that such other reports are brought to the attention of any recipient of this report.

All rights reserved. All material presented in this report, unless specifically indicated otherwise is under copyright to Sacra. Sacra reserves any and all intellectual property rights in the report. All trademarks, service marks and logos used in this report are trademarks or service marks or registered trademarks or service marks of Sacra. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any report is strictly prohibited. None of the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied or distributed to any other party, without the prior express written permission of Sacra. Any unauthorized duplication, redistribution or disclosure of this report will result in prosecution.