Build an open, managed, and intelligent data lakehouse on Google Cloud

Unify and govern your multimodal data with a high-performance data lakehouse that is integrated with Google's industry leading AI. Get the most out of Apache Iceberg and the industry's first autoscaling serverless Spark to simplify data processing, analytics, and AI initiatives.

Bring the power of Google to your open data lakehouse

New innovations in open data formats, intelligent data-to-AI governance, accelerated data processing, and advanced AI-assisted development tools all combine to streamline data management and accelerate innovation.

BigLake

Get the openness of Apache Iceberg with enterprise-grade storage management

BigLake provides a native Iceberg storage engine for Cloud Storage interoperability, delivers unified runtime metadata management, enables advanced analytics and data science, and provides automated data management with built-in governance. Any Iceberg compatible engine can use BigLake’s automated table management to enhance query performance and reduce costs.


“Partnering with Google Cloud has been instrumental in our journey to build Snap's next-generation, open lakehouse and democratize Spark and Iceberg in our developer community!”

Zhengyi Liu, Senior Manager, Software Engineer, Snap

BigLake

Get the openness of Apache Iceberg with enterprise-grade storage management

BigLake provides a native Iceberg storage engine for Cloud Storage interoperability, delivers unified runtime metadata management, enables advanced analytics and data science, and provides automated data management with built-in governance. Any Iceberg compatible engine can use BigLake’s automated table management to enhance query performance and reduce costs.


“Partnering with Google Cloud has been instrumental in our journey to build Snap's next-generation, open lakehouse and democratize Spark and Iceberg in our developer community!”

Zhengyi Liu, Senior Manager, Software Engineer, Snap

Google Cloud Serverless for Apache Spark

Serverless Spark delivers high performance and lightning-fast processing—no cluster management required

Transform your lakehouse with Google Cloud Serverless for Apache Spark. Experience rapid startup and zero operational overhead while improving performance for your Spark workloads with the new Lightning Engine. This powerful combination with Gemini boosts productivity and offers up to 60% lower TCO.


“We see SQL and Spark as two complementary ways of accessing and transforming data. Spark is especially useful to us in use cases that require complex business logic, which although niche, are extremely business-critical. Having a unified platform for SQL, Spark, and AI, with the development experience in notebooks will considerably simplify these critical use cases.”

Andrés Sopeña Pérez, Head of Content Engineering, Trivago

Dataplex Universal Catalog

Simplify data discovery, understanding and trust for your data lakehouse

Dataplex Universal Catalog is the unified data-to-AI governance solution for Google Cloud. The AI-powered catalog centralizes business, technical, and operational metadata across Google Cloud and provides AI-powered insights. It supports open formats like Apache Iceberg to enable integrated governance across your entire lakehouse.


“Dataplex has been instrumental in transforming our data platform into a secure, efficient, and scalable data ecosystem. With a focus on data governance, discovery, observability, and security compliance, we are equipped to meet the challenges of data management in the digital age. Dataplex empowers our teams to unlock the full potential of data and drive Box.Inc's continued growth and innovation.”

Asmita Kulkarni Senior Product Manager, Box.Inc

BigQuery Studio and IDE extensions

Enhancing Apache Spark for advanced data science and AI/ML workloads within lakehouse architectures by streamlining development and operations

Dataproc advances Spark for AI/ML on lakehouses with new innovations for ML Runtimes with GPU drivers and common ML libraries. Colab Enterprise notebooks in BigQuery Studio and third party IDEs provide integrated MLOps with Vertex AI and streamlined production pipelines to accelerate data science.


“Shopify has invested in employing a team with a diverse array of skill sets to remain ahead of trends for data science and engineering. In early testing with BigQuery Studio, we liked Google's ability to connect different tools for different users within a simplified experience. We see this as an opportunity to reduce friction across our team without sacrificing scale we expect from BigQuery.”

Zac Roberts, Data Engineering Manager, Shopify

Start your data lakehouse journey today

Whether you're migrating legacy systems or architecting an Iceberg-first lakehouse, Google Cloud has the technology to help you build an open, managed, and AI-ready lakehouse.


Google Cloud