Build an open, managed, and intelligent data lakehouse on Google Cloud
Unify and govern your multimodal data with a high-performance data lakehouse that is integrated with Google's industry leading AI. Get the most out of Apache Iceberg and the industry's first autoscaling serverless Spark to simplify data processing, analytics, and AI initiatives.
New innovations in open data formats, intelligent data-to-AI governance, accelerated data processing, and advanced AI-assisted development tools all combine to streamline data management and accelerate innovation.
BigLake
Get the openness of Apache Iceberg with enterprise-grade storage management
BigLake provides a native Iceberg storage engine for Cloud Storage interoperability, delivers unified runtime metadata management, enables advanced analytics and data science, and provides automated data management with built-in governance. Any Iceberg compatible engine can use BigLake’s automated table management to enhance query performance and reduce costs.
“Partnering with Google Cloud has been instrumental in our journey to build Snap's next-generation, open lakehouse and democratize Spark and Iceberg in our developer community!”
Zhengyi Liu, Senior Manager, Software Engineer, Snap
Google Cloud Serverless for Apache Spark
Serverless Spark delivers high performance and lightning-fast processing—no cluster management required
Transform your lakehouse with Google Cloud Serverless for Apache Spark. Experience rapid startup and zero operational overhead while improving performance for your Spark workloads with the new Lightning Engine. This powerful combination with Gemini boosts productivity and offers up to 60% lower TCO.
“We see SQL and Spark as two complementary ways of accessing and transforming data. Spark is especially useful to us in use cases that require complex business logic, which although niche, are extremely business-critical. Having a unified platform for SQL, Spark, and AI, with the development experience in notebooks will considerably simplify these critical use cases.”
Andrés Sopeña Pérez, Head of Content Engineering, Trivago
Dataplex Universal Catalog
Simplify data discovery, understanding and trust for your data lakehouse
Dataplex Universal Catalog is the unified data-to-AI governance solution for Google Cloud. The AI-powered catalog centralizes business, technical, and operational metadata across Google Cloud and provides AI-powered insights. It supports open formats like Apache Iceberg to enable integrated governance across your entire lakehouse.
“Dataplex has been instrumental in transforming our data platform into a secure, efficient, and scalable data ecosystem. With a focus on data governance, discovery, observability, and security compliance, we are equipped to meet the challenges of data management in the digital age. Dataplex empowers our teams to unlock the full potential of data and drive Box.Inc's continued growth and innovation.”
Asmita Kulkarni Senior Product Manager, Box.Inc
BigQuery Studio and IDE extensions
Enhancing Apache Spark for advanced data science and AI/ML workloads within lakehouse architectures by streamlining development and operations
Dataproc advances Spark for AI/ML on lakehouses with new innovations for ML Runtimes with GPU drivers and common ML libraries. Colab Enterprise notebooks in BigQuery Studio and third party IDEs provide integrated MLOps with Vertex AI and streamlined production pipelines to accelerate data science.
“Shopify has invested in employing a team with a diverse array of skill sets to remain ahead of trends for data science and engineering. In early testing with BigQuery Studio, we liked Google's ability to connect different tools for different users within a simplified experience. We see this as an opportunity to reduce friction across our team without sacrificing scale we expect from BigQuery.”
Zac Roberts, Data Engineering Manager, Shopify
BigLake
Get the openness of Apache Iceberg with enterprise-grade storage management
BigLake provides a native Iceberg storage engine for Cloud Storage interoperability, delivers unified runtime metadata management, enables advanced analytics and data science, and provides automated data management with built-in governance. Any Iceberg compatible engine can use BigLake’s automated table management to enhance query performance and reduce costs.
“Partnering with Google Cloud has been instrumental in our journey to build Snap's next-generation, open lakehouse and democratize Spark and Iceberg in our developer community!”
Zhengyi Liu, Senior Manager, Software Engineer, Snap
Google Cloud Serverless for Apache Spark
Serverless Spark delivers high performance and lightning-fast processing—no cluster management required
Transform your lakehouse with Google Cloud Serverless for Apache Spark. Experience rapid startup and zero operational overhead while improving performance for your Spark workloads with the new Lightning Engine. This powerful combination with Gemini boosts productivity and offers up to 60% lower TCO.
“We see SQL and Spark as two complementary ways of accessing and transforming data. Spark is especially useful to us in use cases that require complex business logic, which although niche, are extremely business-critical. Having a unified platform for SQL, Spark, and AI, with the development experience in notebooks will considerably simplify these critical use cases.”
Andrés Sopeña Pérez, Head of Content Engineering, Trivago
Dataplex Universal Catalog
Simplify data discovery, understanding and trust for your data lakehouse
Dataplex Universal Catalog is the unified data-to-AI governance solution for Google Cloud. The AI-powered catalog centralizes business, technical, and operational metadata across Google Cloud and provides AI-powered insights. It supports open formats like Apache Iceberg to enable integrated governance across your entire lakehouse.
“Dataplex has been instrumental in transforming our data platform into a secure, efficient, and scalable data ecosystem. With a focus on data governance, discovery, observability, and security compliance, we are equipped to meet the challenges of data management in the digital age. Dataplex empowers our teams to unlock the full potential of data and drive Box.Inc's continued growth and innovation.”
Asmita Kulkarni Senior Product Manager, Box.Inc
BigQuery Studio and IDE extensions
Enhancing Apache Spark for advanced data science and AI/ML workloads within lakehouse architectures by streamlining development and operations
Dataproc advances Spark for AI/ML on lakehouses with new innovations for ML Runtimes with GPU drivers and common ML libraries. Colab Enterprise notebooks in BigQuery Studio and third party IDEs provide integrated MLOps with Vertex AI and streamlined production pipelines to accelerate data science.
“Shopify has invested in employing a team with a diverse array of skill sets to remain ahead of trends for data science and engineering. In early testing with BigQuery Studio, we liked Google's ability to connect different tools for different users within a simplified experience. We see this as an opportunity to reduce friction across our team without sacrificing scale we expect from BigQuery.”
Zac Roberts, Data Engineering Manager, Shopify
10-min read
10-min read
10-min read