Google
4 min lire

Stop Data Loss In BigQuery With HYCU's New Atomic Backups

Rédigé par
Sathya Sankaran
Publié le
February 13, 2025
Partager sur les réseaux sociaux

For years, HYCU has been the most comprehensive data protection solution for Google Cloud workloads.  HYCU protects more Google Cloud services than any other backup solution or service. In recognition of this long-standing innovation for Google Cloud users, Google named HYCU the Google Cloud Partner of the Year for Backup and DR at Google Cloud Next 24 in Las Vegas for its continued commitment to raise the bar  for resiliency and recovery of customer’s crown jewels, their data.  

Google Cloud is often fondly called “The Data Cloud”, with its extremely robust data portfolio. Google BigQuery is the center piece of Google’s data strategy and is the unified data platform that allows users to store, analyze, and visualize multi-petabytes of multi-modal data.

BiqQuery is a fully managed service that supports structured and unstructured data, including open table formats; supports multiple processing engines; processes data across multiple clouds; ingests data in batches and through real time streaming (IOT, Social Media Feeds, Events etc.)  

In the age of AI, it is important to note that AI comes to where data is and not the other way around. With BigQuery ML, AI models are being democratized and made accessible to anyone with basic SQL skills. This makes BigQuery one of the most strategic workloads for all enterprises. Many industry leaders such as Walmart, Spotify, Wayfair, Home Depot, Ford, and Palo Alto Networks rely on BigQuery as their data platform of choice.

To ensure your Google BigQuery data is adequately protected across multiple failure domains, you’ll need enterprise-class backup and recovery that is comprehensive in what it protects, consistent across dependent datasets and granular during recovery. HYCU R-Cloud is the first and only enterprise backup solution to add backup and recovery support for Google BigQuery. We have customers protecting BigQuery data at several TBs/minute.

Taking innovation for BigQuery one step forward, HYCU now supports Atomic Backup Sets for BigQuery. Atomic Backup Sets are designed to ensure consistent views and queries of data spread across multiple datasets in BigQuery. Whether it is dependent datasets from different sources or cross-referencing views across different datasets, organizations are now protected against data loss for much longer than the weeklong Time Travel window available to BigQuery users with a consistent copy.

Why Data Protection is Important for BigQuery

The number one reason you need to protect your BigQuery data is to prevent data loss. Data loss in Google BigQuery can occur for a number of reasons, so it's crucial to be aware of the risks. Here are some common scenarios:

  • Zone-Level and Lower-Level Failures: Hardware or network issues in a specific zone can make your data unavailable or even lost if it's not replicated across other zones.
  • Regional Failures: Major events like natural disasters can affect an entire region. If your backups are only stored there, you might lose access to your data when you need it most.
  • Bugs in SQL Code: Small errors in SQL queries can accidentally delete or corrupt data if safeguards aren't in place.
  • Human Error: Accidental deletions or misconfigurations can lead to unintended data loss.
  • Insider Threats: Authorized individuals might intentionally delete or leak data, posing serious risks to your data's security.

Being aware of these risks helps you take steps to protect your data in BigQuery.

The high cost of recreating your BigQuery data set

Traditionally, data warehouses are a copy of transformed data from multiple sources, and many wonder why they need to be backed up. However, an important consideration is to factor in the time taken and the costs involved in recreating the warehouse if there is sustained data loss. Costs include:

  • ETL(Extract, Transform, Load)
  • Streaming
  • API
  • Pipeline services, egress, and more.  

In addition, with massive scaling systems like BigQuery, many customers rely on real time event streaming to populate the data warehouse and many times recreating it would not even be possible because their only copy of data is stored as a BigQuery dataset.

While Time Travel and Snapshot capabilities are available through the service, protection past seven days requires a backup. Modern regulations like DORA require a larger failure domain for critical applications. Most regulated industries, like Healthcare and Financials, are also subject to compliance, long-term retention, and durability requirements.  

Why Atomic Backup Sets?

While BigQuery can easily handle massive datasets, it is common for BigQuery users to segment their data into several datasets. This segmentation offers them better control over:  

  • Data Organization and Management
  • Granular Access Control
  • Performance and Query Optimization
  • Query Cost Management
  • Data Lifecycle/Record Expiration Management

Even with segmented datasets, BigQuery offers several ways to analyze and mine data across these datasets through federated queries, cross-dataset joins, views etc. Views are virtual tables that provide a way to encapsulate complex queries and present them as simple tables. This is particularly useful for creating reusable queries that can be shared across different teams and often becomes the primary method through which data is consumed by BigQuery users.  

During backup, it is therefore important that these underlying datasets are protected with a version from the same point in time to make these views reliable. Another key point to note is that as these datasets get larger, traditional backups create larger inconsistency window and thus making these atomic backup sets more critical.

It is also important to note that Exporting data from BigQuery doesn’t include the Time Travel data and you can’t retrace back to a consistent point. As a result, having an ability to create coordinated consistency at the time of backup is critically important.

Atomic Backup Sets is a powerful new capability that allows users to group datasets and ensure they are backed up at the same point in time across the entire set. This is particularly useful for maintaining data integrity across related datasets.

Benefits of Using Atomic Backup Sets

  1. Data Integrity: Ensures that related datasets are consistent with each other, preventing discrepancies that can arise from exporting datasets at different times. Views that reference tables in other datasets are common and exporting these dependent datasets together help achieve better consistency.
  1. Simplified Management: Grouping datasets makes it easier to manage and organize your data exports.
  1. Enhanced Reliability: By protecting datasets at the same point in time, you reduce the risk of data mismatches and improve the reliability of your data analysis.

How easy is it to accomplish Atomic Backup Sets?

At HYCU, we always strive to make it easy for customers. Creating Atomic Backup Sets are as simple as creating an Atomic-Backup-set label with the associated datasets tagged. This label lets you define which datasets should be grouped together. When a backup is initiated, all datasets with the same Atomic-Backup-set label value will be protected using the same point in time, ensuring consistency grouping across your data. This grouping is today only available to BigQuery datasets hosted within the same region.  

Getting Started with Atomic Backup Sets

To start using Atomic Backup Sets in your BigQuery backups, follow these simple steps:

  1. Label Your Datasets: Add the Atomic-Backup-set label to the datasets you want to be protected together. HYCU will display a new group in the R-Cloud UI using the format <project name>_<region name>_<Atomic-Backup-set name>
  1. Associate Policy: Associate your desired backup policy to the new group in HYCU R-Cloud. When the policy kicks off backup for BigQuery, HYCU will automatically group and backup the BigQuery datasets with the same Atomic-Backup-set label at the same point in time.

Recovering BigQuery Datasets

Your recovery options remain flexible. You can continue to restore individual datasets and tables, to the same project or a different project with the same name or a new name. Any dataset part of the Atomic Backup Set will have recovery points that were protected at the same point in time. When you restore datasets, the views and routines are also restored along with it.  

Conclusion

HYCU's introduction of Atomic Backup Sets in BigQuery exports is a significant step forward in data management. By leveraging consistency groupings and atomic backups, you can ensure that your BigQuery datasets are consistent, reliable, and easier to manage. Whether you're dealing with large-scale data analysis, trending, mining on historical data, or simply need to maintain data integrity, Atomic Backup Sets provide a robust solution to meet your needs.

Ressources complémentaires

Photo de Shive Raja

Head of Cloud Products

Sathya Sankaran is a seasoned cloud technology executive currently serving as Head of Cloud Products at HYCU. Previously, as the founder and general manager of CloudCasa by Catalogic, he successfully launched and scaled a pioneering backup-as-a-service platform for Kubernetes workloads, leading it to achieve market leadership status. With over a decade of experience in cloud and data protection, Sankaran has demonstrated a talent for identifying market opportunities and delivering innovative solutions that address critical challenges in cloud infrastructure.

Découvrez la première plateforme SaaS de protection des données

Essayez HYCU par vous-même et devenez un adepte.