The concept of Data Warehouses (DWs) has been around since the late 80s, and it is still a critical component for any company that wants to adopt a data-driven culture. Here's why Google Cloud Platform's BigQuery might be the right choice to host your data warehouse.

Data Warehouses and the Cloud

The primary purpose of a data warehouse is to collect and store data from many different data sources and to make this data available for fast, reliable, secure, and easy retrieval, as well as subsequent analysis and insight.

With the rise of cloud computing, big cloud computer providers like Amazon, Azure, and Google (among others) also offer their own data warehouse solutions. These cloud providers make it easier to manage and horizontally scale our data warehouses while also facilitating the integration of the data warehouse with the providers' other tools.

GCP Advantages

In comparison with its competitors, a big advantage that Google Cloud Platform (GCP) offers is its serverless data warehouse: BigQuery. With BigQuery, you don’t have to worry about managing, providing, or dimensioning any infrastructure; instead, you can focus on your data and on how you can use it to improve your company’s products, services, operations, and decision making.

Just like the other most modern Data Lakehouse tools, BigQuery separates storage from processing. This separation helps achieve better availability, scalability, and cost-efficiency. For the storage component, BigQuery uses Colossus, Google’s global storage system; and for processing, it uses Dremel, a large, multi-tenant cluster that executes Standard SQL queries. These resources are orchestrated using Borg, Kubernetes' predecessor, and they communicate using Jupiter, Google’s petabit network, as shown below:

Figure 1: BigQuery Architecture, image courtesy of Google Cloud

GCP and Google Product Integrations

Another big advantage that GCP offers is easy integration with other GCP and Google products, which makes it the best choice for website analytics. Google Analytics corporate version, GA 360, can easily export its data to Big Query, which enables us to better understand our customer’s journey and behavior. Having the data in BigQuery also makes it easier to relate the user data with external sources and apply machine learning models using BigQuery ML (which we will cover in detail in further posts), generating deeper and better insights using a language that most data analysts are familiar with: SQL.

GCP also provides many other useful integrations, such as Cloud Storage, Bigtable, Pub/Sub, Dataflow, Data Studio, Looker, Data Catalog, Cloud Composer, and others, making it much easier to build an end-to-end data pipeline. And since these resources are serverless, we don’t have to worry about the provision and maintenance of any infrastructure. For example: combining BigQuery with Pub/Sub and Dataflow also allows an easier usage of real-time streaming analytics:

Figure 2: Complex event streaming reference architecture

BI Engine

BigQuery also offers BI Engine, an engine that improves BigQuery integration with data visualization tools like Data Studio, Looker, Tableau, QlikView, and Power BI, by providing: 

  1. Faster queries: BI Engine's in-memory analysis service allows us to reduce the response time for queries.
  2. Simplified architecture: BI Engine allows the extraction of data without complicated ETL jobs.
  3. Smart tuning: BI Engine's self-tuning tunes queries by moving data between BI Engine's in-memory storage, the BigQuery query cache, and BigQuery storage to improve performance and load times.
Why Choose Big Query

In summary, BigQuery can help you achieve the data availability and scalability your business needs, without any worries about the underlying infrastructure or operations, all with a competitive cost and a complete ecosystem of support for the most used business scenarios.

At Avenue Code, we have several Google Cloud Platform experts who can help you modernize your Data Warehouse to be highly available, scalable, and cost-efficient.

Additional Data Modernization Resources

If you enjoyed today's post, be sure to check out the other Snippets in our data analytics series: The 6 Pillars of Data Modernization Success and 4 Strategies to Boost Sales with Data Mining.


Author

Frederico Caram

Frederico Caram is a Data Architect at Avenue Code. He enjoys reading historical fantasy novels, ballroom dancing, and playing video games.


Testing Data Pipelines with Behavior Driven Development (BDD)

READ MORE

How the Mulesoft JWT Validation Policy Works

READ MORE

How to Use Redis Cache to Prevent DDoS Attacks

READ MORE

Data Mesh 101: How It Works and When to Use It

READ MORE