Over the past decade, we've seen more and more companies moving to the cloud. This trend is complemented by the growing power of a large community of data consumers who collect, ingest, store, and analyze data to gain business insights that inform decision making. Of course, as cloud migration growth rates increase, so do new concerns regarding information management. 

Three of the biggest concerns are related to data protection, compliance and regulations, and visibility and control. In today's article, we'll explain each concern, as well as industry best practices to address them. 

1. Protecting Data (Digital Assets)

As we address this topic, we need to keep in mind that it is the biggest concern of moving to cloud computing since it involves storing business data in a public cloud infrastructure, and in most cases, companies deploy the enterprise system together with the business data. With the rise of security threats and breaches, data security is a sensitive topic. We've seen an increase in the number of ransomware cases around the world, and no organization wants to be the next victim.

In terms of risk management, protecting data against unauthorized access is a top priority in order to avoid data leaks with sensitive information ranging from personally identifiable information (PII) to confidential corporate information, intellectual property, and trade secrets. 

2. Compliance and Regulations

Different geographies have various sets of regulations that cover data management and security. Examples include the European Union's General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and industry-specific data protection standards like the Health Insurance Portability and Accountability Act of 1996 (HIPAA), the Global Legal Entity Identifier (LEI) in the financial industry, and the Payment Card Industry Data Security Standard (PCI DSS), to mention a few of the most important regulations.

Compliance teams are responsible for guaranteeing adherence to these regulations and standards, and they may have concerns about regulation oversights for data stored in the cloud.

3. Lack of Visibility and Control

Most organizations still struggle to understand the volume of data they have, what exactly they have, and where it's all stored. This raises questions about how much potential value this data creates versus the amount of risk it generates.

Unfortunately, many data management professionals and data consumers sometimes lack visibility into their own data landscape and don't know which data assets are available, where they are located, how and if they can be used, who has access to which data, and whether or not they should have access to it. This uncertainty limits companies' ability to further leverage their own data to improve productivity or drive business value.

Essential Processes for Data Governance

These risk factors highlight critical processes that are essential for data governance:

  1. Data Assessment
  2. Metadata Cataloging
  3. Data Quality
  4. Access Control Management
  5. Information Security

In fact, addressing these risks while enjoying the benefits provided by cloud computing has increased the value of understanding data governance, as well as discovering what is important for business operations and decision making. Below is a survey performed by Mckinsey Company that illustrates this point:

 

Lack of Data Quality. Image courtesy of McKinsey & Company.

What is Data Governance (and Why Do We Need It)?

Data governance is one part of the overall discipline of data management, albeit a very important one. Whereas data governance is about the roles, responsibilities, and processes that ensure accountability for and ownership of data assets, DAMA International defines data management as “an overarching term that describes the processes used to plan, specify, enable, create, acquire, maintain, use, archive, retrieve, control, and purge data.” 

In terms of the practical approach to data governance, the core mission of data governance teams is generally to:

  1. Optimize, organize, secure, govern, and regulate corporate data assets to ensure reliable and secure business insights;
  2. Influence and inform the future state designs that result from the overarching business transformation; and
  3. Build technologies, policies, and frameworks that make it easier and more intuitive for data consumers to do the right thing when it comes to protecting the corporation.

Data governance needs to be in place for the full data lifecycle, from the moment data is collected or ingested through the point at which that data is destroyed or archived. During the entire life cycle of the data, data governance focuses on making the data available to all data consumers in a form that they can readily access and understand in business terms.

This way, the data can be used to generate the desired business outcomes (analysis and insights) in addition to conforming to regulatory standards, if/where relevant. The final outcome of data governance is to enhance trust in the data.

Trustworthy data is a "must have" for using corporate data to support decision making, risk assessment, and management using key performance indicators (KPIs). Using data, you can increase confidence in the decision-making process by showing a process based on evidence.

 

Primary Data Governance Topics. Image courtesy of Finextra.

Data Governance Framework

The primary goal of a data governance framework is to support the creation of a single set of rules and processes for collecting, storing, and using data. In this way, the framework makes it easier to streamline and scale core governance processes, enabling you to maintain compliance and security standards, democratize data, and support decision making.

The framework should include discovery of data to create a  data catalog view across the line of business and/or corporate master data. This includes not only the data itself, but also data relationships and lineage, technical and business metadata, data profiling, data quality, data classification, data engineering, and the overall workflow.

A data governance framework supports the execution of data governance by defining the essential process components of a data governance program like:

  1. Implementing process changes to improve and manage data quality;
  2. Managing data issues and identifying data owners;
  3. Building a data catalog, creating reference data and master data;
  4. Data privacy, enforcing and monitoring data policies based on information classification and access control; and
  5. Driving data literacy, as well as provisioning and delivering data.

Outcomes can be measured and monitored throughout the execution of established processes, then optimized for trust, privacy, and data protection. Key outcomes include: tracking processes covering data quality and data proliferation; monitoring for data privacy and risk exposure; alerts for anomalies and the creation of an audit trail; and issue management and workflow facilitation.

An overall data governance program framework covering core macro activities. Image courtesy of Data Governance: The Definitive Guide: People, Processes, and Tools to Operationalize Data Trustworthiness.

Business Benefits of Robust Data Governance

It’s important to state the benefits of data governance that we can expect to achieve in both the short term and the long term. Setting a data governance strategy is critical, as is designing an operational model to run the data governance framework in stages to support the evolution of the model in accordance with the level of data governance maturity to be achieved, as illustrated below.

General overview of data governance maturity. Image adapted from IBM Maturity Model.

A good data governance strategy and a solid operational model allow companies to know that, whether the data they are accessing is current or historical data, it will be reliable and usable for analysis. The benefits of data governance can be summarized as follows:

Business benefits of a data governance program.

Data Governance with Google Cloud Platform

Google offers some of the most trusted tools to enable data governance at an organizational level. These include a Data Catalog that helps data discoverability, metadata management, and data class-level controls that allow for the separation of sensitive data from other data within containers, as well other tools like Data Loss Prevention and Identity Access Management.

Below is a GCP data governance infrastructure overview:

Data Catalog and DLP. Image courtesy of Google Cloud.

Data Catalog is a fully managed and scalable metadata management service from Google Cloud's Data Analytics family of products. Its focus is searching for insightful data, understanding data, and making data useful.

There are two main ways to interact with the Data Catalog:

  1. Searching data assets you have access to; and
  2. Tagging assets with metadata.

How does the Data Catalog work? The Data Catalog can catalog native metadata in data assets from the following Google Cloud sources:

  1. BigQuery Datasets, Tables, and Views;
  2. Pub/Sub Services Topics; and
  3. Dataproc Metastore databases and tables.

You can also use Data Catalog APIs to create and manage entries for custom data resource types. Once your data is catalogued, you can add your own metadata to these assets using tags.

In addition, Data Catalog interacts with Cloud Data Loss Prevention (DLP) to automatically identify sensitive data using Cloud Data Loss Prevention's powerful auto-tagging engine.

Conclusion

Data governance helps organizations better manage the availability, usability, integrity, and security of their corporate data. With the right technology, data governance can also deliver tremendous business value and support a company's digital transformation journey.

At its most basic level, data governance is about bringing data under control and keeping it secure. Successful data governance requires knowing where data is located, how it originated, who has access to it, and what it contains. Effective data governance is a prerequisite for maintaining business compliance, whether that compliance is self-imposed or mandated by an industry or external regulatory body.

The quality, veracity, and availability of data to authorized personnel can also determine whether an organization meets or violates stringent regulatory requirements.

Data Governance at Avenue Code

At Avenue Code, we have several Google Cloud Platform experts who can help you implement data governance processes based on market best practices to achieve high availability, usability, integrity, and security of corporate data.

Want to know more about how to make the most of your data? Check out the other blogs in our data analytics series:

The 6 Pillars of Data Modernization Success

4 Strategies to Boost Sales with Data Mining

Modernizing Your Data Warehouse with BigQuery

Data Lakes: The Key to Data Modernization

What You Need to Know About Data Pipelines

Data Orchestration in GCP


Author

Andre Soares

Andre Soares is a GCP Specialist with a developing career in data architecture and engineering, data governance, IT architecture, and soft architecture within technology companies like Avenue Code, SAP, Oracle, and Keyrus, as well as banks like Itaú - Unibanco (the largest LatAm Bank).


Testing Data Pipelines with Behavior Driven Development (BDD)

READ MORE

How the Mulesoft JWT Validation Policy Works

READ MORE

How to Use Redis Cache to Prevent DDoS Attacks

READ MORE

Data Mesh 101: How It Works and When to Use It

READ MORE