Looking to level up your data-driven decision making? Machine learning and artificial intelligence models in GCP can help you predict the future.

In our previous posts, we explained how data warehouses and data lakes can help us manage our data in a more secure, cost-effective, reliable, and scalable way. We also talked about how to use Google Cloud Platform to improve data pipelines and their orchestration, as well as how we can use data visualization to leverage business decision making.

In today's post, we are going to talk about a way to go a step further in improving our data-driven decision making. While data visualization can help us better understand what happened in the past through descriptive analysis and diagnosis, machine learning and artificial intelligence models can help us become more proactive with predictive and prescriptive analysis, and they can also help us extract information from unstructured data to enrich our analysis.

ML Models

There are several different machine learning models and architectures, but most of them fall into one or more of the following categories:

Supervised Learning: learns by examples, so we must provide the model with several inputs and the expected outputs so that the model can infer from the inputs and extrapolate the expected outputs. There are also some relevant sub-categories to this type of learning:

  1. Classification: The model learns how to determine to which category new observations belong based on the inputs. A classic example is filtering emails as "spam" or "not spam."
  2. Regression: The model learns how to predict an expected value from a series of input variables and how these influence the outcome.
  3. Forecasting: Forecasting is the process of predicting the future based on past and present time series.

Unsupervised Learning: uses the input data to extract patterns without relying on labeled outputs. Unsupervised learning models usually seek to organize data according to their features or characteristics, and they usually fall into one of the following groups:

  1. Clustering: Clustering groups the data based on distance or relationship criteria. It’s especially useful for segmenting data into several groups in order to perform analysis on them to identify patterns.
  2. Dimension reduction: Dimensionality reduction compacts the number of variables being considered based on their relationships and correlations; it may improve some models' performance and reduce overfitting.

Semi-Supervised Learning: is similar to supervised learning but can also use unlabeled data to increase the dataset used for inferring the expected value; it may also use some unsupervised learning techniques to enrich the data before the predictions.

Reinforcement Learning: uses a trial and error approach for learning. It tries a set of actions in an environment or in a model of the environment, and based on the result (which can be defined in the form of a reward), it can optimize its actions to better fit the problem.

There are tested machine learning models for a number of various applications, such as: Computer Vision, Natural Language, Recommendation Systems, Graphs, Time Series, Speech, Music, Audio, Robots, and Games. A good reference for research papers on models is Papers with Code.

Since some of these models are more broadly applicable, it's relatively easy to find some ready-to-use, pre-trained  implementations of these models in open source libraries or provided by some vendors as Managed Services.

Machine Learning in GCP

Google is one of the main players when it comes to machine learning, and Google Cloud Platform leverages this expertise to cover many different customer needs in an easy, cost-effective, and scalable way using Managed APIs, BigQuery ML, and Vertex AI:

Managed APIs: GCP provides Managed APIs that can be used to solve common Machine Learning problems without the need to train a new model or have deep knowledge regarding the underlying technology. Some of the APIs provided by Google are:

  • Speech API: Automatic speech recognition.
  • Vision API: Extract information from images and pictures.
  • Video API: Extract data and information from videos.
  • Natural Language API: Insightful text analysis with machine learning.
  • Translation: Dynamically translate between languages using Google machine learning.
  • OCR (Optical Character Recognition): Extract text from documents with world-class accuracy; support for more than 200 languages and handwriting recognition for 50 languages.
  • Document AI: Unlock insights from documents with machine learning.
  • Recommendations AI: Deliver highly personalized product recommendations at scale.

BigQuery ML: Allows businesses to build and deploy models based on SQL language inside BigQuery. Big Query ML also supports some common use case scenarios, such as:

  1. Regression
  2. Classification
  3. Clustering
  4. Forecasting
  5. Recommendation

GIF courtesy of Google.

Vertex AI: Vertex AI is a Managed Machine Learning and AI platform that helps you manage the whole lifecycle of your Machine Learning product. It offers a single interface and an API to apply Machine Learning models to different scenarios, as well as MLOps tools to remove the complexity of model maintenance. 

Vertex AI helps you train auto ML models with minimal code or create and manage custom models and their whole pipeline.

AutoML: AutoML uses various pre-trained models and customizes them to your business context. Some of GCP's AutoML offerings are:

  • AutoML Image: Derive insights from object detection and image classification, in the cloud or at the edge.
  • AutoML Video: Enable powerful content discovery and engaging video experiences.
  • AutoML Text: Reveal the structure and meaning of text through machine learning.
  • AutoML Translation: Dynamically detect and translate between languages.
  • AutoML Tabular: Automatically build and deploy state-of-the-art machine learning models on structured data.

Custom ML: Vertex AI also allows us to create, manage, and deploy custom models using well-known frameworks like TensorFlow Extended, Kubeflow, sklearn, and XGBoost; it also supports custom-made containers. 

Vertex AI also integrates with other GCP data tools such as BigQuery, Dataproc, Dataflow, and Cloud Storage, among others, making it easier to integrate your models in your data pipeline without having to worry about the underlying infrastructure.

Getting Started: The First Step to Data-Driven Decisions

As a certified Google Partner specializing in Data Analytics, our Avenue Code team has several Google Cloud Platform experts who can help you create, manage, deploy and integrate your models, making use of the best tools for each scenario and enabling you to make better use of your data to support your business decisions.

Want to know more about how to make the most of your data? Check out the other blogs in our data analytics series:

The 6 Pillars of Data Modernization Success

4 Strategies to Boost Sales with Data Mining

Modernizing Your Data Warehouse with BigQuery

Data Lakes: The Key to Data Modernization

What You Need to Know About Data Pipelines

Data Orchestration in GCP

What Every Company Needs to Know About Data Governance and Security

Data Visualization in GCP


Author

Frederico Caram

Frederico Caram is a Data Architect at Avenue Code. He enjoys reading historical fantasy novels, ballroom dancing, and playing video games.


Testing Data Pipelines with Behavior Driven Development (BDD)

READ MORE

How the Mulesoft JWT Validation Policy Works

READ MORE

How to Use Redis Cache to Prevent DDoS Attacks

READ MORE

Data Mesh 101: How It Works and When to Use It

READ MORE