Important note: this blog contains a brief summary of the developments of the Machine Learning Model Development using AutoML use case. For more details and information, please access the complete official document of this work.
This blog outlines the development and deployment of a machine-learning solution aimed at predicting credit card fraud. By leveraging AutoML and Google Cloud services, the solution provides a robust framework for addressing fraud detection, ultimately helping companies safeguard their financial interests.
In this section, we address the business question of predicting credit card fraud cases from a set of features. This allows companies (such as credit card or insurance companies) to anticipate fraud cases and take preventive actions. By deploying a machine learning model to predict fraud cases, the company can enhance its fraud detection capabilities and improve financial outcomes.
In this blog, we address the critical business need of predicting credit card fraud. This is vital for companies like credit card and insurance companies to proactively identify and mitigate fraud cases, ensuring financial stability and security.
The focus is to develop a machine-learning solution that accurately predicts credit card fraud. This involves presenting a complete machine learning workflow, from data exploration to model deployment. The goal is to provide a predictive classifier model that can forecast credit card transaction outcomes (fraud or not fraud) based on a given set of features.
The proposed solution is a predictive classifier model, trained using AutoML and deployed on the Google Cloud Model Repository. This model will receive requests and provide forecasts for credit card transaction outcomes, enabling the company to act swiftly on predicted fraud cases and enhance their financial results.
The data for this demo consists of two datasets:
Both datasets were built from the original Raw creditcard.csv file provided by Kaggle.
The workflow includes:
The objective is to illustrate the implementation of a complete machine-learning workflow to achieve accurate predictions.
The use case involves presenting a complete machine learning workflow, from data exploration to model deployment, to predict the outcome of credit card transactions (fraud or not fraud). A predictive classifier model, trained using AutoML, is deployed on the Google Cloud Model Repository to an endpoint for making online or batch predictions.
By implementing a machine learning solution to predict credit card fraud, companies can proactively address fraud cases, thus safeguarding financial assets and improving overall financial health.
For effective data exploration, partners should:
Key Findings
CODE SNIPPET:
CODE SNIPPET:
Figure 1: Countplot of response Class variable
Feature Engineering Steps
Feature Engineering Details
CODE SNIPPET:
All preprocessing steps were aligned with the feature engineering steps previously mentioned.
For Demo 3, a predictive classifier was trained using AutoML, with default options ensuring automatic model training and selection.
No specific libraries were used for model training and selection, as AutoML handled these processes.
AutoML handled all aspects of model training, validation, and selection automatically.
Figure 2: Steps in training classifier model in Auto ML
In the figure above, we chose the Other option.
Figure 3: Steps in training classifier model in Auto ML
Figure 4: Steps in training classifier model in Auto ML
The train-test split was designed to balance the training dataset while leaving the test dataset unbalanced.
The development followed Google's ML Best Practices, utilizing recommended products and tools:
AutoML provides a comprehensive set of metrics and performance indicators. For the final model in the training dataset, the recommended metric is the Area Under the ROC curve due to the slightly unbalanced nature of the training data.
The final classification model achieved the following:
Overall, the machine learning workflow demonstrated how to implement a comprehensive solution to predict credit card fraud, adhering to best practices and optimizing model performance.
Figure 5: Final classification model performance on training set
Figure 6: Final model Confusion matrix
We see by the presented confusion matrix that the final model (provided by AutoML) predicted correctly all non-fraud cases and 91 % of the fraud cases in the train set. This is a classic symptom of overfitting the data.
Figure 7: Final Model Feature Importances
From the figure above we see that feature V14 was the most important feature in the explanation of response variable Class.
Figure 8: Final model Precision-recall, ROC, and Precision-recall by threshold curves.