A credit union system is using Generative AI to enhance customer support, developing an intelligent chatbot designed to provide accurate and timely responses to user inquiries about their services, policies, and other relevant information.
The chatbot utilizes a Retrieval-augmented Generation (RAG) system that navigates through a dataset of public website documents, retrieves the most relevant information, and generates coherent responses based on the retrieved data. The RAG system is then fine-tuned using a large publicly available finance dataset, with a structure that includes 52K instruction-following examples related to financial topics.
The company's intelligent chatbot is a valuable tool for customer support and can provide users with the information they need, helping to improve customer satisfaction and reduce the workload on the customer support team.
Business Goal and Generative AI Solution
Business Challenge and Goal
The primary business goal of this project was to enhance customer support for a large Brazilian credit union system by developing an intelligent chatbot aimed at providing accurate and timely responses to user inquiries about its services, policies, and other relevant information.
Generative AI Use Case
This project’s use case for Generative AI is to implement an RAG system, which leverages a combination of document retrieval and Generative AI to deliver precise and contextually relevant answers to user questions. The RAG system is designed to search through a dataset of documents, retrieve the most relevant information, and generate coherent responses based on the retrieved data.
The Generative AI Solution in Practice
The Generative AI solution is applied in this project to address the business goal by:
- Automating content response: Generative AI can automatically generate high-quality answers, freeing up support teams to focus on more strategic initiatives and improve productivity.
- Improving efficiency: Generative AI can automate many tasks that are currently performed manually, helping businesses save time and money.
- Improving customer service: Generative AI can be used to provide personalized customer service, resolving customer issues quickly and efficiently.
- Reducing costs: Generative AI can help businesses save money by automating tasks and reducing the need for human labor.
Our solution involves an RAG system to answer financial questions about the institution using the context of all its website information. We leverage the latest technology in LangChain to build customized chains and workflows and Google Cloud services to build, train, and serve this service as a chatbot through an API.
As the backbone of this solution, we use the latest LLM from Google, Gemini-Pro—a model capable of handling our multi-language (Portuguese and English) requirements—with high scalability through Vertex AI and highly customizable outputs through Vertex AI fine-tuning.
Model Selection
We chose to work with Gemini Pro, the latest LLM from Google, as the foundational model that handles multiple tasks, comes with a long context window capable of handling more complex tasks, and offers immense fine-tuning capabilities, offering a comprehensive set of features that meet our specific needs.
Trained on a specific dataset or task, a fine-tuned model is prepared to perform better on that particular task compared to a general-purpose model. Since we needed a model that could handle the complexities and nuances of our specific use case, the fine-tuned model was essential to achieve the desired level of accuracy and performance. Therefore, we opted to explore options that offered fine-tuning capabilities.
The platform provides robust integration with Google Workspace, allowing us to incorporate AI-powered assistance into our daily workflows seamlessly. Additionally, Gemini Pro's user-friendly interface and customizable design options enable us to tailor the solution to our unique requirements.
Criteria for Model Selection:
- Relevance to the task: We chose a model that was specifically designed for the task of answering questions about the institution’s services and policies.
- Accuracy and reliability: We selected a model that has been shown to produce accurate and reliable results on similar tasks.
- Scalability: We chose a model that could be scaled to handle the large volume of inquiries that we expected to receive.
- Ease of integration: We selected a model that could be easily integrated with our existing chatbot platform.
OBS: No code snippets have been provided in this section because no code was used to select the foundational model—only size, flexibility, and scalability were taken into consideration for the choice.
Prompt Enrichment/Model Tuning
For this task, we chose the rich Alpaca LoRA dataset for its diverse set of data sources, which combines both structured financial data and conversational examples. This allows the model to be trained on a variety of inputs, enhancing its ability to generalize and perform well in different contexts.
The dataset was then split between training and validation sets, following a standard 80%–20% split. The model tuning process is deployed using Vertex AI’s Supervised Tuning Job, which allows us to monitor the training process and the metrics through the platform.
The Supervised Tuning Job handles the device usage and distribution automatically based on the model and dataset size, eliminating concerns about infrastructure. It also computes a loss metric that evaluates the accuracy of the responses compared to the dataset output.
Although the specific metric used for this computation was not provided, it is likely to be cross-entropy, which is the standard metric for Low-rank Adapters (LoRA) and other current state-of-the-art fine-tuning algorithms.
Dataset in GCP Cloud Storage
- Function used to define a column appropriate for the fine-tuning, as defined on https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning-about
- Definition of a numpy seed for dividing the dataset into training and validation
- Division of the dataset
Model Tuning Implementation
The model tuning is then deployed using the Vertex AI Supervised Tuning Job through its SDK, and the model tuning monitoring is available in the Vertex AI tuning platform, including the metrics and the model details.
Model Evaluation Metric
We used the default loss function from AutoML to determine the evaluation metric.
As seen on the validation and training metrics, the total loss of the model diminished over time.
Model Grounding
The model is grounded in real-world facts using an RAG framework:
Prompt Engineering
By applying prompt enrichment techniques, we can enhance the model's task comprehension, reduce ambiguity, and achieve more accurate and relevant results. These techniques significantly improve the quality of output from language models, resulting in more informed and meaningful responses.
Prompt enrichment elements used in this demo
- Prompt Templates
Using a template ensures consistent and structured prompts. By clearly defining a format with placeholders, the AI can generate responses that are both relevant and contextually appropriate.
- Role and Task Specification
By clearly defining the AI’s role and specific task, we help set a precise context. This ensures that the AI understands its purpose and the nature of the responses it should generate, leading to more accurate and contextually appropriate outputs.
- Source Attribution
By instructing the AI always to include the source of its information, the prompt ensures that the responses are not only informative but also verifiable, thereby increasing user trust.
Model Evaluation
Before a model can go into production, we need to ensure that it meets a certain quality standard—a process that can be achieved in several ways.
To filter out potentially harmful or misleading outputs, we utilize Vertex AI’s content filtering through LangChain’s library. Below are the “safety_settings” that block harmful content, such as detrimental content, which are then applied to the chain.
Demonstration of safety filters
Another way to ensure quality content is through human oversight using Reinforcement Learning with Human Feedback (RLHF). When we considered implementing this method in the fine-tuning process of our model, we discovered that the Gemini models are not yet supported for this type of action, as seen below:
Supported models for RLHF
We then had to decide between using the Gemini-Pro model or a T5 model as the foundation for our AI Agent. Given the reliability that Gemini-Pro offers and the fact that it comes pre-tuned with RLHF, we believed that sticking with our original choice would deliver better results.
Finally, we used a manually created test dataset to evaluate the RAG system since our fine-tuning dataset lacked questions specific to the company's services. Therefore, we built a custom dataset and evaluated it on coherence, fluency, and safety metrics.
To test if the LLM followed the designated instructions that were given to it, we asked it a list of questions, and its responses would then be used in the evaluation.
- The coherence metric evaluates the model's ability to provide a coherent response.
- The fluency metric assesses the model's mastery of language.
- The safety metric measures whether the response contains any unsafe text.
Below is the list of questions asked, meticulously developed to challenge the LLM and assess its ability to perform evaluations accurately. The goal was to push the model to its limits and ensure it adheres to the designated instructions while evaluating its responses:
Evaluation metrics for untuned LLM
Evaluation metrics of the tuned LLM
Our fine-tuned LLM outperformed the untuned version in terms of coherence and fluency, as demonstrated in the metrics above.
Furthermore, our RAG system demonstrated exceptional performance by generating coherent answers and achieving a perfect safety score.
Author
Davy Costa
Experienced professional in Software Development, Data Engineering and Analysis, Cloud Infrastructure, Machine Learning, and Generative AI.