From 58de064709535eb48acbd357833b4144ba3eabc1 Mon Sep 17 00:00:00 2001 From: chetan-hirapara Date: Tue, 12 Aug 2025 15:02:53 +0000 Subject: [PATCH 1/2] Complaint analysis using TD genAI --- .../Complaint_Analysis_Customer360.ipynb | 476 +++++ .../Complaint_Summarization.ipynb | 1594 +++++++++++++++++ .../Complaints_Classification.ipynb | 555 ++++++ .../Complaints_Clustering.ipynb | 804 +++++++++ .../Sentiment_Analysis.ipynb | 761 ++++++++ .../Topic_Modelling.ipynb | 507 ++++++ .../requirements.txt | 4 + 7 files changed, 4701 insertions(+) create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaint_Analysis_Customer360.ipynb create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaint_Summarization.ipynb create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaints_Classification.ipynb create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaints_Clustering.ipynb create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Sentiment_Analysis.ipynb create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Topic_Modelling.ipynb create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/requirements.txt diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaint_Analysis_Customer360.ipynb b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaint_Analysis_Customer360.ipynb new file mode 100644 index 00000000..51006c8a --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaint_Analysis_Customer360.ipynb @@ -0,0 +1,476 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1f44f3bc-51cc-47e3-a033-f2883ed97408", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " In-Database Complaints Analysis Integration with Customer360 using LLMs\n", + "
\n", + " \"Teradata\"\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "19c23c87-b5a5-4905-913e-d4f7980497fa", + "metadata": {}, + "source": [ + "

Introduction:

\n", + "

Complaints Analysis Integration with Customer360 is a comprehensive approach to managing customer complaints and feedback within the framework of a Customer 360-degree view using Teradata Vantage and Amazon Bedrock. This integration aims to provide a seamless and personalized customer experience by leveraging data from various sources, including CRM systems, marketing platforms, and social media.

The key components of this integration include:

\n", + "\n", + "
  1. Customer 360 Data Manager: Responsible for managing and maintaining a comprehensive view of customer data, including collection, integration, and analysis from multiple sources.
  2. Complaints Management Dashboard: Analyzes customer complaints, providing insights into complaint volume, trends, and resolution progress.
  3. Customer Insights: Tools for gaining insights into customer behavior and preferences, enabling targeted marketing campaigns and informed business decisions.

The benefits of this integration include:

  1. Improved Customer Experience: By integrating complaints analysis with Customer 360, businesses can address customer complaints more effectively, leading to increased customer satisfaction and loyalty.
  2. Data-Driven Decision Making: The integration provides a centralized platform for analyzing customer data, enabling businesses to make informed decisions about product development, marketing strategies, and customer engagement.
  3. Enhanced Customer Insights: The comprehensive view of customer data allows businesses to better understand customer needs and preferences, leading to more targeted and effective marketing efforts.
\n", + "\n", + "\n", + "

By integrating complaints analysis with Customer 360, businesses can create a more comprehensive and personalized customer experience, driving business growth and customer satisfaction.

\n", + "\n", + "

Steps in the analysis:

\n", + "
    \n", + "
  1. Connect to Vantage
  2. \n", + "
  3. Configure server-side LLM access using Teradatagenai package
  4. \n", + "
  5. Execute in-database Sentiment Analysis, Topic Modelling and Complaint Summarization
  6. \n", + "
  7. Integrate data with customer 360
  8. \n", + "
  9. Cleanup
  10. \n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "439c593b-d12a-4edf-8b5f-6070f1122914", + "metadata": {}, + "source": [ + "
\n", + "

Download and install additional software as needed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ebc0a97b-78f8-4274-9844-d3b1d31848e6", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install --upgrade -r requirements.txt --quiet" + ] + }, + { + "cell_type": "markdown", + "id": "7a8a5583-362f-4d11-8b3a-17b20637f517", + "metadata": {}, + "source": [ + "

\n", + "

Note: Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: 0 0

" + ] + }, + { + "cell_type": "markdown", + "id": "7e7e5046-3c5f-4f6d-aeaf-60028655ff13", + "metadata": {}, + "source": [ + "
\n", + "

Here, we import the required libraries, set environment variables and environment paths (if required).

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "83dc0922-f932-4378-a505-3bb2d1f1243b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import json, warnings\n", + "import getpass\n", + "from teradataml import *\n", + "from teradatagenai import TeradataAI, TextAnalyticsAI, VSManager, VectorStore, VSApi\n", + "\n", + "\n", + "# Set display options for dataframes, plots, and warnings\n", + "%matplotlib inline\n", + "warnings.filterwarnings('ignore')\n", + "display.max_rows = 5\n", + "pd.set_option('display.max_colwidth', None)\n", + "display.suppress_vantage_runtime_warnings = True" + ] + }, + { + "cell_type": "markdown", + "id": "768bf2ed-ae11-4969-b20a-88496e4a2b67", + "metadata": {}, + "source": [ + "
\n", + "1. Connect to Vantage\n", + "

Connection information has been defined in an external file - adjust as necessary.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "454a2e81-c377-4058-9e68-78abd801ad9c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=text_analytics_teradatagenai_aws_huggingface.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "markdown", + "id": "c62b56de-5a86-4c82-9d47-17d7a3314974", + "metadata": { + "tags": [] + }, + "source": [ + "
\n", + "

2. Set up the LLM connection

\n", + "\n", + "

The teradatagenai python library can both connect to cloud-based LLM services as well as instantiate private models running at scale on local GPU compute. In this case we will use anthropoc claude-instant-v1 for low-cost, high-throughput tasks.

\n", + "\n", + "
    \n", + "
  1. aws_access_key_id: Enter your AWS access key ID
  2. \n", + "
  3. aws_secret_access_key: Enter your AWS secret access key
  4. \n", + "
  5. region name: Enter the AWS region you want to configure (e.g., us-east-1)
  6. \n", + "
      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49e92709-9a74-4372-a57a-7b078fe2ae2c", + "metadata": {}, + "outputs": [], + "source": [ + "access_key = getpass.getpass('aws_access_key_id: ')\n", + "secret_key = getpass.getpass('aws_secret_access_key: ')\n", + "region_name = getpass.getpass('region name: ')" + ] + }, + { + "cell_type": "markdown", + "id": "6ae3eee2-4c49-4b5a-921f-9a310b2796e1", + "metadata": {}, + "source": [ + "
      \n", + "

      3. Use the TextAnalyticsAI API to Perform Various Text Analytics Tasks

      \n", + "

      You can execute the help function at the bottom of this notebook to read more about this API.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d47859af-aebc-41c7-a3a1-919dccf57584", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Provide model details\n", + "model_name=\"anthropic.claude-v2\"\n", + "\n", + "# Select in-database or external model\n", + "llm = TeradataAI(api_type = 'AWS',\n", + " model_name = model_name,\n", + " region = region_name,\n", + " # authorization = 'Repositories.BedrockAuth'\n", + " access_key = access_key,\n", + " secret_key = secret_key)\n", + "\n", + "obj = TextAnalyticsAI(llm=llm)" + ] + }, + { + "cell_type": "markdown", + "id": "69cdee3e-feb5-4cfd-b8d5-99baadcde44e", + "metadata": {}, + "source": [ + "
      \n", + "4. Using LLM for Sentiment Analysis, Topic Modeling and Complaint Summarization\n", + "\n", + "

      Sentiment Analysis, Topic Modeling and Complaint Summarization using Large Language Models (LLMs) revolutionizes the way we understand and categorize vast collections of text data. LLMs excel in understanding the semantics and context of words, enabling sophisticated topic modeling techniques.

      \n", + "\n", + "

      Sentiment Analysis Using Large Language Models (LLMs) is a cutting-edge approach to understanding customer opinions and emotions expressed through text-based data. This advanced technique leverages the capabilities of LLMs to accurately identify and categorize sentiment as positive, negative, or neutral, providing businesses with valuable insights into customer perceptions and preferences.

      \n", + "\n", + "

      LLMs can generate coherent topics without needing predefined categories, making them ideal for exploratory analysis of diverse datasets. Moreover, their ability to capture subtle nuances in language allows for more precise topic identification, even in noisy or ambiguous texts.

      \n", + "\n", + "
      \n", + "

      4.1 Inspect source data

      \n", + "\n", + "

      The Teradata python package (teradataml) allows users to work with data using common python syntax and methods without moving data to the client - all operations are pushed to the MPP platform, allowing rapid, performant analytics on data at any scale. In this case, the DataFrame object represents a table or query in-database which could contain millions or billions of records.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3506e344-bf92-44f4-9cee-7d85eac3290b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "customer_data = DataFrame('\"DEMO_ComplaintAnalysis\".\"Customer_360_Details\"')\n", + "customer_data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fb0b5121-6071-42cd-a928-af0116094289", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "complaints_data = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Customer_360_Complaints'))\n", + "complaints_data" + ] + }, + { + "cell_type": "markdown", + "id": "85336f1f-88af-4a03-ba6d-23a69df6a881", + "metadata": {}, + "source": [ + "
      \n", + "

      4.2 Sentiment Analysis

      \n", + "\n", + "

      Extract the sentiment (positive, negative, neutral) using in-database functions that can execute in-platform or call out to Large Language Models of choice.

      " + ] + }, + { + "cell_type": "markdown", + "id": "d1eb6496-37ca-4060-82b9-4137b27a733d", + "metadata": {}, + "source": [ + "

      A simple method call will extract the sentiment for patient comments in-database using the desired LLM and CSP provider.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7da87396-1753-49ec-a9c2-cf4eed6ad304", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf_sentiment = obj.analyze_sentiment(column = 'consumer_complaint_narrative', \n", + " data = complaints_data)[['Customer_ID','Sentiment','consumer_complaint_narrative']]\n", + "tdf_sentiment" + ] + }, + { + "cell_type": "markdown", + "id": "c023695d-eb93-4a57-bb26-95d45b13152a", + "metadata": {}, + "source": [ + "
      \n", + "

      4.3 Topic Modeling

      \n", + "\n", + "\n", + "

      LLMs can generate coherent topics without needing predefined categories, making them ideal for exploratory analysis of diverse datasets. Moreover, their ability to capture subtle nuances in language allows for more precise topic identification, even in noisy or ambiguous texts. In this case, we are looking for specific topics to drive downstream analytics.

      \n", + "

      Provide a list of topics to use for classification.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f93540a7-731a-4cfb-a047-a8dbd3241b32", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf_topics = obj.classify(column = 'consumer_complaint_narrative', \n", + " data = complaints_data,\n", + " labels = ['Mortgage Application',\n", + " 'Payment Trouble',\n", + " 'Mortgage Closing',\n", + " 'Report Inaccuracy',\n", + " 'Payment Struggle'])[['Customer_ID','Labels','consumer_complaint_narrative']]\n", + "tdf_topics" + ] + }, + { + "cell_type": "markdown", + "id": "394cf875-57a4-4f2a-8c5e-e4fa6424f037", + "metadata": {}, + "source": [ + "
      \n", + "

      4.3 Summarization

      \n", + "\n", + "\n", + "

      The summarize method uses the model to summarize the text in the specified column of a database table. It generates an abstractive summary for the input using different levels. The conciseness of the summary can be adjusted using different levels. Higher levels yield more concise summaries.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d3bd5286-1dd5-4e59-8f36-edbfcde946bf", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf_summary = obj.summarize(column = 'consumer_complaint_narrative', \n", + " data = complaints_data,\n", + " levels = 2 # higher values provide more concise summary\n", + " )[['Customer_ID','Summary','consumer_complaint_narrative']]\n", + "\n", + "tdf_summary" + ] + }, + { + "cell_type": "markdown", + "id": "6830c6a0-a260-4c0a-8643-d68fa5509d68", + "metadata": {}, + "source": [ + "
      \n", + "5. Consolidated data in-database Customer360\n", + "\n", + "

      The developer can now perform simple joins on the data in-database to provide a consolidated view of the complaint summary, sentiment, topic label, and customer information.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72d12627-7440-4988-bab3-cd4811474f18", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf_combined = customer_data.join(tdf_topics.drop('consumer_complaint_narrative', axis = 1), on = ['\"Customer Identifier\" = Customer_ID']).drop('Customer_ID', axis = 1)\n", + "tdf_combined = tdf_combined.join(tdf_summary.drop('consumer_complaint_narrative', axis = 1), on = ['\"Customer Identifier\" = Customer_ID']).drop('Customer_ID', axis = 1)\n", + "tdf_combined = tdf_combined.join(tdf_sentiment.drop('consumer_complaint_narrative', axis = 1), on = ['\"Customer Identifier\"= Customer_ID']).drop('Customer_ID', axis = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed87a62b-a38a-4acb-9e94-51475866afbd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf_combined[['Customer Identifier','Name','Sentiment','Labels','Summary']]" + ] + }, + { + "cell_type": "markdown", + "id": "06cbb0f4-d026-4a0b-8ab8-6982b7f7777a", + "metadata": {}, + "source": [ + "
      \n", + "

      5.1 Persist the dataset

      \n", + "

      Simple python methods will materialize the data to a permanent table if desired.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2aae2e32-0a4b-4fef-9e65-501c41f98564", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "copy_to_sql(tdf_combined, table_name = 'Customer360', temporary = True, if_exists = 'replace')" + ] + }, + { + "cell_type": "markdown", + "id": "561ff317-6468-4941-bf9b-840849bfb09d", + "metadata": {}, + "source": [ + "
      \n", + "6. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "476fb82b-d7a0-4e29-a6d4-8372a247cea8", + "metadata": {}, + "source": [ + "

      Databases and Tables

      \n", + "

      The following code will clean up tables and databases created above.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "358f56a0-ad97-4317-ab4c-88be00b8d179", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "6cf670cd-4594-458d-af98-8efff5a72f73", + "metadata": {}, + "source": [ + "
      \n", + "Dataset:\n", + "
      \n", + "
      \n", + "

      The dataset is sourced from Consumer Financial Protection Bureau

      " + ] + }, + { + "cell_type": "markdown", + "id": "eeebf3ab-357c-488e-ba9d-78bf82f4d0dd", + "metadata": {}, + "source": [ + "
      \n", + "
      ClearScape Analytics™
      \n", + "
      \n", + "
      \n", + " Copyright © Teradata Corporation - 2024, 2025. All Rights Reserved\n", + "
      \n", + "
      \n", + "
      " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaint_Summarization.ipynb b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaint_Summarization.ipynb new file mode 100644 index 00000000..84a2b743 --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaint_Summarization.ipynb @@ -0,0 +1,1594 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1f44f3bc-51cc-47e3-a033-f2883ed97408", + "metadata": {}, + "source": [ + "
      \n", + "

      \n", + " Complaints Summarization Using Vantage and LLM model\n", + "
      \n", + " \"Teradata\"\n", + "

      \n", + "
      " + ] + }, + { + "cell_type": "markdown", + "id": "62ff194b-33c1-4c2f-9c5f-042cb29e3dc6", + "metadata": {}, + "source": [ + "

      Introduction:

      \n", + "\n", + "

      In this demo we'll deep dive on Complaints Summarization using Teradata Vantage and AWS Bedrock - Anthropic's Claude LLM model model. This cutting-edge solution empowers organizations to efficiently manage and analyze customer complaints, providing actionable insights to enhance customer satisfaction and improve business operations.

      \n", + "\n", + "

      Key Features:

      \n", + "\n", + "
        \n", + "
      1. AI-Powered Summarization: Utilizing advanced natural language processing (NLP) and machine learning algorithms, the system automatically summarizes complaints, identifying key issues, sentiment, and root causes.
      2. \n", + "
      3. Real-Time Analytics: The platform provides real-time analytics and visualization tools, enabling users to track complaint trends, sentiment analysis, and issue resolution rates.
      4. \n", + "
      5. Customizable Dashboards: Users can create personalized dashboards to monitor specific complaint categories, product lines, or geographic regions, ensuring targeted insights and swift action.
      6. \n", + "
      7. Integration with AWS Bedrock - Anthropic's Claude LLM model: Seamless integration with Teradata Vantage and AWS Bedrock - Anthropic's Claude LLM model models enables users to leverage the power of cloud-based infrastructure and advanced analytics capabilities.
      8. \n", + "
      \n", + "\n", + "

      Benefits:

      \n", + "
        \n", + "
      1. Enhanced Customer Experience: By quickly identifying and addressing customer concerns, organizations can improve customer satisfaction and loyalty.
      2. \n", + "
      3. Operational Efficiency: Automated complaint summarization and analytics reduce manual processing time, allowing teams to focus on issue resolution and strategic decision-making.
      4. \n", + "
      5. Data-Driven Decision-Making: The platform provides actionable insights, enabling organizations to make informed decisions and drive business growth.
      6. \n", + "
      \n", + "\n", + "

      Steps in the analysis:

      \n", + "
        \n", + "
      1. Configuring the environment
      2. \n", + "
      3. Connect to Vantage
      4. \n", + "
      5. Configuring AWS Bedrock - Anthropic's Claude LLM model
      6. \n", + "
      7. Complaints Summarization
      8. \n", + "
      9. Cleanup
      10. \n", + "
      " + ] + }, + { + "cell_type": "markdown", + "id": "6b08d1de-6e5f-4294-993f-3ba54b7f41c3", + "metadata": {}, + "source": [ + "
      \n", + "1. Configuring the environment\n", + "
      \n", + "

      1.1 Downloading and installing additional software needed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ebc0a97b-78f8-4274-9844-d3b1d31848e6", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install -r requirements.txt --upgrade --quiet" + ] + }, + { + "cell_type": "markdown", + "id": "54b6bcb4-ea98-42f9-9088-b15b4ddd03bc", + "metadata": {}, + "source": [ + "

      \n", + "

      Note: Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: 0 0

      \n", + "
      " + ] + }, + { + "cell_type": "markdown", + "id": "a6f5da05-f559-4158-845f-056ab67bce20", + "metadata": {}, + "source": [ + "
      \n", + "

      1.2 Import the required libraries

      \n", + "

      Here, we import the required libraries, set environment variables and environment paths (if required).

      " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "83dc0922-f932-4378-a505-3bb2d1f1243b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Data manipulation and analysis\n", + "import numpy as np\n", + "import pandas as pd\n", + "import getpass\n", + "\n", + "# Plotting\n", + "import plotly.express as px\n", + "\n", + "# Progress bar\n", + "from tqdm import tqdm\n", + "\n", + "# Machine learning and other utilities from Teradata\n", + "from teradataml import *\n", + "from teradatagenai import TeradataAI, TextAnalyticsAI, VSManager, VectorStore, VSApi\n", + "\n", + "# Requests\n", + "import requests\n", + "\n", + "# Display settings\n", + "display.max_rows = 5\n", + "pd.set_option('display.max_colwidth', None)\n", + "\n", + "# Set display options for dataframes, plots, and warnings\n", + "%matplotlib inline\n", + "warnings.filterwarnings('ignore')\n", + "display.suppress_vantage_runtime_warnings = True" + ] + }, + { + "cell_type": "markdown", + "id": "768bf2ed-ae11-4969-b20a-88496e4a2b67", + "metadata": {}, + "source": [ + "
      \n", + "2. Connect to Vantage\n", + "

      We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.

      " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "454a2e81-c377-4058-9e68-78abd801ad9c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Checking if this environment is ready to connect to VantageCloud Lake...\n", + "Your environment parameter file exist. Please proceed with this use case.\n", + "Connected to VantageCloud Lake with: Engine(teradatasql://CH255039:***@54.156.178.22)\n" + ] + } + ], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=text_analytics_teradatagenai_aws_huggingface.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "markdown", + "id": "aed444a1-f0de-4bff-b0b9-d2c4e92573f5", + "metadata": {}, + "source": [ + "

      Begin running steps with Shift + Enter keys.

      " + ] + }, + { + "cell_type": "markdown", + "id": "8f83a8eb-1bd1-4bdb-ab20-cb02bf1ac869", + "metadata": {}, + "source": [ + "
      \n", + "

      2. Set up the LLM connection

      \n", + "\n", + "

      The teradatagenai python library can both connect to cloud-based LLM services as well as instantiate private models running at scale on local GPU compute. In this case we will use anthropoc claude-instant-v1 for low-cost, high-throughput tasks.

      \n", + "\n", + "
        \n", + "
      1. aws_access_key_id: Enter your AWS access key ID
      2. \n", + "
      3. aws_secret_access_key: Enter your AWS secret access key
      4. \n", + "
      5. region name: Enter the AWS region you want to configure (e.g., us-east-1)
      6. \n", + "
          " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "145fbb61-8ee8-4b4b-9b68-0748eb0f0f90", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdin", + "output_type": "stream", + "text": [ + "aws_access_key_id: ····················\n", + "aws_secret_access_key: ········································\n", + "region name: ·········\n" + ] + } + ], + "source": [ + "access_key = getpass.getpass('aws_access_key_id: ')\n", + "secret_key = getpass.getpass('aws_secret_access_key: ')\n", + "region_name = getpass.getpass('region name: ')" + ] + }, + { + "cell_type": "markdown", + "id": "5a1d0e66-4c3e-4784-9674-961642226529", + "metadata": {}, + "source": [ + "
          \n", + "

          3. Use the TextAnalyticsAI API to Perform Various Text Analytics Tasks

          \n", + "

          You can execute the help function at the bottom of this notebook to read more about this API.

          " + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "00636e83-ffec-4415-9309-4b025bbb0276", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Provide model details\n", + "model_name=\"anthropic.claude-v2\"\n", + "\n", + "# Select in-database or external model\n", + "llm = TeradataAI(api_type = 'AWS',\n", + " model_name = model_name,\n", + " region = region_name,\n", + " # authorization = 'Repositories.BedrockAuth'\n", + " access_key = access_key,\n", + " secret_key = secret_key)\n", + "\n", + "obj = TextAnalyticsAI(llm=llm)" + ] + }, + { + "cell_type": "markdown", + "id": "596cfbe5-e6d7-4b14-9d46-08929397e1a3", + "metadata": {}, + "source": [ + "
          \n", + "4. Complaints summarization\n", + "

          Complaints summarization with Language Model (LLM) models involves condensing lengthy complaints into concise, informative summaries. By leveraging advanced natural language processing techniques, LLMs efficiently extract key issues, sentiments, and resolutions, aiding in quicker understanding and response to customer grievances.

          \n", + "\n", + "

          Streamlining the complaint summarization process, Language Model (LLM) models efficiently distill verbose grievances into concise, yet informative summaries. These summaries meticulously capture crucial elements including primary issues, prevalent sentiments, and possible resolutions. Harnessing advanced natural language processing capabilities, LLMs accelerate both comprehension and response to customer concerns, thereby elevating operational efficiency and bolstering overall customer satisfaction.

          " + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "30ac5e3a-9264-46f5-94eb-76e813503ac6", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "df = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Consumer_Complaints'))" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "0174db51-edcc-42d2-b251-c3171648a0e2", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf_summary = obj.summarize(column = 'consumer_complaint_narrative', \n", + " data = df.iloc[:5],\n", + " levels = 2 # higher values provide more concise summary\n", + " )[['complaint_id','Summary','consumer_complaint_narrative']]" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "a33a52e0-fe44-4c6a-aeef-af8245a28b1d", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "882ab327761a4339a82d3574e992f984", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "\n", + "\t\n", + "\n", + "\n", + "\t\n", + "\t\t\n", + "\t\t\n", + "\t\t\n", + "\t\n", + "\t\n", + "\t\t\n", + "\t\t\n", + "\t\t\n", + "\t\n", + "\t\n", + "\t\t\n", + "\t\t\n", + "\t\t\n", + "\t\n", + "\t\n", + "\t\t\n", + "\t\t\n", + "\t\t\n", + "\t\n", + "\t\n", + "\t\t\n", + "\t\t\n", + "\t\t\n", + "\t\n", + "
          complaint_idSummaryconsumer_complaint_narrative
          1295007 'Discover Card payments were withdrawn over a month after being posted, causing frustration.'On XXXX XXXX and XXXX, 2015 ( as well several phone calls and chat sessions with Discover Card customer service ), I made inquiries to why a payment of {$61.00} that was posted to my Discover Card account on XXXX XXXX, 2015, did not have the funds withdrawn from my financial institution, XXXX XXXX XXXX XXXX XXXX in XXXX, NC, until XXXX XXXX, 2015. Nobody at either Discover Card or XXXX XXXX ' XXXX XXXX could give me a direct answer. Instead, I got the \" pass the buck '' routine. Neither financial institution should be allowed to treat customers this way. I have documentation available as proof of these events taking place.
          1294108 A Discover cardholder had his long-term account unexpectedly closed, losing his rewards, and though promised the rewards by check, has yet to receive it after 6 weeks.I have been a Discover credit card holder since 2007. During my entire membership with Discover, I was never late with payments, and always stayed under my credit line, and never charged my credit card for any purposes other than making a legitimate purchase. \n", + "\n", + "About a month ago, without any notice in advance, Discover closed my account and thereby wiped out my existing cashback rewards of {$300.00}. I contacted the company and demanded for an explanation. However, the only reason I got is \" we are no longer able to meet your servicing needs '', and I was told the rewards will be mailed to me in a check. Now, after 6 weeks, I still have n't received any check from Discover regarding my rewards. \n", + "\n", + "I respectfully urge the CFPB to take this matter seriously and to look into this case. We consumers are powerless to protect ourselves from discriminatory actions like this.
          1294987 'I am falsely accused of owing debt for an unused credit card.'I am being accused of having a Discover Card debt that I did n't pay off, so Discover has turned over the account to another company ; XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX to harass me. I knew nothing about this, I 've NEVER used a credit card, the name is not even my legal name, now a sheriff served my mom papers where this attorney office is threatening to seize my properties! This is so wrong to do to people! I 'VE NEVER OPENED A CREDIT CARD!!!!
          1294888 'In XXXX my checking account was compromised resulting in additional unexpected interest charges despite sending regular payments.'In XXXX XXXX my checking acct was compromised that my credit card payment was auto-pay through. Since then I made monthly money order payments. My balance in XX/XX/XXXX was {$2000.00}. I sent 4 ) {$200.00} payments. I do NOT use the card for purchases & the interest is about {$3.00} a month. My balance is still {$1400.00}. I am clearly not credited for my XX/XX/XXXX payment. My bill went up that month instead of down. They insist that I was credited. There is some confusion because they repeatedly tried getting the payment through the bank, even though I told them not to. So they credited the bank payment then added it back on repeatedly. But it should n't take a genius to do the math. {$2000.00} minus {$800.00} is {$1200.00} not {$1400.00}. I called twice & wrote a letter & keep getting the same answer.
          1294631 'XXXX did not fulfill a promotion although the customer satisfied the terms.'XXXX offered a {$100.00} gift card when applying for aDiscoverXXXX Card. I clicked on the link, applied and compliedwith the requirements. The requirement was to make XXXXpurchase with the card within 3 months. It did not say thatthe purchase had to be with XXXX. After a few months, Icontacted the bank, and they said they knew nothing of theoffer. I contacted them again and the management saidI did not apply for the right card. XXXX denied anyknowledge of the offer, although it was on their website. \n", + "I would not have applied for the card, but for the offer.
          " + ], + "text/plain": [ + " complaint_id Summary consumer_complaint_narrative\n", + "0 1295007 'Discover Card payments were withdrawn over a month after being posted, causing frustration.' On XXXX XXXX and XXXX, 2015 ( as well several phone calls and chat sessions with Discover Card customer service ), I made inquiries to why a payment of {$61.00} that was posted to my Discover Card account on XXXX XXXX, 2015, did not have the funds withdrawn from my financial institution, XXXX XXXX XXXX XXXX XXXX in XXXX, NC, until XXXX XXXX, 2015. Nobody at either Discover Card or XXXX XXXX ' XXXX XXXX could give me a direct answer. Instead, I got the \" pass the buck '' routine. Neither financial institution should be allowed to treat customers this way. I have documentation available as proof of these events taking place.\n", + "1 1294108 A Discover cardholder had his long-term account unexpectedly closed, losing his rewards, and though promised the rewards by check, has yet to receive it after 6 weeks. I have been a Discover credit card holder since 2007. During my entire membership with Discover, I was never late with payments, and always stayed under my credit line, and never charged my credit card for any purposes other than making a legitimate purchase. \\n\\nAbout a month ago, without any notice in advance, Discover closed my account and thereby wiped out my existing cashback rewards of {$300.00}. I contacted the company and demanded for an explanation. However, the only reason I got is \" we are no longer able to meet your servicing needs '', and I was told the rewards will be mailed to me in a check. Now, after 6 weeks, I still have n't received any check from Discover regarding my rewards. \\n\\nI respectfully urge the CFPB to take this matter seriously and to look into this case. We consumers are powerless to protect ourselves from discriminatory actions like this.\n", + "2 1294987 'I am falsely accused of owing debt for an unused credit card.' I am being accused of having a Discover Card debt that I did n't pay off, so Discover has turned over the account to another company ; XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX to harass me. I knew nothing about this, I 've NEVER used a credit card, the name is not even my legal name, now a sheriff served my mom papers where this attorney office is threatening to seize my properties! This is so wrong to do to people! I 'VE NEVER OPENED A CREDIT CARD!!!!\n", + "3 1294888 'In XXXX my checking account was compromised resulting in additional unexpected interest charges despite sending regular payments.' In XXXX XXXX my checking acct was compromised that my credit card payment was auto-pay through. Since then I made monthly money order payments. My balance in XX/XX/XXXX was {$2000.00}. I sent 4 ) {$200.00} payments. I do NOT use the card for purchases & the interest is about {$3.00} a month. My balance is still {$1400.00}. I am clearly not credited for my XX/XX/XXXX payment. My bill went up that month instead of down. They insist that I was credited. There is some confusion because they repeatedly tried getting the payment through the bank, even though I told them not to. So they credited the bank payment then added it back on repeatedly. But it should n't take a genius to do the math. {$2000.00} minus {$800.00} is {$1200.00} not {$1400.00}. I called twice & wrote a letter & keep getting the same answer.\n", + "4 1294631 'XXXX did not fulfill a promotion although the customer satisfied the terms.' XXXX offered a {$100.00} gift card when applying for aDiscoverXXXX Card. I clicked on the link, applied and compliedwith the requirements. The requirement was to make XXXXpurchase with the card within 3 months. It did not say thatthe purchase had to be with XXXX. After a few months, Icontacted the bank, and they said they knew nothing of theoffer. I contacted them again and the management saidI did not apply for the right card. XXXX denied anyknowledge of the offer, although it was on their website. \\nI would not have applied for the card, but for the offer." + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tdf_summary" + ] + }, + { + "cell_type": "markdown", + "id": "0d8ace4c-b948-413a-bdf7-53fa20088326", + "metadata": {}, + "source": [ + "
          \n", + "

          4.1 Graph for Complaint and Summary Lengths

          A graph illustrating the Narrative length vs summary length. On the x-axis, you'd have \"Narrative length\" ranging from short to long complaints or narratives. On the y-axis, you'd have \"Summary length\" ranging from brief to detailed summaries. As narrative length increases, summary length would generally decrease, indicating the summarization process effectively condenses longer narratives into shorter summaries. This relationship would likely follow a downward trend, showcasing the summarization efficiency of the LLM models.

          " + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "3f8191c6-4ab5-4b6c-9a51-f8cbca2c9aba", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + " \n", + " " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "customdata": [ + [ + 1294987, + "I am being accused of having a Discover Card debt ...", + " 'I am falsely accused of owing debt for an unused..." + ], + [ + 1294631, + "XXXX offered a {$100.00} gift card when applying f...", + " 'XXXX did not fulfill a promotion although the cu..." + ], + [ + 1295007, + "On XXXX XXXX and XXXX, 2015 ( as well several phon...", + " 'Discover Card payments were withdrawn over a mon..." + ], + [ + 1294888, + "In XXXX XXXX my checking acct was compromised that...", + " 'In XXXX my checking account was compromised resu..." + ], + [ + 1294108, + "I have been a Discover credit card holder since 20...", + " A Discover cardholder had his long-term account u..." + ] + ], + "hovertemplate": "Narrative Length=%{x}
          Summary Length=%{y}
          complaint_id=%{customdata[0]}
          truncated_narrative=%{customdata[1]}
          truncated_summary=%{customdata[2]}", + "legendgroup": "", + "marker": { + "color": "#636efa", + "symbol": "circle" + }, + "mode": "markers", + "name": "", + "orientation": "v", + "showlegend": false, + "type": "scatter", + "x": [ + 455, + 563, + 630, + 815, + 879 + ], + "xaxis": "x", + "y": [ + 64, + 78, + 94, + 132, + 168 + ], + "yaxis": "y" + } + ], + "layout": { + "autosize": true, + "legend": { + "tracegroupgap": 0 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "title": { + "text": "Complaint and Summary Lengths" + }, + "xaxis": { + "anchor": "y", + "autorange": true, + "domain": [ + 0, + 1 + ], + "range": [ + -0.23926380368098163, + 4.2392638036809815 + ], + "title": { + "text": "Narrative Length" + }, + "type": "category" + }, + "yaxis": { + "anchor": "x", + "autorange": true, + "domain": [ + 0, + 1 + ], + "range": [ + 55.41747572815534, + 176.58252427184465 + ], + "title": { + "text": "Summary Length" + }, + "type": "linear" + } + } + }, + "image/png": "iVBORw0KGgoAAAANSUhEUgAABOcAAAFoCAYAAAAcvJ8bAAAAAXNSR0IArs4c6QAAIABJREFUeF7t3X+MZdVhJ/hTNLGbhsEoDQyBxW6bjViGYZK2EpyFKDEZB2tZYyeKsBYtvWEXCBLm19KSna6xolaSKQISPTC0UTCwYdwZtQIzirETtCaOSUaGXZMfnSzDWmzGQzsMPd62O2Oz0LRtunt1H7nlV7deVb165557z3vnU/9AV73z437Oee+c+tb9MXf8+PHjwRcBAgQIECBAgAABAgQIECBAgAABAp0LzAnnOjfXIAECBAgQIECAAAECBAgQIECAAIGBgHDORCBAgAABAgQIECBAgAABAgQIECDQk4Bwrid4zRIgQIAAAQIECBAgQIAAAQIECBAQzpkDBAgQIECAAAECBAgQIECAAAECBHoSEM71BK9ZAgQIECBAgAABAgQIECBAgAABAsI5c4AAAQIECBAgQIAAAQIECBAgQIBATwLCuZ7gNUuAAAECBAgQIECAAAECBAgQIEBAOGcOECBAgAABAgQIECBAgAABAgQIEOhJQDjXE7xmCRAgQIAAAQIECBAgQIAAAQIECAjnzAECBAgQIECAAAECBAgQIECAAAECPQkI53qC1ywBAgQIECBAgAABAgQIECBAgAAB4Zw5QIAAAQIECBAgQIAAAQIECBAgQKAnAeFcT/CaJUCAAAECBAgQIECAAAECBAgQICCcMwcIECBAgAABAgQIECBAgAABAgQI9CQgnOsJXrMECBAgQIAAAQIECBAgQIAAAQIEhHPmAAECBAgQIECAAAECBAgQIECAAIGeBIRzPcFrlgABAgQIECBAgAABAgQIECBAgIBwzhwgQIAAAQIECBAgQIAAAQIECBAg0JOAcK4neM0SIECAAAECBAgQIECAAAECBAgQEM6ZAwQIECBAgAABAgQIECBAgAABAgR6EhDO9QSvWQIECBAgQIAAAQIECBAgQIAAAQLCOXOAAAECBAgQIECAAAECBAgQIECAQE8Cwrme4DVLgAABAgQIECBAgAABAgQIECBAQDhnDhAgQIAAAQIECBAgQIAAAQIECBDoSUA41xO8ZgkQIECAAAECBAgQIECAAAECBAgI58wBAgQIECBAgAABAgQIECBAgAABAj0JCOd6gtcsAQIECBAgQIAAAQIECBAgQIAAAeGcOUCAAAECBAgQIECAAAECBAgQIECgJwHhXE/wmiVAgAABAgQIECBAgAABAgQIECAgnDMHCBAgQIAAAQIECBAgQIAAAQIECPQkIJzrCV6zBAgQIECAAAECBAgQIECAAAECBIRz5gABAgQIECBAgAABAgQIECBAgACBngSEcz3Ba5YAAQIECBAgQIAAAQIECBAgQICAcM4cIECAAAECBAgQIECAAAECBAgQINCTgHCuJ3jNEiBAgAABAgQIECBAgAABAgQIEBDOmQMECBAgQIAAAQIECBAgQIAAAQIEehIQzvUE32azj+x9Mjz8r/8gPHzPx8OF529ps+qprqtyefzzfxL23D8fzth82lQfi873K/DCi/vD9dvvDtf/jx8K1119Rb+d0ToBAgQIECBAgAABAgQIzJSAcO7vh/Obh74dtt2yEF4+cHDJAH/kg5eGhR03ZD3oMeFcVXbXg4+F3Qu3hcsu2bqu48w9sFhvOPf0s/vCzfP3LTO448aPCmTWNTOWv/jwG0cWbau5tumkjZE1tl+8/gx470U/uuw9n/tcb19DjQQIECBAgAABAgQIECDQlYBwLoRQB1TNIK7+Zb0ajJzPvhLOjX67rCecm7/zofDEF55ZFlKuNDe6eoPOSjvCuVkZScdBgAABAgQIECBAgAABAm0LFB/O1WdLrXZ21L985N+Gq3/hn2Z7aWRMOBczoXI/m2jccG6t46hC2r2f/eNw63W/FMNVdFnhXNHD7+AJECBAgAABAgQIECBAYBWBosO5OjA48P8eWteZcfXZVLXr+7ZeMDjjavhSvToY2rXzpnDHzgcWL5c99+wzB20d/Na3B/ewevW1w4Nq6u/X90YbvsTu53/2J5ZcbtkMEkeFc80+Vm2cesqmZfela5Ydt9060Kr7X1usdRnwSpeONi+rHQ7MqrqrS2/rr1GX4I66LPmCH31XeO31N9Yc27pP417aW/dtYf6GJZcC13246sr3L14G29Y8qOpsOtTzoD7rr/Zpzo9JzN/zrh9ZnHP//T99X/jW3706qL45z8cN3cZ9XdVG/dqv7Pvq4pivNOfv+41bw29/5olQv7b5PqoraM7X6r1w6j84OZxz1umDY3r98JGRl7XX7+2X/vYbi/ecG2c+NsdkpX5ZnQgQIECAAAECBAgQIECAQNHhXP0L+2WXbh37vnLVL91/+fzfLAY+KwV8dTg2/Ev5cIA06vvD97oafu1w4DXqLK+Vwrlqeg/fvH6l1w0/TGKldkcFWGudcTbq7VXV80d/+udLvFeruwr/hoOZUcew0jg2x2qlt3td/h2nnrJmkFfVsd5wrgoW25gHTYc6sFzLZxLzZsi6UoC5kkXTetxwbtRYjroXXP3+Gg6cV2pj1Nmxo147zj3nxpmPK31GjAo3LUEECBAgQIAAAQIECBAgQEA4t44nMK4UUKwUHox6UuhKl1o2v79aUND85X/cy1pXOrNrVDjXvCl+HWacdeYPLwZrk4Rzo95y66l71DFUHt84+Hcrnr04zv0Cxz27bJJwro15MHw2XtWHUQ6rfX+lsGzc8Rw1RlWd4wag44Zzq43l8Dxdac5X4zi/8NDiGaIr9bvu+/C8GSecaz6ttTkOq9Xxe098KVx5+SVZPgzDUkiAAAECBAgQIECAAAEC/QkI59YRzq0UrI0KHsYN4eqhX0841wwmVgoqVrr0dLUzrVYKF0Yd46Th3DhPxl2p7mb/VgtDxr3n3PDbb6W+DV/yut4z53II52LMh+foqCC3GRyuFsBWP1vpaa0rBY6jAtHVwrnqibv1eK02R5tBYEw4V4fZw5fkespvfwublgkQIECAAAECBAgQIDBNAsK57XeHcS9rXe0soVFns7URyjTPYKsmV/PsoFFBRX3Pq+GAoI0z54bDlUnCuVFPP53kzLnaZbU+TBLONd+8o57YO23hXKx5bdKcP+vxHefMuZXC5OExqUO3ccO51e4nmCKcq/o66p55o+73OE0Lhb4SIECAAAECBAgQIECAQDqBosO59T4QYlrOnFvrrLPmAwvWc1lrTDi31ll541xiudKZc6PO3lpPeLTaW6wZBE1TONeG+bBNbfrbd90Rfn3XvwqXXnzRkvsaruQ4Tji32plzzXrHDedWuydeqnCu2df13tMw3ce9mgkQIECAAAECBAgQIEAgR4Giw7lqQEbdLL45UH/wxf8jVE9t/PcvvjR4gmXzqZ593HNu+F5ZzaBirXvjtRXOrSdMqUxXCg1jzpxbLfQZN5yrvKqvyy7ZOvI92jwrcr3hZxtnUE56z7k2zIdRhh/Y8PQz+5Y9/TcmnBsnwKvrHzecW+vejcPvo9XuT7fWmNdnclbtVU+O/dAH/tslFOPeFzLHRUKfCBAgQIAAAQIECBAgQCCtQPHhXMU76rK/6vujLmmsgprhUGK1p7W2Eco0L2ut+zocEDZ/8R8VSAzfc6yte86tJ0ypPEe9fvgSwLWeSjs8JsMuo8LI+nvDT0ld6a202mvry4OHvVd70ufLBw4ue7psG/Ng0nCuDfOmW23SfKLrah9V486VOgS74EffteTedFX5X134dLjxmg+HC8/fMnjPDp/xWbc9ai6Mem39PqpC9+F74K106fp6wrlttyyE4fk57rGn/ahXOwECBAgQIECAAAECBAjkKiCc+/uRGeeG+fUg1r/Y1/9u/oJfB35thDJV2DP8NereVaPCh+bxVCHVrp03hTt2PhDaOnNuOCyr+7lWYNO8H1d1PPf9xq3htz/zRJjkstZmKFP/u+rHeVvOCaPGYNSbcaX7nY0a21HHvZpvG/Ng0nBuOBStzuiqvtZr3vRq3vNwnA+3UfdhGy43HKKu9NrhubWecK5+P+568LHFJquA+mv7X1n2lN9m2/X4v/S33wjXj3h4zKggfNRcWut9MY6h1xAgQIAAAQIECBAgQIDAbAoI5zId19Uux8u0y7pVgMCsnAW22iWsBQyjQyRAgAABAgQIECBAgACBjASEcxkNxnBXhHOZDkzh3Vrt6ae50lR9/qM//fOwsOOGxS5OcvZfrsenXwQIECBAgAABAgQIECAw3QLCuUzHTziX6cAU3K1pPWuuDhSHh26cexEWPNQOnQABAgQIECBAgAABAgQ6FBDOdYitKQIECBAgQIAAAQIECBAgQIAAAQLDAsI584EAAQIECBAgQIAAAQIECBAgQIBATwLCuZ7gNUuAAAECBAgQIECAAAECBAgQIEBAOGcOECBAgAABAgQIECBAgAABAgQIEOhJQDjXE7xmCRAgQIAAAQIECBAgQIAAAQIECAjnzAECBAgQIECAAAECBAgQIECAAAECPQkI53qC1ywBAgQIECBAgAABAgQIECBAgAAB4Zw5QIAAAQIECBAgQIAAAQIECBAgQKAnAeFcT/CaJUCAAAECBAgQIECAAAECBAgQICCcMwcIECBAgAABAgQIECBAgAABAgQI9CQgnOsJXrMECBAgQIAAAQIECBAgQIAAAQIEhHPmAAECBAgQIECAAAECBAgQIECAAIGeBIRzPcFrlgABAgQIECBAgAABAgQIECBAgIBwzhwgQIAAAQIECBAgQIAAAQIECBAg0JOAcK4neM0SIECAAAECBAgQIECAAAECBAgQEM6ZAwQIECBAgAABAgQIECBAgAABAgR6EhDO9QSvWQIECBAgQIAAAQIECBAgQIAAAQLCOXOAAAECBAgQIECAAAECBAgQIECAQE8Cwrme4DVLgAABAgQIECBAgAABAgQIECBAQDhnDhAgQIAAAQIECBAgQIAAAQIECBDoSUA41xO8ZgkQIECAAAECBAgQIECAAAECBAgI58wBAgQIECBAgAABAgQIECBAgAABAj0JCOd6gtcsAQIECBAgQIAAAQIECBAgQIAAAeGcOUCAAAECBAgQIECAAAECBAgQIECgJwHhXE/wmiVAgAABAgQIECBAgAABAgQIECAgnDMHCBAgQIAAAQIECBAgQIAAAQIECPQkIJzrCV6zBAgQIECAAAECBAgQIECAAAECBIRz5gABAgQIECBAgAABAgQIECBAgACBngSEcz3Ba5YAAQIECBAgQIAAAQIECBAgQICAcM4cIECAAAECBAgQIECAAAECBAgQINCTgHCuJ3jNEiBAgAABAgQIECBAgAABAgQIEBDOmQMECBAgQIAAAQIECBAgQIAAAQIEehIQzvUEr1kCBAgQIECAAAECBAgQIECAAAECwjlzgAABAgQIECBAgAABAgQIECBAgEBPAsK5nuA1S4AAAQIECBAgQIAAAQIECBAgQEA4Zw4QIECAAAECBAgQIECAAAECBAgQ6ElAONcTvGYJECBAgAABAgQIECBAgAABAgQICOfMAQIECBAgQIAAAQIECBAgQIAAAQI9CQjneoLXLAECBAgQIECAAAECBAgQIECAAAHhnDlAgAABAgQIECBAgAABAgQIECBAoCcB4VxP8JolQIAAAQIECBAgQIAAAQIECBAgIJwzBwgQIECAAAECBAgQIECAAAECBAj0JCCc6wleswQIECBAgAABAgQIECBAgAABAgSEc+YAAQIECBAgQIAAAQIECBAgQIAAgZ4EhHM9wWuWAAECBAgQIECAAAECBAgQIECAgHDOHCBAgAABAgQIECBAgAABAgQIECDQk4Bwrid4zRIgQIAAAQIECBAgQIAAAQIECBAQzkXOgQOH3oisYTqKn735pFDKsU7HiOglgckF/sFJJ4YwNxf+v8Pfn7wSJQkQyEbAGp3NUOgIgWgBa3Q0oQoIZCVQ0hpdHauvyQWEc5PbDUqWEliV9KESOSUUJ5C9gI1/9kOkgwTWJWCNXheXFxPIWsAanfXw6ByBdQuUtEYL59Y9PZYUEM7F+QnnIv0UJ0CgewEb/+7NtUggpUBJG/+UjuomkIOANTqHUdAHAu0JlLRGC+fi5o1wLs5POBfppzgBAt0L2Ph3b65FAikFStr4p3RUN4EcBKzROYyCPhBoT6CkNVo4FzdvhHNxfsK5SD/FCRDoXsDGv3tzLRJIKVDSxj+lo7oJ5CBgjc5hFPSBQHsCJa3Rwrm4eSOci/MTzkX6KU6AQPcCNv7dm2uRQEqBkjb+KR3VTSAHAWt0DqOgDwTaEyhpjRbOxc0b4Vycn3Au0k9xAgS6F7Dx795ciwRSCpS08U/pqG4COQhYo3MYBX0g0J5ASWu0cC5u3gjn4vyEc5F+ihMg0L2AjX/35lokkFKgpI1/Skd1E8hBwBqdwyjoA4H2BEpao4VzcfNGOBfnJ5yL9FOcAIHuBWz8uzfXIoGUAiVt/FM6qptADgLW6BxGQR8ItCdQ0hotnIubN8K5OD/hXKSf4gQIdC9g49+9uRYJpBQoaeOf0lHdBHIQsEbnMAr6QKAdgSNHQjjh6MZwbMORsHFjO3XmXItwLm50hHNxfsK5SD/FCRDoXsDGv3tzLRJIKSCcS6mrbgLdCliju/XWGoFUAp//wxPCn/3FCYvV/+R7j4UrP3QsVXNZ1CucixsG4Vycn3Au0k9xAgS6F7Dx795ciwRSCgjnUuqqm0C3Atbobr21RiCFwEv758LvfGbDsqp/+Zqj4bz3HE/RZBZ1CufihkE4F+cnnIv0U5wAge4FbPy7N9cigZQCwrmUuuom0K2ANbpbb60RSCHw7758Qvjil35w1lzdxgd+7lj4mZ+e3bPnhHNxs0k4F+cnnIv0U5wAge4FbPy7N9cigZQCwrmUuuom0K2ANbpbb60RSCGw76/mwu9/bvmZc7/44aNh6487cy6F+SzUKZyLHMUDh96IrGE6itv4T8c46SWBcQRs/MdR8hoC0yNgjZ6esdJTAmsJWKPXEvJzAvkLvP56CPfef2L47vd+0Ne3vy2E2295M5x8cv79n7SHzpybVO6tcsK5OD9nzkX6KU6AQPcCNv7dm2uRQEoB4VxKXXUT6FbAGt2tt9YIpBI49K258Ny+ufDadzaEU95xNFy89XjYfPrsnjVXOQrn4maTcC7OTzgX6ac4AQLdC9j4d2+uRQIpBYRzKXXVTaBbAWt0t95aI5BaoKQ1WjgXN5uEc3F+wrlIP8UJEOhewMa/e3MtEkgpUNLGP6WjugnkIGCNzmEU9IFAewIlrdHCubh5I5yL8xPORfopToBA9wI2/t2ba5FASoGSNv4pHdVNIAcBa3QOo6APBNoTKGmNFs7FzRvhXJyfcC7ST3ECBLoXsPHv3lyLBFIKlLTxT+mobgI5CFijcxgFfSDQnkBJa7RwLm7eCOfi/IRzkX6KEyDQvYCNf/fmWiSQUqCkjX9KR3UTyEHAGp3DKOgDgfYESlqjhXNx80Y4F+cnnIv0U5wAge4FbPy7N9cigZQCJW38Uzqqm0AOAtboHEZBHwi0J1DSGi2ci5s3wrk4P+FcpJ/iBAh0L2Dj3725FgmkFChp45/SUd0EchCwRucwCvpAoD2BktZo4VzcvBHOxfkJ5yL9FCdAoHsBG//uzbVIIKVASRv/lI7qJpCDgDU6h1HQBwLtCZS0Rgvn4uaNcC7OTzgX6ac4AQLdC9j4d2+uRQIpBUra+Kd0VDeBHASs0TmMgj4QaE+gpDVaOBc3b4RzcX7CuUg/xQkQ6F7Axr97cy0SSClQ0sY/paO6CeQgYI3OYRT0gUB7AiWt0cK5uHkjnIvzE85F+ilOgED3Ajb+3ZtrkUBKgZI2/ikd1U0gBwFrdA6joA8E2hMoaY0WzsXNG+FcnJ9wLtJPcQIEuhew8e/eXIsEUgqUtPFP6ahuAjkIWKNzGAV9INCeQElrtHAubt4I5+L8hHORfooTINC9gI1/9+ZaJJBSoKSNf0pHdRPIQcAancMo6AOB9gRKWqOFc3HzRjgX5yeci/RTnACB7gVs/Ls31yKBlAIlbfxTOqqbQA4C1ugcRkEfCLQnUNIaLZyLmzdJw7lvHvp22HbLQnj5wMFlvTz37DPDnvvnwxmbT4s7gp5LHzj0Rs896Kb5kj5UuhHVCoH+BGz8+7PXMoEUAtboFKrqJNCPgDW6H3etEkglUNIaLZyLm0VJw7n5Ox8a9G5hxw1xvcy4tHAu48HRNQIERgrY+JsYBGZLoKSN/2yNnKMhsFzAGm1WEJgtgZLWaOFc3NxNFs5VZ83dtOPesHP7teHC87fE9TLj0sK5jAdH1wgQEM6ZAwQKEChp41/AcDrEwgWEc4VPAIc/cwIlrdHCubjpK5yL83PPuUg/xQkQ6F7Axr97cy0SSClQ0sY/paO6CeQgYI3OYRT0gUB7AiWt0cK5uHmTLJyrulVd1nrelnPCdVdfEdfLjEs7cy7jwdE1AgRGCtj4mxgEZkugpI3/bI2coyGwXMAabVYQmC2BktZo4Vzc3E0azr3w4v7w4O9+LvzW/K+ETSdtjOtpC6Uf2ftk+Nr+V5bdA68KEZ/4wjNLWrjjxo8uhopVuV0PPjb4+fu2XhB2L9y2eDzCuRYGRhUECHQqYOPfKbfGCCQXKGnjnxxTAwR6FrBG9zwAmifQskBJa7RwLm7ytBrOrfZ01mY3u3xa69PP7gs3z9836MJHPnjpyHCu+tmoB1dUZe/avXfxybLNh1wI5+ImoNIECHQvYOPfvbkWCaQUKGnjn9JR3QRyELBG5zAK+kCgPYGS1mjhXNy8aTWci+tK+tKrnTm3UjjXvDS3GdYJ59KPmxYIEGhXwMa/XU+1EehboKSNf9/W2ieQWsAanVpY/QS6FShpjRbOxc2tZOHcak9rrQKuPY8/teTy0LjDGK/0uJe11pe0Hn7jyOCMu0svvmjxEtfqUt07dn4q7Nr5scFTaIVz49l7FQEC+QjY+OczFnpCoA2Bkjb+bXipg0DOAtbonEdH3wisX6CkNVo4t/75MVyil3CuCrh23vNoeODO28MZm0+LO4J1lF4pnBuuourb9dvvDgvzNwzuL1eFc9uuujxcdsnWwcua4dzx48fX0YPpfenc3Fwo5Vind5T0fBoFvv/msfC2H9owjV3XZwIECBAgQGBGBL73/WPhh06cm5GjcRgE8hEo6ffo6lh9TS7QSzhXhWTPPPd8NmfONfnqS1mv/oWfc+bc3+OUlPhP/nZSksB0CPir/HSMk14SGFfAGj2ulNcRyF/AGp3/GOkhgfUIlLRGO3NuPTNj+WtbD+fqM89efe3wij079ZRN4eF7Pj64LLTLr3HOnKv6M3yfOfece2uESvpQ6XJOaotAHwI2/n2oa5NAOgFrdDpbNRPoWsAa3bW49gikFShpjRbOxc2l1sO5ujur3XMursuTlx4VzlX93PvZPw63XvdLg4qbl616WqtwbvIZpySBPAVs/PMcF70iMKlASRv/SY2UIzAtAtboaRkp/SQwnkBJa7Rwbrw5sdKrkoVzcd1qt3QVsFX3jhv+2r1w2+A+cvVDH76y76uLP65/Vn+jCvV2PfjY4J/Vfeiqn286aePg3x4I0e5YqY0AgfQCNv7pjbVAoEuBkjb+Xbpqi0AfAtboPtS1SSCdQElrtHAubh4VEc7FEa1eWjiXUlfdBAikELDxT6GqTgL9CZS08e9PWcsEuhGwRnfjrBUCXQmUtEYL5+JmVbJwrrpcdNstC+HlAwdX7eEdN340XHf1FXFH0WNp4VyP+JomQGAiARv/idgUIpCtQEkb/2wHQccItCRgjW4JUjUEMhEoaY0WzsVNumThXNWt5sMUqu/Vl5Fuu+rywSWi1eWml1580dQGdMK5uAmoNAEC3QvY+HdvrkUCKQVK2vindFQ3gRwErNE5jII+EGhPoKQ1WjgXN2+ShXOrPRBi+MEM1f3g9jz+1JL7uMUdUrelhXPdemuNAIF4ARv/eEM1EMhJoKSNf07u+kIghYA1OoWqOgn0J1DSGi2ci5tnvYRzw4HcS3/7jbDznkfDA3feHs7YfFrc0fRQWjjXA7omCRCIErDxj+JTmEB2AiVt/LPD1yECLQtYo1sGVR2BngVKWqOFc3GTLVk4N3z5avVU1OGv4XCuekrqXbv3hj33zwvn4sYyaemSPlSSQqqcQAYCNv4ZDIIuEGhRwBrdIqaqCPQsYI3ueQA0T6BlgZLWaOFc3ORJFs5V3apCuPmFh8LD93w8XHj+lkFP6wdFfOLmq0MV2lWXuD7z3PMua40bx+SlS/pQSY6pAQI9C9j49zwAmifQsoA1umVQ1RHoUcAa3SO+pgkkEChpjRbOxU2gpOFc1bUXXtwfrt9+d3j1tcOLPd29cNsgmJuFL5e1zsIoOgYCZQnY+Jc13o529gVK2vjP/mg6wtIFrNGlzwDHP2sCJa3Rwrm42Zs8nIvrXv6lhXP5j5EeEiCwVMDG34wgMFsCJW38Z2vkHA2B5QLWaLOCwGwJlLRGC+fi5m7ScK6+hPXlAweX9fLcs8+c2vvMDR+McC5uAipNgED3Ajb+3ZtrkUBKgZI2/ikd1U0gBwFrdA6joA8E2hMoaY0WzsXNm6Th3PydDw16t7DjhrheZlxaOJfx4OgaAQIjBWz8TQwCsyVQ0sZ/tkbO0RBYLmCNNisIzJZASWu0cC5u7iYL56qz5m7acW/Yuf3axYdBxHU1z9LCuTzHRa8IEFhZwMbf7CAwWwIlbfxna+QcDQHhnDlAYNYFSlqjhXNxs1k4F+cXhHORgIoTINC5gHCuc3INEkgqUNLGPymkyglkIGCNzmAQdIFAiwIlrdHCubiJkyycq7pVXdZ63pZzwnVXXxHXy4xLC+cyHhxdI0BgpICNv4lBYLYEStr4z9bIORoCywWs0WYFgdkSKGmNFs7Fzd2k4dwLL+4PD/7u58Jvzf9K2HTSxrieZlpaOJfpwOgWAQIrCtj4mxwEZkugpI3/bI2coyHSRbpfAAAgAElEQVQgnDMHCMy6QElrtHAubjYnC+dWe1Jr1WVPa40buK5Ll/Sh0rWt9gh0LSCc61pcewTSClij0/qqnUCXAtboLrW1RSC9QElrtHAubj4lC+fiujU9pZ05Nz1jpacECLwlYONvJhCYLYGSNv6zNXKOhsByAWu0WUFgtgRKWqOFc3FzVzgX5+eBEJF+ihMg0L2AjX/35lokkFKgpI1/Skd1E8hBwBqdwyjoA4H2BEpao4VzcfNGOBfnJ5yL9FOcAIHuBWz8uzfXIoGUAiVt/FM6qptADgLW6BxGQR8ItCdQ0hotnIubN0nDucNvHAk3z98XvrLvq+HUUzaFh+/5eHj3O88afO/Siy+aiae4uqw1bgIqTYBA9wI2/t2ba5FASoGSNv4pHdVNoE+Bo8dC+Ou/ngtf//qGEMJceNe73gw/9mPHw4YT+uyVtgkQiBUoaY0WzsXNlqTh3PydD4XztpwTrv6Fnwu/uvDpcOM1Hw4Xnr8lPP3svrDn8afC7oXbpv4prsK5uAmoNAEC3QsI57o31yKBlAIlbfxTOqqbQJ8CT33xhPDlZ5cmcT99ybFw+QeO9dktbRMgEClQ0hotnIubLMnCuepprTftuDfs3H7t4Gy54XDuhRf3h533PBoeuPP2cMbm0+KOoOfSwrmeB0DzBAisW0A4t24yBQhkLVDSxj/rgdA5AhECu+7bEL79nbklNZz2juPhjtuORtSqKAECfQuUtEYL5+JmWy/hnDPn4gatj9Ilfaj04atNAl0KCOe61NYWgfQC1uj0xlogkFrg1//5ieHNRg534oYQfu2fvZm6afUTIJBQoKQ1WjgXN5GShXNVtx7Z+2R45rnnw12fvDH8xr2fGVzWeubpp4VttyyEq658v3vOxY1dp6VL+lDpFFZjBHoQEM71gK5JAgkFrNEJcVVNoCOB/+1fbQj7v770zLl3nns8XP8/O3OuoyHQDIEkAiWt0cK5uCmUNJyruladJVc9AGL4q7rX3GWXbI3reSalXdaayUDoBgECYwsI58am8kICUyFQ0sZ/KgZEJwlMIPDKgbnwe4+fsHhpa3VJ61W/dCyc+18dn6A2RQgQyEWgpDVaOBc365KHc3Hdy7+0cC7/MdJDAgSWCgjnzAgCsyVQ0sZ/tkbO0RBYKnD8eAhvvHZiCHNz4aSTv1/9xxcBAlMuUNIaLZyLm6y9hHMeCBE3aH2ULulDpQ9fbRLoUkA416W2tgikF7BGpzfWAoGuBKzRXUlrh0A3AiWt0cK5uDklnIvzC86ciwRUnACBzgVs/Dsn1yCBpAIlbfyTQqqcQAYC1ugMBkEXCLQoUNIaLZyLmzjCuTg/4Vykn+IECHQvYOPfvbkWCaQUKGnjn9JR3QRyELBG5zAK+kCgPYGS1mjhXNy8Ec7F+QnnIv0UJ0CgewEb/+7NtUggpUBJG/+UjuomkIOANTqHUdAHAu0JlLRGC+fi5o1wLs5POBfppzgBAt0L2Ph3b65FAikFStr4p3RUN4EcBKzROYyCPhBoT6CkNVo4FzdvhHNxfsK5SD/FCRDoXsDGv3tzLRJIKVDSxj+lo7oJ5CBgjc5hFPSBQHsCJa3Rwrm4edN6OPfNQ98O225ZCC8fOLhqz849+8yw5/75cMbm0+KOoOfSHgjR8wBongCBdQvY+K+bTAECWQuUtPHPeiB0jkALAtboFhBVQSAjgZLWaOFc3MRrPZyL6870lRbOTd+Y6TGB0gVs/EufAY5/1gRK2vjP2tg5HgJNAWu0OUFgtgRKWqOFc3FzVzgX5+ey1kg/xQkQ6F7Axr97cy0SSClQ0sY/paO6CeQgYI3OYRT0gUB7AiWt0cK5uHkjnIvzE85F+ilOgED3Ajb+3ZtrkUBKgZI2/ikd1U0gBwFrdA6joA8E2hMoaY0WzsXNG+FcnJ9wLtJPcQIEuhew8e/eXIsEUgqUtPFP6ahuAjkIWKNzGAV9INCeQElrtHAubt4I5+L8hHORfooTINC9gI1/9+ZaJJBSoKSNf0pHdRPIQcAancMo6AOB9gRKWqOFc3HzRjgX5yeci/RTnACB7gVs/Ls31yKBlAIlbfxTOqqbQA4C1ugcRkEfCLQnUNIaLZyLmzfCuTg/4Vykn+IECHQvYOPfvbkWCaQUKGnjn9JR3QRyELBG5zAK+kCgPYGS1mjhXNy8SRbOffPQt8O2WxbCVVe+P1x39RVxvcy49IFDb2Tcu/a6VtKHSntqaiKQp4CNf57jolcEJhWwRk8qpxyB/ASs0fmNiR4RiBEoaY0WzsXMlBCShXNVt55+dl+4ef6+xR5+5IOXhoUdN8T1OLPSwrnMBkR3CBBYU8DGf00iLyAwVQIlbfynamB0lsAEAtboCdAUIZCxQElrtHAubiImDeeGu1afSffygYODb89KUCeci5uAShMg0L2AjX/35lokkFKgpI1/Skd1E8hBwBqdwyjoA4H2BEpao4VzcfOms3DuhRf3h+u33x1efe3wsh5Pc1AnnIubgEoTINC9gI1/9+ZaJJBSoKSNf0pHdRPIQcAancMo6AOB9gRKWqOFc3HzJmk498jeJ8OuBx9b7OGoEK46o+4Tv/lguOuTN4YzNp8WdzQ9lBbO9YCuSQIEogRs/KP4FCaQnUBJG//s8HWIQMsC1uiWQVVHoGeBktZo4VzcZEsWznkgRNzA5Fa6pA+V3Oz1h0DbAjb+bYuqj0C/Atbofv21TqBNAWt0m5rqItC/QElrtHAubr4lC+fiujU9pZ05Nz1jpacECLwlYONvJhCYLYGSNv6zNXKOhsByAWu0WUFgtgRKWqOFc3FzN1k4V505d9OOe8PO7deGC8/fEtfLjEsL5zIeHF0jQGCkgI2/iUFgtgRK2vjP1sg5GgLCOXOAwKwLlLRGC+fiZrNwLs4vCOciARUnQKBzAeFc5+QaJJBUoKSNf1JIlRPIQMAancEg6AKBFgVKWqOFc3ETJ1k4V3Vr/s6Hws//7E+Eyy7ZGtfLjEsL5zIeHF0jQGCkgI2/iUFgtgRK2vjP1sg5GgLLBazRZgWB2RIoaY0WzsXN3aTh3Asv7g8P/u7nwm/N/0rYdNLGuJ5mWlo4l+nA6BYBAisK2PibHARmS6Ckjf9sjZyjISCcMwcIzLpASWu0cC5uNicL5+qntb584ODIHp579plhz/3z4YzNp8UdQc+lhXM9D4DmCRBYt4Bwbt1kChDIWqCkjX/WA6FzBFoQsEa3gKgKAhkJlLRGC+fiJl6ycC6uW9NTWjg3PWOlpwQIvCVg428mEJgtgZI2/rM1co6GwHIBa7RZQWC2BEpao4VzcXNXOBfn54EQkX6KEyDQvYCNf/fmWiSQUqCkjX9KR3UTyEHAGp3DKOgDgfYESlqjhXNx8yZpOFfdc+767XeHV187vKyXLmuNG7iuS5f0odK1rfYIdC1g49+1uPYIpBWwRqf1VTuBLgWs0V1qa4tAeoGS1mjhXNx8ShbOHX7jSLh5/r5w6cUXhZ967z9a8mCIWXqKq8ta4yag0gQIdC9g49+9uRYJpBQoaeOf0lHdBHIQsEbnMAr6QKA9gZLWaOFc3LxJFs5VD4S4ace9Yef2awc93HnPo+GBO28fPADi6Wf3hT2PPxV2L9w29U9xFc7FTUClCRDoXsDGv3tzLRJIKVDSxj+lo7oJ5CBgjc5hFPSBQHsCJa3Rwrm4edNJOHfm6aeFT/zmg+GuT944COeqy12Hw7q4Q+i3tHCuX3+tEyCwfgEb//WbKUEgZ4GSNv45j4O+EWhDwBrdhqI6COQjUNIaLZyLm3fJwrnhy1qvu/qKUF3Ket6Wc0L1/4/sfTI889zzzpyLG7tOS5f0odIprMYI9CBg498DuiYJJBSwRifEVTWBjgWs0R2Da45AYoGS1mjhXNxkShbONbtVXea67ZaF8PKBg+HUUzaFh+/5eLjw/C1xvc+gtDPnMhgEXSBAYF0CNv7r4vJiAtkLlLTxz34wdJBApIA1OhJQcQKZCZS0Rgvn4iZfZ+FcXDfzLS2cy3ds9IwAgdECNv5mBoHZEihp4z9bI+doCCwXsEabFQRmS6CkNVo4Fzd3hXNxfkE4FwmoOAECnQvY+HdOrkECSQVK2vgnhVQ5gQwErNEZDIIuEGhRoKQ1WjgXN3GEc3F+wrlIP8UJEOhewMa/e3MtEkgpUNLGP6WjugnkIGCNzmEU9IFAewIlrdHCubh5kzScq57Kev32u8Orrx1e1stzzz4z7Ll/fvD01mn+cubcNI+evhMoU8DGv8xxd9SzK1DSxn92R9GREXhLwBptJhCYLYGS1mjhXNzcTRbONZ/WGtfNfEsL5/IdGz0jQGC0gI2/mUFgtgRK2vjP1sg5GgLLBazRZgWB2RIoaY0WzsXN3WThXPV01pt23Bt2br82m6eyPrL3yfC1/a+EhR03LFEbfpJs9YPdC7eFyy7ZuviaqtyuBx8b/Pt9Wy8Y/HzTSRsH/xbOxU1ApQkQ6F7Axr97cy0SSClQ0sY/paO6CeQgYI3OYRT0gUB7AiWt0cK5uHmTLJyrz5zbdtXlS4KuuO5OVvrpZ/eFm+fvGxT+yAcvXRLONc/wqy7FvWPnp8KunR8bhIpV2bt27128BHf+zocG9dQBn3BusjFRigCB/gRs/Puz1zKBFAIlbfxT+KmTQE4C1uicRkNfCMQLlLRGC+fi5kuycK7qVnXG2TPPPb/kTLO47saVHnXmXBXG7bzn0fDAnbcP7n/XDOuqMO68LeeE666+YtB4M6wTzsWNidIECHQvYOPfvbkWCaQUKGnjn9JR3QRyELBG5zAK+kCgPYGS1mjhXNy8SRrO5fZAiFHhXDNsqzjrs+M+efu2wRl3l1580WI41zyzTjgXNwGVJkCgewEb/+7NtUggpUBJG/+UjuomkIOANTqHUdAHAu0JlLRGC+fi5k2ycC7HB0KsFM7tefypJWf3NcO54Utzm+Hcd79/NG4EpqT0239oQyjlWKdkSHRzRgSOHj0eTtxwQqdHs2HD3KC9qm1fBAhMv8DbfuiE8L3vH5v+A3EEBAiEvtboN48dCxtOeGt/4IsAgfYESvo9ujpWX5MLJAvnpuWBELFnzn3rO9+dXH+KSp7+jreHUo51ioZFV2dA4Hg4Hubmut0Mb3rbhhDm5sLh7745A4IOgQCB0099e/jWq2XsR4w2gVkX6GuNPn78eJgL3e5HZn0sHR+BSqCk36OrY/U1uUCycC6nB0LUPO45N/lEKel03MmVlCQwHQIumZmOcdJLAuMKWKPHlfI6AvkLWKPzHyM9JLAegZLWaJe1rmdmLH9tsnCuamoaHgjhaa3jTaCSPlTGE/EqAtMrYOM/vWOn5wRGCVijzQsCsyNgjZ6dsXQkBCqBktZo4VzcnE8WzlWXtW67ZSG8fODgyB6ee/aZYc/984MnpKb+qi5drR7sMPy1e+G2cNklWwffavZ1+Gd1yLjrwccGr33f1guW3J/OAyFSj576CRBoW8DGv21R9RHoV6CkjX+/0lonkF7AGp3eWAsEuhQoaY0WzsXNrGThXFy3pqe0cG56xkpPCRB4S8DG30wgMFsCJW38Z2vkHA2B5QLWaLOCwGwJlLRGC+fi5q5wLs4vCOciARUnQKBzARv/zsk1SCCZwJEjIZxwdGM4tuFI2LgxWTMqJkCgIwFrdEfQmiHQkYBwriPoGWgmaTj3wov7w/Xb7w6vvnZ4GVWXl7WmHCfhXEpddRMgkELAxj+FqjoJdC/w+T88IfzZX5yw2PBPvvdYuPJDx7rviBYJEGhNwBrdGqWKCGQhIJzLYhimohPJwrnmgxamQmOCTgrnJkBThACBXgVs/Hvl1ziBVgRe2j8XfuczG5bV9cvXHA3nved4K22ohACB7gWs0d2ba5FASgHhXErd2ao7WThXPWThph33hp3brw0Xnr9lttSGjkY4N7ND68AIzKyAjf/MDq0DK0jg3335hPDFL/3grLn60D/wc8fCz/y0s+cKmgoOdcYErNEzNqAOp3gB4VzxU2BsgGThXH3m3LarLl98KurYvZqiFwrnpmiwdJUAgYGAjb+JQGD6Bfb91Vz4/c8tP3PuFz98NGz9cWfOTf8IO4JSBazRpY68455VAeHcrI5s+8eVLJyruvrI3ifDM889H3Yv3BY2nTSbdykWzrU/KdVIgEBaARv/tL5qJ9CFwOuvh3Dv/SeG737vB629/W0h3H7Lm+Hkk7vogTYIEEghYI1OoapOAv0JCOf6s5+2lpOGcx4IMW3TYeX+lvShMjuj5kgIjBaw8TczCMyGwKFvzYXn9s2F176zIZzyjqPh4q3Hw+bTnTU3G6PrKEoVsEaXOvKOe1YFSvo9ujpWX5MLJAvnPBBi8kHJsWRJHyo5+usTgTYFbPzb1FQXgf4FrNH9j4EeEGhLwBrdlqR6COQhUNIaLZyLm3PJwjkPhIgbmNxKl/Shkpu9/hBoW8DGv21R9RHoV8Aa3a+/1gm0KWCNblNTXQT6FyhpjRbOxc23ZOGcB0LEDUxupUv6UMnNXn8ItC1g49+2qPoI9Ctgje7XX+sE2hSwRrepqS4C/QuUtEYL5+LmW7JwruqWB0LEDU5OpUv6UMnJXV8IpBCw8U+hqk4C/QlYo/uz1zKBtgWs0W2Lqo9AvwIlrdHCubi5liycqy5r3XbLQnj5wMGRPTz37DPDnvvnwxmbT4s7gp5Le1przwOgeQIE1i1g479uMgUIZC1Q0sY/64HQOQItCFijW0BUBYGMBEpao4VzcRMvWTgX163pKS2cm56x0lMCBN4SsPE3EwjMlkBJG//ZGjlHQ2C5gDXarCAwWwIlrdHCubi5K5yL8wvCuUhAxQkQ6FzAxr9zcg0SSCpQ0sY/KaTKCWQgYI3OYBB0gUCLAiWt0cK5uImTLJxzWWvcwORWuqQPldzs9YdA2wI2/m2Lqo9AvwLW6H79tU6gTQFrdJua6iLQv0BJa7RwLm6+JQvnVupW9RTXX134dLjxmg+HC8/fEtf7DEo7cy6DQdAFAgTWJWDjvy4uLyaQvUBJG//sB0MHCUQKWKMjARUnkJlASWu0cC5u8nUezlXdrZ7i+rX9r4SFHTfE9T6D0sK5DAZBFwgQWJeAjf+6uLyYQPYCJW38sx8MHSQQKWCNjgRUnEBmAiWt0cK5uMnXSzj3wov7w857Hg0P3Hm7p7XGjV9npUv6UOkMVUMEehKw8e8JXrMEEglYoxPBqpZADwLW6B7QNUkgoUBJa7RwLm4iCefi/DwQItJPcQIEuhew8e/eXIsEUgqUtPFP6ahuAjkIWKNzGAV9INCeQElrtHAubt70Es7N3/nQoNcua40bvC5Ll/Sh0qWrtgj0IWDj34e6NgmkE7BGp7NVM4GuBazRXYtrj0BagZLWaOFc3FxKFs6t9rTW9229IOxeuC1sOmljXO8zKO2ecxkMgi4QILAuARv/dXF5MYHsBUra+Gc/GDpIIFLAGh0JqDiBzARKWqOFc3GTL1k4F9et6SktnJuesdJTAgTeErDxNxMIzJZASRv/2Ro5R0NguYA12qwgMFsCJa3Rwrm4uSuci/Nzz7lIP8UJEOhewMa/e3MtEkgpUNLGP6WjugnkIGCNzmEU9IFAewIlrdHCubh5kySce2Tvk+Hhf/0H4eF7Ph4uPH/LoIdPP7sv3Dx/3+D/77jxo+G6q6+I63kmpZ05l8lA6AYBAmML2PiPTeWFBKZCoKSN/1QMiE4SiBCwRkfgKUogQ4GS1mjhXNwETBLONR/4UN1/7qYd94ad268N737nWYOQbttVl4fLLtka1/sMSgvnMhgEXSBAYGyBQ4fmwiuvbAhhbi6c8yNvhs2nHx+7rBcSIJCnQEkb/zxHQK8ItCcgnGvPUk0EchAoaY0WzsXNuNbDufpBEJ+4+erF8K06a27P408tPgSi+e+4Q+i3tHCuX3+tEyAwvsBf/V9z4bOf2xCOHXurzNxcCL/4kaPhx/+JgG58Ra8kkJ9ASRv//PT1iEC7AsK5dj3VRqBvgZLWaOFc3GxLEs7VZ8nVl7Q2z6R74cX9Yec9j4YH7rw9nLH5tLgj6Lm0cK7nAdA8AQJjC/zLT20I3zo0t+T1p28+Hm792NGx6/BCAgTyEyhp45+fvh4RaFdAONeup9oI9C1Q0hotnIubbZ2Fc+dtOWfxPnPCubhB66N0SR8qffhqk0AXAr/+z08MbzZyuBM3hPBr/+zNLprXBgECiQSs0YlgVUugBwHhXA/omiSQUKCkNVo4FzeRWg/nDr9xZMk95Zr/rrpbXdZ61+69Yc/9886cixu/zkqX9KHSGaqGCHQs4My5jsE1R6AjAWt0R9CaIdCBgHCuA2RNEOhQoKQ1WjgXN7FaD+eq7lRPa33muecH95j7yr6vLgvimpe5xh1Cv6Vd1tqvv9YJEBhfwD3nxrfySgLTJFDSxn+axkVfCUwiIJybRE0ZAvkKlLRGC+fi5mGScK7qUhXAPfGFZwa9q0K6+sms1Vlz1dNah78Xdwj9lhbO9euvdQIE1ifgaa3r8/JqAtMgUNLGfxrGQx8JxAgI52L0lCWQn0BJa7RwLm7+JQvn4ro1PaWFc9MzVnpKgMBbAjb+ZgKB2RIoaeM/WyPnaAgsF7BGmxUEZkugpDVaOBc3d4VzcX5BOBcJqDgBAp0L2Ph3Tq5BAkkFStr4J4VUOYEMBKzRGQyCLhBoUaCkNVo4FzdxhHNxfsK5SD/FCRDoXsDGv3tzLRJIKVDSxj+lo7oJ5CBgjc5hFPSBQHsCJa3Rwrm4eSOci/MTzkX6KU6AQPcCNv7dm2uRQEqBkjb+KR3VTSAHAWt0DqOgDwTaEyhpjRbOxc0b4Vycn3Au0k9xAgS6F7Dx795ciwRSCpS08U/pqG4COQhYo3MYBX0g0J5ASWu0cC5u3gjn4vyEc5F+ihMg0L2AjX/35lokkFKgpI1/Skd1E8hBwBqdwyjoA4H2BEpao4VzcfNGOBfnJ5yL9FOcAIHuBWz8uzfXIoGUAiVt/FM6qptADgLW6BxGQR8ItCdQ0hotnIubN8K5OD/hXKSf4gQIdC9g49+9uRYJpBQoaeOf0lHdBHIQsEbnMAr6QKA9gZLWaOFc3LwRzsX5Ceci/RQnQKB7ARv/7s21SCClQEkb/5SO6iaQg4A1OodR0AcC7QmUtEYL5+LmjXAuzk84F+mnOAEC3QvY+HdvrkUCKQVK2vindFQ3gRwErNE5jII+EGhPoKQ1WjgXN2+Ec3F+wrlIP8UJEOhewMa/e3MtEkgpUNLGP6WjugnkIGCNzmEU9IFAewIlrdHCubh5I5yL8xPORfopToBA9wI2/t2ba5FASoGSNv4pHdVNIAcBa3QOo6APBNoTKGmNFs7FzRvhXJyfcC7ST3ECBLoXsPHv3lyLBFIKlLTxT+mobgI5CFijcxgFfSDQnkBJa7RwLm7eCOfi/IRzkX6KEyDQvYCNf/fmWiSQUqCkjX9KR3UTyEHAGp3DKOgDgfYESlqjhXNx80Y4F+c38+HcoW/Nhef2zYXXvrMhnPKOo+HircfD5tOPR6opToBAnwI2/n3qa5tA+wIlbfzb11MjgbwErNF5jYfeEIgVKGmNFs7FzRbhXJzfTIdzr78ewr33nxi++70fIL39bSHcfsub4eSTI+EUJ0CgNwEb/97oNUwgiUBJG/8kgColkJGANTqjwdAVAi0IlLRGC+fiJoxwLs5vpsO5fX81F37/cxuWCf3ih4+GrT/u7LnIqaM4gd4EbPx7o9cwgSQCJW38kwCqlEBGAtbojAZDVwi0IFDSGi2ci5swwrk4v5kO55764gnhy8+esEzopy85Fi7/wLFIOcUJEOhLwMa/L3ntEkgjUNLGP42gWgnkI2CNzmcs9IRAGwIlrdHCubgZI5yL85vpcO6l/XPhdz6z/My5X77maDjvPc6ci5w6ihPoTcDGvzd6DRNIIlDSxj8JoEoJZCRgjc5oMHSFQAsCJa3Rwrm4CSOci/Ob6XCuovn8H54Q/uwvfnD23E++91i48kPOmoucNooT6FXAxr9Xfo0TaF2gpI1/63gqJJCZgDU6swHRHQKRAiWt0cK5uMkinIvzm/lwruI5ciSEE45uDMc2HAkbN0aCKU6AQO8CNv69D4EOEGhVoKSNf6twKiOQoYA1OsNB0SUCEQIlrdHCuYiJEkIQzsX5FRHOVUQlfahETgnFCWQvYOOf/RDpIIF1CVij18XlxQSyFrBGZz08Okdg3QIlrdHCuXVPjyUFhHNxfsK5SD/FCRDoXsDGv3tzLRJIKVDSxj+lo7oJ5CBgjc5hFPSBQHsCJa3Rwrm4eSOci/MTzkX6KU6AQPcCNv7dm2uRQEqBkjb+KR3VTSAHAWt0DqOgDwTaEyhpjRbOxc0b4Vycn3Au0k9xAgS6F7Dx795ciwRSCpS08U/pqG4COQhYo3MYBX0g0J5ASWu0cC5u3gjn4vyEc5F+ihMg0L2AjX/35lokkFKgpI1/Skd1E8hBwBqdwyjoA4H2BEpao4VzcfNGOBfnJ5yL9FOcAIHuBWz8uzfXIoGUAiVt/FM6qptADgLW6BxGQR8ItCdQ0hotnIubN8K5OD/hXKSf4gQIdC9g49+9uRYJpBQoaeOf0lHdBHIQsEbnMAr6QKA9gZLWaOFc3LwRzsX5Ceci/RQnQKB7ARv/7s21SCClQEkb/5SO6iaQg4A1OodR0AcC7QmUtEYL5+LmjXAuzk84F+mnOAEC3QvY+HdvrkUCKQVK2vindFQ3gRwErNE5jII+EGhPoKQ1WjgXN2+Ec3F+ShMgQIAAAQIECBAgQIAAAQIECBCYWEA4NzGdggQIECBAgAABAgQIECBAgAABAgTiBIRzcX5KEyBAgAABAgQIECBAgAABAgQIEJhYQDg3MZ2CBAgQIECAAAECBAgQIECAAAECBOIEhHNxfjNVev7Oh8JfPv83Yc/98+GMzacNjq363hNfeGbJcd5x40fDdVdfEWigaasAABB3SURBVF54cX+4fvvd4dXXDi/+/Nyzz1xSfqaAHAyBzAW+eejbYdstC+HlAwcXe3rqKZvCw/d8PFx4/pbB9w6/cSTcPH9f+Mq+rw7+Xb+fR73fdy/cFi67ZGvmR617BGZbYHitbb6fm2t08z37yN4nw64HHxsAvW/rBaH6+aaTNs42mKMjkLHA8Huy+X6uu12953fe82h44M7bF/fja+3JMz5kXSMwswLNffdHPnhpWNhxw+B4R/2e3FyLh9dwa/TMTpN1HZhwbl1cs/vi+sOhGa5V36++6g+aYYHqQ+eOnZ8Ku3Z+bPEX/9kVcmQE8heoNwmfuPnqkaFaHcxdevFFg4B9+Ksq+y8+/Xj45O3bBr+8e3/nP956OPsCq70P13rPPv3svnDX7r2LfzBbbT2ffUlHSKB/geZ7svnv4V/0R/2x23u4/zHUAwK1QHNPvdoeuy5TvYfP23LOYA9eBfXPPPf84I9m1Vf1h/Ozzvzhkb9zUy9HQDhXzliveKTVh8PX9r8Sfv5nf2LJRr4qIJwzQQhMj8Ba4Vz9Xh8VtjePcq26pkdFTwlMp0C90d921eVjncHafM8O/xJQCTSDgOlU0WsC0ysw/Mv4an8EW+3Muerox1nDp1dJzwlMh8CoffJavzfXZ8RWR1hd6TL8x3Rr9HSMe+peCudSC2de//BGobrMbfiv7HU4N3xZ6/AlcM3TdV3Smvlg697MCzRPr1/rEriVLqmpf5GfX3hoySWxMw/oAAlkJDDqMvXVLnupNvb1e/bd7zxr8Ff44bNknQ2b0eDqSpEC9Xv67H+4eXC2zN7Pfmnwx/Fm2DbuZa3De/IiQR00gZ4F6svUq/fzPz7/3eGmHfeGnduvHXlF2fAfzEYFe9bongczk+aFc5kMRB/dqDbyex5/avEeNGsl9nUYtzB/w8i/4lcfOt84+HfuadPHYGqTwAiBatPw+Of/ZHBZ28mbNg5+WR8+C2f45/V9JodDd/ecM60I9CfQ/AW9PpOuednLqPfsqLPubPz7G0stE6gFqr3y//Mf/1P46t98Paz3nnPDimvtyYkTIJBeoF5Xq5b+03/+Zhi+51zz/dq8j2TzLDtrdPrxmoYWhHPTMEqJ+jh8U9rhJlY7A655mcxaHzyJuq5aAgTGEKj+Mlf/Fa8+k2Y4nFvt0lWXtY4B7CUEEgqMOntmtT+iDb9nqzPsnDmXcHBUTWACgeatJYbPdq0f2lRVu9KZc80mV9uTT9A9RQgQWIfA8B67ev+u9Ae0le5FN+rseFehrWMAZvSlwrkZHdhJDmutM+eqOoVzk8gqQ6AfgebGofn+bf7cxr+fcdIqgVECo96fzTPeV3vPuueceUUgL4FRa3DzvlPCubzGTG8IrCQwKkRv3leyKjvO79f16/7oT//cPSULn3LCucInwPDhj3pq1N7P/nG49bpfGrysebrt7z3xpfCP/5v3LF5X7ylSJhOBfgWq93D1ddklWwf/bW4Smn+lH/75S3/7jfDEF74c5m+9ZvH9fv32u8NKl7H3e6RaJ1CGwPDtIqojHj4brlqTV3vPelprGXPEUU6PQPNWEus5c64K61fbk0+Pgp4SmA2B+sy3q658/+Dpq6POnBvnCa6jfseeDSFHMYmAcG4StRkt09zI1x8o1YMi6q/he1BVr69+Uai/VrtR9YySOSwCWQmM85CW4cvZh0+fX+v9ntWB6gyBQgSa78vh+9mM854dfr9bowuZNA4za4EqcK8ftNa859yoy9zq9/w47/esD1znCMygQHPf3bzn3Kgz6WqG4bIuZ53ByTHhIQnnJoRTjAABAgQIECBAgAABAgQIECBAgECsgHAuVlB5AgQIECBAgAABAgQIECBAgAABAhMKCOcmhFOMAAECBAgQIECAAAECBAgQIECAQKyAcC5WUHkCBAgQIECAAAECBAgQIECAAAECEwoI5yaEU4wAAQIECBAgQIAAAQIECBAgQIBArIBwLlZQeQIECBAgQIAAAQIECBAgQIAAAQITCgjnJoRTjAABAgQIECBAgAABAgQIECBAgECsgHAuVlB5AgQIECBAgAABAgQIECBAgAABAhMKCOcmhFOMAAECBAgQIECAAAECBAgQIECAQKyAcC5WUHkCBAgQIECAAAECBAgQIECAAAECEwoI5yaEU4wAAQIECBAgQIAAAQIECBAgQIBArIBwLlZQeQIECBAgQIAAAQIECBAgQIAAAQITCgjnJoRTjAABAgQIECBAgAABAgQIECBAgECsgHAuVlB5AgQIECBAgAABAgQIECBAgAABAhMKCOcmhFOMAAECBAgQIECAAAECBAgQIECAQKyAcC5WUHkCBAgQIECAAAECBAgQIECAAAECEwoI5yaEU4wAAQIECBAgQIAAAQIECBAgQIBArIBwLlZQeQIECBAgQIAAgVYEnn52X7hr996w5/75cMbm01qpUyUECBAgQIAAgdwFhHO5j5D+ESBAgAABAhMJvPDi/nD99rsHZR++5+PhwvO3LNYzf+dDg/9f2HHDRHXHFhoVQh1+40i4ef6+cNaZP5y0X1Xb8wsPLTOJPab1lK+P9dKLLwrXXX3FYlHh3HoUvZYAAQIECBCYFQHh3KyMpOMgQIAAAQIElghU4dwdOz81+N45Z50edi/cFjadtHHw7xzDua6GTzjXlbR2CBAgQIAAAQLjCQjnxnPyKgIECBAgQGDKBOpw7n/5H/67cO9D/yYszN8QLrtk68hw7pG9T4ZdDz62eISnnrJpyZll9Rldn7j56sFZZ6++djjcceNHw3ve9SODyzCb3/+p9/6jwVl71evqr+r11Vli9Rl9wz/7yAcvDZ+8fdvgzLn6bLKVziJrBovNvlchZH2co4ZsnHCu2ceqf/VZhqMsqnaa7Y46zvdtvWDwut+8d0944gvPLPM+eOi/LPMcVfeUTUXdJUCAAAECBAisKiCcM0EIECBAgACBmRSow7ldOz8W/s+//L/DM889v3j23KiAqwra6lCrCrwe//yfLN77rAqkquCsDpfqM/BW+n7V9hNf+HKYv/WagW0dVNUB4WqXtdbh3DcPfTtsu2VhEPzV/Wp+r+rn8HE125kknBt2qy4Fbl5uO+qYm16j+jHc16pfw0Fk3c9x6p7JyeqgCBAgQIAAgaIFhHNFD7+DJ0CAAAECsyswHDKdefppS4KutS5rrUKwm3bcG3Zuv3Zwr7qVzmJbzz3SqjbP23LO4Oy5ccK5amSqMt84+HeLoWJVbs/jTw3+/frhI0v6WI/kWse21plzw/0cDs3qBzX8+xdfWvbQhmagN6oP44ZzzQdCNOue3RnryAgQIECAAIFSBYRzpY684yZAgAABAjMu0Ax1hs/u+heffnxw9MMPhKgCpeFLLauf15dqThLO1WeBDTPXl4eOG84NH8O733nWkrPNRl02Wrc1fBlqc5hXC+fqs+S+su+ry2bHuWefOTiTcK1wrtnPuiLh3Iy/4RweAQIECBAgMLGAcG5iOgUJECBAgACBnAWa4Vx9SehVV74/fG3/K4vhXP39s//h5sUz1JqXj643nKuCvqef2bfkvnXDZ5ONG84NP9W0uo9d9YCL6jLd6my+Sc8oGyecaz5FdXicR/V9VIi47arLl9z7TjiX87tF3wgQIECAAIE+BYRzfeprmwABAgQIEEgmMCq8qoOlKog768wfHpw5V71u5z2PhgfuvD2csfm0QX9iwrk6UGuGU8Ph3Kg2h4O46tLX+qsOtX7swv86/PUL/2HFAHFcyHEua63qGj6rcJJwrhnwNe+Pt9bls/VYTBpCjuvhdQQIECBAgACBvgWEc32PgPYJECBAgACBJAKjQp3hyzbrSz9HPbygvsR1kstamw9QqA6uvsR1tTZXCufqoPDlAweXPRF11Bl6VVv/8ev/eXBvu1Ffa4VzdV/rp8vWYWV1KXD1RNnqkte17gvXfEDEqLMTm/fTq53WqjvJZFEpAQIECBAgQKBHAeFcj/iaJkCAAAECBNIJrHTGVTMoGw7P6t782v/6P4Xf+b3/ffFJqeu9rHU4UKvqrEK5+qs+I60KsHY9+Njg29XPq+Br1BNMq59XQdZfPv83i0+PHVYbrqf6/qmnbFpyOW1TeNS98JrlRt3Prg7r1rqstbrktvoa7ld1v7qf+al/Ev7DS68sO/OvCh3rPh889F/WDP7SzRg1EyBAgAABAgT6ERDO9eOuVQIECBAgQIBAUQJVWFfd62+ly2WLwnCwBAgQIECAAIEhAeGc6UCAAAECBAgQINCqQHXm3RNf+HKYv/WaQb3Ne/i12pjKCBAgQIAAAQJTLiCcm/IB1H0CBAgQIECAQG4Coy6Lre/fl1tf9YcAAQIECBAg0LeAcK7vEdA+AQIECBAgQIAAAQIECBAgQIBAsQLCuWKH3oETIECAAAECBAgQIECAAAECBAj0LSCc63sEtE+AAAECBAgQIECAAAECBAgQIFCsgHCu2KF34AQIECBAgAABAgQIECBAgAABAn0LCOf6HgHtEyBAgAABAgQIECBAgAABAgQIFCsgnCt26B04AQIECBAgQIAAAQIECBAgQIBA3wLCub5HQPsECBAgQIAAAQIECBAgQIAAAQLFCgjnih16B06AAAECBAgQIECAAAECBAgQINC3gHCu7xHQPgECBAgQIECAAAECBAgQIECAQLECwrlih96BEyBAgAABAgQIECBAgAABAgQI9C0gnOt7BLRPgAABAgQIECBAgAABAgQIECBQrIBwrtihd+AECBAgQIAAAQIECBAgQIAAAQJ9Cwjn+h4B7RMgQIAAAQIECBAgQIAAAQIECBQrIJwrdugdOAECBAgQIECAAAECBAgQIECAQN8Cwrm+R0D7BAgQIECAAAECBAgQIECAAAECxQoI54odegdOgAABAgQIECBAgAABAgQIECDQt4Bwru8R0D4BAgQIECBAgAABAgQIECBAgECxAsK5YofegRMgQIAAAQIECBAgQIAAAQIECPQtIJzrewS0T4AAAQIECBAgQIAAAQIECBAgUKyAcK7YoXfgBAgQIECAAAECBAgQIECAAAECfQsI5/oeAe0TIECAAAECBAgQIECAAAECBAgUKyCcK3boHTgBAgQIECBAgAABAgQIECBAgEDfAsK5vkdA+wQIECBAgAABAgQIECBAgAABAsUKCOeKHXoHToAAAQIECBAgQIAAAQIECBAg0LeAcK7vEdA+AQIECBAgQIAAAQIECBAgQIBAsQLCuWKH3oETIECAAAECBAgQIECAAAECBAj0LSCc63sEtE+AAAECBAgQIECAAAECBAgQIFCsgHCu2KF34AQIECBAgAABAgQIECBAgAABAn0LCOf6HgHtEyBAgAABAgQIECBAgAABAgQIFCsgnCt26B04AQIECBAgQIAAAQIECBAgQIBA3wLCub5HQPsECBAgQIAAAQIECBAgQIAAAQLFCgjnih16B06AAAECBAgQIECAAAECBAgQINC3gHCu7xHQPgECBAgQIECAAAECBAgQIECAQLECwrlih96BEyBAgAABAgQIECBAgAABAgQI9C0gnOt7BLRPgAABAgQIECBAgAABAgQIECBQrIBwrtihd+AECBAgQIAAAQIECBAgQIAAAQJ9C/z/M3OuPzYPLyYAAAAASUVORK5CYII=", + "text/html": [ + "
          " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Truncate text for hover data\n", + "df = tdf_summary.to_pandas()\n", + "max_chars = 50 # Maximum characters to display\n", + "df['truncated_narrative'] = df['consumer_complaint_narrative'].apply(lambda x: x[:max_chars] + '...' if len(x) > max_chars else x)\n", + "df['truncated_summary'] = df['Summary'].apply(lambda x: x[:max_chars] + '...' if len(x) > max_chars else x)\n", + "\n", + "# Calculate the length of consumer_complaint_narrative and Summary\n", + "df['narrative_length'] = df['consumer_complaint_narrative'].apply(len)\n", + "df['summary_length'] = df['Summary'].apply(len)\n", + "\n", + "# Create a scatter plot\n", + "fig = px.scatter(df.sort_values(['narrative_length']), x='narrative_length', y='summary_length',\n", + " hover_data=['complaint_id', 'truncated_narrative', 'truncated_summary'],\n", + " labels={'narrative_length': 'Narrative Length', 'summary_length': 'Summary Length'},\n", + " title='Complaint and Summary Lengths')\n", + "\n", + "# Update the x-axis to show values as they are (not in scientific notation)\n", + "fig.update_xaxes(type='category')\n", + "\n", + "# Show the plot\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "06cbb0f4-d026-4a0b-8ab8-6982b7f7777a", + "metadata": {}, + "source": [ + "

          Now the results can be saved back to Vantage.

          " + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "8b9a58e0-ec2e-4e76-8f51-f1757ca5a8b5", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "copy_to_sql(df = df, table_name = 'Complaints_Summaries', if_exists = 'replace')" + ] + }, + { + "cell_type": "markdown", + "id": "561ff317-6468-4941-bf9b-840849bfb09d", + "metadata": {}, + "source": [ + "
          \n", + "5. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "85f388c4-82cd-45dd-89c0-45854a118f24", + "metadata": {}, + "source": [ + "

          Work Tables

          \n", + "

          Cleanup work tables to prevent errors next time.

          " + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "56ae9d4d-5a68-496c-884c-911de451314e", + "metadata": {}, + "outputs": [], + "source": [ + "tables = ['Complaints_Summaries']\n", + "\n", + "# Loop through the list of tables and execute the drop table command for each table\n", + "for table in tables:\n", + " try:\n", + " db_drop_table(table_name=table)\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "id": "476fb82b-d7a0-4e29-a6d4-8372a247cea8", + "metadata": {}, + "source": [ + "

          Databases and Tables

          \n", + "

          The following code will clean up tables and databases created above.

          " + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "358f56a0-ad97-4317-ab4c-88be00b8d179", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "6cf670cd-4594-458d-af98-8efff5a72f73", + "metadata": {}, + "source": [ + "
          \n", + "Dataset:\n", + "
          \n", + "
          \n", + "

          The dataset is sourced from Consumer Financial Protection Bureau

          " + ] + }, + { + "cell_type": "markdown", + "id": "eeebf3ab-357c-488e-ba9d-78bf82f4d0dd", + "metadata": {}, + "source": [ + "
          \n", + "
          ClearScape Analytics™
          \n", + "
          \n", + "
          \n", + " Copyright © Teradata Corporation - 2024. All Rights Reserved\n", + "
          \n", + "
          \n", + "
          " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaints_Classification.ipynb b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaints_Classification.ipynb new file mode 100644 index 00000000..46560a95 --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaints_Classification.ipynb @@ -0,0 +1,555 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "9fa8099b-5aa9-4a85-ab26-21e7df9e5a9a", + "metadata": {}, + "source": [ + "
          \n", + "

          \n", + " Complaints Classification using Vantage and LLM\n", + "
          \n", + " \"Teradata\"\n", + "

          \n", + "
          " + ] + }, + { + "cell_type": "markdown", + "id": "bbae5cd8-f5bd-4b0c-9ae6-ff8ef4bdd737", + "metadata": {}, + "source": [ + "

          Introduction:

          \n", + "\n", + "

          Revolutionize customer complaint resolution with our pioneering solution, which seamlessly integrates the capabilities of Teradata Vantage and AWS Bedrock - Anthropic's Claude LLM model model as LLM. This powerful synergy enables businesses to classify customer complaints with unmatched precision and speed, allowing them to swiftly identify and address concerns, thereby elevating overall customer satisfaction and loyalty.

          \n", + "\n", + "

          Key Features:

          \n", + "
            \n", + "
          • Automated Classification: Our AI-driven model categorizes complaints into predefined categories, ensuring consistency and reducing manual effort.
          • \n", + "
          • Contextual Understanding: The system comprehends the nuances of customer feedback, capturing subtle differences in tone and language.
          • \n", + "
          • Real-time Insights: Generate instant reports and analytics to identify trends, patterns, and areas for improvement.
          • \n", + "
          \n", + "\n", + "\n", + "

          Benefits:

          \n", + "
            \n", + "
          • Enhanced Customer Experience: Swiftly address customer concerns, fostering trust and loyalty.
          • \n", + "
          • Improved Operational Efficiency: Reduce manual processing time, allowing teams to focus on high-value tasks.
          • \n", + "
          • Data-Driven Decision Making: Make informed decisions with actionable insights from complaint data.
          • \n", + "
          \n", + "\n", + "

          Experience the transformative power of Generative AI in complaints classification.

          \n", + "\n", + "

          Steps in the analysis:

          \n", + "
            \n", + "
          1. Configuring the environment
          2. \n", + "
          3. Connect to Vantage
          4. \n", + "
          5. Configuring AWS Bedrock - Anthropic's Claude LLM model
          6. \n", + "
          7. Classify Complaints
          8. \n", + "
          9. Cleanup
          10. \n", + "
          " + ] + }, + { + "cell_type": "markdown", + "id": "6eb81395-b727-4eaa-a514-09f67c11b888", + "metadata": {}, + "source": [ + "
          \n", + "1. Configuring the environment\n", + "
          \n", + "

          1.1 Downloading and installing additional software needed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74720dfc-1195-403e-a596-c9a55773fedc", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install -r requirements.txt --upgrade --quiet" + ] + }, + { + "cell_type": "markdown", + "id": "3e8d47d7-1a44-4da1-afa5-46ce90312021", + "metadata": {}, + "source": [ + "

          \n", + "

          Note: Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: 0 0

          \n", + "
          " + ] + }, + { + "cell_type": "markdown", + "id": "7bfc2916-153f-4a56-95ff-ebeb08c0112c", + "metadata": {}, + "source": [ + "
          \n", + "

          1.2 Import the required libraries

          \n", + "

          Here, we import the required libraries, set environment variables and environment paths (if required).

          " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "83dc0922-f932-4378-a505-3bb2d1f1243b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Data manipulation and analysis\n", + "import numpy as np\n", + "import pandas as pd\n", + "import json, warnings\n", + "import getpass\n", + "\n", + "# Visualization\n", + "import plotly.express as px\n", + "import matplotlib.pyplot as plt\n", + "from wordcloud import WordCloud\n", + "\n", + "# Progress bar\n", + "from tqdm import tqdm\n", + "\n", + "# Machine learning and other utilities from Teradata\n", + "from teradataml import *\n", + "from teradatagenai import TeradataAI, TextAnalyticsAI, VSManager, VectorStore, VSApi\n", + "\n", + "# Requests\n", + "import requests\n", + "\n", + "# Display settings\n", + "display.max_rows = 5\n", + "pd.set_option('display.max_colwidth', None)\n", + "# Set display options for dataframes, plots, and warnings\n", + "%matplotlib inline\n", + "warnings.filterwarnings('ignore')\n", + "display.suppress_vantage_runtime_warnings = True" + ] + }, + { + "cell_type": "markdown", + "id": "e365debb-d67d-45b9-aaad-e54bcc474818", + "metadata": {}, + "source": [ + "
          \n", + "2. Connect to Vantage\n", + "

          We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.

          " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "454a2e81-c377-4058-9e68-78abd801ad9c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=text_analytics_teradatagenai_aws_huggingface.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "markdown", + "id": "dee44870-6034-4353-96df-f5bd00970fa9", + "metadata": {}, + "source": [ + "

          Begin running steps with Shift + Enter keys.

          " + ] + }, + { + "cell_type": "markdown", + "id": "6cbcc301-e792-46b4-84de-e775832be566", + "metadata": {}, + "source": [ + "
          \n", + "

          2. Set up the LLM connection

          \n", + "\n", + "

          The teradatagenai python library can both connect to cloud-based LLM services as well as instantiate private models running at scale on local GPU compute. In this case we will use anthropoc claude-instant-v1 for low-cost, high-throughput tasks.

          \n", + "\n", + "
            \n", + "
          1. aws_access_key_id: Enter your AWS access key ID
          2. \n", + "
          3. aws_secret_access_key: Enter your AWS secret access key
          4. \n", + "
          5. region name: Enter the AWS region you want to configure (e.g., us-east-1)
          6. \n", + "
              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c20b89e0-4e61-4603-be2a-033437b2c568", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "access_key = getpass.getpass('aws_access_key_id: ')\n", + "secret_key = getpass.getpass('aws_secret_access_key: ')\n", + "region_name = getpass.getpass('region name: ')" + ] + }, + { + "cell_type": "markdown", + "id": "5385eddb-7912-41dd-bf5a-d44d9bc53f9c", + "metadata": {}, + "source": [ + "
              \n", + "

              3. Use the TextAnalyticsAI API to Perform Various Text Analytics Tasks

              \n", + "

              You can execute the help function at the bottom of this notebook to read more about this API.

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bbb331fe-a3ff-42cf-a491-8347d53818e8", + "metadata": {}, + "outputs": [], + "source": [ + "# Provide model details\n", + "model_name=\"anthropic.claude-v2\"\n", + "\n", + "# Select in-database or external model\n", + "llm = TeradataAI(api_type = 'AWS',\n", + " model_name = model_name,\n", + " region = region_name,\n", + " # authorization = 'Repositories.BedrockAuth'\n", + " access_key = access_key,\n", + " secret_key = secret_key)\n", + "\n", + "obj = TextAnalyticsAI(llm=llm)" + ] + }, + { + "cell_type": "markdown", + "id": "3d1a07c7-a9d7-49a7-a2ac-1e68c0a1282c", + "metadata": {}, + "source": [ + "
              \n", + "4. Classify Complaints\n", + "

              We'll use a sample of the data to classify complaints

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cac31f37-2eab-410a-b1ec-9fda12f141c5", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Consumer_Complaints'))\n", + "tdf = tdf.assign(id = tdf.complaint_id).drop('complaint_id', axis = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b3f38a59-8ffb-4010-897d-5d571692365a", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_classified = obj.classify(\n", + " column=\"consumer_complaint_narrative\",\n", + " data=tdf,\n", + " accumulate=\"0:5\",\n", + " labels=[\"Complaint\", \"Non-Complaint\"],\n", + " multi_label=True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7734659a-5b9e-4989-b334-1e17b77754d4", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_classified = tdf_classified.assign(Prediction = tdf_classified.Labels.oreplace(\"[\").oreplace(\"]\")).drop(columns=['Labels', 'Message'])\n", + "tdf_classified = tdf_classified.assign(Prediction = tdf_classified.Prediction.cast(type_=VARCHAR(15)))\n", + "tdf_classified = tdf_classified.assign(Prediction = tdf_classified.Prediction.str.strip())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60feb089-0011-44db-83cd-f4f70f5ff79b", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_classified" + ] + }, + { + "cell_type": "markdown", + "id": "e8daa142-e48f-4caf-bd3a-4e85e93d2e6e", + "metadata": {}, + "source": [ + "
              \n", + "

              4.1 Consumer Complaints Prediction vs Occurrences

              \n", + "\n", + "

              A graph illustrating the relationship between consumer complaints prediction and the number of occurrences. This visual representation helps identify trends, patterns, and areas for improvement, enabling data-driven decision making.

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32919a9a-7a29-410a-9207-341b0f857652", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display, Markdown\n", + "def display_helper(msg):\n", + " return display(Markdown(\n", + " f\"\"\"
              \n", + "

              Note: \n", + "{msg}

              \"\"\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "141400a2-a4a6-42ea-9f44-651534e474d6", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from collections import Counter\n", + "df = tdf_classified.to_pandas()\n", + "data = Counter(df['Prediction'])\n", + "\n", + "# Convert Counter data to DataFrame\n", + "viz_df = pd.DataFrame.from_dict(data, orient='index', columns=['Count']).reset_index()\n", + "\n", + "# Rename columns\n", + "viz_df.columns = ['Prediction', 'Count']\n", + "\n", + "# Create bar graph using Plotly Express\n", + "fig = px.bar(viz_df, x='Prediction', y='Count', color='Prediction',\n", + " labels={'Count': 'Number of Occurrences', 'Prediction': 'Prediction'})\n", + "\n", + "# Show the plot\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "5cc01cc7-269f-4136-a945-fcc149b04e37", + "metadata": {}, + "source": [ + "
              \n", + "

              4.2 Word Cloud for Consumer Complaints Prediction

              \n", + "\n", + "

              A visual representation of consumer complaints prediction, highlighting the most frequent words and pain points in customer feedback. This word cloud helps identify trends, sentiment, and areas for improvement, enabling data-driven decision making.

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "df58cb69-27f5-4878-a0d3-132c501a95e8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "complaint = df[df['Prediction'] == 'Complaint']\n", + "complaint_text = ' '.join(complaint['consumer_complaint_narrative'])\n", + "\n", + "# Replace 'X' with blank space\n", + "modified_string = complaint_text.replace('X', '')\n", + "\n", + "if len(modified_string) > 0:\n", + " wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)\n", + "\n", + " # Display the word cloud\n", + " plt.imshow(wordcloud, interpolation='bilinear')\n", + " plt.title(\"Complaints\")\n", + " plt.tight_layout()\n", + " plt.axis(\"off\")\n", + " plt.show()\n", + "else:\n", + " display_helper(\"We included both complaint and non-complaint options for completeness. But since this is a complaints dataset, we don't expect to see any complaints.\")" + ] + }, + { + "cell_type": "markdown", + "id": "afe85dc8-e179-4f31-93db-1ba497724aa4", + "metadata": {}, + "source": [ + "
              \n", + "

              4.3 Word Cloud for Non-Complaints Prediction

              \n", + "\n", + "

              A visual representation of non-complaints prediction, highlighting the most frequent words and positive sentiments in customer feedback. This word cloud helps identify trends, sentiment, and areas of satisfaction, enabling data-driven decision making.

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fef35daa-b6a9-44a8-8562-a464d49d92f5", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "non_complaint = df[df['Prediction'] == 'Non-Complaint']\n", + "non_complaint_text = ' '.join(non_complaint['consumer_complaint_narrative'])\n", + "\n", + "# Replace 'X' with blank space\n", + "modified_string = non_complaint_text.replace('X', '')\n", + "\n", + "if len(modified_string) > 0:\n", + " wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)\n", + "\n", + " # Display the word cloud\n", + " plt.imshow(wordcloud, interpolation='bilinear')\n", + " plt.title(\"Non-Complaints\")\n", + " plt.tight_layout()\n", + " plt.axis(\"off\")\n", + " plt.show()\n", + "else:\n", + " display_helper(\"We included both complaint and non-complaint options for completeness. But since this is a complaints dataset, we don't expect to see any non-complaints.\")" + ] + }, + { + "cell_type": "markdown", + "id": "8522801e-48fb-4c6d-af93-3538e13ea294", + "metadata": {}, + "source": [ + "

              Now the results can be saved back to Vantage.

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2b2d6203-9526-43ed-8b25-f5548376a10c", + "metadata": {}, + "outputs": [], + "source": [ + "copy_to_sql(df = df, table_name = 'complaints_classified', if_exists = 'replace')" + ] + }, + { + "cell_type": "markdown", + "id": "d37c5d71-e309-496a-a0d9-5ac9b3567fe0", + "metadata": {}, + "source": [ + "
              \n", + "5. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "04a9cf3a-a133-4695-b683-72c4a448bbaa", + "metadata": {}, + "source": [ + "

              Work Tables

              \n", + "

              Cleanup work tables to prevent errors next time.

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe618c2d-b637-4577-a6c7-01a417c2e27d", + "metadata": {}, + "outputs": [], + "source": [ + "tables = ['complaints_classified']\n", + "\n", + "# Loop through the list of tables and execute the drop table command for each table\n", + "for table in tables:\n", + " try:\n", + " db_drop_table(table_name=table)\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "id": "3265d0a0-2b15-4383-b657-c6329d547ece", + "metadata": {}, + "source": [ + "

              Databases and Tables

              \n", + "

              The following code will clean up tables and databases created above.

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "358f56a0-ad97-4317-ab4c-88be00b8d179", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "b82f52a3-2aad-44d8-b4dd-606249c85962", + "metadata": {}, + "source": [ + "
              \n", + "Dataset:\n", + "
              \n", + "
              \n", + "

              The dataset is sourced from Consumer Financial Protection Bureau

              " + ] + }, + { + "cell_type": "markdown", + "id": "ffa325e2-7b1b-49c1-a932-e75f71bf002d", + "metadata": {}, + "source": [ + "
              \n", + "
              ClearScape Analytics™
              \n", + "
              \n", + "
              \n", + " Copyright © Teradata Corporation - 2024. All Rights Reserved\n", + "
              \n", + "
              \n", + "
              " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaints_Clustering.ipynb b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaints_Clustering.ipynb new file mode 100644 index 00000000..fa69a4aa --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Complaints_Clustering.ipynb @@ -0,0 +1,804 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "473a7d4a-dc19-4b44-a062-7da705197114", + "metadata": {}, + "source": [ + "
              \n", + "

              \n", + " Complaints Clustering using Vantage and LLM\n", + "
              \n", + " \"Teradata\"\n", + "

              \n", + "
              " + ] + }, + { + "cell_type": "markdown", + "id": "0b39f161-0198-4044-9956-3dc9c9b304b0", + "metadata": {}, + "source": [ + "

              Introduction:

              \n", + "\n", + "

              This feature uses advanced clustering techniques powered by Teradata Vantage and AWS Bedrock - Amazon's Titan embeddings model model to group similar customer complaints together. By identifying common themes and patterns, this functionality provides valuable insights into the key issues and pain points experienced by customers.

              \n", + "\n", + "\n", + "

              Key Features of Complaints Clustering:

              \n", + "
                \n", + "
              • Leverages advanced clustering algorithms powered by Teradata Vantage and Amazon's Titan embeddings.
              • \n", + "
              • Groups similar customer complaints together, revealing common themes and pain points.
              • \n", + "
              • Provides clients with a deeper understanding of the key issues affecting their customers.
              • \n", + "
              • Enables clients to prioritize and address the most pressing concerns more effectively.
              • \n", + "
              • Helps clients identify opportunities for product improvements and enhanced customer experience.
              • \n", + "
              \n", + "\n", + "\n", + "

              Unlock the revolutionary potential of Generative AI to categorize and analyze complaints with unparalleled efficiency.

              \n", + "\n", + "

              Steps in the analysis:

              \n", + "
                \n", + "
              1. Configuring the environment
              2. \n", + "
              3. Connect to Vantage
              4. \n", + "
              5. Data Exploration
              6. \n", + "
              7. Configuring AWS Titan Embeddings
              8. \n", + "
              9. Cluster the Complaints
              10. \n", + "
              11. Cleanup
              12. \n", + "
              " + ] + }, + { + "cell_type": "markdown", + "id": "16ffb0c2-17a5-475f-9e98-1f86919a9ec9", + "metadata": {}, + "source": [ + "
              \n", + "1. Configuring the environment\n", + "
              \n", + "

              1.1 Downloading and installing additional software needed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e97af3c-0907-4a05-adf1-40b4ea1cded0", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install -r requirements.txt --upgrade --quiet" + ] + }, + { + "cell_type": "markdown", + "id": "3a81070d-da02-49db-a422-c36143018277", + "metadata": {}, + "source": [ + "

              \n", + "

              Note: Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: 0 0

              \n", + "
              " + ] + }, + { + "cell_type": "markdown", + "id": "034d8e59-1e9e-410c-9e56-d77543163727", + "metadata": {}, + "source": [ + "
              \n", + "

              1.2 Import the required libraries

              \n", + "

              Here, we import the required libraries, set environment variables and environment paths (if required).

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c9f49cf5-f7b9-461a-a59c-867a414d3bcc", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Data manipulation and analysis\n", + "import pandas as pd\n", + "\n", + "# Suppress warnings\n", + "import warnings\n", + "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)\n", + "\n", + "# General imports\n", + "import os\n", + "import getpass\n", + "\n", + "# Plotting packages\n", + "import plotly.express as px\n", + "import plotly.graph_objects as go\n", + "\n", + "# Teradata library\n", + "from teradataml import *\n", + "from teradatagenai import TeradataAI, TextAnalyticsAI, VSManager, VectorStore, VSApi\n", + "from sqlalchemy import func\n", + "\n", + "# Display settings\n", + "display.max_rows = 5\n", + "display.print_sqlmr_query = False\n", + "display.suppress_vantage_runtime_warnings = True\n", + "configure.val_install_location = \"val\"\n", + "configure.byom_install_location = \"byom\"" + ] + }, + { + "cell_type": "markdown", + "id": "8452ad4b-84f7-4b7d-b523-5f24125264c7", + "metadata": {}, + "source": [ + "
              \n", + "2. Connect to Vantage\n", + "

              We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.

              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c25f3d12-a0db-43ce-8244-294ce0132097", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=text_analytics_teradatagenai_aws_huggingface.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "markdown", + "id": "11417221-3825-471d-af67-2f38de4059b7", + "metadata": {}, + "source": [ + "

              Begin running steps with Shift + Enter keys.

              " + ] + }, + { + "cell_type": "markdown", + "id": "62bbb47c-6490-45a9-b3c0-f9281f49ae35", + "metadata": {}, + "source": [ + "
              \n", + "

              2. Set up the LLM connection

              \n", + "\n", + "

              The teradatagenai python library can both connect to cloud-based LLM services as well as instantiate private models running at scale on local GPU compute. In this case we will use anthropoc claude-instant-v1 for low-cost, high-throughput tasks.

              \n", + "\n", + "
                \n", + "
              1. aws_access_key_id: Enter your AWS access key ID
              2. \n", + "
              3. aws_secret_access_key: Enter your AWS secret access key
              4. \n", + "
              5. region name: Enter the AWS region you want to configure (e.g., us-east-1)
              6. \n", + "
                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09c6a41f-08af-4c99-8b02-ef1e9e6759e4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "access_key = getpass.getpass('aws_access_key_id: ')\n", + "secret_key = getpass.getpass('aws_secret_access_key: ')\n", + "region_name = getpass.getpass('region name: ')" + ] + }, + { + "cell_type": "markdown", + "id": "86e2a715-df46-4090-908f-496e63908371", + "metadata": {}, + "source": [ + "
                  \n", + "3. Data Exploration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d35afaed-2485-4a3f-9784-8fe5fd6c08f4", + "metadata": {}, + "outputs": [], + "source": [ + "df = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Consumer_Complaints'))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "94557669-e96c-496a-a35c-d4b23ca93e15", + "metadata": {}, + "outputs": [], + "source": [ + "# df.columns" + ] + }, + { + "cell_type": "markdown", + "id": "9cb7fca4-365a-41b3-a2f8-feb472be6786", + "metadata": {}, + "source": [ + "
                  \n", + "

                  3.1 Graph for Count of Product Complaints Over Years

                  \n", + "\n", + "

                  The provided graph visualizes the count of complaints over the past few years, categorized by product names.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ab30b1b-13e4-4720-9df2-a2901cf92785", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "viz_df = df.assign(year = func.td_year_of_calendar(df.date_received.expression))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b692e80-8976-4240-9599-5aac385c94ef", + "metadata": {}, + "outputs": [], + "source": [ + "pd_df = viz_df.select(['product','year','complaint_id']).groupby(['product', 'year']).agg(['count']).to_pandas()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4161f6ec-4250-4c0c-823f-c832daea1192", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Sorting the DataFrame by year for each product\n", + "pd_df_sorted = pd_df.sort_values(by = ['product', 'year'])\n", + "\n", + "# Plotting using Plotly\n", + "fig = px.line(\n", + " pd_df_sorted,\n", + " x = 'year',\n", + " y = 'count_complaint_id',\n", + " color = 'product',\n", + " markers = True,\n", + " title = 'Count of Product Complaints Over Years'\n", + ")\n", + "\n", + "fig.update_layout(\n", + " xaxis_title = 'Year',\n", + " yaxis_title = 'Count',\n", + " legend_title = 'Product',\n", + " width = 1200,\n", + " height = 600\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "d595d54c-e906-4411-b744-3cc61585dd5e", + "metadata": {}, + "source": [ + "
                  \n", + "

                  3.2 Graph for Count of Complaints by Months

                  \n", + "

                  The provided graph visualizes the count of complaints by months. We can see that the mean count is above 500, and the July and August months have the maximum complaints count.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d99c576-05c8-470c-b7a5-afaa1c71086c", + "metadata": {}, + "outputs": [], + "source": [ + "df = df.assign(complaint_month = func.td_month_of_year(df.date_received.expression))\n", + "grp_gen = df.select(['complaint_month','complaint_id']).groupby(['complaint_month']).agg(['count']).to_pandas()\n", + "\n", + "# Define a reverse mapping dictionary\n", + "reverse_month_mapping = {1: 'January', 2: 'February', 3: 'March', 4: 'April', 5: 'May', 6: 'June',\n", + " 7: 'July', 8: 'August', 9: 'September', 10: 'October', 11: 'November', 12: 'December'}\n", + "\n", + "# Create a new column with month names based on reverse mapping\n", + "grp_gen['month'] = grp_gen['complaint_month'].map(reverse_month_mapping)\n", + "\n", + "\n", + "fig = px.bar(\n", + " grp_gen.sort_values(by = 'complaint_month'),\n", + " x = 'month', y = 'count_complaint_id',\n", + " labels = {\n", + " 'count_complaint_id': 'Number of Complaints',\n", + " 'month': 'Complaint Month'\n", + " },\n", + " title = 'Number of Complaints by Month'\n", + ")\n", + "\n", + "# Add hover information\n", + "fig.update_traces(hovertemplate = 'Month: %{x}
                  Number of Complaints: %{y:,}')\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "5c43d615-4487-4b26-9339-c5a26515e949", + "metadata": {}, + "source": [ + "
                  \n", + "\n", + "

                  3.3 Graph for Number of Complaints by Product

                  The graph displays the number of complaints received for different products. The data shows that the highest number of complaints are related to credit cards or prepaid cards, as well as credit reporting and credit repair services.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d1797ad2-aa25-4945-aa85-e30611ae96da", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "grp_gen = df.select(['product','complaint_id']).groupby(['product']).agg(['count']).to_pandas()\n", + "\n", + "fig = px.bar(\n", + " grp_gen,\n", + " x = 'product',\n", + " y = 'count_complaint_id',\n", + " labels = {\n", + " 'count_complaint_id': 'Number of Complaints',\n", + " 'product': 'Product'\n", + " },\n", + " title = 'Number of Complaints by Product'\n", + ")\n", + "\n", + "# Add hover information\n", + "fig.update_traces(hovertemplate = 'Product: %{x}
                  Number of Complaints: %{y:,}')\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2b83e400-c23c-438f-8e40-218e90e36e7f", + "metadata": {}, + "source": [ + "
                  \n", + "\n", + "

                  3.4 Graph for Number of Complaints by Issue

                  The graph displays the number of complaints received for different issues. The data shows that the highest number of complaints are related to issue of incorrect information on your report.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eab68ba2-e74b-418d-bee4-46750a90a444", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "grp_gen = df.select(['issue','complaint_id']).groupby(['issue']).agg(['count']).to_pandas()\n", + "\n", + "grp_gen = grp_gen.sort_values('count_complaint_id', ascending = False)[:10]\n", + "\n", + "fig = px.bar(\n", + " grp_gen,\n", + " x = 'issue',\n", + " y = 'count_complaint_id',\n", + " labels = {\n", + " 'count_complaint_id': 'Number of Complaints',\n", + " 'issue': 'Issue'\n", + " },\n", + " title = 'Number of Complaints by Issue(Top 10)'\n", + ")\n", + "\n", + "# Add hover information\n", + "fig.update_traces(hovertemplate = 'Issue: %{x}
                  Number of Complaints: %{y:,}')\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9655211c-4ba2-4a33-b3f2-777633fbc889", + "metadata": {}, + "source": [ + "
                  \n", + "\n", + "

                  3.5 Graph for Number of Complaints by Sub-Issue

                  \n", + "\n", + "

                  The graph displays the number of complaints received for different sub-issues. The data shows that the highest number of complaints are related to issue of information belongs to someone else.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5d456215-beee-4133-9ff5-70131299bdf0", + "metadata": {}, + "outputs": [], + "source": [ + "grp_gen = df.select(['sub_issue','complaint_id']).groupby(['sub_issue']).agg(['count']).to_pandas()\n", + "\n", + "grp_gen = grp_gen.sort_values('count_complaint_id', ascending = False)[:10]\n", + "\n", + "fig = px.bar(\n", + " grp_gen,\n", + " x = 'sub_issue',\n", + " y = 'count_complaint_id',\n", + " labels = {\n", + " 'count_complaint_id': 'Number of Complaints',\n", + " 'sub_issue': 'Sub-Issue'\n", + " },\n", + " title='Number of Complaints by Sub-Issue(Top 10)'\n", + ")\n", + "\n", + "# Add hover information\n", + "fig.update_traces(hovertemplate = 'Sub-Issue: %{x}
                  Number of Complaints: %{y:,}')\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "17a98018-d49e-49a3-9e98-b0e5cd2bdc86", + "metadata": {}, + "source": [ + "
                  \n", + "\n", + "

                  3.6 Graph for Number of Complaints by Channel

                  \n", + "\n", + "

                  The graph displays the number of complaints received for different issues. The data shows that the all the complaints are submitted by web channel.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c63e282f-4e6f-40d5-8edf-852c07b1cb4c", + "metadata": {}, + "outputs": [], + "source": [ + "grp_gen = df.select(['submitted_via','complaint_id']).groupby(['submitted_via']).agg(['count']).to_pandas()\n", + "\n", + "# Create a mapping of numbers to product names\n", + "product_mapping = {i: product for i, product in enumerate(grp_gen['submitted_via'])}\n", + "\n", + "# Replace product names with numbers in the DataFrame\n", + "grp_gen['product_num'] = grp_gen['submitted_via'].map(\n", + " {product: i for i, product in enumerate(grp_gen['submitted_via'])}\n", + ")\n", + "\n", + "fig = px.bar(\n", + " grp_gen,\n", + " x = 'submitted_via',\n", + " y = 'count_complaint_id',\n", + " labels = {\n", + " 'count_complaint_id': 'Number of Complaints',\n", + " 'submitted_via': 'Submitted Via'\n", + " },\n", + " title = 'Number of Complaints by Channel'\n", + ")\n", + "\n", + "# Add hover information\n", + "fig.update_traces(hovertemplate = 'Submitted Via: %{x}
                  Number of Complaints: %{y:,}')\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "03493ca9-52b0-4546-96d8-e4d26485f557", + "metadata": {}, + "source": [ + "
                  \n", + "4. Generating Embeddings\n", + "\n", + "

                  \n", + "

                  \n", + "
                  \n", + "

                  \n", + " The embeddings() function generates vector representations of text from a specified column, capturing the semantic meaning of each entry.\n", + "

                  \n", + "

                  \n", + " These embeddings can then be used for tasks such as semantic similarity, clustering, retrieval, or as input features for downstream machine learning models.\n", + "

                  \n", + "
                  \n", + "
                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21d9ffd0-1fdb-40e0-9091-df5666858ad6", + "metadata": {}, + "outputs": [], + "source": [ + "len(df.columns)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e7a1554-d70d-4b0d-8404-b4acec6a5adc", + "metadata": {}, + "outputs": [], + "source": [ + "# Instantiate the TeradataAI class with the Amazon Bedrock model.\n", + "llm_embedding = TeradataAI(api_type=\"aws\", \n", + " model_name=\"amazon.titan-embed-text-v2:0\",\n", + " access_key=access_key,\n", + " secret_key=secret_key,\n", + " region=\"us-west-2\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5d3c9f4a-6cf8-42ba-a329-87e297f47461", + "metadata": {}, + "outputs": [], + "source": [ + "# Instantiate the TextAnalyticsAI class with the embedding model.\n", + "obj_embeddings = TextAnalyticsAI(llm=llm_embedding)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cc30cc34-3739-486d-a91b-fa2f2c863331", + "metadata": {}, + "outputs": [], + "source": [ + "# Generate embeddings\n", + "tdf_embeddings = obj_embeddings.embeddings(column=\"consumer_complaint_narrative\",data=df.iloc[:10],accumulate=\"0:17\",output_format='VECTOR')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "abbb2073-7169-4d43-ae30-cd855a9842f1", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_embeddings.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7fc26c6b-9474-4876-8071-fc27f3d93402", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_embeddings.columns[:5] + tdf_embeddings.columns[6:-3] + [\"Message\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1e3d86d-9b64-4e5e-a3fe-0ca7dd5a35f4", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_embeddings = tdf_embeddings.drop(columns=tdf_embeddings.columns[:5] + tdf_embeddings.columns[6:-3] + [\"Message\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41bc94fd-f53a-4fd3-8027-9c7c746c0595", + "metadata": {}, + "outputs": [], + "source": [ + "copy_to_sql(df = tdf_embeddings, table_name = 'complaints_embeddings', if_exists = 'replace')" + ] + }, + { + "cell_type": "markdown", + "id": "1c4e6736-9070-4713-a010-bbb1a7a1f9d0", + "metadata": {}, + "source": [ + "
                  \n", + "5. Cluster the Complaints\n", + "\n", + "

                  For our complaint clustering task, we'll be using a sample of the data to cluster the complaints. This approach will allow us to effectively analyze and categorize the complaints without using the entire dataset.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8511855-c7f5-4783-8e5a-5a2b14b79b4f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "KMeans_Model = KMeans(\n", + " data = DataFrame('complaints_embeddings'),\n", + " id_column = \"complaint_id\",\n", + " target_columns = [\"Embedding\"],\n", + " output_cluster_assignment = True,\n", + " num_clusters = 5\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "63a55e88-1632-46eb-b86c-3ee3d16edd8a", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print(\"Data information: \\n\", KMeans_Model.model_data.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7d5a3743-a51a-46cd-888c-eecd85e3f3ab", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "KMeans_Model.result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7cd95cbc-7434-4248-84f8-20260410bd7e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "embeddings_cluster = DataFrame('complaints_embeddings').join(\n", + " other = KMeans_Model.result,\n", + " how = \"inner\",\n", + " on = \"complaint_id=complaint_id\",\n", + " lprefix = \"L_\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "94563f0e-6d80-4b73-ac5a-8aa787ddfc9b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# View complaints in cluster 1\n", + "embeddings_cluster[['td_clusterid_kmeans','complaint_id','consumer_complaint_narrative']] \\\n", + " .loc[embeddings_cluster.td_clusterid_kmeans == 1]" + ] + }, + { + "cell_type": "markdown", + "id": "f8f29be4-9609-4c96-a7ab-e2f4a60ec106", + "metadata": {}, + "source": [ + "
                  \n", + "6. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "9140d843-0e89-4f1a-9018-dbde4e41f389", + "metadata": {}, + "source": [ + "

                  Work Tables

                  \n", + "

                  Cleanup work tables to prevent errors next time.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9da55306-cfe5-4d34-978d-16c236504630", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tables = ['complaints_embeddings']\n", + "\n", + "# Loop through the list of tables and execute the drop table command for each table\n", + "for table in tables:\n", + " try:\n", + " db_drop_table(table_name=table)\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "id": "73a8ead8-0698-4449-bc11-5a07351e3fb6", + "metadata": {}, + "source": [ + "

                  Databases and Tables

                  \n", + "

                  The following code will clean up tables and databases created above.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e697922-8d63-4709-b7b3-0f705a938255", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "75e7e6a6-1413-4b18-b800-4899bcdfb31a", + "metadata": {}, + "source": [ + "
                  \n", + "Dataset:\n", + "
                  \n", + "
                  \n", + "

                  The dataset is sourced from Consumer Financial Protection Bureau

                  " + ] + }, + { + "cell_type": "markdown", + "id": "f9586bbb-6521-4988-bc00-40a67b08b695", + "metadata": {}, + "source": [ + "
                  \n", + "
                  ClearScape Analytics™
                  \n", + "
                  \n", + "
                  \n", + " Copyright © Teradata Corporation - 2024. All Rights Reserved\n", + "
                  \n", + "
                  \n", + "
                  " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Sentiment_Analysis.ipynb b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Sentiment_Analysis.ipynb new file mode 100644 index 00000000..a2573934 --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Sentiment_Analysis.ipynb @@ -0,0 +1,761 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5ce3dd8a-f3e5-40d5-ab4c-d40cc358cf7f", + "metadata": {}, + "source": [ + "
                  \n", + "

                  \n", + " Sentiment Analysis Using Vantage and LLM\n", + "
                  \n", + " \"Teradata\"\n", + "

                  \n", + "
                  " + ] + }, + { + "cell_type": "markdown", + "id": "57054f7e-befe-4029-b4ac-b82019240b1f", + "metadata": {}, + "source": [ + "

                  Introduction:

                  \n", + "\n", + "

                  Sentiment analysis using Teradata Vantage and the advanced AWS Bedrock - Anthropic's Claude LLM model model involves leveraging cutting-edge technologies to extract insights from unstructured data. This process empowers businesses to swiftly identify and address customer concerns, enhancing overall customer satisfaction and loyalty.

                  \n", + "\n", + "

                  Key Features:

                  \n", + "
                    \n", + "
                  • Polarity Classification: Identifies specific emotions such as happiness, anger, sadness, and more.
                  • \n", + "
                  • Emotion Detection: The system comprehends the nuances of customer feedback, capturing subtle differences in tone and language.
                  • \n", + "
                  • Aspect-Based Sentiment Analysis: Analyzes sentiment towards specific features or aspects of a product or service.
                  • \n", + "
                  • Fine-Grained Sentiment Analysis: Provides detailed sentiment analysis at the phrase or clause level.
                  • \n", + "
                  • Subjectivity Classification: Distinguishes between objective and subjective text.
                  • \n", + "\n", + "
                  \n", + "\n", + "\n", + "

                  Benefits:

                  \n", + "
                    \n", + "
                  • Improved Customer Satisfaction: Enhances customer experience by addressing concerns and improving products.
                  • \n", + "
                  • Competitive Advantage: Provides valuable insights to stay ahead of competitors.
                  • \n", + "
                  • Objective Insights: Offers unbiased and accurate sentiment analysis.
                  • \n", + "
                  • Real-Time Decision Making: Enables swift responses to customer concerns and market trends.
                  • \n", + "
                  • Scalability: Handles large volumes of data efficiently.
                  • \n", + "
                  \n", + "\n", + "

                  Experience the transformative power of Generative AI in complaints classification.

                  \n", + "\n", + "

                  Steps in the analysis:

                  \n", + "
                    \n", + "
                  1. Configuring the environment
                  2. \n", + "
                  3. Connect to Vantage
                  4. \n", + "
                  5. Configuring AWS Bedrock - Anthropic's Claude LLM model
                  6. \n", + "
                  7. Complaints Sentiment Analysis
                  8. \n", + "
                  9. Cleanup
                  10. \n", + "
                  " + ] + }, + { + "cell_type": "markdown", + "id": "aa85be84-f59c-4bd9-95b5-2d0f805d33d5", + "metadata": {}, + "source": [ + "
                  \n", + "1. Configuring the environment\n", + "
                  \n", + "

                  1.1 Downloading and installing additional software needed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "03eb4587-6632-4b08-a121-c98ea5a8fc48", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install -r requirements.txt --upgrade --quiet" + ] + }, + { + "cell_type": "markdown", + "id": "c1fd81e4-0df9-4360-b3ac-72214f135296", + "metadata": {}, + "source": [ + "

                  \n", + "

                  Note: Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: 0 0

                  \n", + "
                  " + ] + }, + { + "cell_type": "markdown", + "id": "035d9ed5-5fde-431e-bce4-f358f8223a1b", + "metadata": {}, + "source": [ + "
                  \n", + "

                  1.2 Import the required libraries

                  \n", + "

                  Here, we import the required libraries, set environment variables and environment paths (if required).

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70311ba4-fad6-42b7-b0bc-bf6f45b4d73b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Data manipulation and analysis\n", + "import numpy as np\n", + "import pandas as pd\n", + "import getpass\n", + "\n", + "# Visualization\n", + "import plotly.express as px\n", + "import matplotlib.pyplot as plt\n", + "from wordcloud import WordCloud\n", + "\n", + "# Progress bar\n", + "from tqdm import tqdm\n", + "\n", + "# Machine learning and other utilities from Teradata\n", + "from teradataml import *\n", + "from sqlalchemy import func\n", + "from teradatagenai import TeradataAI, TextAnalyticsAI, VSManager, VectorStore, VSApi\n", + "\n", + "# Requests\n", + "import requests\n", + "\n", + "# Display settings\n", + "display.max_rows = 5\n", + "pd.set_option('display.max_colwidth', None)\n", + "\n", + "# Set display options for dataframes, plots, and warnings\n", + "%matplotlib inline\n", + "warnings.filterwarnings('ignore')\n", + "display.suppress_vantage_runtime_warnings = True" + ] + }, + { + "cell_type": "markdown", + "id": "1763fd95-5000-4a2d-8b86-7be261e20847", + "metadata": {}, + "source": [ + "
                  \n", + "2. Connect to Vantage\n", + "

                  We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.

                  " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de5eba56-d38f-4204-b30d-232c7d894eb0", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=text_analytics_teradatagenai_aws_huggingface.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "markdown", + "id": "0ce81699-6345-4dbf-a3a8-b9402c7b6a98", + "metadata": {}, + "source": [ + "

                  Begin running steps with Shift + Enter keys.

                  " + ] + }, + { + "cell_type": "markdown", + "id": "c4aedced-59c4-4887-9ff1-25cfe56e0403", + "metadata": {}, + "source": [ + "
                  \n", + "

                  2. Set up the LLM connection

                  \n", + "\n", + "

                  The teradatagenai python library can both connect to cloud-based LLM services as well as instantiate private models running at scale on local GPU compute. In this case we will use anthropoc claude-instant-v1 for low-cost, high-throughput tasks.

                  \n", + "\n", + "
                    \n", + "
                  1. aws_access_key_id: Enter your AWS access key ID
                  2. \n", + "
                  3. aws_secret_access_key: Enter your AWS secret access key
                  4. \n", + "
                  5. region name: Enter the AWS region you want to configure (e.g., us-east-1)
                  6. \n", + "
                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c768569-e23c-4601-a673-85219da5a925", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "access_key = getpass.getpass('aws_access_key_id: ')\n", + "secret_key = getpass.getpass('aws_secret_access_key: ')\n", + "region_name = getpass.getpass('region name: ')" + ] + }, + { + "cell_type": "markdown", + "id": "5fed3ec9-5527-4a9c-af43-597532854827", + "metadata": {}, + "source": [ + "
                      \n", + "

                      3. Use the TextAnalyticsAI API to Perform Various Text Analytics Tasks

                      \n", + "

                      You can execute the help function at the bottom of this notebook to read more about this API.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a9ee8d70-231f-4fff-b416-6588acd6ac88", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Provide model details\n", + "model_name=\"anthropic.claude-v2\"\n", + "\n", + "# Select in-database or external model\n", + "llm = TeradataAI(api_type = 'AWS',\n", + " model_name = model_name,\n", + " region = region_name,\n", + " # authorization = 'Repositories.BedrockAuth'\n", + " access_key = access_key,\n", + " secret_key = secret_key)\n", + "\n", + "obj = TextAnalyticsAI(llm=llm)" + ] + }, + { + "cell_type": "markdown", + "id": "4332348c-f45e-4f03-9184-81746d39f566", + "metadata": {}, + "source": [ + "
                      \n", + "4. Complaints Sentiment Analysis\n", + "

                      We'll analyze the sentiments of a sample of customer complaints data.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff462b32-8742-4ac0-a40f-3e201120b3bc", + "metadata": {}, + "outputs": [], + "source": [ + "tdf = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Consumer_Complaints'))\n", + "tdf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d091617-4179-4953-ac50-7f9e966e4c0d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf_sentiment = obj.analyze_sentiment(column = 'consumer_complaint_narrative', \n", + " data = tdf)[['date_received','complaint_id','Sentiment','consumer_complaint_narrative', 'product']]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f09bdb52-f057-4cad-b82d-8b92c29ca10d", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_sentiment" + ] + }, + { + "cell_type": "markdown", + "id": "9162a5c4-a3ee-44c0-be09-5bfaa9b5d827", + "metadata": {}, + "source": [ + "

                      Now the results can be saved back to Vantage.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1626db2a-1159-4560-8ae3-704abcd9e081", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "copy_to_sql(df = tdf_sentiment, table_name = 'complaints_sentiment', if_exists = 'replace')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5245af62-abf4-4ed5-b4c2-7f06620587a8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "sentiment_df = DataFrame('complaints_sentiment')\n", + "sentiment_df = sentiment_df.assign(date_received = sentiment_df.date_received.cast(type_=DATE))\n", + "sentiment_df = sentiment_df.assign(Sentiment = sentiment_df.Sentiment.str.strip())\n", + "print('Before: ', sentiment_df.shape)\n", + "sentiment_df = sentiment_df.loc[sentiment_df.Sentiment.isin(['positive', 'negative', 'neutral'])]\n", + "print('After: ', sentiment_df.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "c29ca4fd-3282-4eac-a402-0c72542a5a30", + "metadata": {}, + "source": [ + "
                      \n", + "

                      4.1 Consumer Sentiments Prediction vs Occurrences

                      \n", + "\n", + "

                      A graph illustrating the relationship between consumer sentiments (positive, negative, neutral) prediction and the number of occurrences. This visual representation helps identify trends, patterns, and areas for improvement, enabling data-driven decision making.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31955976-23c1-4897-9140-77768fd4400c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from IPython.display import display, Markdown\n", + "def display_helper(msg):\n", + " return display(Markdown(\n", + " f\"\"\"
                      \n", + "

                      Note: \n", + "{msg}

                      \"\"\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "faa3be08-0b27-4748-a7d8-32023b183761", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from collections import Counter\n", + "data = Counter(sentiment_df[['Sentiment']].get_values().flatten())\n", + "\n", + "# Convert Counter data to DataFrame\n", + "df = pd.DataFrame.from_dict(data, orient='index', columns=['Count']).reset_index()\n", + "\n", + "# Rename columns\n", + "df.columns = ['Sentiment', 'Count']\n", + "\n", + "# Create bar graph using Plotly Express\n", + "fig = px.bar(df, x='Sentiment', y='Count', color='Sentiment',\n", + " labels={'Count': 'Number of Occurrences', 'Sentiment': 'Sentiment'})\n", + "\n", + "# Show the plot\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "1211a149-ec81-4cd3-8423-963952da4d01", + "metadata": {}, + "source": [ + "
                      \n", + "

                      4.2 Word Cloud for Negative Consumer Sentiment Prediction

                      \n", + "\n", + "

                      Unlock the power of customer feedback with our intuitive word cloud visualization, which provides a comprehensive snapshot of negative consumer complaints sentiment. This innovative tool highlights the most frequently occurring words and pain points in customer feedback, empowering businesses to:

                      1. Identify trends and sentiment patterns
                      2. Pinpoint areas for improvement
                      3. Make data-driven decisions to enhance customer satisfaction and loyalty

                      By leveraging this word cloud, businesses can proactively address customer concerns, refine their products and services, and ultimately drive growth through a deeper understanding of their customers' needs and preferences.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d7f3644c-c55d-4864-87ee-57403c86e9b7", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "neg = sentiment_df[sentiment_df['Sentiment'] == 'negative'].to_pandas()\n", + "neg_text = ' '.join(neg['consumer_complaint_narrative'])\n", + "\n", + "# Replace 'X' with blank space\n", + "modified_string = neg_text.replace('X', '')\n", + "\n", + "if len(modified_string) > 0:\n", + " wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)\n", + "\n", + " # Display the word cloud\n", + " plt.imshow(wordcloud, interpolation='bilinear')\n", + " plt.tight_layout()\n", + " plt.axis(\"off\")\n", + " plt.show()\n", + "else:\n", + " display_helper(\"We included positive, negative, and neutral categories to cover all bases. But in this sample, it's possible that none of the complaints are actually negative.\")" + ] + }, + { + "cell_type": "markdown", + "id": "a3c9017a-4470-4184-b7c3-334a2dd7ef4d", + "metadata": {}, + "source": [ + "
                      \n", + "

                      4.3 Word Cloud for Neutral Consumer Sentiment Prediction

                      \n", + "\n", + "

                      Tap into the insights of customer feedback with our intuitive word cloud visualization, which offers a detailed overview of neutral consumer complaints sentiment

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02aede14-b229-4835-a878-aed4bf1fef94", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "neu = sentiment_df[sentiment_df['Sentiment'] == 'neutral'].to_pandas()\n", + "neu_text = ' '.join(neu['consumer_complaint_narrative'])\n", + "\n", + "# Replace 'X' with blank space\n", + "modified_string = neu_text.replace('X', '')\n", + "\n", + "if len(modified_string) > 0:\n", + " wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)\n", + "\n", + " # Display the word cloud\n", + " plt.imshow(wordcloud, interpolation='bilinear')\n", + " plt.tight_layout()\n", + " plt.axis(\"off\")\n", + " plt.show()\n", + "else:\n", + " display_helper(\"To cover all possible scenarios, we included positive, negative, and neutral categories in our analysis. However, given that this dataset consists of complaints, it's expected that the model would rarely, if ever, encounter positive or neutral responses.\")" + ] + }, + { + "cell_type": "markdown", + "id": "d63d3d37-cf05-4497-9d74-a612762fe3e7", + "metadata": {}, + "source": [ + "
                      \n", + "

                      4.4 Word Cloud for Positive Consumer Sentiment Prediction

                      \n", + "\n", + "

                      Explore customer feedback insights with our intuitive word cloud visualization, providing a detailed overview of consumer sentiment.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f881a57-4421-436b-9d2b-1103ba6ad005", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "pos = sentiment_df[sentiment_df['Sentiment'] == 'positive'].to_pandas()\n", + "pos_text = ' '.join(pos['consumer_complaint_narrative'])\n", + "\n", + "# Replace 'X' with blank space\n", + "modified_string = pos_text.replace('X', '')\n", + "\n", + "if len(modified_string) > 0:\n", + " wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)\n", + "\n", + " # Display the word cloud\n", + " plt.imshow(wordcloud, interpolation='bilinear')\n", + " plt.tight_layout()\n", + " plt.axis(\"off\")\n", + " plt.show()\n", + "else:\n", + " display_helper(\"To cover all possible scenarios, we included positive, negative, and neutral categories in our analysis. However, given that this dataset consists of complaints, it's expected that the model would rarely, if ever, encounter positive or neutral responses.\")" + ] + }, + { + "cell_type": "markdown", + "id": "3f81682d-268c-498a-b7d6-fa3d57698fd8", + "metadata": {}, + "source": [ + "
                      \n", + "

                      4.5 Negative Sentiment per Product Over Years

                      \n", + "\n", + "

                      This graph tracks the negative sentiment associated with different products over time, offering valuable insights into evolving customer perceptions and pain points.

                      \n", + "\n", + "

                      We will use Vantage in-db function OrdinalEncodingFit which will identifies distinct categorical values from the input data or a user-defined list and generates the distinct categorical values along with the ordinal value for each category. 0:\n", + "\n", + " viz_senti = viz_neg.select(['product','Sentiment', 'year']).groupby(['product', 'year']).agg(['sum']).to_pandas()\n", + "\n", + " # Sorting the DataFrame by year for each product\n", + " pd_df_sorted = viz_senti.sort_values(by=['product', 'year'])\n", + "\n", + " # Plotting using Plotly\n", + " fig = px.line(pd_df_sorted, x='year', y='sum_Sentiment', color='product', markers=True, title='Negative Sentiment per Product Over Years')\n", + " fig.update_layout(xaxis_title='Year', yaxis_title='Count', legend_title='Product', width=1000, height=600)\n", + "\n", + " fig.show()\n", + "else:\n", + " display_helper(\"We included positive, negative, and neutral categories to cover all bases. But in this sample, it's possible that none of the complaints are actually negative.\")" + ] + }, + { + "cell_type": "markdown", + "id": "8fc49e14-3beb-4206-811e-558026246e50", + "metadata": {}, + "source": [ + "


                      \n", + "

                      4.6 Neutral Sentiment per Product Over Years

                      \n", + "\n", + "

                      This graph tracks the neutral sentiment associated with different products over time, offering valuable insights into evolving customer perceptions and pain points.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04da72eb-c4c4-4087-9a1f-f47f0dc31e23", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "viz_neu = result[result['Sentiment'] == 0]\n", + "\n", + "if viz_neu.shape[0] > 0:\n", + " viz_senti = viz_neu.select(['product','Sentiment', 'year']).groupby(['product', 'year']).agg(['sum']).to_pandas()\n", + "\n", + " # Sorting the DataFrame by year for each product\n", + " pd_df_sorted = viz_senti.sort_values(by=['product', 'year'])\n", + "\n", + " # Plotting using Plotly\n", + " fig = px.line(pd_df_sorted, x='year', y='sum_Sentiment', color='product', markers=True, title='Neutral Sentiment per Product Over Years')\n", + " fig.update_layout(xaxis_title='Year', yaxis_title='Count', legend_title='Product', width=1000, height=600)\n", + "\n", + " fig.show()\n", + "else:\n", + " display_helper(\"To cover all possible scenarios, we included positive, negative, and neutral categories in our analysis. However, given that this dataset consists of complaints, it's expected that the model would rarely, if ever, encounter positive or neutral responses.\")" + ] + }, + { + "cell_type": "markdown", + "id": "389e811a-737d-4953-965b-56b591409aad", + "metadata": {}, + "source": [ + "
                      \n", + "

                      4.7 Positive Sentiment per Product Over Years

                      \n", + "\n", + "

                      This graph tracks the positive sentiment associated with different products over time, offering valuable insights into evolving customer perceptions and pain points.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b218073f-c7b5-4971-ac96-5bd2217e80dd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "viz_pos = result[result['Sentiment'] == 1]\n", + "\n", + "if viz_pos.shape[0] > 0:\n", + " viz_senti = viz_pos.select(['product','Sentiment', 'year']).groupby(['product', 'year']).agg(['sum']).to_pandas()\n", + "\n", + " # Sorting the DataFrame by year for each product\n", + " pd_df_sorted = viz_senti.sort_values(by=['product', 'year'])\n", + "\n", + " # Plotting using Plotly\n", + " fig = px.line(pd_df_sorted, x='year', y='sum_Sentiment', color='product', markers=True, title='Positive Sentiment per Product Over Years')\n", + " fig.update_layout(xaxis_title='Year', yaxis_title='Count', legend_title='Product', width=1000, height=600)\n", + "\n", + " fig.show()\n", + "else:\n", + " display_helper(\"To cover all possible scenarios, we included positive, negative, and neutral categories in our analysis. However, given that this dataset consists of complaints, it's expected that the model would rarely, if ever, encounter positive or neutral responses.\")" + ] + }, + { + "cell_type": "markdown", + "id": "a44bc090-a387-41b6-90a5-638a16dc3d4f", + "metadata": {}, + "source": [ + "
                      \n", + "5. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "daefb5ac-30ad-4eec-aa1d-ea5d1f232626", + "metadata": {}, + "source": [ + "

                      Work Tables

                      \n", + "

                      Cleanup work tables to prevent errors next time.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bc7d7637-276b-4c0e-a79c-82aaa33e9a51", + "metadata": {}, + "outputs": [], + "source": [ + "tables = ['complaints_sentiment']\n", + "\n", + "# Loop through the list of tables and execute the drop table command for each table\n", + "for table in tables:\n", + " try:\n", + " db_drop_table(table_name=table)\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "id": "3b6ba28d-07d2-4322-9f59-e81855e1389c", + "metadata": {}, + "source": [ + "

                      Databases and Tables

                      \n", + "

                      The following code will clean up tables and databases created above.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b7e82715-0855-4d59-990c-481d3a9d3f1b", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "ebbd137a-7bfc-49fa-a18e-ca34ba68919d", + "metadata": {}, + "source": [ + "
                      \n", + "Dataset:\n", + "
                      \n", + "
                      \n", + "

                      The dataset is sourced from Consumer Financial Protection Bureau

                      " + ] + }, + { + "cell_type": "markdown", + "id": "19bdc0df-dc13-4a13-bf73-c7039a54c3ba", + "metadata": {}, + "source": [ + "
                      \n", + "
                      ClearScape Analytics™
                      \n", + "
                      \n", + "
                      \n", + " Copyright © Teradata Corporation - 2024. All Rights Reserved\n", + "
                      \n", + "
                      \n", + "
                      " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Topic_Modelling.ipynb b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Topic_Modelling.ipynb new file mode 100644 index 00000000..e58410f2 --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/Topic_Modelling.ipynb @@ -0,0 +1,507 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "877a4d39-8348-4330-9121-ec41f89e0b66", + "metadata": {}, + "source": [ + "
                      \n", + "

                      \n", + " Topic Modelling using Vantage and LLM\n", + "
                      \n", + " \"Teradata\"\n", + "

                      \n", + "
                      " + ] + }, + { + "cell_type": "markdown", + "id": "a1cca519-4be2-49f4-85f6-c81f26a6c4fe", + "metadata": {}, + "source": [ + "

                      Introduction:

                      \n", + "\n", + "

                      In this comprehensive user demo, we will delve into the world of topic modeling using Teradata Vantage and AWS Bedrock - Anthropic's Claude LLM model. This cutting-edge technology empowers businesses to uncover hidden insights from vast amounts of consumer complaints data, enabling them to identify trends, improve customer satisfaction, and enhance their overall brand reputation.

                      \n", + "\n", + "

                      Key Features:

                      \n", + "\n", + "
                        \n", + "
                      1. Scalable Data Ingestion: Seamlessly integrate and process large volumes of consumer complaints data from various sources, including OpenAI, into Teradata Vantage.
                      2. \n", + "
                      3. Advanced Topic Modelling: Utilize state-of-the-art topic modeling algorithms to identify and categorize underlying themes and sentiments within the complaints data, providing actionable insights.
                      4. \n", + "
                      5. Real-time Analytics: Leverage Teradata Vantage's real-time analytics capabilities to monitor and respond to emerging trends and issues in consumer complaints.
                      6. \n", + "
                      7. Customizable Dashboards: Create tailored dashboards to visualize and track key performance indicators (KPIs) and metrics specific to your business needs.
                      8. \n", + "
                      9. Integration with AWS Bedrock - Anthropic's Claude LLM model: Seamlessly integrate with AWS Bedrock - Anthropic's Claude LLM model to collect and analyze consumer complaints data from these platforms.
                      \n", + "\n", + "

                      Benefits:

                      \n", + "\n", + "
                        \n", + "
                      1. Enhanced Customer Insights: Gain a deeper understanding of customer concerns and preferences, enabling data-driven decision-making.
                      2. \n", + "
                      3. Improved Customer Satisfaction: Identify and address recurring issues, leading to increased customer satisfaction and loyalty.
                      4. \n", + "
                      5. Competitive Advantage: Stay ahead of the competition by proactively addressing consumer complaints and improving brand reputation.
                      6. \n", + "
                      7. Cost Savings: Reduce the financial burden of handling and resolving consumer complaints by identifying and addressing root causes.
                      8. \n", + "
                      9. Data-Driven Decision-Making: Make informed business decisions based on actionable insights derived from topic modeling and real-time analytics.
                      \n", + "\n", + "

                      Steps in the analysis:

                      \n", + "
                        \n", + "
                      1. Configuring the environment
                      2. \n", + "
                      3. Connect to Vantage
                      4. \n", + "
                      5. Configuring AWS Bedrock - Anthropic's Claude LLM model
                      6. \n", + "
                      7. Exploring the data
                      8. \n", + "
                      9. Topic Modelling
                      10. \n", + "
                      11. Cleanup
                      12. \n", + "
                      " + ] + }, + { + "cell_type": "markdown", + "id": "09ec9238-d615-49de-adbe-04165d332bf7", + "metadata": {}, + "source": [ + "
                      \n", + "1. Configuring the environment\n", + "
                      \n", + "

                      1.1 Downloading and installing additional software needed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef4ea764-3ec3-43bf-ae6a-528ab9fd5883", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install -r requirements.txt --upgrade --quiet" + ] + }, + { + "cell_type": "markdown", + "id": "d700c07d-c611-455f-9335-35312dbc12a2", + "metadata": {}, + "source": [ + "

                      \n", + "

                      Note: Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: 0 0

                      \n", + "
                      " + ] + }, + { + "cell_type": "markdown", + "id": "7b7e2b4e-39a6-4e0b-a4c7-8dbfa7d02e7b", + "metadata": {}, + "source": [ + "
                      \n", + "

                      1.2 Import the required libraries

                      \n", + "

                      Here, we import the required libraries, set environment variables and environment paths (if required).

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0d327004-e3cd-4c48-90d2-e3b1c4602e24", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Data manipulation and analysis\n", + "import numpy as np\n", + "import pandas as pd\n", + "import json, warnings\n", + "import getpass\n", + "\n", + "# Visualization\n", + "import plotly.express as px\n", + "\n", + "# Progress bar\n", + "from tqdm import tqdm\n", + "\n", + "# Machine learning and other utilities from Teradata\n", + "from teradataml import *\n", + "from teradatagenai import TeradataAI, TextAnalyticsAI, VSManager, VectorStore, VSApi\n", + "\n", + "# Requests\n", + "import requests\n", + "\n", + "# Display settings\n", + "display.max_rows = 5\n", + "pd.set_option('display.max_colwidth', None)\n", + "\n", + "# Set display options for dataframes, plots, and warnings\n", + "%matplotlib inline\n", + "warnings.filterwarnings('ignore')\n", + "display.suppress_vantage_runtime_warnings = True" + ] + }, + { + "cell_type": "markdown", + "id": "3cc65823-8d76-4ce4-90c0-7005679e8dcc", + "metadata": {}, + "source": [ + "
                      \n", + "2. Connect to Vantage\n", + "

                      We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.

                      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bec3d84c-b60d-4bd8-9d85-9a905d1c01fa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=text_analytics_teradatagenai_aws_huggingface.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "markdown", + "id": "cdd806c9-dcd4-4b6a-ae42-854b9cc74187", + "metadata": {}, + "source": [ + "

                      Begin running steps with Shift + Enter keys.

                      " + ] + }, + { + "cell_type": "markdown", + "id": "7e0fc107-abf5-4d22-8ed5-6113299808da", + "metadata": {}, + "source": [ + "
                      \n", + "

                      2. Set up the LLM connection

                      \n", + "\n", + "

                      The teradatagenai python library can both connect to cloud-based LLM services as well as instantiate private models running at scale on local GPU compute. In this case we will use anthropoc claude-instant-v1 for low-cost, high-throughput tasks.

                      \n", + "\n", + "
                        \n", + "
                      1. aws_access_key_id: Enter your AWS access key ID
                      2. \n", + "
                      3. aws_secret_access_key: Enter your AWS secret access key
                      4. \n", + "
                      5. region name: Enter the AWS region you want to configure (e.g., us-east-1)
                      6. \n", + "
                          " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da08bf82-3b16-4b00-981a-99877f12ca41", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "access_key = getpass.getpass('aws_access_key_id: ')\n", + "secret_key = getpass.getpass('aws_secret_access_key: ')\n", + "region_name = getpass.getpass('region name: ')" + ] + }, + { + "cell_type": "markdown", + "id": "66ebf0a6-9b58-418e-bf4d-9dbf3ec365a8", + "metadata": {}, + "source": [ + "
                          \n", + "

                          3. Use the TextAnalyticsAI API to Perform Various Text Analytics Tasks

                          \n", + "

                          You can execute the help function at the bottom of this notebook to read more about this API.

                          " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "67b80af4-55d3-48e5-95d5-b30224ad484d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Provide model details\n", + "model_name=\"anthropic.claude-v2\"\n", + "\n", + "# Select in-database or external model\n", + "llm = TeradataAI(api_type = 'AWS',\n", + " model_name = model_name,\n", + " region = region_name,\n", + " # authorization = 'Repositories.BedrockAuth'\n", + " access_key = access_key,\n", + " secret_key = secret_key)\n", + "\n", + "obj = TextAnalyticsAI(llm=llm)" + ] + }, + { + "cell_type": "markdown", + "id": "a9cabc44-d40d-4f27-ab0a-cfc09315e179", + "metadata": {}, + "source": [ + "
                          \n", + "4. Exploring the data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e4076638-4401-4239-a379-2d7df1b209bd", + "metadata": {}, + "outputs": [], + "source": [ + "df = DataFrame('\"DEMO_ComplaintAnalysis\".\"Consumer_Complaints\"')\n", + "df" + ] + }, + { + "cell_type": "markdown", + "id": "aa2e019c-086c-4574-bda8-27494c64c090", + "metadata": {}, + "source": [ + "

                          Here we subset the data to get only the complaints related to Mortgage. We further analyze the issues of those complaints and pick the top 5 topics.

                          " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c11e365c-0c34-447b-9889-8c7681790cce", + "metadata": {}, + "outputs": [], + "source": [ + "df = df[df.product == 'Mortgage']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7fd59f8a-349e-4f4f-b689-534b6cb9a97f", + "metadata": {}, + "outputs": [], + "source": [ + "df.select(['issue', 'sub_issue', 'complaint_id']).groupby(['issue', 'sub_issue']).agg(['count']).sort('count_complaint_id', ascending = False)" + ] + }, + { + "cell_type": "markdown", + "id": "deeba736-584f-4fff-8439-fe934764722e", + "metadata": {}, + "source": [ + "

                          According to the result above, we can classify the issues into the following topics:

                          \n", + "\n", + "
                            \n", + "
                          • Mortgage Application: Applying or refinancing
                          • \n", + "
                          • Payment Trouble: Issues during payment
                          • \n", + "
                          • Mortgage Closing: Finalizing the mortgage
                          • \n", + "
                          • Report Inaccuracy: Incorrect information
                          • \n", + "
                          • Payment Struggle: Difficulty paying
                          • \n", + "
                              " + ] + }, + { + "cell_type": "markdown", + "id": "bdd3a9fe-e8db-463f-b647-28864a6cbc23", + "metadata": {}, + "source": [ + "
                              \n", + "5. Topic Modelling\n", + "\n", + "

                              Topic modeling using Large Language Models (LLMs) revolutionizes the way we understand and categorize vast collections of text data. LLMs excel in understanding the semantics and context of words, enabling sophisticated topic modeling techniques.

                              \n", + "\n", + "

                              Traditionally, topic modeling algorithms like Latent Dirichlet Allocation (LDA) rely on statistical patterns within documents to identify topics. However, LLMs offer a more nuanced approach. By leveraging their deep understanding of language, LLMs can extract complex themes and topics from unstructured text data with higher accuracy and flexibility.

                              \n", + "\n", + "

                              LLMs can generate coherent topics without needing predefined categories, making them ideal for exploratory analysis of diverse datasets. Moreover, their ability to capture subtle nuances in language allows for more precise topic identification, even in noisy or ambiguous texts.

                              \n", + "\n", + "

                              Reasoning with a Chain of Thought: Imagine you're trying to solve a problem. With a large language model, you start with an initial idea or question. Then, you use the model's capabilities to explore related concepts, gradually connecting them together. Each step builds upon the previous one, leading you closer to understanding or solving the problem. It's like putting together puzzle pieces, one by one, until you see the whole picture.

                              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf854c5a-5b50-4950-b146-bcd137abfa75", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "tdf_topics = obj.classify(column = 'consumer_complaint_narrative', \n", + " data = df,\n", + " labels = ['Mortgage Application',\n", + " 'Payment Trouble',\n", + " 'Mortgage Closing',\n", + " 'Report Inaccuracy',\n", + " 'Payment Struggle'])[['complaint_id','Labels','consumer_complaint_narrative']]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ae39c62-a536-40ed-aafd-59e2cda7dc41", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_topics" + ] + }, + { + "cell_type": "markdown", + "id": "00cb0cb4-572d-49c4-9068-c4dcfc326c7e", + "metadata": {}, + "source": [ + "

                              Now the results can be saved back to Vantage.

                              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a394e4d4-c28a-411c-8b03-cc1cc28b6814", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "copy_to_sql(df = tdf_topics, table_name = 'topic_prediction', if_exists = 'replace')" + ] + }, + { + "cell_type": "markdown", + "id": "ea01a05e-2116-4c7c-94db-ac369a14231f", + "metadata": {}, + "source": [ + "
                              \n", + "

                              5.1 Number of Complaints by Predicted Topic

                              \n", + "\n", + "

                              A graph illustrating the Number of Complaints by Predicted Topic reveals that the majority of complaints are centered around Mortgage Application, while the fewest are related to Mortgage Closing.

                              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "67c13569-4302-400d-965f-288c1e42dcf4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "grp_gen = DataFrame('topic_prediction').select(['Labels','complaint_id']).groupby(['Labels']).agg(['count']).to_pandas()\n", + "\n", + "grp_gen = grp_gen.sort_values('count_complaint_id', ascending = False)[:10]\n", + "\n", + "fig = px.bar(grp_gen, x='Labels', y='count_complaint_id',\n", + " labels={'count_complaint_id': 'Number of Complaints', 'Labels': 'Labels'},\n", + " title='Number of Complaints by Predicted Topic')\n", + "\n", + "# Add hover information\n", + "fig.update_traces(hovertemplate='Issue: %{x}
                              Number of Complaints: %{y:,}')\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a48cb9e9-e322-4868-81f6-719a48ebb706", + "metadata": {}, + "source": [ + "
                              \n", + "6. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "ea475160-bd2f-4b42-bc2b-424aa75eb269", + "metadata": {}, + "source": [ + "

                              Work Tables

                              \n", + "

                              Cleanup work tables to prevent errors next time.

                              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f132db5b-4889-4089-9bd3-ef737aa51d9a", + "metadata": {}, + "outputs": [], + "source": [ + "tables = ['topic_prediction']\n", + "\n", + "# Loop through the list of tables and execute the drop table command for each table\n", + "for table in tables:\n", + " try:\n", + " db_drop_table(table_name=table)\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "id": "07a477ec-ea8e-4394-afcb-3fce4f3c0d19", + "metadata": {}, + "source": [ + "

                              Databases and Tables

                              \n", + "

                              The following code will clean up tables and databases created above.

                              " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef7255f3-2d27-4b26-b391-149e79ee0d50", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "b6cbe888-1ced-464e-b5e4-05d6af91dd5e", + "metadata": {}, + "source": [ + "
                              \n", + "Dataset:\n", + "
                              \n", + "
                              \n", + "

                              The dataset is sourced from Consumer Financial Protection Bureau

                              " + ] + }, + { + "cell_type": "markdown", + "id": "6c5372e5-c7c9-4e9e-908a-6fba19b4e022", + "metadata": {}, + "source": [ + "
                              \n", + "
                              ClearScape Analytics™
                              \n", + "
                              \n", + "
                              \n", + " Copyright © Teradata Corporation - 2024. All Rights Reserved\n", + "
                              \n", + "
                              \n", + "
                              " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/requirements.txt b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/requirements.txt new file mode 100644 index 00000000..7a3fb086 --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/requirements.txt @@ -0,0 +1,4 @@ +teradataml==20.0.0.5 +teradatagenai>=20.0.0.1 +wordcloud +Pillow \ No newline at end of file From 915dde8332fd22f7101c9474ca9338e121cdeb45 Mon Sep 17 00:00:00 2001 From: chetan-hirapara Date: Tue, 19 Aug 2025 12:25:49 +0000 Subject: [PATCH 2/2] added yaml files for playwrite --- .../.Complaint_Analysis_Customer360.yaml | 10 ++++++++++ .../.Complaint_Summarization.yaml | 10 ++++++++++ .../.Complaints_Classification.yaml | 10 ++++++++++ .../.Complaints_Clustering.yaml | 10 ++++++++++ .../.Sentiment_Analysis.yaml | 10 ++++++++++ .../Customer_Complaints_Analyzer/.Topic_Modelling.yaml | 10 ++++++++++ 6 files changed, 60 insertions(+) create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaint_Analysis_Customer360.yaml create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaint_Summarization.yaml create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaints_Classification.yaml create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaints_Clustering.yaml create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Sentiment_Analysis.yaml create mode 100644 VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Topic_Modelling.yaml diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaint_Analysis_Customer360.yaml b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaint_Analysis_Customer360.yaml new file mode 100644 index 00000000..57799a8d --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaint_Analysis_Customer360.yaml @@ -0,0 +1,10 @@ +inputs: + - type: env + value: 'AWS_ACCESS_KEY_ID' + cell: 12 + - type: env + value: 'AWS_SECRET_ACCESS_KEY' + cell: 12 + - type: env + value: 'AWS_DEFAULT_REGION' + cell: 12 diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaint_Summarization.yaml b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaint_Summarization.yaml new file mode 100644 index 00000000..57799a8d --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaint_Summarization.yaml @@ -0,0 +1,10 @@ +inputs: + - type: env + value: 'AWS_ACCESS_KEY_ID' + cell: 12 + - type: env + value: 'AWS_SECRET_ACCESS_KEY' + cell: 12 + - type: env + value: 'AWS_DEFAULT_REGION' + cell: 12 diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaints_Classification.yaml b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaints_Classification.yaml new file mode 100644 index 00000000..57799a8d --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaints_Classification.yaml @@ -0,0 +1,10 @@ +inputs: + - type: env + value: 'AWS_ACCESS_KEY_ID' + cell: 12 + - type: env + value: 'AWS_SECRET_ACCESS_KEY' + cell: 12 + - type: env + value: 'AWS_DEFAULT_REGION' + cell: 12 diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaints_Clustering.yaml b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaints_Clustering.yaml new file mode 100644 index 00000000..57799a8d --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Complaints_Clustering.yaml @@ -0,0 +1,10 @@ +inputs: + - type: env + value: 'AWS_ACCESS_KEY_ID' + cell: 12 + - type: env + value: 'AWS_SECRET_ACCESS_KEY' + cell: 12 + - type: env + value: 'AWS_DEFAULT_REGION' + cell: 12 diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Sentiment_Analysis.yaml b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Sentiment_Analysis.yaml new file mode 100644 index 00000000..57799a8d --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Sentiment_Analysis.yaml @@ -0,0 +1,10 @@ +inputs: + - type: env + value: 'AWS_ACCESS_KEY_ID' + cell: 12 + - type: env + value: 'AWS_SECRET_ACCESS_KEY' + cell: 12 + - type: env + value: 'AWS_DEFAULT_REGION' + cell: 12 diff --git a/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Topic_Modelling.yaml b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Topic_Modelling.yaml new file mode 100644 index 00000000..57799a8d --- /dev/null +++ b/VantageCloud_Lake/UseCases/Customer_Complaints_Analyzer/.Topic_Modelling.yaml @@ -0,0 +1,10 @@ +inputs: + - type: env + value: 'AWS_ACCESS_KEY_ID' + cell: 12 + - type: env + value: 'AWS_SECRET_ACCESS_KEY' + cell: 12 + - type: env + value: 'AWS_DEFAULT_REGION' + cell: 12