diff --git a/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/Chatbot_Teradata_Vector_Store.ipynb b/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/Chatbot_Teradata_Vector_Store.ipynb new file mode 100644 index 00000000..c8d11127 --- /dev/null +++ b/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/Chatbot_Teradata_Vector_Store.ipynb @@ -0,0 +1,696 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "59308445-4a7a-46d1-aa75-7fb1d43e2852", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " ChatBot using Teradata's Enterprise Vector Store\n", + "
\n", + " \"Teradata\"\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "9ef48202-fdc2-4d31-9395-ab62354c0143", + "metadata": {}, + "source": [ + "

Introduction:

\n", + "\n", + "

In today’s information-driven world, organizations need smarter, faster, and more efficient ways to interact with vast amounts of data. Traditional search systems, which rely solely on keywords, often fail to provide the most relevant results, especially when dealing with complex or unstructured data. Teradata's Enterprise Vector Store takes a revolutionary approach to this problem by enabling semantic search — the ability to search based on meaning rather than keywords.\n", + "

\n", + "\n", + "

\n", + "In this demo, we showcase the power of Teradata's Enterprise Vector Store combined with conversational AI, demonstrating how the system can intelligently retrieve and present relevant data, all within a chat interface.

\n", + "\n", + "\n", + "
\"mortgage
\n", + "\n", + "
\n", + "

Architecture Overview

\n", + "\n", + " \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "e1d43659-3200-4dab-9478-31577b2feebc", + "metadata": {}, + "source": [ + "

Key Components

\n", + "\n", + " \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "89ba1249-362b-40ab-8921-05f5bb795b22", + "metadata": {}, + "source": [ + "
\n", + "1. Configuring the environment\n", + "
\n", + "

Note:The installation of the required libraries will take approximately 4 to 5 minutes for the first-time installation. However, if the libraries are already installed, the execution will complete within 5 seconds.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60640844-b70a-4c11-922b-cc9e02ae592d", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install -r requirements.txt" + ] + }, + { + "cell_type": "markdown", + "id": "6bcf38ca-4b14-4e1e-8fcc-6462e89d4d47", + "metadata": {}, + "source": [ + "

\n", + " The above statements will install the required libraries to run this demo. To gain access to installed libraries after running this, restart the kernel.

" + ] + }, + { + "cell_type": "markdown", + "id": "579119b0-c920-4c1b-bff3-6c14a9c4cedf", + "metadata": {}, + "source": [ + "
\n", + "

Note: The above statements may need to be uncommented if you run the notebooks on a platform other than ClearScape Analytics Experience that does not have the libraries installed. If you uncomment those installs, be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: 0 0

" + ] + }, + { + "cell_type": "markdown", + "id": "fec1ac53-54e6-4a2a-8b2d-5d4dfb5a5dca", + "metadata": {}, + "source": [ + "
\n", + "

Note: To ensure that the Chatbot interface reflects the latest changes, please reload the page by clicking the 'Reload' button or pressing F5 on your keyboard for first-time only This will update the notebook with the latest modifications, and you'll be able to interact with the Chatbot using the new libraries.

" + ] + }, + { + "cell_type": "markdown", + "id": "fd594369-361b-4abb-b99c-07eb266535c2", + "metadata": {}, + "source": [ + "
\n", + "

1.1 Import the required libraries

\n", + "\n", + "

Here, we import the required libraries, set environment variables and environment paths (if required).

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "743580c4-a481-49b1-8662-eb605f68ea33", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import glob\n", + "from dotenv import load_dotenv\n", + "import panel as pn\n", + "from teradataml import *\n", + "from teradatagenai import VSManager, VectorStore\n", + "from teradataml import create_context, set_auth_token\n", + "import logging\n", + "import time" + ] + }, + { + "cell_type": "markdown", + "id": "6c129348-e588-4b09-8daa-b361e3e51912", + "metadata": {}, + "source": [ + "
\n", + "2. Connect to VantageCloud Lake\n", + "

Connect to VantageCloud using create_context from the teradataml Python library. If this environment has been prepared for connecting to a VantageCloud Lake OAF Container, all the details required will be loaded and you will see an acknowledgement after executing this cell.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "196569ab-be81-4ef8-9b43-14a560e75f3a", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=Chatbot_Teradata_Vector_Store.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e4f550a3-3f74-4175-a772-97718618269b", + "metadata": {}, + "outputs": [], + "source": [ + "# We've already loaded all the values into our environment variables and into a dictionary, env_vars.\n", + "\n", + "if set_auth_token(base_url=env_vars.get(\"ues_uri\"),\n", + " pat_token=env_vars.get(\"access_token\"), \n", + " pem_file=env_vars.get(\"pem_file\"),\n", + " valid_from=int(time.time())\n", + " ):\n", + " print(\"UES Authentication successful\")\n", + "else:\n", + " print(\"UES Authentication failed. Check credentials.\")\n", + " sys.exit(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3b8b5b67-82cd-4158-851a-a14cb17d9387", + "metadata": {}, + "outputs": [], + "source": [ + "VSManager.health()" + ] + }, + { + "cell_type": "markdown", + "id": "aeb01b06-6ed5-4e05-92f8-f40d5a2825d7", + "metadata": {}, + "source": [ + "
\n", + "3. Initializing the Vector Store\n", + "

Here, we initialize the Vector Store, which will store the document embeddings. This vector store will be used to index and search the uploaded documents efficiently..

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "468e0aac-47d4-472e-9894-cbc009f66638", + "metadata": {}, + "outputs": [], + "source": [ + "# Create the vector store\n", + "document_vector_store = VectorStore(\"testing\")" + ] + }, + { + "cell_type": "markdown", + "id": "a0fc4794-3dfa-427e-85da-dcbdb2db3bc5", + "metadata": {}, + "source": [ + "
\n", + "

3.1 File Upload Setup

\n", + "\n", + "

We initialize the Panel extension to create a user interface that allows document uploads. The panel interface enables users to select and upload documents.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8180e570-e239-4906-8911-26cfcf2abdbe", + "metadata": {}, + "outputs": [], + "source": [ + "# File upload functionality using Panel\n", + "pn.extension()" + ] + }, + { + "cell_type": "markdown", + "id": "be834101-4a2f-4615-8c4c-63613c1e5c27", + "metadata": {}, + "source": [ + "
\n", + "

3.2 File Upload Handling

\n", + "\n", + "

This function saves the uploaded file into a local directory called data. It checks for supported file types and ensures that the file is saved correctly.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6d22df3c-f0be-45e9-be9c-6bbe6a840425", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# Full path to your desired project folder\n", + "PROJECT_DIR = \"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store\"\n", + "\n", + "# Ensure current working directory is valid and set to the project folder\n", + "try:\n", + " _ = os.getcwd()\n", + "except FileNotFoundError:\n", + " os.chdir(PROJECT_DIR)\n", + "else:\n", + " if os.getcwd() != PROJECT_DIR:\n", + " os.chdir(PROJECT_DIR)\n", + "\n", + "print(\"Current Working Directory:\", os.getcwd())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6dd06e5d-3547-485d-bc77-dc6082449a52", + "metadata": {}, + "outputs": [], + "source": [ + "# Global variable to track upload status\n", + "upload_completed = False" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5eb55627-8d66-4e0e-b27d-cf1d0f5f0759", + "metadata": {}, + "outputs": [], + "source": [ + "def save_uploaded_file(file_content, filename):\n", + " \"\"\"Save uploaded PDF file to data folder inside PROJECT_DIR\"\"\"\n", + " global upload_completed\n", + "\n", + " if file_content is None or filename is None:\n", + " return \"No file selected\"\n", + "\n", + " # Create data folder inside PROJECT_DIR if it doesn't exist\n", + " data_folder = os.path.join(PROJECT_DIR, \"data\")\n", + " if not os.path.exists(data_folder):\n", + " os.makedirs(data_folder)\n", + "\n", + " # Only support PDFs now\n", + " supported_extensions = ['.pdf']\n", + " file_extension = os.path.splitext(filename.lower())[1]\n", + " if file_extension not in supported_extensions:\n", + " return f\"Error: {filename} is not a supported document type. Only PDFs are allowed.\"\n", + "\n", + " # Save file to data folder\n", + " file_path = os.path.join(data_folder, filename)\n", + " try:\n", + " with open(file_path, 'wb') as f:\n", + " f.write(file_content)\n", + " upload_completed = True\n", + " return f\"Successfully uploaded: {filename} to {data_folder}\"\n", + " except Exception as e:\n", + " return f\"Error saving file: {str(e)}\"\n" + ] + }, + { + "cell_type": "markdown", + "id": "e714226a-7514-4853-a36a-85bb7832a105", + "metadata": {}, + "source": [ + "
\n", + "4. File Input Widget\n", + "

We create a File Input widget using Panel, allowing users to select multiple document files for upload. Supported file types include PDF.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f6a52d9-093f-4b47-8dc8-f99afeed50c2", + "metadata": {}, + "outputs": [], + "source": [ + "# Create file input widget\n", + "file_input = pn.widgets.FileInput(\n", + " accept='.pdf',\n", + " multiple=True,\n", + " sizing_mode='stretch_width'\n", + ")\n", + "\n", + "# Create upload button and status\n", + "upload_button = pn.widgets.Button(name=\"Upload Documents\", button_type=\"primary\")\n", + "status_pane = pn.pane.HTML(\"

Please select document files to upload.

\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e31f7c93-7dcc-4ba5-b583-d779bf261814", + "metadata": {}, + "outputs": [], + "source": [ + "def handle_upload(event):\n", + " \"\"\"Handle the upload button click\"\"\"\n", + " global upload_completed\n", + " \n", + " if file_input.value is None:\n", + " status_pane.object = \"

Please select a file first.

\"\n", + " return\n", + " \n", + " if isinstance(file_input.value, list):\n", + " # Multiple files\n", + " results = []\n", + " filenames = file_input.filename if isinstance(file_input.filename, list) else [file_input.filename]\n", + " for i, file_content in enumerate(file_input.value):\n", + " filename = filenames[i] if i < len(filenames) else f\"file_{i}\"\n", + " result = save_uploaded_file(file_content, filename)\n", + " results.append(result)\n", + " status_messages = \"
\".join([f\"

{result}

\" for result in results])\n", + " status_pane.object = status_messages\n", + " else:\n", + " # Single file\n", + " result = save_uploaded_file(file_input.value, file_input.filename)\n", + " if \"Successfully\" in result:\n", + " status_pane.object = f\"

{result}

\"\n", + " else:\n", + " status_pane.object = f\"

{result}

\"" + ] + }, + { + "cell_type": "markdown", + "id": "d1d731bb-33b4-4f01-b5df-5a470c21f837", + "metadata": {}, + "source": [ + "
\n", + "

4.1 Upload Button and Status Display

\n", + "\n", + "

This section sets up a button to trigger the file upload process and a status pane that shows the upload progress or completion messages.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "255477f9-03dc-4714-addb-39ef61099c5b", + "metadata": {}, + "outputs": [], + "source": [ + "# Bind the upload function to the button\n", + "upload_button.on_click(handle_upload)\n", + "\n", + "# Create upload interface\n", + "upload_interface = pn.Column(\n", + " pn.pane.HTML(\"

📁 Upload Document Files

\"),\n", + " pn.pane.HTML(\"

Supported formats: PDF

\"),\n", + " pn.pane.HTML(\"

Select one or more PDF files to upload:

\"),\n", + " file_input,\n", + " pn.Spacer(height=10),\n", + " upload_button,\n", + " pn.Spacer(height=10),\n", + " status_pane,\n", + " width=600,\n", + " margin=(10, 10)\n", + ")\n", + "\n", + "# Display the upload interface in the notebook\n", + "upload_interface" + ] + }, + { + "cell_type": "markdown", + "id": "c69d4077-b132-4f2a-bb93-52b146755aa7", + "metadata": {}, + "source": [ + "

File Input widget using Panel, allowing users to select multiple document files for upload. Supported file types include PDF, DOC, TXT, CSV, and others

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33c57156-e374-433b-953c-a53071df92e3", + "metadata": {}, + "outputs": [], + "source": [ + "data_folder = os.path.join(PROJECT_DIR, \"data\")\n", + "supported_patterns = [\"*.pdf\"]\n", + "files = []\n", + "for pattern in supported_patterns:\n", + " files.extend(glob.glob(os.path.join(data_folder, pattern)))\n", + "\n", + "if len(files) == 0:\n", + " raise FileNotFoundError(\"No PDF files found in the data directory.\")\n", + "else:\n", + " print(\"Input PDF files from data folder:\")\n", + " for file in files:\n", + " print(os.path.basename(file))" + ] + }, + { + "cell_type": "markdown", + "id": "70784fc5-128f-48b4-b75d-28cc55b8f775", + "metadata": {}, + "source": [ + "
\n", + "

4.2 Creating Vector Store

\n", + "\n", + "

initialize and configure the Teradata Vector Store with the required parameters. This is the core step where we set up the vector store with the relevant models, algorithms, and document files. The Vector Store will index the uploaded documents and prepare them for fast retrieval using similarity search.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3262a5ff-a2a4-404d-8f90-0ebeec508dc9", + "metadata": {}, + "outputs": [], + "source": [ + "document_vector_store.create(\n", + " embeddings_model=\"amazon.titan-embed-text-v2:0\",\n", + " chat_completion_model=\"anthropic.claude-3-5-sonnet-20240620-v1:0\",\n", + " search_algorithm=\"VECTORDISTANCE\",\n", + " top_k=10,\n", + " object_names=\"tbl_testing\",\n", + " data_columns=[\"chunks\"],\n", + " vector_column=\"VectorIndex\",\n", + " chunk_size=100,\n", + " optimized_chunking=False,\n", + " document_files=files,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "73a680b0-95a4-4724-8d74-614287b0fa59", + "metadata": {}, + "source": [ + "

Check the current status of the Teradata Vector Store after it has been created. This step ensures that the Vector Store has been successfully initialized and is ready for processing queries.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c0ec921c-8844-4d90-b608-af76e5fa0147", + "metadata": {}, + "outputs": [], + "source": [ + "document_vector_store.status()" + ] + }, + { + "cell_type": "markdown", + "id": "883cc1aa-86e6-4db2-a769-33b6529a5744", + "metadata": {}, + "source": [ + "

The `run_query` function is designed to process and answer user queries based on the document content stored in the Teradata Vector Store. This function leverages the embeddings created from the uploaded documents to retrieve relevant information and provide answers.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f4a7acf9-384c-4035-a65c-c5a873d0f9c8", + "metadata": {}, + "outputs": [], + "source": [ + "# Function to run a query from the PDF content\n", + "def run_query(query: str):\n", + " res = document_vector_store.ask(question=query)\n", + " return res" + ] + }, + { + "cell_type": "markdown", + "id": "f87df1e6-fe5c-412e-ad8b-f4ebafc55559", + "metadata": {}, + "source": [ + "

The callback function is responsible for handling the chat messages from the user and providing appropriate responses. It acts as the core mechanism for processing user input and querying the Teradata Vector Store to generate responses based on the uploaded document content.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "930096a7-2c58-4fb8-9923-5ef87bd70a09", + "metadata": {}, + "outputs": [], + "source": [ + "# Callback function for handling chat messages and providing responses\n", + "def callback(contents, user, instance):\n", + " \"\"\"Handles the chat interaction and returns the response.\"\"\"\n", + " # Process the contents of the message\n", + " response = run_query(contents) \n", + " return response" + ] + }, + { + "cell_type": "markdown", + "id": "19a27512-ff10-4b97-b64f-ca817ba15675", + "metadata": {}, + "source": [ + "
\n", + "

Note:Chatbot is accessing multiple components, including databases and LLMs. This may cause a brief delay in responses. Your patience is appreciated.

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "4815b98e-f073-4341-9a01-a3d1dd0aa3b2", + "metadata": {}, + "source": [ + "
\n", + "5. Panel's Chat Interface\n", + "

The chatbot uses Panel's ChatInterface to handle the user interface for interactions. This interface allows users to input questions and view responses in real-time, providing an intuitive and smooth experience for engaging with the documents.

" + ] + }, + { + "cell_type": "markdown", + "id": "37019136-98ea-459f-8c18-0cbc932a15d2", + "metadata": {}, + "source": [ + "

\n", + " You can ask the chatbot about anything in the documents you uploaded. Here are some example queries:\n", + "

\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9dff307f-96bf-4834-9ab3-bf1fe9a90f2c", + "metadata": {}, + "outputs": [], + "source": [ + "# Using Panel's ChatInterface for the chatbot UI\n", + "pn.chat.ChatInterface(\n", + " callback=callback,\n", + " show_rerun=False, # Hide rerun button\n", + " show_undo=False, # Hide undo button\n", + " show_clear=False, # Hide clear button\n", + " width=800,\n", + " height=400\n", + ").servable()" + ] + }, + { + "cell_type": "markdown", + "id": "32295f2c-7184-4eb5-9157-ba4721907df0", + "metadata": {}, + "source": [ + "If the chatbot didn't work when you pressed ENTER, on your first time using this demo on your environment, did you use F5 to reload the site? See instructions at the top of the notebook.
\n", + "If you asked a question and got no reponse after a few minutes, it is possible that you will need to type 0 0 to restart the kernel and re-run the demo. Questions outside the model seem to confuse the chatbot.
" + ] + }, + { + "cell_type": "markdown", + "id": "f99373cf-30a0-4bbc-be1b-d8b8f53d173c", + "metadata": {}, + "source": [ + "
\n", + "6. Cleanup\n", + "

Call the destroy() method of the VS object to clean up the objects created during this demo.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "135d8649-af09-4d58-b7c5-ba364f662695", + "metadata": {}, + "outputs": [], + "source": [ + "# Destroy the vector store after use\n", + "document_vector_store.destroy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d639486-b32b-43b9-b38b-0b890d576de0", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "de13f7b7-b27c-4088-a1ee-103a0fc32ef9", + "metadata": {}, + "source": [ + "

Link:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "2fad35c0-a3e3-4c23-b76e-ebe1de1a9545", + "metadata": {}, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/images/header.png b/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/images/header.png new file mode 100644 index 00000000..49229cd9 Binary files /dev/null and b/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/images/header.png differ diff --git a/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/requirements.txt b/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/requirements.txt new file mode 100644 index 00000000..135c4067 --- /dev/null +++ b/VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/requirements.txt @@ -0,0 +1,8 @@ +teradataml==20.0.0.5 +teradatagenai +streamlit +python-dotenv +plotly +ipykernel +panel +jupyter_bokeh