diff --git a/VantageCloud_Lake/Getting_Started/VectorStore/SQL_Fundamentals.pdf b/VantageCloud_Lake/Getting_Started/VectorStore/SQL_Fundamentals.pdf
new file mode 100644
index 00000000..d0755c0a
Binary files /dev/null and b/VantageCloud_Lake/Getting_Started/VectorStore/SQL_Fundamentals.pdf differ
diff --git a/VantageCloud_Lake/Getting_Started/VectorStore/VectorStore_Getting_Started.ipynb b/VantageCloud_Lake/Getting_Started/VectorStore/VectorStore_Getting_Started.ipynb
new file mode 100644
index 00000000..e3db4b9e
--- /dev/null
+++ b/VantageCloud_Lake/Getting_Started/VectorStore/VectorStore_Getting_Started.ipynb
@@ -0,0 +1,1419 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "9944e564-a16d-44fb-b9e0-0dd16a45eb1f",
+ "metadata": {},
+ "source": [
+ " \n",
+ " Vector Store - Getting Started\n",
+ "
\n",
+ " \n",
+ "
Introduction
\n", + "\n", + "\n", + "Vector stores are specialized databases designed for efficient storage, indexing, and searching of high-dimensional vector embeddings. These embeddings, generated by AI models, enable similarity search and power applications in machine learning, NLP, recommendation systems, and image/video retrieval.\n", + "
\n", + "\n", + "\n", + "This notebook provides an in-depth exploration of vector stores in the Teradata Database, covering key concepts, vector embedding mechanisms, and efficient search techniques.\n", + "
\n", + "\n", + "---\n", + "\n", + "Key Concepts
\n", + "\n", + "What is a Vector Store?
\n", + "\n", + "\n", + "A vector store contains a vectorized representation of data, typically created using embeddings from AI models. It allows for high-speed similarity searches beyond traditional keyword matching.\n", + "
\n", + "\n", + "\n", + "Vector embeddings are numerical representations of data (text, images, audio, etc.) mapped into a multi-dimensional space.\n", + "
\n", + "\n", + "\n", + "\n", + "Each embedding is a vector that captures semantic or content-based relationships. For example:\n", + "
\n", + "\n", + "Data Type | \n", + "Example | \n", + "Vector Representation | \n", + "
---|---|---|
Text | \n", + "\"King\" | \n", + "[0.12, 0.45, 0.67, ...] | \n", + "
Image | \n", + "A picture of a cat | \n", + "[0.23, 0.78, 0.55, ...] | \n", + "
\n", + "Embedding Generation is carried out by AI_TextEmbedding Function \n", + "
\n", + "Supported models\n", + "
Vector Search
\n", + "\n", + "\n", + "Unlike traditional search methods, vector search understands the meaning behind a query. It provides relevant results by analyzing semantic relationships rather than exact keyword matches.\n", + "
\n", + "\n", + "\n", + "Common types of embeddings include:\n", + "
\n", + "\n", + "Vector Store in the Teradata Database
\n", + "\n", + "\n", + "A vector in Teradata is a specialized column type called Vector.\n", + "
\n", + "\n", + "\n", + "The vector store in Teradata consists of schemas and tables that enable vector search functionality.\n", + "
\n", + "\n", + "Vector Search Algorithms
\n", + "\n", + "\n", + "Teradata supports three main search algorithms:\n", + "
\n", + "\n", + "1. VectorDistance
\n", + "2. Kmeans
\n", + "3. HNSW
\n", + "Types of Vector Store
\n", + "\n", + "Type | \n", + "Description | \n", + "Use Case | \n", + "
---|---|---|
1. Content-Based | \n", + "Built on the contents of a table (or file/PDF converted to a table). Queries return the top relevant rows based on similarity. | \n", + "Can be combined with LLM-generated textual responses for more accurate answers. | \n", + "
2. Metadata-Based | \n", + "Built on the metadata of tables. Queries return the top matching tables based on similarity. Used in SQL generation for data retrieval. | \n", + "Helps form textual responses by retrieving data from relevant tables for more precise answers. | \n", + "
What we will do in this Notebook
\n", + "\n", + "\n", + "This notebook is designed to guide us through a comprehensive set of exercises focused on working with vector store management, ask and pattern managemnet API. By the end of this tutorial, we will have gained hands-on experience with creating and managing vector store objects and patterns as well as asking questions and performing similarity searches. Here's what we'll learn:\n", + "
\n", + "\n", + "For this notebook to run properly we will need teradataml version 20.00.00.05 or greater and teradatagenai version 20.00.00.1 or greater; below command will check the versions and install required versions
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19d7a949-fd55-4f26-9187-61c8bcfff976", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "!pip show teradataml || pip install teradataml==20.00.00.05" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62bbd4ae-273e-4e1d-9c31-1b92972f4326", + "metadata": {}, + "outputs": [], + "source": [ + "!pip show teradatagenai || pip install teradatagenai==20.00.00.1\n" + ] + }, + { + "cell_type": "markdown", + "id": "8c5a7b4a-41f9-4171-9d8b-ace73987d332", + "metadata": {}, + "source": [ + "Note:If the above commands install the modules please restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: 0 0
\n", + "2. Connect to VantageCloud Lake
\n", + "Connect to VantageCloud using `create_context` from the teradataml Python library. Input your connection details, including the host, username, password and Analytic Compute Group name.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bb7a1034-78fe-4866-9b31-45972607ff2e", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=VCL_GettingStarted_VectorStore.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "markdown", + "id": "83e979f6-4014-462f-9ab7-587b79a22e0f", + "metadata": {}, + "source": [ + "3. Authenticate into User Environment Service (UES)
\n", + "\n", + "UES authentication is required to create and manage the Python or R environments that we will be creating. A VantageCloud Lake user can easily create the authentication objects using the Console in a VantageCloud Lake environment. The step to create these authentication objects has already been performed for you.\n", + "
\n", + "\n", + " \n", + "
4. Understanding Vector Store management
\n", + "VSManager handles Vector Store Management. Below are the methods\n", + "
4.1 health() - Service Health Check\n", + "
What It Does: Ensures that the Vector Store service is up and running smoothly. It’s like a quick check-up to make sure everything is operational.
\n",
+ " What We Get: A clean report in the form of a DataFrame, indicating the health status of the service.
4.2 list() - List All Vector Stores\n", + "
What It Does: Retrieves a list of all vector stores available in our environment. It’s like checking our inventory of vector stores.
\n",
+ " What We Get: A DataFrame with the details of each vector store.
4.3 list_sessions() - Active Sessions Overview\n", + "
What It Does: Lists all active sessions within the Vector Data Store. Useful for managing our active interactions with the system.This is available only for admin users.
\n",
+ " What We Get:A DataFrame with details about each active session.
4.4 disconnect() - End Active Sessions\n", + "
What It Does: Ends any active session by providing the session id (Applicable only for admin). If no session ID is provided, it terminates all active sessions created within the current python session. This is useful when we’re done working and want to disconnect from the system.
\n",
+ " What We Get:No return value, but all active sessions are closed.
5. Getting Started with Vector Store
\n", + "Steps to create Vector Store\n", + "
Checking the Status of the vector store\n", + "
\n",
+ "Though the create, update, destroy APIs are synchronous and will complete only after the operation is complete, user can ensure the same using the status API.
\n",
+ " Check the status: The status function allows us to confirm whether the particular operation is successful.
Wait for the status to become Ready to use the vector store.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8751cae-1426-406c-ba0d-ca6da429c628", + "metadata": {}, + "outputs": [], + "source": [ + "vs.get_details()" + ] + }, + { + "cell_type": "markdown", + "id": "4a648c5f-0eeb-4bd4-aa76-684b2fb0b7bc", + "metadata": {}, + "source": [ + "Use the list function to retrieve all available vector stores and confirm if the vector store has been created." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "731128f2-d330-406f-8bf3-167edb816ffb", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "VSManager.list()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dab3f97e-818c-4a03-9ed7-70954f22e782", + "metadata": {}, + "outputs": [], + "source": [ + "# NOTE: To destroy a vector store, use the destroy() function.\n", + "#vs.destroy()" + ] + }, + { + "cell_type": "markdown", + "id": "2e985e3b-2898-4748-8910-00bada77969f", + "metadata": {}, + "source": [ + "
5.1 Performing a Similarity Search in a Vector Store\n", + "
similarity search allows us to find the most relevant vectors based on a given input. By passing a question to an existing vector store, we can retrieve the closest matches based on the selected search algorithm.
\n",
+ "
This process enables efficient retrieval of information based on vector embeddings, making it useful for applications such as semantic search, recommendation systems, and document retrieval. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b4f5fe77-0265-456b-b80e-efcd163bca73", + "metadata": {}, + "outputs": [], + "source": [ + "question = 'Which item has most comments'\n", + "response = vs.similarity_search(question=question)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "880c8950-73c0-42d6-8453-0cbb4b029a15", + "metadata": {}, + "outputs": [], + "source": [ + "# The \"similar_objects\" in the response is by default a DataFrame with top_k entries.\n", + "response.similar_objects" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f5c764a-84a7-471f-a596-85b065578819", + "metadata": {}, + "outputs": [], + "source": [ + "# Alternative option: Using JSON format could be speedier.\n", + "# We repeat the previous call by now specifying return_type='json'\n", + "question = 'Which item has most comments?'\n", + "response = vs.similarity_search(question=question, return_type='json')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1fb6d371-bb87-44ce-b187-74575e9b70d5", + "metadata": {}, + "outputs": [], + "source": [ + "# The \"similar.objects\" in the response is now a JSON object with top_k entries.\n", + "response.similar_objects" + ] + }, + { + "cell_type": "markdown", + "id": "c8084c92-cc15-4912-a04e-4aaa8b77dcab", + "metadata": {}, + "source": [ + "
5.2 Generating a Natural Language Response\n", + "
Once the similarity searchmethod retrieves relevant results from the vector store, we can generate a natural language response by proving a prompt.
\n",
+ "
As we can see below, by passing the question and the prompt, the response is formated in a conversational manner as suggested in the prompt.
. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3e0fa563-d19b-4a38-9291-3538d934c0d9", + "metadata": {}, + "outputs": [], + "source": [ + "question='Did any one say the about the material of shirt?'\n", + "prompt='Format the response in a conversational way.'\n", + "response = vs.prepare_response(question=question, similarity_results=response, prompt=prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04f8d25e-523a-4e43-bd8c-0d4175aec2f1", + "metadata": {}, + "outputs": [], + "source": [ + "# NOTE: The \"response\" object is a string. If we ask to display the string itself,\n", + "# the special characters like new lines will not be interpreted.\n", + "# To show the actual text with new lines, explicitly specify to print() the \"response\" object.\n", + "\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "241e3d5b-5b0a-4ee8-8af7-650820b774cf", + "metadata": {}, + "source": [ + "5.3 ask -Unified Similarity Search & Response Generation\n", + "
To improve efficiency, we can combine similarity search and response generation into a single operation. This method performs a similarity search in the vector store using the input query and then prepares a natural language response based on the retrieved results.
\n",
+ "
Faster Execution By combining both steps into a single function call, response time is reduced, making the system more efficient.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "379a2188-ab34-4618-b77c-81efee008eb5", + "metadata": {}, + "outputs": [], + "source": [ + "custom_prompt = \"\"\"Do not assume information. \n", + "Only provide information that is present in the data.\n", + "Format results like this:\n", + "Comment ID: \n", + "Comment: \n", + "\"\"\"\n", + "question ='Are there any negative comments on shirts?'\n", + "response = vs.ask(question=question, prompt=custom_prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5cd35bfa-3472-4767-bd4c-dafd11610264", + "metadata": {}, + "outputs": [], + "source": [ + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "de5ed738-be27-41de-af25-7458eb61cc77", + "metadata": {}, + "source": [ + "6. Initializing and Using the Existing Vector
\n", + "Using an Existing Vector Store
If we have an existing vector store that we want to reuse, we can initialize it by providing the name of the store. This allows us to interact with and perform operations on the existing store without creating a new one. \n",
+ "
6.1 Updating Arguments of an Existing Vector Store\n", + "
We can update specific parameters of an existing vector store without having to recreate it. For instance, we want to modify the search algorithm or other configuration settings to better suit our needs. This can be done using the update() API, which allows us to change the arguments of the vector store.
\n",
+ "
6.2 Performing Similarity Search, Preparing Response, and Asking on the Updated Vector Store\n", + "
Once we have updated the vector store, we can seamlessly perform a similarity search, prepare anatural language response , and ask the updated vector store to retrieve the most relevant information.
\n",
+ "
7. Building a Vector Store from PDF Documents
\n", + "Using an Existing Vector Store
If we have an existing vector store that we want to reuse, we can initialize it by providing the name of the store. This allows us to interact with and perform operations on the existing store without creating a new one.
To create a vector store from a PDF document, follow these steps: \n",
+ "
1. Create an Instance of Vector Store : First, instantiate a new vector store where the PDF content will be stored. \n",
+ "
2. Call the create Method
\n",
+ "Using the vector store instance, call the create() method and pass the PDF document file(s) as the document_files argument to upload and vectorize the content.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c3a7f156-cc88-4521-afd3-f4f265bf9d63",
+ "metadata": {
+ "editable": true,
+ "slideshow": {
+ "slide_type": ""
+ },
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "pdf_vs.create(embeddings_model= 'amazon.titan-embed-text-v1',\n",
+ " search_algorithm= 'VECTORDISTANCE',\n",
+ " top_k= 10,\n",
+ " object_names= ['sql_fundamentals'],\n",
+ " data_columns= ['chunks'],\n",
+ " vector_column= 'VectorIndex',\n",
+ " chunk_size= 500,\n",
+ " optimized_chunking=False,\n",
+ " document_files=files)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f4ce8b8a-2af0-4b6b-85a3-8f43fae7c36a",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "pdf_vs.status()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7b6607a0-285e-488d-b929-0c27acad1563",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "VSManager.list()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5c6be509-11a4-48e2-9fa4-39d418c5fee3",
+ "metadata": {},
+ "source": [
+ "
7.1 Asking a Question Based on Uploaded Document to the Vector Store\n", + "
Once we have uploaded our PDF content into the vector store, we can ask specific questions based on the content stored in the vector store. The vector store will use its similarity search to find the most relevant information and provide a detailed response.
\n",
+ "Steps to Ask a Question from the Vector Store \n",
+ "
8. Access Management in Vector Store
\n", + "We can manage both admin and user permissions for users in our vector store. This section shows how to grant and revoke these permissions to control access effectively.
\n",
+ " Note Only admin user can view the permissions or grant/revoke user/admin access to Vector Store\n",
+ "
8.1 Listing User Permissions in the Vector Store \n", + "
To view the permissions currently assigned to users in the vector store, we can use the list_user_permissions() method. This will display a list of all users and the respective access rights they have on the Vector Store.
\n",
+ " For the below commands we'll need a separate userid.
8.2 Granting USER Permission\n", + "
We can grant or revokeUSER permission for specific users to control their access level to the vector store.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1829e98e-4736-4f03-b012-76803a6a65a4", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.grant.user('8.3 Revoke USER Permission\n", + "
Use the revoke.user() method to remove USER permission for from user.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49610a71-236b-47d6-843e-f669cb088565", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.revoke.user('8.4 Granting ADMIN Permission\n", + "
We can grant or revokeADMIN permission for specific users to control their access level to the vector store.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f58e9af-3732-4c95-80e2-2de6afbed02a", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.grant.admin('8.5 Revoke ADMIN Permission\n", + "
Use the revoke.admin() method to removeADMIN permission from user.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e464e5df-a88d-4aa5-ad08-f0f15d57761a", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.revoke.admin('9.1 Destroying the Vector Store
\n", + "When a vector store is no longer needed, we can permanently delete it to free up resources. The destroy operation removes the vector store and all associated data.
\n",
+ "
9.2 Disconnect from the session
\n", + "Use the VSManager.disconnect() method to remove Disconnect from the session.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33d6f002-93b1-4955-84a9-750650d6a9a6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "VSManager.disconnect()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a5068f4-2507-4ac4-8339-9b41d20eaef8", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "2d75485b-b4e3-4415-ae26-554149520897", + "metadata": {}, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}