diff --git a/VantageCloud_Lake/Getting_Started/VectorStore/SQL_Fundamentals.pdf b/VantageCloud_Lake/Getting_Started/VectorStore/SQL_Fundamentals.pdf new file mode 100644 index 00000000..d0755c0a Binary files /dev/null and b/VantageCloud_Lake/Getting_Started/VectorStore/SQL_Fundamentals.pdf differ diff --git a/VantageCloud_Lake/Getting_Started/VectorStore/VectorStore_Getting_Started.ipynb b/VantageCloud_Lake/Getting_Started/VectorStore/VectorStore_Getting_Started.ipynb new file mode 100644 index 00000000..e3db4b9e --- /dev/null +++ b/VantageCloud_Lake/Getting_Started/VectorStore/VectorStore_Getting_Started.ipynb @@ -0,0 +1,1419 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9944e564-a16d-44fb-b9e0-0dd16a45eb1f", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " Vector Store - Getting Started\n", + "
\n", + " \"Teradata\"\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "67ed6f6f-da47-40ca-8877-6849fc54060a", + "metadata": {}, + "source": [ + "

Introduction

\n", + "\n", + "

\n", + "Vector stores are specialized databases designed for efficient storage, indexing, and searching of high-dimensional vector embeddings. These embeddings, generated by AI models, enable similarity search and power applications in machine learning, NLP, recommendation systems, and image/video retrieval.\n", + "

\n", + "\n", + "

\n", + "This notebook provides an in-depth exploration of vector stores in the Teradata Database, covering key concepts, vector embedding mechanisms, and efficient search techniques.\n", + "

\n", + "\n", + "---\n", + "\n", + "

Key Concepts

\n", + "\n", + "

What is a Vector Store?

\n", + "\n", + "

\n", + "A vector store contains a vectorized representation of data, typically created using embeddings from AI models. It allows for high-speed similarity searches beyond traditional keyword matching.\n", + "

\n", + "\n", + "

\n", + "Vector embeddings are numerical representations of data (text, images, audio, etc.) mapped into a multi-dimensional space.\n", + "

\n", + "\n", + "\n", + "

\n", + "Each embedding is a vector that captures semantic or content-based relationships. For example:\n", + "

\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Data TypeExampleVector Representation
Text\"King\"[0.12, 0.45, 0.67, ...]
ImageA picture of a cat[0.23, 0.78, 0.55, ...]
\n", + "\n", + "

\n", + "Embedding Generation is carried out by AI_TextEmbedding Function \n", + "

\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "e0d29c1e-2ba2-4b40-8834-e0fd6fe574f3", + "metadata": {}, + "source": [ + "

What we will do in this Notebook

\n", + "\n", + "

\n", + "This notebook is designed to guide us through a comprehensive set of exercises focused on working with vector store management, ask and pattern managemnet API. By the end of this tutorial, we will have gained hands-on experience with creating and managing vector store objects and patterns as well as asking questions and performing similarity searches. Here's what we'll learn:\n", + "

\n", + "\n", + "
\n", + "
    \n", + "
  1. \n", + "Understanding Vector Store management\n", + "
  2. \n", + "
  3. \n", + "Creating a new Vector Store\n", + "
  4. \n", + "
  5. \n", + "Using existing Vector Store\n", + "
  6. \n", + "
  7. \n", + "Access Management in Vector Store\n", + "
  8. \n", + "
  9. \n", + "Building a Vector Store from PDF Documents\n", + "
  10. \n", + "
  11. \n", + "Pattern Management for Vector Store\n", + "
  12. \n", + "
  13. \n", + "Working with metadata-based Vector Store\n", + "\n", + "\n", + "
\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "2e8c7c2b-0216-451c-b5e5-8972c51fbe63", + "metadata": {}, + "source": [ + "
\n", + "1. Configure the environment" + ] + }, + { + "cell_type": "markdown", + "id": "9ccc135c-7f1d-423a-bdcd-aded45bb0a0d", + "metadata": {}, + "source": [ + "

For this notebook to run properly we will need teradataml version 20.00.00.05 or greater and teradatagenai version 20.00.00.1 or greater; below command will check the versions and install required versions

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19d7a949-fd55-4f26-9187-61c8bcfff976", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "!pip show teradataml || pip install teradataml==20.00.00.05" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62bbd4ae-273e-4e1d-9c31-1b92972f4326", + "metadata": {}, + "outputs": [], + "source": [ + "!pip show teradatagenai || pip install teradatagenai==20.00.00.1\n" + ] + }, + { + "cell_type": "markdown", + "id": "8c5a7b4a-41f9-4171-9d8b-ace73987d332", + "metadata": {}, + "source": [ + "
\n", + "

Note:If the above commands install the modules please restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: 0 0

\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3c2016c4-a193-4fdd-8ba7-28fb5a769935", + "metadata": {}, + "outputs": [], + "source": [ + "from getpass import getpass\n", + "# Required imports\n", + "from teradatagenai import VSManager, VectorStore, VSPattern, VSApi\n", + "from teradataml import *\n", + "import os\n", + "\n", + "# Suppress warnings\n", + "import warnings\n", + "\n", + "warnings.filterwarnings('ignore')\n", + "display.suppress_vantage_runtime_warnings = True\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "561440c8-a673-46c6-a9bc-c0141ead34b1", + "metadata": {}, + "source": [ + "
\n", + "

2. Connect to VantageCloud Lake

\n", + "

Connect to VantageCloud using `create_context` from the teradataml Python library. Input your connection details, including the host, username, password and Analytic Compute Group name.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bb7a1034-78fe-4866-9b31-45972607ff2e", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Checking if this environment is ready to connect to VantageCloud Lake...\")\n", + "\n", + "if os.path.exists(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\"):\n", + " print(\"Your environment parameter file exist. Please proceed with this use case.\")\n", + " # Load all the variables from the .env file into a dictionary\n", + " env_vars = dotenv_values(\"/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env\")\n", + " # Create the Context\n", + " eng = create_context(host=env_vars.get(\"host\"), username=env_vars.get(\"username\"), password=env_vars.get(\"my_variable\"))\n", + " execute_sql('''SET query_band='DEMO=VCL_GettingStarted_VectorStore.ipynb;' UPDATE FOR SESSION;''')\n", + " print(\"Connected to VantageCloud Lake with:\", eng)\n", + "else:\n", + " print(\"Your environment has not been prepared for connecting to VantageCloud Lake.\")\n", + " print(\"Please contact the support team.\")" + ] + }, + { + "cell_type": "markdown", + "id": "83e979f6-4014-462f-9ab7-587b79a22e0f", + "metadata": {}, + "source": [ + "
\n", + "

3. Authenticate into User Environment Service (UES)

\n", + "\n", + "

UES authentication is required to create and manage the Python or R environments that we will be creating. A VantageCloud Lake user can easily create the authentication objects using the Console in a VantageCloud Lake environment. The step to create these authentication objects has already been performed for you.\n", + "

\n", + "

\n", + " \n", + "

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9744f809-f693-463b-9b01-73119c0e2802", + "metadata": {}, + "outputs": [], + "source": [ + "# We've already loaded all the values into our environment variables and into a dictionary, env_vars.\n", + "# username=env_vars.get(\"username\") isn't required when using base_url, pat and pem.\n", + "ues_uri=env_vars.get(\"ues_uri\")\n", + "if ues_uri.endswith(\"/open-analytics\"):\n", + " ues_uri = ues_uri[:-15] # remove last 5 chars (\"/open-analytics\")\n", + "\n", + "if set_auth_token(base_url=ues_uri,\n", + " pat_token=env_vars.get(\"access_token\"), \n", + " pem_file=env_vars.get(\"pem_file\")\n", + " ):\n", + " print(\"UES Authentication successful\")\n", + "else:\n", + " print(\"UES Authentication failed. Check credentials.\")\n", + " sys.exit(1)" + ] + }, + { + "cell_type": "markdown", + "id": "36ca84ab-4a8a-46b5-8b54-08a4f4f576a2", + "metadata": {}, + "source": [ + "
\n", + "

4. Understanding Vector Store management

\n", + "

VSManager handles Vector Store Management. Below are the methods\n", + "

\n", + "

" + ] + }, + { + "cell_type": "markdown", + "id": "6daf439c-85a7-4a7b-9e4b-064767c64655", + "metadata": {}, + "source": [ + "
\n", + "

4.1 health() - Service Health Check\n", + "

What It Does: Ensures that the Vector Store service is up and running smoothly. It’s like a quick check-up to make sure everything is operational.
\n", + " What We Get: A clean report in the form of a DataFrame, indicating the health status of the service.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "197e60de-a4ef-4e7d-8682-534b7d5e17af", + "metadata": {}, + "outputs": [], + "source": [ + "VSManager.health()" + ] + }, + { + "cell_type": "markdown", + "id": "bcc2c5a1-43b4-4194-875a-1ffea1d8b5c5", + "metadata": {}, + "source": [ + "
\n", + "

4.2 list() - List All Vector Stores\n", + "

What It Does: Retrieves a list of all vector stores available in our environment. It’s like checking our inventory of vector stores.
\n", + " What We Get: A DataFrame with the details of each vector store.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "83b83fcd-7e35-401d-8906-c65f27473ac1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "vslist = VSManager.list()\n", + "vslist" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a60035c4-82fe-40fd-bcfe-ea0f7416e502", + "metadata": {}, + "outputs": [], + "source": [ + "# Specify your own Database username to isolate your vector stores among all.\n", + "# vslist[ (vslist['database_name'] == '') & (vslist['permission'] == 'ADMIN') ]" + ] + }, + { + "cell_type": "markdown", + "id": "008e372e-003f-425e-8f08-b759cc8c962d", + "metadata": {}, + "source": [ + "
\n", + "

4.3 list_sessions() - Active Sessions Overview\n", + "

What It Does: Lists all active sessions within the Vector Data Store. Useful for managing our active interactions with the system.This is available only for admin users.
\n", + " What We Get:A DataFrame with details about each active session.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f322d8e-d59a-45a6-be97-4e1e60316854", + "metadata": {}, + "outputs": [], + "source": [ + "sessions = VSManager.list_sessions()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3018afab-0fd1-4538-b55a-2d53c3901522", + "metadata": {}, + "outputs": [], + "source": [ + "sessions.session_details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1e584812-0620-42ec-91e6-7c98c1dcca83", + "metadata": {}, + "outputs": [], + "source": [ + "sessions.current_session_id" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f458ed7b-bf21-4b77-a19e-fca711a9d824", + "metadata": {}, + "outputs": [], + "source": [ + "sessions.total_active_sessions" + ] + }, + { + "cell_type": "markdown", + "id": "97ee9251-1b15-48bc-b16d-9db4e53dce5f", + "metadata": {}, + "source": [ + "
\n", + "

4.4 disconnect() - End Active Sessions\n", + "

What It Does: Ends any active session by providing the session id (Applicable only for admin). If no session ID is provided, it terminates all active sessions created within the current python session. This is useful when we’re done working and want to disconnect from the system.
\n", + " What We Get:No return value, but all active sessions are closed.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c7846a1f-ea61-4d6f-90de-6d015ad4f947", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# VSManager.disconnect()" + ] + }, + { + "cell_type": "markdown", + "id": "4619607f-46c4-4135-bdd6-ed1ca8a26040", + "metadata": {}, + "source": [ + "
\n", + "

5. Getting Started with Vector Store

\n", + "

Steps to create Vector Store\n", + "

\n", + "

\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "906bb504-5c67-4c80-a594-1797c5b811af", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# ACTION: Specify a name for a new vector store.\n", + "vs = VectorStore(name=\"vs_comment\", log=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "700463d5-5773-4b71-9073-e79dab9446f4", + "metadata": {}, + "outputs": [], + "source": [ + "# We need to specify the data source to create the embeddings for the \n", + "# new vector store. The data source can be a string of a (fully-qualified, \n", + "# if needed) table name, or a teradataml DataFrame.\n", + "#\n", + "# The following creates a teradataml DataFrame from this table to use as input.\n", + "# As the usecase is only for knowledge purpose we are taking subset of data\n", + "input_data = DataFrame.from_query(\"SELECT * FROM DEMO_Retail.Web_Comment where comment_id < 1000\")\n", + "input_data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3f219d3-6688-4160-9e81-d0d50ac068a0", + "metadata": {}, + "outputs": [], + "source": [ + "# Create vector store with the title specified earlier.\n", + "# The following specification lacks \"chat_completion_model\" argument.\n", + "# Per the user guide, default value is \"anthropic.claude-3-haiku-20240307-v1:0\"\n", + "# NOTE: If you ask to create an existing vector store, an error will be thrown.\n", + "vs.create(embeddings_model= 'amazon.titan-embed-text-v1',\n", + " search_algorithm= 'KMEANS',\n", + " seed=10,\n", + " top_k=5,\n", + " metric='EUCLIDEAN',\n", + " object_names= input_data,\n", + " key_columns= ['comment_id'],\n", + " data_columns= ['comment_text'],\n", + " vector_column= 'VectorIndex')" + ] + }, + { + "cell_type": "markdown", + "id": "b289f82a-03da-4831-b601-6fce799f6ca8", + "metadata": {}, + "source": [ + "

Checking the Status of the vector store\n", + "

\n", + "Though the create, update, destroy APIs are synchronous and will complete only after the operation is complete, user can ensure the same using the status API.
\n", + " Check the status: The status function allows us to confirm whether the particular operation is successful.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "feba49df-799f-4a7a-ba8d-59d5df3eee40", + "metadata": {}, + "outputs": [], + "source": [ + "vs.status()" + ] + }, + { + "cell_type": "markdown", + "id": "1661b1fa-3c27-4a47-b41b-e6f53ad998a8", + "metadata": {}, + "source": [ + "

Wait for the status to become Ready to use the vector store.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8751cae-1426-406c-ba0d-ca6da429c628", + "metadata": {}, + "outputs": [], + "source": [ + "vs.get_details()" + ] + }, + { + "cell_type": "markdown", + "id": "4a648c5f-0eeb-4bd4-aa76-684b2fb0b7bc", + "metadata": {}, + "source": [ + "

Use the list function to retrieve all available vector stores and confirm if the vector store has been created." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "731128f2-d330-406f-8bf3-167edb816ffb", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "VSManager.list()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dab3f97e-818c-4a03-9ed7-70954f22e782", + "metadata": {}, + "outputs": [], + "source": [ + "# NOTE: To destroy a vector store, use the destroy() function.\n", + "#vs.destroy()" + ] + }, + { + "cell_type": "markdown", + "id": "2e985e3b-2898-4748-8910-00bada77969f", + "metadata": {}, + "source": [ + "


\n", + "

5.1 Performing a Similarity Search in a Vector Store\n", + "

similarity search allows us to find the most relevant vectors based on a given input. By passing a question to an existing vector store, we can retrieve the closest matches based on the selected search algorithm.
\n", + "

    How It Works\n", + "
  1. Input a question Pass a question or input text to question argument of similarity_search().
  2. \n", + "
  3. Perform the Similarity Search The vector store searches for the most relevant matches using the specified search_algorithm.
  4. \n", + "
  5. Retrieve Top Matches The result contains the top_k most relevant entries along with their similarity scores, helping us understand how closely they match the input.
  6. \n", + "
\n", + "

This process enables efficient retrieval of information based on vector embeddings, making it useful for applications such as semantic search, recommendation systems, and document retrieval. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b4f5fe77-0265-456b-b80e-efcd163bca73", + "metadata": {}, + "outputs": [], + "source": [ + "question = 'Which item has most comments'\n", + "response = vs.similarity_search(question=question)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "880c8950-73c0-42d6-8453-0cbb4b029a15", + "metadata": {}, + "outputs": [], + "source": [ + "# The \"similar_objects\" in the response is by default a DataFrame with top_k entries.\n", + "response.similar_objects" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f5c764a-84a7-471f-a596-85b065578819", + "metadata": {}, + "outputs": [], + "source": [ + "# Alternative option: Using JSON format could be speedier.\n", + "# We repeat the previous call by now specifying return_type='json'\n", + "question = 'Which item has most comments?'\n", + "response = vs.similarity_search(question=question, return_type='json')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1fb6d371-bb87-44ce-b187-74575e9b70d5", + "metadata": {}, + "outputs": [], + "source": [ + "# The \"similar.objects\" in the response is now a JSON object with top_k entries.\n", + "response.similar_objects" + ] + }, + { + "cell_type": "markdown", + "id": "c8084c92-cc15-4912-a04e-4aaa8b77dcab", + "metadata": {}, + "source": [ + "


\n", + "

5.2 Generating a Natural Language Response\n", + "

Once the similarity searchmethod retrieves relevant results from the vector store, we can generate a natural language response by proving a prompt.
\n", + "

    Steps to Prepare the Response\n", + "
  1. Input a question Pass a question or input text to question argument of prepare_response().
  2. \n", + "
  3. Retrieve Similarity Results The similarity_search() method returns the most relevant matches along with their similarity scores.
  4. \n", + "
  5. Input a prompt By passing prompt, we ask the model to format the output in a particular way.
  6. \n", + "
\n", + "

As we can see below, by passing the question and the prompt, the response is formated in a conversational manner as suggested in the prompt.

. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3e0fa563-d19b-4a38-9291-3538d934c0d9", + "metadata": {}, + "outputs": [], + "source": [ + "question='Did any one say the about the material of shirt?'\n", + "prompt='Format the response in a conversational way.'\n", + "response = vs.prepare_response(question=question, similarity_results=response, prompt=prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04f8d25e-523a-4e43-bd8c-0d4175aec2f1", + "metadata": {}, + "outputs": [], + "source": [ + "# NOTE: The \"response\" object is a string. If we ask to display the string itself,\n", + "# the special characters like new lines will not be interpreted.\n", + "# To show the actual text with new lines, explicitly specify to print() the \"response\" object.\n", + "\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "241e3d5b-5b0a-4ee8-8af7-650820b774cf", + "metadata": {}, + "source": [ + "
\n", + "

5.3 ask -Unified Similarity Search & Response Generation\n", + "

To improve efficiency, we can combine similarity search and response generation into a single operation. This method performs a similarity search in the vector store using the input query and then prepares a natural language response based on the retrieved results.
\n", + "

    How It Works\n", + "
  1. Input a question Pass a question or input text to question argument of ask().
  2. \n", + "
  3. Input a prompt By passing prompt, we ask the model to format the output in a particular way.
  4. \n", + "
\n", + "

Faster Execution By combining both steps into a single function call, response time is reduced, making the system more efficient.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "379a2188-ab34-4618-b77c-81efee008eb5", + "metadata": {}, + "outputs": [], + "source": [ + "custom_prompt = \"\"\"Do not assume information. \n", + "Only provide information that is present in the data.\n", + "Format results like this:\n", + "Comment ID: \n", + "Comment: \n", + "\"\"\"\n", + "question ='Are there any negative comments on shirts?'\n", + "response = vs.ask(question=question, prompt=custom_prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5cd35bfa-3472-4767-bd4c-dafd11610264", + "metadata": {}, + "outputs": [], + "source": [ + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "de5ed738-be27-41de-af25-7458eb61cc77", + "metadata": {}, + "source": [ + "
\n", + "

6. Initializing and Using the Existing Vector

\n", + "

Using an Existing Vector Store
If we have an existing vector store that we want to reuse, we can initialize it by providing the name of the store. This allows us to interact with and perform operations on the existing store without creating a new one. \n", + "

    Steps to Initialize existing Vector Store \n", + "
  • Provide the Name of the Vector Store : Specify the name of an existing vector store that we went to access. In our case, for object_names we will pass in the teradataml DataFrame.
  • \n", + "
  • Initialize the Vector Store Instance :By using the provided name, we can initialize the vector store instance and begin interacting with it.
  • \n", + "
\n", + "

\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36622aeb-61bb-4f6b-8cd6-7e4626f1d056", + "metadata": {}, + "outputs": [], + "source": [ + "# ACTION: Specify an existing vector store to initialize.\n", + "updated_vs = VectorStore(\"vs_comment\")" + ] + }, + { + "cell_type": "markdown", + "id": "22b3256d-2274-4ccc-a9b3-c69946e3a624", + "metadata": {}, + "source": [ + "
\n", + "

6.1 Updating Arguments of an Existing Vector Store\n", + "

We can update specific parameters of an existing vector store without having to recreate it. For instance, we want to modify the search algorithm or other configuration settings to better suit our needs. This can be done using the update() API, which allows us to change the arguments of the vector store.
\n", + "

    Steps to Update an Existing Vector Store\n", + "
  1. Access the Existing Vector Store We first ensure that an existing vector store is already initialized.
  2. \n", + "
  3. Use the update() API Then we call the update() method on the vector store instance and provide the new argument(s) we want to modify, such as the search_algorithm.
  4. \n", + "
\n", + "

. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0c6e9451-72ee-4a14-862c-9bbf80301da1", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.update(search_algorithm=\"HNSW\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bbdd8374-99c0-4055-8dab-a951b08a046f", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.status()" + ] + }, + { + "cell_type": "markdown", + "id": "34a8421c-579f-4746-9084-68c3fb657dcc", + "metadata": {}, + "source": [ + "
\n", + "

6.2 Performing Similarity Search, Preparing Response, and Asking on the Updated Vector Store\n", + "

Once we have updated the vector store, we can seamlessly perform a similarity search, prepare anatural language response , and ask the updated vector store to retrieve the most relevant information.
\n", + "

    Steps to Perform on the Updated Vector Store\n", + "
  1. Similarity Search Use the similarity_search() method to find the most relevant matches for your input question based on the updated configuration.
  2. \n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2bfa398c-2513-4cc8-862a-c8cf9a007711", + "metadata": {}, + "outputs": [], + "source": [ + "question = 'What category of item has most comments?'\n", + "response = updated_vs.similarity_search(question=question)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c62ac4c-0922-436f-b1bd-c64dcfad4c8d", + "metadata": {}, + "outputs": [], + "source": [ + "response.similar_objects" + ] + }, + { + "cell_type": "markdown", + "id": "f6c77aba-fab9-46b4-a90f-34effaf7ff29", + "metadata": {}, + "source": [ + "
      \n", + "
    • 2. Prepare the Response After retrieving the search results, use the prepare_response() function to generate a user-friendly, natural language response based on the most relevant matches.
    • " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a61c96bf-23a0-418d-abea-866d1fbe577c", + "metadata": {}, + "outputs": [], + "source": [ + "question = 'What category of item has most comments?'\n", + "response = updated_vs.prepare_response(question=question, similarity_results=response)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a9471b30-6998-4a1b-92e0-fb2f0aa66157", + "metadata": {}, + "outputs": [], + "source": [ + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "134e270c-7014-4a2d-9774-bdd8e31435d8", + "metadata": {}, + "source": [ + "
        \n", + "
      • 3. Ask the Updated Vector Store Combine these operations using the ask() method, which performs the search and generates the response in one call, ensuring faster and more efficient retrieval of information.
      • " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e4429a2a-ff77-420e-aff7-7fed004ad203", + "metadata": {}, + "outputs": [], + "source": [ + "custom_prompt = \"\"\"Do not assume information. \n", + "Only provide information that is present in the data.\n", + "Format results like this:\n", + "Comment ID: \n", + "Comment: \n", + "\"\"\"\n", + "question ='Are there any positive comments on shirts?'\n", + "response = vs.ask(question=question, prompt=custom_prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "48d0d617-026d-4a00-a7af-dcb2f3ba7adb", + "metadata": {}, + "outputs": [], + "source": [ + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "36491bf5-3132-4efd-81d3-bde3de20a014", + "metadata": {}, + "source": [ + "
        \n", + "

        7. Building a Vector Store from PDF Documents

        \n", + "

        Using an Existing Vector Store
        If we have an existing vector store that we want to reuse, we can initialize it by providing the name of the store. This allows us to interact with and perform operations on the existing store without creating a new one.

        To create a vector store from a PDF document, follow these steps: \n", + "
        1. Create an Instance of Vector Store : First, instantiate a new vector store where the PDF content will be stored. \n", + "

      \n", + "

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "441b8dd9-c205-49ef-b3d0-41744cc2d6f8", + "metadata": {}, + "outputs": [], + "source": [ + "# ACTION: Specify a name for a new vector store.\n", + "pdf_vs = VectorStore('vs_sql_pdf')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b3d5383-9d25-4ea6-9b3f-7891fcd0669d", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "files = [os.path.join(os.getcwd(), \"SQL_Fundamentals.pdf\")]" + ] + }, + { + "cell_type": "markdown", + "id": "e9626c7d-1271-4265-8310-1a2f87a625fc", + "metadata": {}, + "source": [ + "

      2. Call the create Method
      \n", + "Using the vector store instance, call the create() method and pass the PDF document file(s) as the document_files argument to upload and vectorize the content.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3a7f156-cc88-4521-afd3-f4f265bf9d63", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "pdf_vs.create(embeddings_model= 'amazon.titan-embed-text-v1',\n", + " search_algorithm= 'VECTORDISTANCE',\n", + " top_k= 10,\n", + " object_names= ['sql_fundamentals'],\n", + " data_columns= ['chunks'],\n", + " vector_column= 'VectorIndex',\n", + " chunk_size= 500,\n", + " optimized_chunking=False,\n", + " document_files=files)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f4ce8b8a-2af0-4b6b-85a3-8f43fae7c36a", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "pdf_vs.status()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b6607a0-285e-488d-b929-0c27acad1563", + "metadata": {}, + "outputs": [], + "source": [ + "VSManager.list()" + ] + }, + { + "cell_type": "markdown", + "id": "5c6be509-11a4-48e2-9fa4-39d418c5fee3", + "metadata": {}, + "source": [ + "


      \n", + "

      7.1 Asking a Question Based on Uploaded Document to the Vector Store\n", + "

      Once we have uploaded our PDF content into the vector store, we can ask specific questions based on the content stored in the vector store. The vector store will use its similarity search to find the most relevant information and provide a detailed response.
      \n", + "Steps to Ask a Question from the Vector Store \n", + "

        \n", + "
      1. Define the Question and Prompt : Formulate the question we would like to ask and provide a prompt for context. In our case, the question is about searching for NULL and NOT NULL values in Teradata.
      2. \n", + "
      3. Use the ask Method : Call the ask() method on the vector store instance, passing both the question and prompt. The vector store will then return a response based on the content of the uploaded document.

      \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "318fac6f-33ce-4e25-9b15-9f8ce0bdf146", + "metadata": {}, + "outputs": [], + "source": [ + "question= 'How to search for Nulls and Not Nulls together in Teradata with an example?'\n", + "prompt= 'Briefly explain, provide a syntax in educational tone.'\n", + "response = pdf_vs.ask(question=question, prompt=prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "820cbe87-68bf-492d-ac50-e38cf883d3f1", + "metadata": {}, + "outputs": [], + "source": [ + "print(response)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "575c8c81-fb78-42de-937c-f0fa0dd8cc0b", + "metadata": {}, + "outputs": [], + "source": [ + "#pdf_vs.destroy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3048deea-4ff6-400b-b8b0-461467777a2f", + "metadata": {}, + "outputs": [], + "source": [ + "pdf_vs.status()" + ] + }, + { + "cell_type": "markdown", + "id": "c4a8568f-33cf-45af-812a-a4fe61ec0d14", + "metadata": { + "tags": [] + }, + "source": [ + "
      \n", + "

      8. Access Management in Vector Store

      \n", + "

      We can manage both admin and user permissions for users in our vector store. This section shows how to grant and revoke these permissions to control access effectively.
      \n", + " Note Only admin user can view the permissions or grant/revoke user/admin access to Vector Store\n", + "

      " + ] + }, + { + "cell_type": "markdown", + "id": "186d42cc-7070-468f-928f-cba2cfd0d5c3", + "metadata": {}, + "source": [ + "
      \n", + "

      8.1 Listing User Permissions in the Vector Store \n", + "

      To view the permissions currently assigned to users in the vector store, we can use the list_user_permissions() method. This will display a list of all users and the respective access rights they have on the Vector Store.
      \n", + " For the below commands we'll need a separate userid.

      \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3922c660-9ee9-417c-85ee-940f79a9ccf9", + "metadata": {}, + "outputs": [], + "source": [ + "vs.list_user_permissions()" + ] + }, + { + "cell_type": "markdown", + "id": "b3e3b422-60ea-4e52-924d-a44ae2ae3d6c", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "
      \n", + "

      8.2 Granting USER Permission\n", + "

      We can grant or revokeUSER permission for specific users to control their access level to the vector store.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1829e98e-4736-4f03-b012-76803a6a65a4", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.grant.user('')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ede2b13b-dbc6-40c5-980f-13957cb28588", + "metadata": {}, + "outputs": [], + "source": [ + "# check granted permission\n", + "updated_vs.list_user_permissions()" + ] + }, + { + "cell_type": "markdown", + "id": "785a62c7-7178-4ed4-a35a-f0c73098ca5f", + "metadata": {}, + "source": [ + "
      \n", + "

      8.3 Revoke USER Permission\n", + "

      Use the revoke.user() method to remove USER permission for from user.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49610a71-236b-47d6-843e-f669cb088565", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.revoke.user('')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3f2c3aa-fcdd-41bb-9d33-1522e253a86c", + "metadata": {}, + "outputs": [], + "source": [ + "# check revoked permission\n", + "updated_vs.list_user_permissions()" + ] + }, + { + "cell_type": "markdown", + "id": "4a67f91d-2f55-47f4-8ec5-004eb990fd28", + "metadata": {}, + "source": [ + "
      \n", + "

      8.4 Granting ADMIN Permission\n", + "

      We can grant or revokeADMIN permission for specific users to control their access level to the vector store.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f58e9af-3732-4c95-80e2-2de6afbed02a", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.grant.admin('')" + ] + }, + { + "cell_type": "markdown", + "id": "e606b63b-0295-45ec-b173-0fb2ad796530", + "metadata": {}, + "source": [ + "
      \n", + "

      8.5 Revoke ADMIN Permission\n", + "

      Use the revoke.admin() method to removeADMIN permission from user.

      " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e464e5df-a88d-4aa5-ad08-f0f15d57761a", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.revoke.admin('')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22f9c6af-bb6b-4b4a-8abc-ba6d400ac5cb", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.list_user_permissions()" + ] + }, + { + "cell_type": "markdown", + "id": "25cac8c6-0265-4c87-a258-9f9598c9657c", + "metadata": {}, + "source": [ + "
      \n", + "9. Cleanup\n", + "

      9.1 Destroying the Vector Store

      \n", + "

      When a vector store is no longer needed, we can permanently delete it to free up resources. The destroy operation removes the vector store and all associated data.
      \n", + "

        Important Considerations\n", + "
      1. Irreversible Action: Once a vector store is destroyed, it cannot be recovered.
      2. \n", + "
      3. Does Not Remove Connection: Destroying a vector store only deletes the vector store—it does not remove the underlying database connection. The instance remains active and can be used to create a new vector store.\n", + "
      4. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f43fbb71-ae5f-4b66-bb26-1312faed0382", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.destroy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "512219b5-f604-43e9-abc0-e6b82b1cbcd8", + "metadata": {}, + "outputs": [], + "source": [ + "updated_vs.status()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30723532-fa34-4f3f-a25b-b79177bb36ac", + "metadata": {}, + "outputs": [], + "source": [ + "pdf_vs.destroy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3e51fb05-70db-4f89-ab8a-545559655781", + "metadata": {}, + "outputs": [], + "source": [ + "pdf_vs.status()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41edd1f4-c89d-4f3c-965e-c0aa49b6fa4f", + "metadata": {}, + "outputs": [], + "source": [ + "VSManager.list()" + ] + }, + { + "cell_type": "markdown", + "id": "68bd0c83-62de-49dc-87fb-61e9a8e0bc1a", + "metadata": {}, + "source": [ + "
        \n", + "\n", + "

        9.2 Disconnect from the session

        \n", + "

        Use the VSManager.disconnect() method to remove Disconnect from the session.

        " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33d6f002-93b1-4955-84a9-750650d6a9a6", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "VSManager.disconnect()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a5068f4-2507-4ac4-8339-9b41d20eaef8", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "2d75485b-b4e3-4415-ae26-554149520897", + "metadata": {}, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}