diff --git a/.tests/skip_files.txt b/.tests/skip_files.txt
index 468e100e..6eefdc9f 100644
--- a/.tests/skip_files.txt
+++ b/.tests/skip_files.txt
@@ -19,3 +19,4 @@
../ModelOps/12_ModelOps_Model_Factory_REST_Python.ipynb
../UseCases/Data_Dictionary/Data_Dictionary_Raw.ipynb
../UseCases/Augmented_call_center_AgenticAI/Augmented_call_center_AgenticAI.ipynb
+../UseCases/Opensource_Data_Science_OAF/Opensource_Data_Science_OAF.ipynb
diff --git a/UseCases/Opensource_Data_Science_OAF/Opensource_Data_Science_OAF.ipynb b/UseCases/Opensource_Data_Science_OAF/Opensource_Data_Science_OAF.ipynb
new file mode 100755
index 00000000..b1f26487
--- /dev/null
+++ b/UseCases/Opensource_Data_Science_OAF/Opensource_Data_Science_OAF.ipynb
@@ -0,0 +1,1091 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "hawaiian-daniel",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ " \n",
+ " Leveraging Open Source Machine Learning with ClearScape Analytics and Open Analytics Framework\n",
+ "
\n",
+ " \n",
+ "
Open-source Machine Learning, AI, and Advanced Analytics tools, techniques, and resources offer enterprises limitless opportunities to drive new insights and business value from their internal and external data landscape. Unfortunately, with these opportunities come significant challenges to realizing success. Some of these challenges include:
\n", + "VantageCloud Lake Edition Open Analytics Framework is the only enterprise-class platform that addresses these challenges with a simple, powerful architecture. The following demonstration will illustrate how users can use any open-source tool or package of choice, deploy it to a custom, isolated environment; and then execute in parallel and at massive scale.
\n", + "\n", + "This demonstration utilizes a VantageCloud Lake Analytic Cluster architecture, using the shared data sets created in the previous demonstration. Specifically the \"Txn_History\" data that represents \"CashApp\" style transaction history stored in the Vantage Object File System (OFS).
\n", + "\n", + "The high level process is as follows:
\n", + "\n", + "\n",
+ "
\n", + " \n", + " | ![]() |
This notebook consists of three primary demonstrations
\n", + "Python Package Imports
\n", + "\n", + "Standard practice to import required packages and libraries; execute this cell to import packages for Teradata automation as well as machine learning, analytics, utility, and data management packages.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "southeast-density", + "metadata": {}, + "outputs": [], + "source": [ + "# install other required packages\n", + "%pip install xgboost" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "great-shadow", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Import the Python library teradataml and the specific environment setup modules.\n", + "#\n", + "import warnings\n", + "from teradataml import *\n", + "from db_utils import *\n", + "warnings.filterwarnings('ignore')\n", + "display.suppress_vantage_runtime_warnings = True\n", + "\n", + "from IPython.display import display as ipydisplay\n", + "from IPython.display import clear_output \n", + "\n", + "from sklearn.metrics import accuracy_score, roc_auc_score, confusion_matrix, ConfusionMatrixDisplay\n", + "import matplotlib.pyplot as plt\n", + "#\n", + "# Account for the data types to be used with the script.\n", + "#\n", + "from teradatasqlalchemy.types import BIGINT, VARCHAR, FLOAT, INTEGER\n", + "from collections import OrderedDict\n", + "#\n", + "# Other case-specific imports.\n", + "#\n", + "import json, os, sys, getpass\n", + "import pandas as pd\n", + "from time import sleep\n", + "\n", + "# container name - set here for easier notebook navigation\n", + "### User will also be asked to change it ###\n", + "oaf_name = 'OAF_demo_env'\n", + "###########################\n", + "print(f'using \"{oaf_name}\" for the OAF environment')\n", + "\n", + "# get the current python version to match deploy a custom container\n", + "python_version = str(sys.version_info[0]) + '.' + str(sys.version_info[1])\n", + "print(f'Using Python version {python_version} for user environment')" + ] + }, + { + "cell_type": "markdown", + "id": "muslim-intention", + "metadata": {}, + "source": [ + "Connect to Vantage
\n", + "\n", + "Before performing any operations in Vantage, we need to connect to the system. The below code will read in a variables file (vars.json - this has been used in prior environment setup and data engineering examples) and will connect to Vantage with this information. The Vantage connection is referred to as a \"Context\" - a common python-rdbms connection architecture.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "pretty-forge", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# load vars json\n", + "with open('../../vars.json', 'r') as f:\n", + " session_vars = json.load(f)\n", + "\n", + "# Create the SQLAlchemy Context\n", + "host = session_vars['environment']['host']\n", + "username = session_vars['hierarchy']['users']['business_users'][1]['username']\n", + "password = session_vars['hierarchy']['users']['business_users'][1]['password']\n", + "\n", + "# UES Authentication information\n", + "ues_url = session_vars['environment']['UES_URI']\n", + "configure.ues_url = ues_url\n", + "pat_token = session_vars['hierarchy']['users']['business_users'][1]['pat_token']\n", + "pem_file = session_vars['hierarchy']['users']['business_users'][1]['key_file']\n", + "\n", + "compute_group = session_vars['hierarchy']['users']['business_users'][1]['compute_group']\n", + "\n", + "# check for existing connection\n", + "eng = check_and_connect(host=host, username=username, password=password, compute_group = compute_group)\n", + "print(eng)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3507baa-0c76-488a-b6af-fb704b0c6542", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# check cluster status\n", + "res = check_cluster_start(compute_group = compute_group)" + ] + }, + { + "cell_type": "markdown", + "id": "offshore-watch", + "metadata": {}, + "source": [ + "Demo 1 - Custom Container Management
\n", + "\n", + "\n", + "\n", + "The Teradata Vantage Python Client Library provides simple, powerful methods for the creation and maintenance of custom Python runtime environments in the VantageCloud environment . This allows practitioners complete control over the behavior and quality of their model performance and analytic accuracy running on the Analytic Cluster. The following demonstration will show how easy it is to create a custom xgboost-based scoring environment.
\n", + "\n", + "Custom environments are persistent. Users only need to create these once and then can be saved, updated, or modified only as needed.
\n", + "\n", + "Container Management Process
\n", + "\n",
+ "
\n", + " \n", + " \n", + " | \n",
+ " ![]() | \n",
+ "
Connect to the Environment Service
\n", + "\n", + "To better support integration with Cloud Services and commong automation tools; the User Environment Service is accessed via RESTful APIs. These APIs can be called directly or in the examples shown below that leverage the Python Package for Teradata (teradataml) methods.
\n", + "\n", + "In order to properly authenticate to the UES infrastructure, the user must log in with the same credentials that are used to connect to the database. When the following cell executes, follow the instructions to open a browser window, and log in with that user.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "seasonal-jonathan", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# check to see if there is a valid UES auth\n", + "# if not, authenticate\n", + "try:\n", + " demo_env = get_env(oaf_name)\n", + " print('Existing valid UES token')\n", + "\n", + "except Exception as e:\n", + " if '''NoneType' object has no attribute 'value''' in str(e) or '''Failed to execute get_env''' in str(e):\n", + " if set_auth_token(ues_url = ues_url, username = username, pat_token = pat_token, pem_file = pem_file):\n", + " print('UES Authentication successful')\n", + " else:\n", + " print('UES Authentication failed, check URL and account info')\n", + " pass\n", + " else:\n", + " raise\n", + " " + ] + }, + { + "cell_type": "markdown", + "id": "eligible-newfoundland", + "metadata": {}, + "source": [ + "Create a Custom Container in Vantage
\n", + "\n", + "If desired, the user can create a new custom environment by starting with a \"base\" image and customizing it. The steps are:
\n", + "Install Dependencies
\n", + "\n", + "The second step in the customization process is to install Python package dependencies. This set of code:\n", + "
\n", + "\n", + "Demo 2 - Install Custom Models and Scripts
\n", + "\n", + "Once the custom runtime environment has been created, the user can then load custom user-created assets. For the purposes of this Demonstration, we will load two files;
\n", + "\n", + "Once again, the Vantage Python Library makes this process straightforward by calling two simple methods:
\n", + "\n", + "\n",
+ "
\n", + " | \n",
+ " ![]() | \n",
+ "
Install User Files in the Cluster Container
\n", + "\n", + "Users can load any asset to the environment using the install_file method. This ensures that only authenticated users can install specific files into a dedicated filesystem, and helps prevent malicious code injection. Users pass the file name, and whether to replace an existing file.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "large-luther", + "metadata": {}, + "outputs": [], + "source": [ + "# Install xgboost model file.\n", + "#\n", + "demo_env.install_file('xgb_model', replace = True)\n", + "\n", + "# Install the desired Python script into the environment.\n", + "demo_env.install_file('Demo_XGB_Scoring.py', replace = True)" + ] + }, + { + "cell_type": "markdown", + "id": "minimal-transport", + "metadata": {}, + "source": [ + "List all installed files
\n", + "\n", + "files property lists the asset, size, and last updated timestamp. As above, these methods are available to manage the container remotely, since these containers live in the Vantage environment.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "running-tribute", + "metadata": {}, + "outputs": [], + "source": [ + "# Verify the files have been installed correctly.\n", + "demo_env.files" + ] + }, + { + "cell_type": "markdown", + "id": "responsible-switzerland", + "metadata": {}, + "source": [ + "Demo 3 - Model Scoring at Scale
\n", + "\n", + "VantageCloud Lake Edition Analytic Clusters combine the power and scale of native ClearScape Analytics Functions with the open and flexible runtime environments; offering users the flexibility to balance built-in data prep, transformation and feature engineering functions with custom code and models at massive scale.
\n", + "\n", + "Enterprise Class customers report the ability to reduce data prep and model scoring times from several hours per run to seconds; effectively allowing model scoring in near-real-time.
\n", + "\n", + "This demonstration will illustrate these key concepts:
\n", + "\n", + "\n",
+ "
\n", + " \n", + " | \n",
+ " ![]() | \n",
+ "
Data Transformation/Feature Engineering
\n", + "\n", + "Create a reference to the data set in Vantage, and apply powerful transformation functions directly on the Data. ClearScape Analytics is a suite of in-database massively-parallel-processing functions for statistical analysis, data cleaning and transformation, machine learning, text analytics, and model scoring. Practictioners can leverage these functions together with open-source modeling as illustrated here, or create powerful, native end-to-end pipelines using just these functions.
\n", + "\n", + "Engineer Features
\n", + "\n", + "Call the ClearScape One Hot Encoding function to transform the categorical column into numeric features.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "imposed-match", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Perform native one-hot encoding on the data\n", + "# These functions use a \"fit-and-transform\" pattern\n", + "# that supports reuse and easier operationalization of the transformation process\n", + "\n", + "from teradataml import OneHotEncodingFit, OneHotEncodingTransform\n", + "\n", + "res_ohe = OneHotEncodingFit(data = tdf_test, \n", + " target_column = 'txn_type', \n", + " categorical_values = ['CASH_OUT', 'CASH_IN', 'TRANSFER', 'DEBIT', 'PAYMENT'], \n", + " other_column = 'other',\n", + " is_input_dense = True)\n", + "\n", + "res_transformed = OneHotEncodingTransform(data = tdf_test, object = res_ohe.result, is_input_dense = True)\n", + "res_transformed.result.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "collectible-gather", + "metadata": {}, + "source": [ + "Execute the Scoring function
\n", + "\n", + "Now that the categorical column has been encoded, the XGBoost model can be called. This is executed via the Apply method, where we pass;
\n", + "\n", + "Finally, the script is executed by calling the \"execute_script\" method; this \"lazy\" evaluation allows for more modular and performant architecture.
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "unlimited-liver", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "\n", + "apply_obj = Apply(data = res_transformed.result.drop(['step', 'nameOrig', 'nameDest', 'isFlaggedFraud'], axis = 1),\n", + " apply_command = 'python3 Demo_XGB_Scoring.py',\n", + " returns = {'txn_id': VARCHAR(20), 'Prob_0': VARCHAR(30), \n", + " 'Prob_1': VARCHAR(30), 'Prediction':VARCHAR(2),\n", + " 'Actual': VARCHAR(2)},\n", + " env_name = demo_env,\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "opening-manner", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Execute the Python script inside the remote user environment.\n", + "# The result is a teradataml DataFrame. \n", + "#\n", + "\n", + "\n", + "scored_data = apply_obj.execute_script()\n", + "\n", + "# Only return five rows - minimize network overhead\n", + "scored_data.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "chief-falls", + "metadata": {}, + "source": [ + "Analyze the Results
\n", + "\n", + "It is common practice to measure the efficacy of a model. For this demonstration, a \"Confusion Matrix\" is generated that shows the quantity of true vs. false positives and negatives for the model.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "distinguished-motor", + "metadata": {}, + "outputs": [], + "source": [ + "# Copy the predictions to the client\n", + "# to generate the simple Confusion Matrix\n", + "# and print the AUC (Area Under Curve)\n", + "\n", + "df_test = scored_data.to_pandas(all_rows = True)\n", + "cm = confusion_matrix(df_test['Actual'].astype(int), df_test['Prediction'].astype(int))\n", + "disp = ConfusionMatrixDisplay(confusion_matrix = cm, display_labels = ['0', '1'])\n", + "fig, ax = plt.subplots(figsize=(10,10))\n", + "disp.plot(ax=ax)\n", + "\n", + "plt.show()\n", + "\n", + "#Get AUC score - anything over .75 is decent\n", + "AUC = roc_auc_score(df_test['Actual'].astype(int), df_test['Prediction'].astype(int))\n", + "print(f'AUC: {AUC}')" + ] + }, + { + "cell_type": "markdown", + "id": "conceptual-crash", + "metadata": {}, + "source": [ + "Disconnect from Vantage
\n", + "\n", + "Once complete, one can remove the custom environment (if desired) and close the \"context\" to the Vantage system.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e43065f2-19c8-4815-9f3d-3e638325070d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# check cluster status\n", + "res = check_cluster_stop(compute_group = compute_group)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "tired-purple", + "metadata": {}, + "outputs": [], + "source": [ + "# uninstall the libraries from the environment first before removing it\n", + "demo_env.uninstall_lib(libs = demo_env.libs['name'].to_list())\n", + "remove_env(demo_env.env_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fiscal-animal", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "material-groove", + "metadata": {}, + "source": [ + "Appendix - Model Training and Evaluation
\n", + "\n", + "VantageCloud Lake Edition Analytic Clusters and ClearScape Analytics functions can also be leveraged for model training. This brief addendum shows an abbreviated process for developing and testing an open-source fraud detection model with Vantage and XGBoost.
" + ] + }, + { + "cell_type": "markdown", + "id": "abroad-underground", + "metadata": {}, + "source": [ + "Connect to Vantage
\n", + "\n", + "If necessary, connect to Vantage. If the context is still valid from above this doesn't need to be run. The below code will read in a variables file (vars.json - this has been used in prior environment setup and data engineering examples) and will connect to Vantage with this information. The Vantage connection is referred to as a \"Context\" - a common python-rdbms connection architecture.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "contemporary-rouge", + "metadata": {}, + "outputs": [], + "source": [ + "# load vars json\n", + "with open('vars.json', 'r') as f:\n", + " session_vars = json.load(f)\n", + "\n", + "# Create the SQLAlchemy Context\n", + "host = session_vars['environment']['host']\n", + "username = session_vars['hierarchy']['users']['business_users'][1]['username']\n", + "password = session_vars['hierarchy']['users']['business_users'][1]['password']\n", + "\n", + "# UES Authentication information\n", + "ues_url = session_vars['environment']['UES_URI']\n", + "configure.ues_url = ues_url\n", + "pat_token = session_vars['hierarchy']['users']['business_users'][1]['pat_token']\n", + "pem_file = session_vars['hierarchy']['users']['business_users'][1]['key_file']\n", + "\n", + "compute_group = session_vars['hierarchy']['users']['business_users'][1]['compute_group']\n", + "\n", + "# check for existing connection\n", + "eng = check_and_connect(host=host, username=username, password=password, compute_group = compute_group)\n", + "print(eng)" + ] + }, + { + "cell_type": "markdown", + "id": "modified-services", + "metadata": {}, + "source": [ + "Get a reference to the data
\n", + "\n", + "Create a Teradataml DataFrame which references the data set in Vantage. This could be a table stored in direct-attach block storage, Performance-Optimized Object Storage (OFS), or stored in an open format in any Object Store.
\n", + "\n", + "Teradataml DataFrames do not copy data into local memory, so complex analytic and transformation operations can run against data at any scale, while leveraging the parallel processing and workload isolation of Vantage Analytic Clusters.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "american-centre", + "metadata": {}, + "outputs": [], + "source": [ + "# Updated variables to insure they are the same\n", + "tdf_test = DataFrame('\"demo_ofs\".\"txn_history\"')\n", + "tdf_test.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "terminal-network", + "metadata": {}, + "source": [ + "Engineer Features
\n", + "\n", + "Call the ClearScape One Hot Encoding function to transform the categorical column into numeric features.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "higher-courage", + "metadata": {}, + "outputs": [], + "source": [ + "from teradataml import OneHotEncodingFit, OneHotEncodingTransform\n", + "\n", + "res_ohe = OneHotEncodingFit(data = tdf_test, \n", + " target_column = 'txn_type', \n", + " categorical_values = ['CASH_OUT', 'CASH_IN', 'TRANSFER', 'DEBIT', 'PAYMENT'], \n", + " other_column = 'other',\n", + " is_input_dense = True)\n", + "\n", + "res_transformed = OneHotEncodingTransform(data = tdf_test, object = res_ohe.result, is_input_dense = True)\n", + "res_transformed.result.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "billion-drawing", + "metadata": {}, + "source": [ + "Design for Operations
\n", + "\n", + "Persist the \"Fit\" table to reuse it for the Operational transformation of new data
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "meaning-trading", + "metadata": {}, + "outputs": [], + "source": [ + "# copy the fit table to a permanent table for use later\n", + "res = copy_to_sql(res_ohe.result, table_name = 'OHE_FIT_TABLE', schema_name = 'demo_ofs', if_exists = 'replace')" + ] + }, + { + "cell_type": "markdown", + "id": "cognitive-dream", + "metadata": {}, + "source": [ + "Test/Train Split
\n", + "\n", + "Extraordinarily fast \"Sample\" function can split the data into multiple data sets in seconds.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ignored-scholar", + "metadata": {}, + "outputs": [], + "source": [ + "tdf_samples = res_transformed.result.sample(frac = [0.2, 0.8])\n", + "copy_to_sql(tdf_samples[tdf_samples['sampleid'] == 2], table_name = 'txns_train', schema_name = 'demo_ofs', if_exists = 'replace')\n", + "copy_to_sql(tdf_samples[tdf_samples['sampleid'] == 1], table_name = 'txns_test', schema_name = 'demo_ofs', if_exists = 'replace')" + ] + }, + { + "cell_type": "markdown", + "id": "major-nudist", + "metadata": {}, + "source": [ + "Train the Model
\n", + "\n", + "Use open-source XGBoost Classifier to train the model using the \"training\" data split above.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "demanding-bouquet", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a Pandas DataFrame\n", + "df_train = DataFrame('\"demo_ofs\".\"txns_train\"').to_pandas(all_rows = True)\n", + "\n", + "# define the input columns and target variable:\n", + "X_train = df_train[['txn_type_CASH_OUT', 'txn_type_CASH_IN', 'txn_type_TRANSFER',\n", + " 'txn_type_DEBIT', 'txn_type_PAYMENT', 'txn_type_other', 'amount','oldbalanceOrig', 'newbalanceOrig',\n", + " 'oldbalanceDest', 'newbalanceDest']]\n", + "y_train = df_train[['isFraud']]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "strong-lottery", + "metadata": {}, + "outputs": [], + "source": [ + "# Fit the Model\n", + "warnings.filterwarnings('ignore')\n", + "from xgboost import XGBClassifier\n", + "\n", + "model = XGBClassifier()\n", + "model.fit(X_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "id": "atmospheric-occasions", + "metadata": {}, + "source": [ + "Test the Model
\n", + "\n", + "It is common practice to measure the efficacy of a model. For this demonstration, a \"Confusion Matrix\" is generated that shows the quantity of true vs. false positives and negatives for the model.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "australian-religion", + "metadata": {}, + "outputs": [], + "source": [ + "# Return a Pandas DataFrame from the split data above\n", + "\n", + "df_test = DataFrame('\"demo_ofs\".\"txns_test\"').to_pandas(all_rows = True)\n", + "\n", + "# Define the input columns and target\n", + "X_test = df_test[['txn_type_CASH_OUT', 'txn_type_CASH_IN', 'txn_type_TRANSFER',\n", + " 'txn_type_DEBIT', 'txn_type_PAYMENT', 'txn_type_other', 'amount','oldbalanceOrig', 'newbalanceOrig',\n", + " 'oldbalanceDest', 'newbalanceDest']]\n", + "y_test = df_test[['isFraud']]\n", + "\n", + "\n", + "# Predict the class and the probability of Fraud\n", + "y_pred = model.predict(X_test)\n", + "y_prob = model.predict_proba(X_test)\n", + "\n", + "\n", + "# Generate the Confusion Matrix\n", + "df_test[['prob_0', 'prob_1']] = y_prob\n", + "df_test['prediction'] = y_pred\n", + "\n", + "cm = confusion_matrix(df_test['isFraud'], df_test['prediction'])\n", + "disp = ConfusionMatrixDisplay(confusion_matrix = cm, display_labels = ['0', '1'])\n", + "fig, ax = plt.subplots(figsize=(10,10))\n", + "disp.plot(ax=ax)\n", + "\n", + "plt.show()\n", + "\n", + "#Get AUC score - anything over .75 is decent\n", + "AUC = roc_auc_score(df_test['isFraud'], df_test['prediction'])\n", + "print(f'AUC: {AUC}')" + ] + }, + { + "cell_type": "markdown", + "id": "proper-friendship", + "metadata": {}, + "source": [ + "Save the Model
\n", + "\n", + "Save the model file in native xgboost format. This is used above in the main demonstration.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "assured-progressive", + "metadata": {}, + "outputs": [], + "source": [ + "model.save_model('xgb_model')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "formed-sheet", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "changed-certification", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "3.10.0", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.0" + }, + "toc-autonumbering": false, + "toc-showmarkdowntxt": true + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/UseCases/Opensource_Data_Science_OAF/images/Container_Layout.png b/UseCases/Opensource_Data_Science_OAF/images/Container_Layout.png new file mode 100755 index 00000000..79fac5d8 Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/Container_Layout.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/In_DB_Functions.png b/UseCases/Opensource_Data_Science_OAF/images/In_DB_Functions.png new file mode 100755 index 00000000..7445ea5f Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/In_DB_Functions.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/ML_Step1.png b/UseCases/Opensource_Data_Science_OAF/images/ML_Step1.png new file mode 100755 index 00000000..8266119f Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/ML_Step1.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/Model.png b/UseCases/Opensource_Data_Science_OAF/images/Model.png new file mode 100755 index 00000000..228bf77b Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/Model.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/OAF_Env.png b/UseCases/Opensource_Data_Science_OAF/images/OAF_Env.png new file mode 100755 index 00000000..1be627c3 Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/OAF_Env.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/OAF_Overview.png b/UseCases/Opensource_Data_Science_OAF/images/OAF_Overview.png new file mode 100755 index 00000000..73b29048 Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/OAF_Overview.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/OAF_Scoring.png b/UseCases/Opensource_Data_Science_OAF/images/OAF_Scoring.png new file mode 100755 index 00000000..239be028 Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/OAF_Scoring.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/Overview.png b/UseCases/Opensource_Data_Science_OAF/images/Overview.png new file mode 100755 index 00000000..0ca2cc23 Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/Overview.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/TeradataLogo.png b/UseCases/Opensource_Data_Science_OAF/images/TeradataLogo.png new file mode 100644 index 00000000..a6811164 Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/TeradataLogo.png differ diff --git a/UseCases/Opensource_Data_Science_OAF/images/new-tab-icon.png b/UseCases/Opensource_Data_Science_OAF/images/new-tab-icon.png new file mode 100644 index 00000000..34b83204 Binary files /dev/null and b/UseCases/Opensource_Data_Science_OAF/images/new-tab-icon.png differ