diff --git a/aws_sagemaker_studio/frameworks/mxnet_onnx_ei/mxnet_onnx_ei.ipynb b/aws_sagemaker_studio/frameworks/mxnet_onnx_ei/mxnet_onnx_ei.ipynb index 3fa8a51c35..a002e89ca4 100644 --- a/aws_sagemaker_studio/frameworks/mxnet_onnx_ei/mxnet_onnx_ei.ipynb +++ b/aws_sagemaker_studio/frameworks/mxnet_onnx_ei/mxnet_onnx_ei.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Hosting ONNX models with Amazon Elastic Inference\n", + "# Hosting ONNX models with Amazon Elastic Inference (Testing)\n", "\n", "*(This notebook was tested with the \"Python 3 (MXNet CPU Optimized)\" kernel.)*\n", "\n", diff --git a/aws_sagemaker_studio/introduction_to_amazon_algorithms/xgboost_abalone/new_markdown_notebook.ipynb b/aws_sagemaker_studio/introduction_to_amazon_algorithms/xgboost_abalone/new_markdown_notebook.ipynb new file mode 100644 index 0000000000..1da3151775 --- /dev/null +++ b/aws_sagemaker_studio/introduction_to_amazon_algorithms/xgboost_abalone/new_markdown_notebook.ipynb @@ -0,0 +1,42 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Hello World!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "How are you?" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "celltoolbar": "Tags", + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "Python 3 (Data Science)", + "language": "python", + "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/aws_sagemaker_studio/sagemaker_algorithms/linear_learner_mnist/new_code_notebook.ipynb b/aws_sagemaker_studio/sagemaker_algorithms/linear_learner_mnist/new_code_notebook.ipynb new file mode 100644 index 0000000000..c8714b6f85 --- /dev/null +++ b/aws_sagemaker_studio/sagemaker_algorithms/linear_learner_mnist/new_code_notebook.ipynb @@ -0,0 +1,60 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Some Dumb Notebook I Wrote" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook has no purpose" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sagemaker" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Hello world with code!\")" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "celltoolbar": "Tags", + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "Python 3 (Data Science)", + "language": "python", + "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/ingest_data/ingest-data-types/ingest_tabular_data.ipynb b/ingest_data/ingest-data-types/ingest_tabular_data_rename.ipynb similarity index 89% rename from ingest_data/ingest-data-types/ingest_tabular_data.ipynb rename to ingest_data/ingest-data-types/ingest_tabular_data_rename.ipynb index c8214d3a6b..60b895e40a 100644 --- a/ingest_data/ingest-data-types/ingest_tabular_data.ipynb +++ b/ingest_data/ingest-data-types/ingest_tabular_data_rename.ipynb @@ -200,25 +200,6 @@ "with fs.open(data_s3fs_location) as f:\n", " print(pd.read_csv(f, nrows=5))" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 3.2 AWS Data Wrangler\n", - "[AWS Data Wrangler](https://github.com/awslabs/aws-data-wrangler) is an open-source Python library that extends the power of the Pandas library to AWS connecting DataFrames and AWS data related services (Amazon Redshift, AWS Glue, Amazon Athena, Amazon EMR, Amazon QuickSight, etc), which we will cover in later sections. It is built on top of other open-source projects like Pandas, Apache Arrow, Boto3, s3fs, SQLAlchemy, Psycopg2 and PyMySQL, and offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses and Databases. Note that you would need `s3fs version > 0.4.0` for the `awswrangler csv reader` to work." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data_wr_location = \"s3://{}/{}/{}\".format(bucket, prefix, filename) # S3 URL\n", - "wr_data = wr.s3.read_csv(path=data_wr_location, nrows=5)\n", - "wr_data.head()" - ] } ], "metadata": { diff --git a/ingest_data/ingest-data-types/ingest_text_data.ipynb b/ingest_data/ingest-data-types/ingest_text_data_rename.ipynb similarity index 95% rename from ingest_data/ingest-data-types/ingest_text_data.ipynb rename to ingest_data/ingest-data-types/ingest_text_data_rename.ipynb index b2a4c58f77..2d3860080c 100644 --- a/ingest_data/ingest-data-types/ingest_text_data.ipynb +++ b/ingest_data/ingest-data-types/ingest_text_data_rename.ipynb @@ -480,18 +480,6 @@ "text_data_new.to_csv(filename, index=False)\n", "upload_to_s3(bucket, \"text_twitter_sentiment_full\", filename)" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Citation\n", - "Twitter140 Data, Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009), p.12.\n", - "\n", - "SMS Spaming data, Almeida, T.A., Gómez Hidalgo, J.M., Yamakami, A. Contributions to the Study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11), Mountain View, CA, USA, 2011.\n", - "\n", - "J! Archive, J! Archive is created by fans, for fans. The Jeopardy! game show and all elements thereof, including but not limited to copyright and trademark thereto, are the property of Jeopardy Productions, Inc. and are protected under law. This website is not affiliated with, sponsored by, or operated by Jeopardy Productions, Inc." - ] } ], "metadata": { diff --git a/sagemaker-python-sdk/mxnet_gluon_mnist/mxnet_mnist_with_gluon.ipynb b/sagemaker-python-sdk/mxnet_gluon_mnist/mxnet_mnist_with_gluon.ipynb index ec63333be4..b0767bc4da 100644 --- a/sagemaker-python-sdk/mxnet_gluon_mnist/mxnet_mnist_with_gluon.ipynb +++ b/sagemaker-python-sdk/mxnet_gluon_mnist/mxnet_mnist_with_gluon.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# MNIST Training with MXNet and Gluon\n", + "## MNIST Training with MXNet and Gluon (Testing)\n", "\n", "MNIST is a widely used dataset for handwritten digit classification. It consists of 70,000 labeled 28x28 pixel grayscale images of hand-written digits. The dataset is split into 60,000 training images and 10,000 test images. There are 10 classes (one for each of the 10 digits). This tutorial shows how to train and test an MNIST model on SageMaker using MXNet and the Gluon API.\n", "\n", @@ -36,7 +36,8 @@ "\n", "sagemaker_session = sagemaker.Session()\n", "\n", - "role = get_execution_role()" + "role = get_execution_role()\n", + "print(role)" ] }, { diff --git a/sagemaker-python-sdk/pytorch_batch_inference/sagemaker_batch_inference_torchserve.ipynb b/sagemaker-python-sdk/pytorch_batch_inference/sagemaker_batch_inference_torchserve.ipynb index 88045af95f..ea2190fe78 100644 --- a/sagemaker-python-sdk/pytorch_batch_inference/sagemaker_batch_inference_torchserve.ipynb +++ b/sagemaker-python-sdk/pytorch_batch_inference/sagemaker_batch_inference_torchserve.ipynb @@ -8,17 +8,6 @@ "# SageMaker Real-time Dynamic Batching Inference with Torchserve" ] }, - { - "cell_type": "markdown", - "id": "d1647aa0-0140-40fc-bf58-bd8cf786d7a4", - "metadata": {}, - "source": [ - "This notebook demonstrates the use of dynamic batching on SageMaker with [torchserve](https://github.com/pytorch/serve/) as a model server. It demonstrates the following\n", - "1. Batch inference using DLC i.e. SageMaker's default backend container. This is done by using SageMaker python sdk in script-mode.\n", - "2. Specifying inference parameters for torchserve using environment variables.\n", - "3. Option to use a custom container with config file for torchserve baked-in the container." - ] - }, { "cell_type": "markdown", "id": "beb7434c-2d73-41dc-a56c-7db10b9f552f", @@ -264,16 +253,6 @@ "source": [ "predictor.delete_endpoint(predictor.endpoint_name)" ] - }, - { - "cell_type": "markdown", - "id": "1c7d981a-46de-46af-9e96-fcd66e3f057a", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "Through this exercise, we were able to understand the basics of batch inference using torchserve on Amazon SageMaker. We learnt that we can have several inference requests from different processes/users batched together, and the results will be processed as a batch of inputs. We also learnt that we could either use SageMaker's default DLC container as the base environment, and supply an inference.py script with the model, or create a custom container that can be used with SageMaker for more involved workflows." - ] } ], "metadata": { diff --git a/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb b/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb index 7c91e84339..d320c07023 100644 --- a/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb +++ b/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb @@ -4,16 +4,7 @@ "cell_type": "code", "execution_count": 1, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[33mWARNING: Skipping torchvison as it is not installed.\u001b[0m\n", - "yes: standard output: Broken pipe\n" - ] - } - ], + "outputs": [], "source": [ "!yes | pip uninstall torchvison\n", "!pip install -qU torchvision" @@ -711,6 +702,15 @@ "source": [ "sagemaker_session.delete_endpoint(endpoint_name=predictor.endpoint_name)" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"End of notebook\")" + ] } ], "metadata": { diff --git a/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist_elastic_inference.ipynb b/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist_elastic_inference_rename.ipynb similarity index 100% rename from sagemaker-python-sdk/pytorch_mnist/pytorch_mnist_elastic_inference.ipynb rename to sagemaker-python-sdk/pytorch_mnist/pytorch_mnist_elastic_inference_rename.ipynb diff --git a/sagemaker_processing/basic_sagemaker_data_processing/basic_sagemaker_processing.ipynb b/sagemaker_processing/basic_sagemaker_data_processing/basic_sagemaker_processing.ipynb index 552236814b..9c900413b4 100644 --- a/sagemaker_processing/basic_sagemaker_data_processing/basic_sagemaker_processing.ipynb +++ b/sagemaker_processing/basic_sagemaker_data_processing/basic_sagemaker_processing.ipynb @@ -148,72 +148,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "%%capture output\n", - "\n", - "from sagemaker.processing import ProcessingInput, ProcessingOutput\n", - "\n", - "sklearn_processor.run(\n", - " code=\"preprocessing.py\",\n", - " # arguments = [\"arg1\", \"arg2\"], # Arguments can optionally be specified here\n", - " inputs=[ProcessingInput(source=\"dataset.csv\", destination=\"/opt/ml/processing/input\")],\n", - " outputs=[\n", - " ProcessingOutput(source=\"/opt/ml/processing/output/train\"),\n", - " ProcessingOutput(source=\"/opt/ml/processing/output/validation\"),\n", - " ProcessingOutput(source=\"/opt/ml/processing/output/test\"),\n", - " ],\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get the Processing job logs and retrieve the job name." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(output)\n", - "job_name = str(output).split(\"\\n\")[1].split(\" \")[-1]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Confirm that the output dataset files were written to S3." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import boto3\n", - "\n", - "s3_client = boto3.client(\"s3\")\n", - "default_bucket = sagemaker.Session().default_bucket()\n", - "for i in range(1, 4):\n", - " prefix = s3_client.list_objects(\n", - " Bucket=default_bucket, Prefix=job_name + \"/output/output-\" + str(i) + \"/\"\n", - " )[\"Contents\"][0][\"Key\"]\n", - " print(\"s3://\" + default_bucket + \"/\" + prefix)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "In this notebook, we read a dataset from S3 and processed it into train, test, and validation sets using a SageMaker Processing job. You can extend this example for preprocessing your own datasets in preparation for machine learning or other applications." - ] + "source": [] } ], "metadata": { diff --git a/sagemaker_processing/fairness_and_explainability/text_explainability_sagemaker_algorithm/container/README.md b/sagemaker_processing/fairness_and_explainability/text_explainability_sagemaker_algorithm/container/README.md deleted file mode 100644 index dea5286cc6..0000000000 --- a/sagemaker_processing/fairness_and_explainability/text_explainability_sagemaker_algorithm/container/README.md +++ /dev/null @@ -1,72 +0,0 @@ -# Bring-your-own Algorithm Sample - -This example shows how to package an algorithm for use with SageMaker. - -SageMaker supports two execution modes: _training_ where the algorithm uses input data to train a new model and _serving_ where the algorithm accepts HTTP requests and uses the previously trained model to do an inference (also called "scoring", "prediction", or "transformation"). - -The algorithm that we have built here supports both training and scoring in SageMaker with the same container image. It is perfectly reasonable to build an algorithm that supports only training _or_ scoring as well as to build an algorithm that has separate container images for training and scoring.v - -In order to build a production grade inference server into the container, we use the following stack to make the implementer's job simple: - -1. __[nginx][nginx]__ is a light-weight layer that handles the incoming HTTP requests and manages the I/O in and out of the container efficiently. -2. __[gunicorn][gunicorn]__ is a WSGI pre-forking worker server that runs multiple copies of your application and load balances between them. -3. __[flask][flask]__ is a simple web framework used in the inference app that you write. It lets you respond to call on the `/ping` and `/invocations` endpoints without having to write much code. - -## The Structure of the Sample Code - -The components are as follows: - -* __Dockerfile__: The _Dockerfile_ describes how the image is built and what it contains. It is a recipe for your container and gives you tremendous flexibility to construct almost any execution environment you can imagine. Here. we use the Dockerfile to describe a pretty standard python science stack and the simple scripts that we're going to add to it. See the [Dockerfile reference][dockerfile] for what's possible here. - -* __build\_and\_push.sh__: The script to build the Docker image (using the Dockerfile above) and push it to the [Amazon EC2 Container Registry (ECR)][ecr] so that it can be deployed to SageMaker. Specify the name of the image as the argument to this script. The script will generate a full name for the repository in your account and your configured AWS region. If this ECR repository doesn't exist, the script will create it. - -* __blazing_text__: The directory that contains the application to run in the container. See the next session for details about each of the files. - -* __local-test__: A directory containing scripts and a setup for running a simple training and inference jobs locally so that you can test that everything is set up correctly. See below for details. - -### The application run inside the container - -When SageMaker starts a container, it will invoke the container with an argument of either __train__ or __serve__. We have set this container up so that the argument in treated as the command that the container executes. When training, it will run the __train__ program included and, when serving, it will run the __serve__ program. - - -* __serve__: The wrapper that starts the inference server. In most cases, you can use this file as-is. -* __wsgi.py__: The start up shell for the individual server workers. This only needs to be changed if you changed where predictor.py is located or is named. -* __predictor.py__: The algorithm-specific inference server. This is the file that you modify with your own algorithm's code. -* __nginx.conf__: The configuration for the nginx master server that manages the multiple workers. - -### Setup for local testing - -The subdirectory local-test contains scripts and sample data for testing the built container image on the local machine. When building your own algorithm, you'll want to modify it appropriately. - -* __train-local.sh__: Instantiate the container configured for training. -* __serve-local.sh__: Instantiate the container configured for serving. -* __predict.sh__: Run predictions against a locally instantiated server. -* __test-dir__: The directory that gets mounted into the container with test data mounted in all the places that match the container schema. -* __payload.csv__: Sample data for used by predict.sh for testing the server. - -#### The directory tree mounted into the container - -The tree under test-dir is mounted into the container and mimics the directory structure that SageMaker would create for the running container during training or hosting. - -* __input/config/hyperparameters.json__: The hyperparameters for the training job. -* __input/data/training/leaf_train.csv__: The training data. -* __model__: The directory where the algorithm writes the model file. -* __output__: The directory where the algorithm can write its success or failure file. - -## Environment variables - -When you create an inference server, you can control some of Gunicorn's options via environment variables. These -can be supplied as part of the CreateModel API call. - - Parameter Environment Variable Default Value - --------- -------------------- ------------- - number of workers MODEL_SERVER_WORKERS the number of CPU cores - timeout MODEL_SERVER_TIMEOUT 60 seconds - - -[skl]: http://scikit-learn.org "scikit-learn Home Page" -[dockerfile]: https://docs.docker.com/engine/reference/builder/ "The official Dockerfile reference guide" -[ecr]: https://aws.amazon.com/ecr/ "ECR Home Page" -[nginx]: http://nginx.org/ -[gunicorn]: http://gunicorn.org/ -[flask]: http://flask.pocoo.org/