Skip to content

Inference Failure for MXNet MME Deployment in SageMaker #135

@swlkn

Description

@swlkn

Describe the bug
When deploying a multi model endpoint using the SageMaker MXNet container (which uses this library) I receive the following error when I try to invoke the endpoint: "AttributeError: 'NoneType' object has no attribute 'transform'.

To reproduce
Train a simple MXNet model in SageMaker using the example here.

At the end of the notebook, you verify that you can receive real time predictions from a SageMaker Endpoint using the model/code.

Next, add the following code to the notebook to deploy the same model as a multi model endpoint:

mme = MultiDataModel(name='mxnet-mme-final', model_data_prefix='<enter your own bucket + prefix>',
                                              model=model, sagemaker_session=sagemaker_session)

mme.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

Copy the model artifact to the model_data_prefix so it's accessible by MME.

Once it finishes deploying, try to invoke it using sagemaker_runtime in boto3, you will get the same error.

Expected behavior
I'd expect to be able to deploy the same model to MME as I can as a single endpoint using the SageMaker MXNet container. The SageMaker documentation says that MXNet container supports MME.

CloudWatch Endpoint Logs at time of failure

2021-02-11 19:25:28,443 [INFO ] pool-1-thread-1 ACCESS_LOG - /10.32.0.2:49282 "GET /ping HTTP/1.1" 200 1
2021-02-11 19:25:29,687 [WARN ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-29028cdaefc80ac4662997191
2021-02-11 19:25:29,807 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_mxnet_serving_container.handler_service --model-path /opt/ml/models/29028cdaefc80ac4662997191ec53602/model --model-name 29028cdaefc80ac4662997191ec53602 --preload-model false --tmp-dir /home/model-server/tmp
2021-02-11 19:25:29,809 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
2021-02-11 19:25:29,809 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 69
2021-02-11 19:25:29,809 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2021-02-11 19:25:29,809 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.10
2021-02-11 19:25:29,809 [INFO ] epollEventLoopGroup-3-1 com.amazonaws.ml.mms.wlm.ModelManager - Model 29028cdaefc80ac4662997191ec53602 loaded.
2021-02-11 19:25:29,818 [INFO ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2021-02-11 19:25:29,833 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2021-02-11 19:25:31,055 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model 29028cdaefc80ac4662997191ec53602 loaded io_fd=86fe85fffe5a7fee-00000026-00000002-7cbbb181288e375a-efae222a
2021-02-11 19:25:31,057 [INFO ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1195
2021-02-11 19:25:31,058 [INFO ] W-9000-29028cdaefc80ac4662997191 ACCESS_LOG - /10.32.0.2:49282 "POST /models HTTP/1.1" 200 1400
2021-02-11 19:25:31,061 [WARN ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-29028cdaefc80ac4662997191-1
2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Invoking custom service failed.
2021-02-11 19:25:31,106 [INFO ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2
2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/usr/local/lib/python3.6/site-packages/mms/service.py", line 108, in predict
2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ret = self._entry_point(input_batch, self.context)
2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/usr/local/lib/python3.6/site-packages/sagemaker_inference/default_handler_service.py", line 50, in handle
2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - return self._service.transform(data, context)
2021-02-11 19:25:31,107 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'NoneType' object has no attribute 'transform'
2021-02-11 19:25:31,109 [INFO ] W-9000-29028cdaefc80ac4662997191 ACCESS_LOG - /10.32.0.2:49282 "POST /models/29028cdaefc80ac4662997191ec53602/invoke HTTP/1.1" 503 14

System information
A description of your system. Please provide:

  • Toolkit version: Latest? The version that is installed in the SageMaker container.
  • Framework version: 1.6.0
  • Python version: 3.6
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context
I think I may know the cause of the bug.

Looking in to the handler_service.py code, the initialize() method retrieves the appropriate Transformer object and sets it to self._service. It then calls the parent class constructor, but it doesn't pass in the transformer object. The parent constructor accepts a transformer as a parameter (see here), but the parent class defaults the value of self._service to None, so when handle() is called by MMS, it uses the self._service set by the parent class, resulting in the error shown above.

If you look at SageMaker containers that successfully deploy MME, they pass the Transformer() in to the default handler constructor.

def initialize(self, context):
"""Calls the Transformer method that validates the user module against
the SageMaker inference contract.
"""
properties = context.system_properties
model_dir = properties.get("model_dir")
# add model_dir/code to python path
code_dir_path = "{}:".format(model_dir + '/code')
if PYTHON_PATH_ENV in os.environ:
os.environ[PYTHON_PATH_ENV] = code_dir_path + os.environ[PYTHON_PATH_ENV]
else:
os.environ[PYTHON_PATH_ENV] = code_dir_path
self._service = self._user_module_transformer(model_dir)
super(HandlerService, self).initialize(context)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions