Skip to content

Conversation

vdantu
Copy link
Contributor

@vdantu vdantu commented Apr 16, 2021

Issue #, if available:

PR Checklist

  • I've prepended PR tag with frameworks/job this applies to : [mxnet, tensorflow, pytorch] | [ei/neuron] | [build] | [test] | [benchmark] | [ec2, ecs, eks, sagemaker]
  • (If applicable) I've documented below the DLC image/dockerfile this relates to
  • (If applicable) I've documented below the tests I've run on the DLC image
  • (If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the Apache Software Foundation Third Party License Policy Category A or Category B license list. See https://www.apache.org/legal/resolved.html.
  • (If applicable) I've scanned the updated and new binaries to make sure they do not have vulnerabilities associated with them.

Pytest Marker Checklist

  • (If applicable) I have added the marker @pytest.mark.model("<model-type>") to the new tests which I have added, to specify the Deep Learning model that is used in the test (use "N/A" if the test doesn't use a model)
  • (If applicable) I have added the marker @pytest.mark.integration("<feature-being-tested>") to the new tests which I have added, to specify the feature that will be tested
  • (If applicable) I have added the marker @pytest.mark.multinode(<integer-num-nodes>) to the new tests which I have added, to specify the number of nodes used on a multi-node test
  • (If applicable) I have added the marker @pytest.mark.processor(<"cpu"/"gpu"/"eia"/"neuron">) to the new tests which I have added, if a test is specifically applicable to only one processor type

EIA/NEURON Checklist

  • When creating a PR:
  • I've modified src/config/build_config.py in my PR branch by setting ENABLE_EI_MODE = True or ENABLE_NEURON_MODE = True
  • When PR is reviewed and ready to be merged:
  • I've reverted the code change on the config file mentioned above

Benchmark Checklist

  • When creating a PR:
  • I've modified src/config/test_config.py in my PR branch by setting ENABLE_BENCHMARK_DEV_MODE = True
  • When PR is reviewed and ready to be merged:
  • I've reverted the code change on the config file mentioned above

Reviewer Checklist

  • For reviewer, before merging, please cross-check:
  • I've verified the code change on the config file mentioned above has already been reverted

Description:

Tests run:

DLC image/dockerfile:

Additional context:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@vdantu vdantu force-pushed the hugging_face branch 3 times, most recently from 34286af to 30bfb8a Compare April 19, 2021 05:20
Copy link
Contributor

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments. I think we can optimize the base container even further. But I already starting testing with it for PyTorch and TensorFlow

ARG PYTHON=python3
ARG PYTHON_VERSION=3.7.10
ARG PT_INFERENCE_URL=https://aws-pytorch-binaries.s3-us-west-2.amazonaws.com/r1.7.1_inference/20210112-183245/c1130f2829b03c0997b9813211a7c0f600fc569a/cpu/torch-1.7.1-cp36-cp36m-manylinux1_x86_64.whl
ARG PT_TORCHVISION_URL=https://torchvision-build.s3-us-west-2.amazonaws.com/1.7.1/cpu/torchvision-0.8.2-cp36-cp36m-linux_x86_64.whl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same goes for torchvision, which we don't need at all.

Copy link
Contributor Author

@vdantu vdantu Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we don't need this and we will have HF container bring in the PT/TF frameworks.

python=$PYTHON_VERSION \
&& /opt/conda/bin/conda install -y \
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest.
ruamel_yaml==0.15.100 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ruamel_yaml for the inference container?

python=$PYTHON_VERSION \
&& /opt/conda/bin/conda install -y \
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest.
ruamel_yaml==0.15.100 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ruamel_yaml for the inference container?

@@ -0,0 +1,5 @@
vmargs=-XX:+UseContainerSupport -XX:InitialRAMPercentage=8.0 -XX:MaxRAMPercentage=10.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would this file place in build_artifacts instead of artifacts to keep it the same across all frameworks.
https://github.com/vdantu/deep-learning-containers/tree/hugging_face/pytorch/inference/docker/build_artifacts

Also, can you remove the __init__.py from artifacts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can name it either artifacts or build_artifacts. artifacts and build_artifacts are both used interchangeably.

@@ -0,0 +1,25 @@
# Copyright 2019-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would this file place in build_artifacts instead of artifacts to keep it the same across all frameworks.
https://github.com/vdantu/deep-learning-containers/tree/hugging_face/pytorch/inference/docker/build_artifacts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are framework repositories with "artifacts" and "build-artifacts" .. We could move it to either, I don't have any preference.

&& mkdir -p /home/model-server/tmp \
&& chown -R model-server /home/model-server

COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be adjusted to the right directory for the build context. Not sure what the context of docker build is but I doubt it its artifacts/

&& chown -R model-server /home/model-server

COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
COPY config.properties /home/model-server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be adjusted to the right directory for the build context. Not sure what the context of docker build is but I doubt it its artifacts/

Comment on lines 69 to 62
RUN /opt/conda/bin/conda install -c \
conda-forge \
&& /opt/conda/bin/conda install -y \
requests==2.22.0 \
&& /opt/conda/bin/conda clean -ya \
&& pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org \
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \
&& pip install packaging==20.4 \
enum-compat==0.0.3 \
"cryptography>3.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN /opt/conda/bin/conda install -c \
conda-forge \
&& /opt/conda/bin/conda install -y \
requests==2.22.0 \
&& /opt/conda/bin/conda clean -ya \
&& pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org \
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \
&& pip install packaging==20.4 \
enum-compat==0.0.3 \
"cryptography>3.2"
RUN pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org \
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \
&& pip install packaging==20.4 \
enum-compat==0.0.3 \
"cryptography>3.2"


ARG MMS_VERSION=1.1.2
ARG PYTHON=python3
ARG PYTHON_VERSION=3.7.10
Copy link
Contributor

@philschmid philschmid Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you chosen PYTHON_VERSION=3.7.10 when the other PyTorch inference DLC use 3.6.13 and the aws optimized PyTorch only supports py36?

Suggested change
ARG PYTHON_VERSION=3.7.10
ARG PYTHON_VERSION=3.6.13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I need to revert this.


ARG MMS_VERSION=1.1.2
ARG PYTHON=python3
ARG PYTHON_VERSION=3.7.10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you chosen PYTHON_VERSION=3.7.10 when the other PyTorch inference DLC use 3.6.13 and the aws optimized PyTorch only supports py36?

Suggested change
ARG PYTHON_VERSION=3.7.10
ARG PYTHON_VERSION=3.6.13

openjdk-8-jdk \
openjdk-8-jre \
vim \
wget \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wget \
wget \
git \

openjdk-8-jdk \
openjdk-8-jre \
vim \
wget \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wget \
wget \
git \

@philschmid
Copy link
Contributor

philschmid commented Apr 19, 2021

Also when I compared the dockerfile with the old PyTorch 1.4.0. I noticed that you removed amongst other mkl-include and mkl, which are still included in the latest PyTorch and TensorFlow versions. What is the reason behind this removal?
30bfb8a


EXPOSE 8080 8081
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"]
CMD ["serve"]

Not sure why you start the model-server like that and not use the dockerd-entrypoint.py with the inference toolkit.


EXPOSE 8080 8081
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"]
CMD ["serve"]

Not sure why you start the model-server like that and not use the dockerd-entrypoint.py with the inference toolkit.

&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \
&& rm -rf ${HOME_DIR}/oss_compliance*

RUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.7/license.txt -o /license.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why should include the license in the base container. I would rather add it to the PyTorch-specific one and for TensorFlow the same. Since the base container is not including the license for that.

&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \
&& rm -rf ${HOME_DIR}/oss_compliance*

RUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.7/license.txt -o /license.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why should include the license in the base container. I would rather add it to the PyTorch-specific one and for TensorFlow the same. Since the base container is not including the license for that.

@philschmid
Copy link
Contributor

While testing the DLC with PyTorch, more specific with
https://aws-pytorch-binaries.s3-us-west-2.amazonaws.com/r1.7.1_inference/20210112-183245/c1130f2829b03c0997b9813211a7c0f600fc569a/cpu/torch-1.7.1-cp36-cp36m-manylinux1_x86_64.whl version of Pytorch I get the following error.

  File "/opt/conda/lib/python3.6/site-packages/huggingface_hub/hub_mixin.py", line 15, in <module>
    import torch
  File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
    _load_global_deps()
  File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/conda/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory

When I use the base torch==1.7.1+cpu it works. Can you please look into this.

@vdantu vdantu force-pushed the hugging_face branch 10 times, most recently from 994bc2e to a842576 Compare April 21, 2021 00:24
@vdantu vdantu force-pushed the hugging_face branch 3 times, most recently from eaa1b29 to 4f02ada Compare April 21, 2021 01:03
Copy link
Contributor

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting with refactoring. Sadly it seems like all the comments got erased and not all of them were resolved. I went again through each dockerfile line by line and added comments and thoughts/questions.
Furthermore, I saw your commit message Added py36 and py37 docker files to readily use aws TF and PT wheels.
If we need py37 for TensorFlow and py36 for PyTorch then we could either make PYTHON_VERSION as build-args, so we need only on dockerfile, if this is the only difference. Or we could basically build without a base container and add the framework-specific commands to them too.
It makes no sense to have 4 BASE container files and 4 Framework-specific files when the only difference is the Python version. I would rather only have the 4 Framework specific container.

I also saw that you used force-push i am not sure if this was the reason why the comments got erased, but this should be avoided I think.

Comment on lines +64 to +71
RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz \
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf - \
&& cd openmpi-$OPEN_MPI_VERSION \
&& ./configure --prefix=/home/.openmpi \
&& make all install \
&& cd .. \
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz \
&& rm -rf openmpi-$OPEN_MPI_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we should need OPEN_MPI in an inference CPU container. Is it used by some sort of SageMaker or MMS feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently aws-fx wheel depends on this. We can revisit this once (if?) we remove this dependency in the future.

ARG MMS_VERSION=1.1.2
ARG PYTHON=python3
ARG PYTHON_VERSION=3.6.13
ARG OPEN_MPI_VERSION=4.0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add OPEN_MPI_VERSION compared to the initial one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get our aws-pytorch wheel working.

openssl \
openjdk-8-jdk-headless \
vim \
wget \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wget \
wget \
git \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need git for building/installing HF inference toolkit? If yes, we should add it on top of base container spec. We could later remove it when we publish the toolkits.

vim \
wget \
curl \
unzip \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to the initial dockerfile you have removed several packages.

    zlib1g-dev \
    libreadline-gplv2-dev \
    libncursesw5-dev \
    libssl-dev \
    libsqlite3-dev \
    libgdbm-dev \
    libc6-dev \
    libbz2-dev \
    tk-dev \
    libffi-dev \

Are they not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Functionally these don't seem to be required. Will revisit some (like libffi) later. We technically shouldn't need any dev packages in prod containers.

python=$PYTHON_VERSION \
&& /opt/conda/bin/conda install -y \
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest.
ruamel_yaml==0.15.100 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ruamel_yaml for the inference container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggested that we need this for conda. Considering we are on 4.10.1 we might not need this anyomre. But, we need to verify before removing.

# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest.
ruamel_yaml==0.15.100 \
cython==0.29.12 \
botocore \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, is botocore required? If yes I would suggest pinning the version. It is also new compared to the initial one you provided.

Comment on lines +65 to +72
RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz \
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf - \
&& cd openmpi-$OPEN_MPI_VERSION \
&& ./configure --prefix=/home/.openmpi \
&& make all install \
&& cd .. \
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz \
&& rm -rf openmpi-$OPEN_MPI_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we should need OPEN_MPI in an inference CPU container. Is it used by some sort of SageMaker or MMS feature?

Comment on lines +74 to +75
ENV PATH="$PATH:/home/.openmpi/bin"
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/.openmpi/lib/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to OPEN_MPI comment

Comment on lines +87 to +88
COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
COPY config.properties /home/model-server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be adjusted to the right directory for the build context. Not sure what the context of docker build is but I doubt it is `artifacts/


EXPOSE 8080 8081
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you start the model-server like that and not use the dockerd-entrypoint.py with the inference toolkit.

Suggested change
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"]
CMD ["serve"]

Copy link
Contributor Author

@vdantu vdantu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some duplicate comments, was that intentional?

@@ -0,0 +1,5 @@
vmargs=-XX:+UseContainerSupport -XX:InitialRAMPercentage=8.0 -XX:MaxRAMPercentage=10.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can name it either artifacts or build_artifacts. artifacts and build_artifacts are both used interchangeably.

ARG MMS_VERSION=1.1.2
ARG PYTHON=python3
ARG PYTHON_VERSION=3.6.13
ARG OPEN_MPI_VERSION=4.0.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get our aws-pytorch wheel working.

openssl \
openjdk-8-jdk-headless \
vim \
wget \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need git for building/installing HF inference toolkit? If yes, we should add it on top of base container spec. We could later remove it when we publish the toolkits.

vim \
wget \
curl \
unzip \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Functionally these don't seem to be required. Will revisit some (like libffi) later. We technically shouldn't need any dev packages in prod containers.

wget \
curl \
unzip \
&& apt-get clean \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. This was required for getting GPU version of aws-pytorch wheel run on GPU containers. Will revisit if we can remove this.

Comment on lines +64 to +71
RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz \
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf - \
&& cd openmpi-$OPEN_MPI_VERSION \
&& ./configure --prefix=/home/.openmpi \
&& make all install \
&& cd .. \
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz \
&& rm -rf openmpi-$OPEN_MPI_VERSION
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently aws-fx wheel depends on this. We can revisit this once (if?) we remove this dependency in the future.

Comment on lines +90 to +91
COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
COPY config.properties /home/model-server
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To build the inference containers, we need to add logic to buildspec.yml. That will tell the container building pipelines how to pull in the required artifacts and Dockerfiles and build the container.


EXPOSE 8080 8081
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have ENTRYPOINT in the Dockerfile which will be called with serve. Why can't we follow something like this

# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest.
ruamel_yaml==0.15.100 \
cython==0.29.12 \
botocore \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we can pin this to current latest version.

@@ -0,0 +1,107 @@
FROM nvidia/cuda:11.0-cudnn8-runtime-ubuntu18.04
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently testing this. The CUDA interoperability is more nuanced than I thought. 😄 . Will update this if endpoint creation fails.

@vdantu vdantu changed the title [WIP][HuggingFace Inference] Add new base HuggingFace inference containers [HuggingFace Inference] Add new base HuggingFace inference containers Apr 22, 2021
@philschmid
Copy link
Contributor

can be closed, due to #1077

@@ -0,0 +1,110 @@
FROM ubuntu:18.04

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use public ecr. Docker hub will throttle pull requests


EXPOSE 8080 8081
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the CMD since these images are for SageMaker only(don't need to support DLC use case). SageMaker overrides anyway - https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-run-image

@@ -0,0 +1,107 @@
FROM nvidia/cuda:11.0-cudnn8-runtime-ubuntu18.04

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, use ECR public

# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest.
ruamel_yaml==0.15.100 \
cython==0.29.12 \
botocore \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need boto?

Comment on lines +65 to +75
RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz \
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf - \
&& cd openmpi-$OPEN_MPI_VERSION \
&& ./configure --prefix=/home/.openmpi \
&& make all install \
&& cd .. \
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz \
&& rm -rf openmpi-$OPEN_MPI_VERSION

ENV PATH="$PATH:/home/.openmpi/bin"
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/.openmpi/lib/"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be required as a dlc image. SageMaker probably doesn't need this.

@saimidu
Copy link
Contributor

saimidu commented Jun 28, 2021

Closing this PR now. Let's create a new PR if we want to consider this approach again.

@saimidu saimidu closed this Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants