-
Notifications
You must be signed in to change notification settings - Fork 520
[HuggingFace Inference] Add new base HuggingFace inference containers #1040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
34286af
to
30bfb8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments. I think we can optimize the base container even further. But I already starting testing with it for PyTorch and TensorFlow
ARG PYTHON=python3 | ||
ARG PYTHON_VERSION=3.7.10 | ||
ARG PT_INFERENCE_URL=https://aws-pytorch-binaries.s3-us-west-2.amazonaws.com/r1.7.1_inference/20210112-183245/c1130f2829b03c0997b9813211a7c0f600fc569a/cpu/torch-1.7.1-cp36-cp36m-manylinux1_x86_64.whl | ||
ARG PT_TORCHVISION_URL=https://torchvision-build.s3-us-west-2.amazonaws.com/1.7.1/cpu/torchvision-0.8.2-cp36-cp36m-linux_x86_64.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same goes for torchvision
, which we don't need at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we don't need this and we will have HF container bring in the PT/TF frameworks.
python=$PYTHON_VERSION \ | ||
&& /opt/conda/bin/conda install -y \ | ||
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest. | ||
ruamel_yaml==0.15.100 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need ruamel_yaml
for the inference container?
python=$PYTHON_VERSION \ | ||
&& /opt/conda/bin/conda install -y \ | ||
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest. | ||
ruamel_yaml==0.15.100 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need ruamel_yaml
for the inference container?
@@ -0,0 +1,5 @@ | |||
vmargs=-XX:+UseContainerSupport -XX:InitialRAMPercentage=8.0 -XX:MaxRAMPercentage=10.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would this file place in build_artifacts
instead of artifacts
to keep it the same across all frameworks.
https://github.com/vdantu/deep-learning-containers/tree/hugging_face/pytorch/inference/docker/build_artifacts
Also, can you remove the __init__.py
from artifacts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can name it either artifacts
or build_artifacts
. artifacts and build_artifacts are both used interchangeably.
@@ -0,0 +1,25 @@ | |||
# Copyright 2019-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would this file place in build_artifacts
instead of artifacts
to keep it the same across all frameworks.
https://github.com/vdantu/deep-learning-containers/tree/hugging_face/pytorch/inference/docker/build_artifacts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are framework repositories with "artifacts" and "build-artifacts" .. We could move it to either, I don't have any preference.
&& mkdir -p /home/model-server/tmp \ | ||
&& chown -R model-server /home/model-server | ||
|
||
COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be adjusted to the right directory for the build context. Not sure what the context of docker build
is but I doubt it its artifacts/
&& chown -R model-server /home/model-server | ||
|
||
COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py | ||
COPY config.properties /home/model-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be adjusted to the right directory for the build context. Not sure what the context of docker build
is but I doubt it its artifacts/
RUN /opt/conda/bin/conda install -c \ | ||
conda-forge \ | ||
&& /opt/conda/bin/conda install -y \ | ||
requests==2.22.0 \ | ||
&& /opt/conda/bin/conda clean -ya \ | ||
&& pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org \ | ||
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \ | ||
&& pip install packaging==20.4 \ | ||
enum-compat==0.0.3 \ | ||
"cryptography>3.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN /opt/conda/bin/conda install -c \ | |
conda-forge \ | |
&& /opt/conda/bin/conda install -y \ | |
requests==2.22.0 \ | |
&& /opt/conda/bin/conda clean -ya \ | |
&& pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org \ | |
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \ | |
&& pip install packaging==20.4 \ | |
enum-compat==0.0.3 \ | |
"cryptography>3.2" | |
RUN pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org \ | |
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \ | |
&& pip install packaging==20.4 \ | |
enum-compat==0.0.3 \ | |
"cryptography>3.2" |
|
||
ARG MMS_VERSION=1.1.2 | ||
ARG PYTHON=python3 | ||
ARG PYTHON_VERSION=3.7.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you chosen PYTHON_VERSION=3.7.10
when the other PyTorch inference DLC use 3.6.13
and the aws optimized PyTorch only supports py36
?
ARG PYTHON_VERSION=3.7.10 | |
ARG PYTHON_VERSION=3.6.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I need to revert this.
|
||
ARG MMS_VERSION=1.1.2 | ||
ARG PYTHON=python3 | ||
ARG PYTHON_VERSION=3.7.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you chosen PYTHON_VERSION=3.7.10
when the other PyTorch inference DLC use 3.6.13
and the aws optimized PyTorch only supports py36
?
ARG PYTHON_VERSION=3.7.10 | |
ARG PYTHON_VERSION=3.6.13 |
openjdk-8-jdk \ | ||
openjdk-8-jre \ | ||
vim \ | ||
wget \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wget \ | |
wget \ | |
git \ |
openjdk-8-jdk \ | ||
openjdk-8-jre \ | ||
vim \ | ||
wget \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wget \ | |
wget \ | |
git \ |
Also when I compared the |
|
||
EXPOSE 8080 8081 | ||
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"] | ||
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"] | |
CMD ["serve"] |
Not sure why you start the model-server
like that and not use the dockerd-entrypoint.py
with the inference toolkit.
|
||
EXPOSE 8080 8081 | ||
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"] | ||
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"] | |
CMD ["serve"] |
Not sure why you start the model-server
like that and not use the dockerd-entrypoint.py
with the inference toolkit.
&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \ | ||
&& rm -rf ${HOME_DIR}/oss_compliance* | ||
|
||
RUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.7/license.txt -o /license.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why should include the license in the base container. I would rather add it to the PyTorch-specific one and for TensorFlow the same. Since the base container is not including the license for that.
&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \ | ||
&& rm -rf ${HOME_DIR}/oss_compliance* | ||
|
||
RUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.7/license.txt -o /license.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why should include the license in the base container. I would rather add it to the PyTorch-specific one and for TensorFlow the same. Since the base container is not including the license for that.
While testing the DLC with PyTorch, more specific with File "/opt/conda/lib/python3.6/site-packages/huggingface_hub/hub_mixin.py", line 15, in <module>
import torch
File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
_load_global_deps()
File "/opt/conda/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/opt/conda/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory When I use the base |
994bc2e
to
a842576
Compare
eaa1b29
to
4f02ada
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting with refactoring. Sadly it seems like all the comments got erased and not all of them were resolved. I went again through each dockerfile
line by line and added comments and thoughts/questions.
Furthermore, I saw your commit message Added py36 and py37 docker files to readily use aws TF and PT wheels
.
If we need py37
for TensorFlow and py36
for PyTorch then we could either make PYTHON_VERSION
as build-args, so we need only on dockerfile, if this is the only difference. Or we could basically build without a base container and add the framework-specific commands to them too.
It makes no sense to have 4 BASE container files and 4 Framework-specific files when the only difference is the Python version. I would rather only have the 4 Framework specific container.
I also saw that you used force-push
i am not sure if this was the reason why the comments got erased, but this should be avoided I think.
RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz \ | ||
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf - \ | ||
&& cd openmpi-$OPEN_MPI_VERSION \ | ||
&& ./configure --prefix=/home/.openmpi \ | ||
&& make all install \ | ||
&& cd .. \ | ||
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz \ | ||
&& rm -rf openmpi-$OPEN_MPI_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we should need OPEN_MPI
in an inference CPU container. Is it used by some sort of SageMaker or MMS feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently aws-fx wheel depends on this. We can revisit this once (if?) we remove this dependency in the future.
ARG MMS_VERSION=1.1.2 | ||
ARG PYTHON=python3 | ||
ARG PYTHON_VERSION=3.6.13 | ||
ARG OPEN_MPI_VERSION=4.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add OPEN_MPI_VERSION
compared to the initial one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to get our aws-pytorch wheel working.
openssl \ | ||
openjdk-8-jdk-headless \ | ||
vim \ | ||
wget \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wget \ | |
wget \ | |
git \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need git for building/installing HF inference toolkit? If yes, we should add it on top of base container spec. We could later remove it when we publish the toolkits.
vim \ | ||
wget \ | ||
curl \ | ||
unzip \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to the initial dockerfile
you have removed several packages.
zlib1g-dev \
libreadline-gplv2-dev \
libncursesw5-dev \
libssl-dev \
libsqlite3-dev \
libgdbm-dev \
libc6-dev \
libbz2-dev \
tk-dev \
libffi-dev \
Are they not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Functionally these don't seem to be required. Will revisit some (like libffi) later. We technically shouldn't need any dev packages in prod containers.
python=$PYTHON_VERSION \ | ||
&& /opt/conda/bin/conda install -y \ | ||
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest. | ||
ruamel_yaml==0.15.100 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need ruamel_yaml
for the inference container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment suggested that we need this for conda. Considering we are on 4.10.1 we might not need this anyomre. But, we need to verify before removing.
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest. | ||
ruamel_yaml==0.15.100 \ | ||
cython==0.29.12 \ | ||
botocore \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, is botocore
required? If yes I would suggest pinning the version. It is also new compared to the initial one you provided.
RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz \ | ||
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf - \ | ||
&& cd openmpi-$OPEN_MPI_VERSION \ | ||
&& ./configure --prefix=/home/.openmpi \ | ||
&& make all install \ | ||
&& cd .. \ | ||
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz \ | ||
&& rm -rf openmpi-$OPEN_MPI_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we should need OPEN_MPI
in an inference CPU container. Is it used by some sort of SageMaker or MMS feature?
ENV PATH="$PATH:/home/.openmpi/bin" | ||
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/.openmpi/lib/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to OPEN_MPI
comment
COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py | ||
COPY config.properties /home/model-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be adjusted to the right directory for the build context. Not sure what the context of docker build
is but I doubt it is `artifacts/
|
||
EXPOSE 8080 8081 | ||
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"] | ||
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you start the model-server
like that and not use the dockerd-entrypoint.py
with the inference toolkit.
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"] | |
CMD ["serve"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some duplicate comments, was that intentional?
@@ -0,0 +1,5 @@ | |||
vmargs=-XX:+UseContainerSupport -XX:InitialRAMPercentage=8.0 -XX:MaxRAMPercentage=10.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can name it either artifacts
or build_artifacts
. artifacts and build_artifacts are both used interchangeably.
ARG MMS_VERSION=1.1.2 | ||
ARG PYTHON=python3 | ||
ARG PYTHON_VERSION=3.6.13 | ||
ARG OPEN_MPI_VERSION=4.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to get our aws-pytorch wheel working.
openssl \ | ||
openjdk-8-jdk-headless \ | ||
vim \ | ||
wget \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need git for building/installing HF inference toolkit? If yes, we should add it on top of base container spec. We could later remove it when we publish the toolkits.
vim \ | ||
wget \ | ||
curl \ | ||
unzip \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Functionally these don't seem to be required. Will revisit some (like libffi) later. We technically shouldn't need any dev packages in prod containers.
wget \ | ||
curl \ | ||
unzip \ | ||
&& apt-get clean \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. This was required for getting GPU version of aws-pytorch wheel run on GPU containers. Will revisit if we can remove this.
RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz \ | ||
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf - \ | ||
&& cd openmpi-$OPEN_MPI_VERSION \ | ||
&& ./configure --prefix=/home/.openmpi \ | ||
&& make all install \ | ||
&& cd .. \ | ||
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz \ | ||
&& rm -rf openmpi-$OPEN_MPI_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently aws-fx wheel depends on this. We can revisit this once (if?) we remove this dependency in the future.
COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py | ||
COPY config.properties /home/model-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To build the inference containers, we need to add logic to buildspec.yml
. That will tell the container building pipelines how to pull in the required artifacts and Dockerfiles and build the container.
|
||
EXPOSE 8080 8081 | ||
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"] | ||
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have ENTRYPOINT
in the Dockerfile which will be called with serve. Why can't we follow something like this
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest. | ||
ruamel_yaml==0.15.100 \ | ||
cython==0.29.12 \ | ||
botocore \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we can pin this to current latest version.
@@ -0,0 +1,107 @@ | |||
FROM nvidia/cuda:11.0-cudnn8-runtime-ubuntu18.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently testing this. The CUDA interoperability is more nuanced than I thought. 😄 . Will update this if endpoint creation fails.
can be closed, due to #1077 |
@@ -0,0 +1,110 @@ | |||
FROM ubuntu:18.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use public ecr. Docker hub will throttle pull requests
|
||
EXPOSE 8080 8081 | ||
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"] | ||
CMD ["multi-model-server", "--start", "--mms-config", "/home/model-server/config.properties"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need the CMD since these images are for SageMaker only(don't need to support DLC use case). SageMaker overrides anyway - https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-run-image
@@ -0,0 +1,107 @@ | |||
FROM nvidia/cuda:11.0-cudnn8-runtime-ubuntu18.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, use ECR public
# conda 4.10.0 requires ruamel_yaml to be installed. Currently pinned at latest. | ||
ruamel_yaml==0.15.100 \ | ||
cython==0.29.12 \ | ||
botocore \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need boto?
RUN wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-$OPEN_MPI_VERSION.tar.gz \ | ||
&& gunzip -c openmpi-$OPEN_MPI_VERSION.tar.gz | tar xf - \ | ||
&& cd openmpi-$OPEN_MPI_VERSION \ | ||
&& ./configure --prefix=/home/.openmpi \ | ||
&& make all install \ | ||
&& cd .. \ | ||
&& rm openmpi-$OPEN_MPI_VERSION.tar.gz \ | ||
&& rm -rf openmpi-$OPEN_MPI_VERSION | ||
|
||
ENV PATH="$PATH:/home/.openmpi/bin" | ||
ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/.openmpi/lib/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be required as a dlc image. SageMaker probably doesn't need this.
Closing this PR now. Let's create a new PR if we want to consider this approach again. |
Issue #, if available:
PR Checklist
Pytest Marker Checklist
@pytest.mark.model("<model-type>")
to the new tests which I have added, to specify the Deep Learning model that is used in the test (use"N/A"
if the test doesn't use a model)@pytest.mark.integration("<feature-being-tested>")
to the new tests which I have added, to specify the feature that will be tested@pytest.mark.multinode(<integer-num-nodes>)
to the new tests which I have added, to specify the number of nodes used on a multi-node test@pytest.mark.processor(<"cpu"/"gpu"/"eia"/"neuron">)
to the new tests which I have added, if a test is specifically applicable to only one processor typeEIA/NEURON Checklist
src/config/build_config.py
in my PR branch by settingENABLE_EI_MODE = True
orENABLE_NEURON_MODE = True
Benchmark Checklist
src/config/test_config.py
in my PR branch by settingENABLE_BENCHMARK_DEV_MODE = True
Reviewer Checklist
Description:
Tests run:
DLC image/dockerfile:
Additional context:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.