Skip to content

Commit 7ef675a

Browse files
bhanutejagkBhanu Teja Goshikonda
andauthored
Addressing package regression in Pytorch-training-2.7 image (#5478)
* Building the image and basic testing * Addressed package regression caused by smclarify dependency s3fs * building only training image * explicitly declared typer, langcodes, languade_data to fix package regression * Removed AML2_CPU_ARM64_US_EAST_1 since ami not available * removed version pins for awscli and boto3 * added boto3 package * removed boto3 declaration for cpu image * added boto3 without any specific pin * pinned awscli and boto3 to the versions in prod image * added typer langcodes and language_data in samageker image recipe aswell * reduced the version of awscli and boto3 * revereted awscli and boto3 to 1.42.61 * updated awscli and bot3 versions in gpu to 1.42.61 to match those in cpu * reveted back toml file --------- Co-authored-by: Bhanu Teja Goshikonda <[email protected]>
1 parent 0ec41cb commit 7ef675a

File tree

3 files changed

+20
-3
lines changed

3 files changed

+20
-3
lines changed

pytorch/training/docker/2.7/py3/Dockerfile.cpu

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -241,11 +241,14 @@ RUN pip install --no-cache-dir -U torch==${PYTORCH_VERSION} \
241241
fastai==2.8.2 \
242242
accelerate \
243243
# pin numpy requirement for fastai dependency
244-
# requires explicit declaration of spacy, thic, blis
244+
# requires explicit declaration of spacy, thic, blis, typer, langcodes, language_data
245245
spacy \
246246
#thinc 8.3.6 is not compatible with numpy 1.26.4 (sagemaker doesn't support latest numpy)
247247
thinc==8.3.4 \
248248
blis \
249+
typer \
250+
langcodes \
251+
language_data \
249252
numpy \
250253
&& pip uninstall -y dataclasses
251254

@@ -312,15 +315,21 @@ RUN pip install --no-cache-dir -U torch==${PYTORCH_VERSION} \
312315
fastai==2.8.2 \
313316
accelerate \
314317
# pin numpy requirement for fastai dependency
315-
# requires explicit declaration of spacy, thic, blis
318+
# requires explicit declaration of spacy, thic, blis, typer, langcodes, language_data
316319
spacy \
317320
thinc==8.3.4 \
318321
blis \
319322
numpy \
323+
typer \
324+
langcodes \
325+
language_data \
320326
&& pip uninstall -y dataclasses
321327

322328
# Install SM packages
323329
RUN pip install --no-cache-dir -U \
330+
# address package regression caused by smclarify depedency s3fs"
331+
"awscli<=1.42.61" \
332+
"boto3<=1.40.61" \
324333
smclarify \
325334
"sagemaker>=2.9.0,<3" \
326335
"sagemaker-experiments<1" \

pytorch/training/docker/2.7/py3/cu128/Dockerfile.gpu

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,11 +143,14 @@ RUN pip install --no-cache-dir \
143143
"tornado>=6.5.1" \
144144
"fastai==2.8.2" \
145145
# pin numpy requirement for fastai dependency
146-
# requires explicit declaration of spacy, thic, blis
146+
# requires explicit declaration of spacy, thic, blis, typer, langcodes, language_data
147147
spacy \
148148
#thinc 8.3.6 is not compatible with numpy 1.26.4 (sagemaker doesn't support latest numpy)
149149
"thinc==8.3.4" \
150150
blis \
151+
typer \
152+
langcodes \
153+
language_data \
151154
"jinja2>=3.1.6"\
152155
"typing-extensions>=4.14.1" \
153156
&& pip uninstall -y dataclasses
@@ -195,6 +198,9 @@ RUN chmod +x /usr/local/bin/start_with_right_hostname.sh
195198

196199
# Install SM packages
197200
RUN pip install --no-cache-dir -U \
201+
# address package regression caused by smclarify depedency s3fs"
202+
"awscli<=1.42.61" \
203+
"boto3<=1.40.61" \
198204
smclarify \
199205
"sagemaker>=2.9.0,<3" \
200206
"sagemaker-experiments<1" \

test/test_utils/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,10 +140,12 @@ def get_ami_id_ssm(region_name, parameter_path):
140140
ami_name_pattern="Deep Learning ARM64 AMI OSS Nvidia Driver GPU PyTorch 2.2.? (Ubuntu 20.04) ????????",
141141
IncludeDeprecated=True,
142142
)
143+
143144
AML2_CPU_ARM64_US_EAST_1 = get_ami_id_boto3(
144145
region_name="us-east-1",
145146
ami_name_pattern="Deep Learning ARM64 Base OSS Nvidia Driver GPU AMI (Amazon Linux 2) ????????",
146147
)
148+
147149
PT_GPU_PY3_BENCHMARK_IMAGENET_AMI_US_EAST_1 = "ami-0673bb31cc62485dd"
148150
PT_GPU_PY3_BENCHMARK_IMAGENET_AMI_US_WEST_2 = "ami-02d9a47bc61a31d43"
149151

0 commit comments

Comments
 (0)