Skip to content

Conversation

mshannon-sil
Copy link
Collaborator

@mshannon-sil mshannon-sil commented Dec 4, 2023

Upgrading the production docker container to python 3.11. I also changed some of the tests I wrote because when I at first updated all of the poetry packages, a few of them stopped passing. However, I've since reverted most of the packages back to their original versions and only upgraded packages that were necessary to upgrade, but I kept the changes to the tests regardless for maximum compatibility.


This change is Reviewable

@mshannon-sil mshannon-sil self-assigned this Dec 4, 2023
@mshannon-sil mshannon-sil linked an issue Dec 4, 2023 that may be closed by this pull request
@johnml1135
Copy link
Collaborator

tests/translation/huggingface/test_hugging_face_nmt_model_trainer.py line 145 at r1 (raw file):

            training_args,
            corpus,
            src_lang="en_XX",

Why would the test without the _XX fail on newer versions of python? We should support language codes that don't include a script code.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @mshannon-sil)

Copy link
Collaborator Author

@mshannon-sil mshannon-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)


tests/translation/huggingface/test_hugging_face_nmt_model_trainer.py line 145 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Why would the test without the _XX fail on newer versions of python? We should support language codes that don't include a script code.

I don't think it was the fact that it was a newer version of python that was making it fail; rather, I had updated transformers without needing to, and the updated version caught that I was using the wrong language codes for this tokenizer. The tokenizer that the tiny-random-nllb model uses is actually the slow MBartTokenizer rather than NllbTokenizerFast, maybe because MBartTokenizer takes a lot less space for testing. That tokenizer does include script codes in all of its default language codes. I'm not sure why it wasn't failing earlier when I was using the wrong language codes for this tokenizer, but I guess that one of the most recent updates just fixed it to work properly and fail if it sees an unrecognized language code. So I don't think we need to make any further changes to these tests.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135 johnml1135 merged commit dbac78e into main Dec 5, 2023
@mshannon-sil mshannon-sil deleted the #35_python_3.11 branch December 5, 2023 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Update python to 3.11

2 participants