-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Trigger corresponding pipeline tests if tests/utils/tiny_model_summary.json is modified
#27693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # in order to trigger pipeline tests even if no code change at all | ||
| elif "tests/utils/tiny_model_summary.json" in modified_files: | ||
| test_files_to_run = ["tests"] | ||
| repo_utils_launch = any(f.split(os.path.sep)[0] == "utils" for f in modified_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not working as modified_files is computed by get_modified_python_files(diff_with_last_commit=diff_with_last_commit), so tiny_model_summary is never in that list.
My bad, sorry
| # TODO (ydshieh): Check this. See https://app.circleci.com/pipelines/github/huggingface/transformers/79292/workflows/fa2ba644-8953-44a6-8f67-ccd69ca6a476/jobs/1012905 | ||
| def is_pipeline_test_to_skip( | ||
| self, pipeline_test_casse_name, config_class, model_architecture, tokenizer_name, processor_name | ||
| ): | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed in the hotfix #27687
|
The documentation is not available anymore as the PR was closed or merged. |
| old_keys = set(old_content.keys()) | ||
| new_keys = set(new_content.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keys are always a set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. 'dict_keys' is not the same as set: 'dict_keys' object has no attribute 'intersection'
a = {1: 2, 2: 3}
b = {2: 3, 3: 4}
a.keys()
b.keys()
c1 = set(a.keys()).intersection(set(b.keys()))
print(c1)
c2 = a.keys().intersection(b.keys())
print(c2)gives
{2}
Traceback (most recent call last):
File "C:\Users\33611\Desktop\Project\transformers\temp3.py", line 11, in <module>
c2 = a.keys().intersection(b.keys())
AttributeError: 'dict_keys' object has no attribute 'intersection'
Process finished with exit code 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright! Thanks
utils/tests_fetcher.py
Outdated
| # if both have the same key, check its content | ||
| for key in common_keys: | ||
| if old_content[key] != new_content[key]: | ||
| diff_keys.add(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm it's not a diff_key but diff_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are the keys which
- either only in one set,
- or in both sets, but values are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can give it a different name though.
| # get the module where the model classes are defined. We want to use the main `__init__` file, but it requires | ||
| # all the framework being installed, which is not ideal for a simple script like test fetcher. | ||
| # So we create a temporary and modified main `__init__` and access its `_import_structure`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use inspect on the class? The transformers packages should be installed and
>>> import inspect
>>> from transformers import *
>>> inspect.getfile(globals()["CLIPConfig"])
'/Users/arthurzucker/Work/transformers/src/transformers/models/clip/configuration_clip.py'we can split on src and be done with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using inspect will make this test_fethcer require all the dependency being installed, otherwise one will get a dummy object, also the process will become slower due to all the importing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm but it's more resilient less error prone. Agreed tho that we would need every tf flax and torch dependency which is not optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more resilient less error prone
I agree too.
I don't think the community would run test fetcher. So I am fine if we decide to use inspect (on the CI). Two things to consider:
- On
CircleCI: the fetch test job might take longer (bigger cache to download, longer installation if cache miss) - On
Push CI: the results would look a bit strange/confusing, asFlaxis not installed in our CI docker image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly we need dependencies anyway on the machine that will run the tests no? So what we can do is:
- load the object
- if it's a dummy, log that dummy object and don't use it (will help you with debugging). That means we don't have the deps on the machine
- else, we have the deps, all good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you confirm that it's ok to install transformers[dev] in the CircleCI job fetch_tests in .circleci/config.yml, then I am OK with the suggestions :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently , test fetcher run in ~ 1m30.
Loading cache of the whole .[dev] will take 5 minutes + a bit more for all the importing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetching the test should be fast, so would just do this job last. I thought we had a different machine running the pipeline test and would say let's forward getting the files to update over there if there are any changes to the tiny model!
Cache is so slow :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forward getting the files to update over there
If the commit(s) in a PR only modifies the tests/utils/tiny_model_summary.json, then fetch_tests in .circleci/config.yml will give an empty job. So there is no job (like torch_job / tf_job) to perform getting the files to update, and we will miss the desired pipeline tests to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the offline discussion. Yes, your suggestion is a good possibility (other than the fact the file .circleci/create_circleci_config.py might get a bit more complex in order to get the actual tests to run dynamically).
As mentioned, I proposed to merge the current version, and I will work in a follow up PR to implement your suggested idea. So at the end, if there is any unexpected undesired issue, we can at least have a working version.
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, feel free to merge as is, and work on a follow up PR for less brittle !
What does this PR do?
Trigger corresponding pipeline tests if
tests/utils/tiny_model_summary.jsonis modified.It happened several times that we merged a PR and pipeline testing failed, because the CI triggered on those PRs didn't cover all files (as
tests/utils/tiny_model_summary.jsonis not a python file)