Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Nov 24, 2023

What does this PR do?

Trigger corresponding pipeline tests if tests/utils/tiny_model_summary.json is modified.

It happened several times that we merged a PR and pipeline testing failed, because the CI triggered on those PRs didn't cover all files (as tests/utils/tiny_model_summary.json is not a python file)

Comment on lines -952 to -955
# in order to trigger pipeline tests even if no code change at all
elif "tests/utils/tiny_model_summary.json" in modified_files:
test_files_to_run = ["tests"]
repo_utils_launch = any(f.split(os.path.sep)[0] == "utils" for f in modified_files)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not working as modified_files is computed by get_modified_python_files(diff_with_last_commit=diff_with_last_commit), so tiny_model_summary is never in that list.

My bad, sorry

@ydshieh ydshieh requested a review from ArthurZucker November 24, 2023 14:32
Comment on lines +291 to +295
# TODO (ydshieh): Check this. See https://app.circleci.com/pipelines/github/huggingface/transformers/79292/workflows/fa2ba644-8953-44a6-8f67-ccd69ca6a476/jobs/1012905
def is_pipeline_test_to_skip(
self, pipeline_test_casse_name, config_class, model_architecture, tokenizer_name, processor_name
):
return True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed in the hotfix #27687

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 24, 2023

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh marked this pull request as draft November 24, 2023 15:37
@ydshieh ydshieh marked this pull request as ready for review November 24, 2023 15:41
Comment on lines +302 to +303
old_keys = set(old_content.keys())
new_keys = set(new_content.keys())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys are always a set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. 'dict_keys' is not the same as set: 'dict_keys' object has no attribute 'intersection'

a = {1: 2, 2: 3}
b = {2: 3, 3: 4}

a.keys()
b.keys()

c1 = set(a.keys()).intersection(set(b.keys()))
print(c1)


c2 = a.keys().intersection(b.keys())
print(c2)

gives

{2}
Traceback (most recent call last):
  File "C:\Users\33611\Desktop\Project\transformers\temp3.py", line 11, in <module>
    c2 = a.keys().intersection(b.keys())
AttributeError: 'dict_keys' object has no attribute 'intersection'

Process finished with exit code 1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! Thanks

# if both have the same key, check its content
for key in common_keys:
if old_content[key] != new_content[key]:
diff_keys.add(key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm it's not a diff_key but diff_value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are the keys which

  • either only in one set,
  • or in both sets, but values are different.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can give it a different name though.

Comment on lines +319 to +321
# get the module where the model classes are defined. We want to use the main `__init__` file, but it requires
# all the framework being installed, which is not ideal for a simple script like test fetcher.
# So we create a temporary and modified main `__init__` and access its `_import_structure`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use inspect on the class? The transformers packages should be installed and

>>> import inspect
>>> from transformers import *
>>> inspect.getfile(globals()["CLIPConfig"])
 '/Users/arthurzucker/Work/transformers/src/transformers/models/clip/configuration_clip.py'

we can split on src and be done with it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using inspect will make this test_fethcer require all the dependency being installed, otherwise one will get a dummy object, also the process will become slower due to all the importing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm but it's more resilient less error prone. Agreed tho that we would need every tf flax and torch dependency which is not optimal.

Copy link
Collaborator Author

@ydshieh ydshieh Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more resilient less error prone

I agree too.

I don't think the community would run test fetcher. So I am fine if we decide to use inspect (on the CI). Two things to consider:

  • On CircleCI: the fetch test job might take longer (bigger cache to download, longer installation if cache miss)
  • On Push CI: the results would look a bit strange/confusing, as Flax is not installed in our CI docker image.

Copy link
Collaborator

@ArthurZucker ArthurZucker Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly we need dependencies anyway on the machine that will run the tests no? So what we can do is:

  • load the object
  • if it's a dummy, log that dummy object and don't use it (will help you with debugging). That means we don't have the deps on the machine
  • else, we have the deps, all good

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you confirm that it's ok to install transformers[dev] in the CircleCI job fetch_tests in .circleci/config.yml, then I am OK with the suggestions :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently , test fetcher run in ~ 1m30.

Loading cache of the whole .[dev] will take 5 minutes + a bit more for all the importing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching the test should be fast, so would just do this job last. I thought we had a different machine running the pipeline test and would say let's forward getting the files to update over there if there are any changes to the tiny model!
Cache is so slow :(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forward getting the files to update over there

If the commit(s) in a PR only modifies the tests/utils/tiny_model_summary.json, then fetch_tests in .circleci/config.yml will give an empty job. So there is no job (like torch_job / tf_job) to perform getting the files to update, and we will miss the desired pipeline tests to run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the offline discussion. Yes, your suggestion is a good possibility (other than the fact the file .circleci/create_circleci_config.py might get a bit more complex in order to get the actual tests to run dynamically).

As mentioned, I proposed to merge the current version, and I will work in a follow up PR to implement your suggested idea. So at the end, if there is any unexpected undesired issue, we can at least have a working version.

@ydshieh ydshieh requested a review from ArthurZucker November 28, 2023 08:51
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, feel free to merge as is, and work on a follow up PR for less brittle !

@ydshieh ydshieh merged commit 30e92ea into main Nov 28, 2023
@ydshieh ydshieh deleted the trigger_ci_when_tiny_summary_modified branch November 28, 2023 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants