Skip to content

Conversation

@ringohoffman
Copy link

What does this PR do?

Fixes #23980

This PR fixes the type hints that users see when calling PretrainedModel.from_pretrained

before:

import transformers

bert_model = transformers.BertForSequenceClassification.from_pretrained("...")
reveal_type(bert_model)  # Type of "bert_model" is "tuple[Unknown | BertForSequenceClassification, dict[str, Unbound | Unknown] | dict[str, Unknown | list[Unknown]] | Unknown] | Unknown | BertForSequenceClassification"

bert_model_and_loading_info = transformers.BertForSequenceClassification.from_pretrained("...", output_loading_info=True)
reveal_type(bert_model_and_loading_info)  # Type of "bert_model_and_loading_info" is "tuple[Unknown | BertForSequenceClassification, dict[str, Unbound | Unknown] | dict[str, Unknown | list[Unknown]] | Unknown] | Unknown | BertForSequenceClassification"

after:

import transformers

bert_model = transformers.BertForSequenceClassification.from_pretrained("...")
reveal_type(bert_model)  # Type of "bert_model" is "BertForSequenceClassification"

bert_model_and_loading_info = transformers.BertForSequenceClassification.from_pretrained("...", output_loading_info=True)
reveal_type(bert_model_and_loading_info)  # Type of "bert_model_and_loading_info" is "Tuple[BertForSequenceClassification, LoadingInfo]"
  1. move output_loading_info from variadic kwargs to be an explicit kwarg
  2. create overloaded signature for from_pretrained based on its value
  3. add LoadingInfo TypedDict

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Documentation: @sgugger, @stevhliu and @MKhalusova

Fixes huggingface#23980; move output_loading_info from variadic kwargs to an explicit kwarg and create overloaded signatures based on its value; also add LoadingInfo TypedDict
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, any code using overload won't be accepted (you can search the source code, it's not present anywhere).

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, any code using overload won't be accepted (you can search the source code, it's not present anywhere).

@ringohoffman
Copy link
Author

you can search the source code, it's not present anywhere

Searching in the source code, you can see it here: https://github.com/search?q=repo%3Ahuggingface%2Ftransformers+%40overload&type=code

Is there a specific part of this overload that makes it difficult to merge in?

This overload makes it obvious to users that you only get the LoadingInfo when you pass in output_loading_info=True and that otherwise they will only get the model they expected.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ringohoffman
Copy link
Author

@amyeroberts RE: #26125 (comment)

Previously a similar proposal was very strongly rejected (#23980 (comment) and #24035)

This PR was rejected with little explanation and without any discussion. I would be so grateful if you would be willing to hear me out.

I'm generally against introducing overloads.

However, this does seem to be an issue which is raised by many independent community members and finding an easy to maintain alternative we can agree on would be good. Perhaps we can open an issue to discuss addressing this more generally?

overload describes polymorphic functions--that is, functions whose output types depend on their input types. This is a very natural and common pattern, and transformers already uses polymorphic functions extensively.

The alternative would be to rewrite these functions as separate, distinct functions for each input/output type combination (not practical for existing functions). You will end up with the same number of signatures as if you just used overload, but with the added burden of having to test and maintain these as separate functions, as well as cluttering your namespace with a bunch of variants of the same function.

Of course, I have to mention that the most important and ubiquitous function in all of transformers, PreTrainedModel.from_pretrained is polymorphic, and is inferred to have a monster return type:

import transformers

llama = transformers.LlamaForCausalLM.from_pretrained(...)
reveal_type(llama)  # tuple[Unknown | LlamaForCausalLM, dict[str, Unbound | Unknown] | dict[str, Unknown | list[Unknown]] | Unknown] | Unknown | LlamaForCausalLM

I'm really not certain on what basis the claim that overload signatures bloat the code is made. PreTrainedModel.from_pretrained is 1106 lines long! My suggestion in this PR is 40 lines long and spells out exactly how the output type varies with the input, which is currently not at all obvious (see the inferred type above).

PyTorch makes considerable use of overload, many of them having gone years since they were last touched. How many more return type variants are planned to be added to PreTrainedModel.from_pretrained? In my experience of maintaining codebases that use overload, the number of overload signatures tends to remain the same over time. As a testament to this, this PR could still be merged in today without changes from what was originally proposed 1 year ago. If you account for the different parameter names, it could have been merged in as is since v1.0.0, 5 years ago.

Finally, and maybe most importantly, whether or not Hugging Face type checks its code, users are going to continue run type checkers on code that uses Hugging Face code. This is why the community keeps suggesting these changes. Refusal to adopt well-established Python typing features like overload is directly harmful to the user experience. Type checking is the tool that powers IDE auto-completion and code navigation, and helps users to reason about code and ensure correctness.

In light all of this, how do you feel about revisiting this PR? I am sure so many users would be delighted to make use of such type hinting at such little cost to Hugging Face.

Thank you for your consideration.


For what it's worth, I think your choice of type checker makes a huge difference. I don't use or recommend mypy at all. I use the type checker built into VS Code, pyright. It is a lot faster, has fewer bugs, and fixes issues much, much faster than mypy. As of right now, it has 6 open issues compared to mypy's 2628.

@amyeroberts
Copy link
Contributor

This PR was rejected with little explanation and without any discussion. I would be so grateful if you would be willing to hear me out.

As commented here - #26125 (comment) - please feel free to open an issue where a solution (not necessarily relating to overloads) can be discussed.

However, this does seem to be an issue which is raised by many independent community members and finding an easy to maintain alternative we can agree on would be good. Perhaps we can open an issue to discuss addressing this more generally?

Since #26125 around a year ago, I'm not aware of any issues or PRs opened addressing this. Given the number of issues we have opened daily, it doesn't appear to be something commonly raised or requested. Most of what I see is through github, so it's possible there are active discussions elsewhere e.g. on the forum. If so, please share links as this will help guide the discussion and decisions.

To be clear, we're not saying that this doesn't provide any utility - it definitely does - the question is whether the additional code and its maintenance is worth it.

The alternative would be to rewrite these functions as separate, distinct functions for each input/output type combination (not practical for existing functions).

This is assumping we want to enable this feature!

To provide some context, we previously had a push to increase complicance with type checkers about 2 years ago. This resulted in many issues and users complaining, so we eventually reverted this support #18485.

I'm really not certain on what basis the claim that overload signatures bloat the code is made. PreTrainedModel.from_pretrained is 1106 lines long! My suggestion in this PR is 40 lines long and spells out exactly how the output type varies with the input, which is currently not at all obvious (see the inferred type above).

As you point out - the modeling file is already huge (something we want to address!). Doing this adds even more code. And, as @Narsil mentions, overloads can introduce bugs and make it harder to know what code is being called.

PyTorch makes considerable use of overload, many of them having gone years since they were last touched. How many more return type variants are planned to be added to PreTrainedModel.from_pretrained? In my experience of maintaining codebases that use overload, the number of overload signatures tends to remain the same over time. As a testament to this, this PR could still be merged in today without changes from what was originally proposed 1 year ago. If you account for the different parameter names, it could have been merged in as is since v1.0.0, 5 years ago.

PyTorch is an amazing, but different library and which has different considerations and engineering choices. For one thing, the transformers Hugging Face team is small! For core logic like this there's really only 2 - 3 of us responsible for maintenance, and that's alongside our other commitments.

How many more return type variants are planned to be added to PreTrainedModel.from_pretrained?

transformers is a very dynamic library - we don't plan these things out in advance!

Finally, and maybe most importantly, whether or not Hugging Face type checks its code, users are going to continue run type checkers on code that uses Hugging Face code. This is why the community keeps suggesting these changes. Refusal to adopt well-established Python typing features like overload is directly harmful to the user experience. Type checking is the tool that powers IDE auto-completion and code navigation, and helps users to reason about code and ensure correctness.

You can't build software based on what people are doing with it downstream. People are of course welcome to run type checkers, but it's something we've decided not to support and so make no guarantees about how well it works.

Python is ultimately not a great language for making typing assumptions. If the library was created and written now, fresh, then having more explicit and full coverage would be ideal. Sadly, that's not the case.

at such little cost to Hugging Face.

This is a strong assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve PreTrainedModel.from_pretrained return type

4 participants