-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Control first downsample stride in ResNet #26374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new configuration parameter is being suggested in the ResNet config to attend the needs of a new model. It was done to preserve backward compatibility by default. :)
However, the name of the suggested config parameter should be more intuitive to leverage other models. Could you please, make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening the follow up PR 🤗
if config.layer_type == "bottleneck": | ||
self.layers = nn.Sequential( | ||
# downsampling is done in the first layer with stride of 2 | ||
layer( | ||
in_channels, | ||
out_channels, | ||
stride=stride, | ||
activation=config.hidden_act, | ||
reduce_first=config.reduce_first, | ||
), | ||
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | ||
) | ||
else: | ||
self.layers = nn.Sequential( | ||
# downsampling is done in the first layer with stride of 2 | ||
layer(in_channels, out_channels, stride=stride, activation=config.hidden_act), | ||
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be simplified 😉
if config.layer_type == "bottleneck": | |
self.layers = nn.Sequential( | |
# downsampling is done in the first layer with stride of 2 | |
layer( | |
in_channels, | |
out_channels, | |
stride=stride, | |
activation=config.hidden_act, | |
reduce_first=config.reduce_first, | |
), | |
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | |
) | |
else: | |
self.layers = nn.Sequential( | |
# downsampling is done in the first layer with stride of 2 | |
layer(in_channels, out_channels, stride=stride, activation=config.hidden_act), | |
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | |
) | |
reduce_first = config.reduce_first if config.layer_type == "bottleneck" else False | |
self.layers = nn.Sequential( | |
# downsampling is done in the first layer with stride of 2 | |
layer(in_channels, out_channels, stride=stride, activation=config.hidden_act), | |
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | |
) |
or something like this
Hi @ArthurZucker @rafaelpadilla . Thanks for your advice. The resnet backbone which downsamples in bottleneck layer can be found in resnet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
But including docstring in the constructors of ResNetBottleNeckLayer
and ResNetBasicLayer
seems important to me, as the new parameter downsample_in_bottleneck
is introduced.
out_channels: int, | ||
stride: int = 1, | ||
activation: str = "relu", | ||
downsample_in_bottleneck: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the purpose of downsample_in_bottleneck
here: it prevents one from distinguishing between ResNetBottleNeckLayer
and ResNetBasicLayer
. But it is not being use in this class.
So, I think it is valuable to include a docstring here, and clarify why downsample_in_bottleneck
isn't used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let's not pollute this class with this, the if else logic should be done to have the correct args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree! It's cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM appart for the arg that is not used. Let's have explicit and useful args, doing a if else is okay to chose the layer and more understandable
out_channels: int, | ||
stride: int = 1, | ||
activation: str = "relu", | ||
downsample_in_bottleneck: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let's not pollute this class with this, the if else logic should be done to have the correct args.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Sure! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this should be the last iteration!
if config.layer_type == "bottleneck": | ||
self.layers = nn.Sequential( | ||
# downsampling is done in the first layer with stride of 2 | ||
layer( | ||
in_channels, | ||
out_channels, | ||
stride=stride, | ||
activation=config.hidden_act, | ||
downsample_in_bottleneck=config.downsample_in_bottleneck, | ||
), | ||
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | ||
) | ||
else: | ||
self.layers = nn.Sequential( | ||
# downsampling is done in the first layer with stride of 2 | ||
layer(in_channels, out_channels, stride=stride, activation=config.hidden_act), | ||
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No as I said before the only layer that is different here is the call. Something like
if config.layer_type == "bottleneck": | |
self.layers = nn.Sequential( | |
# downsampling is done in the first layer with stride of 2 | |
layer( | |
in_channels, | |
out_channels, | |
stride=stride, | |
activation=config.hidden_act, | |
downsample_in_bottleneck=config.downsample_in_bottleneck, | |
), | |
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | |
) | |
else: | |
self.layers = nn.Sequential( | |
# downsampling is done in the first layer with stride of 2 | |
layer(in_channels, out_channels, stride=stride, activation=config.hidden_act), | |
*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)], | |
) | |
if config.layer_type == "bottleneck": | |
first_layer = layer(in_channels,out_channels,stride=stride,activation=config.hidden_act,downsample_in_bottleneck=config.downsample_in_bottleneck) | |
else: | |
first_layer = layer(in_channels, out_channels, stride=stride, activation=config.hidden_act) | |
self.layers = nn.Sequential(first_layer,*[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this round! merging 😉
Control first downsample stride in ResNet (huggingface#26374)
Hi @ArthurZucker
Relate to 25856. I added a new parameter in config to control the stride for the first bottleneck layer in stages. Would you please help me review it? Thx!