Skip to content

Conversation

@Wheest
Copy link
Contributor

@Wheest Wheest commented Apr 13, 2019

I was experiencing an error, where grouped convolution didn't work for groups!=num_input_channels.

It turns out that in python/tvm/relay/op/nn/_nn.py, if groups!=1, then depthwise convolution is used. This is incorrect, since a depthwise convolution is groups==input_channels.

This pull request attempts to fix the issue.

However, note that the attrs object does not have a property for input_channels, only output_channels. The latter is what the fix uses. This does not affect correctness in the current implementation, since depthwise convolution always has input_channels==output_channels.

However, if depthwise convolution has a depth_multiplier parameter, where N filters are applied to each input channel, then the current bug fix would not be correct. As it stands, depth multiplied depthwise convolutions do not seem to be implemented right now.

This Jupyter notebook demonstrates the problem that the PR fixes.

@cbalint13
Copy link
Contributor

@Wheest ,

groups = attrs.groups
layout = attrs.data_layout
kernel_layout = attrs.kernel_layout
channels = attrs.channels #TODO should be input channels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not safe to use attrs.channels since this attr is optinoal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the in_channels needs to be exposed to the function. Either by being added to the attrs (defined at nn.h:51), or passing it as an extra argument (such as the tinfo suggestion) by every caller of it in the codebase.

Would there ever be any cases when the in_channels would be indeterminate when this function is called?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in_channels should be available from the shape of input / weight after type infer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wheest As suggested in #3070 (comment), you can get input channels via outs[0].op.attrs["workload"]. Can you check if this work for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insight @vinx13, I would never have groked all of the info that outs stores. I can confirm that setting in_channels = outs[0].op.attrs['workload'][1][1] works for groups==1 and groups==in_channels. However, interestingly when 1<groups<in_channels, it seems that outs[0].op.attrs.items is empty. This suggests that whatever calls schedule_conv2d does not set outs[0].op.attrs.items correctly when using grouped convolutions. Am investigating why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Wheest, I did notice such circumstances in my PR. Please let me know if you have any new discovery :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wheest FYI workload is set here
This is probably because group_conv2d_nchw is not registered as autotvm template

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinx13 You might be correct - depthwise_conv2d_nchw has been properly registered while group_conv2d_nchw has not. I'm working on fixing that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll hold until then. Why would autotvm templates be used in this workflow? I thought that the process is:

  1. Define model in using high level Relay API (with the bug in this PR happening here).
  2. Generate model description in Relay IR
  3. Compile model to chosen backend.
    As I understand it, you might use autotvm at step 3. to tune and optimise the program. But I thought of it a "separate tool" in the tvm suite, not used in step 1 or 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we only need extra attributes when calling schedule. One way is to add attributes needed in topi compute. Autotvm already did this, but we can also register them manually. For the group conv case, we can add an attribute when calling tvm.compute

@Wheest
Copy link
Contributor Author

Wheest commented Apr 25, 2019

Have tried using the OpenCL backend with grouped convolutions, using the example in the notebook gist.

However, this backend only works with groups==1 and groups==in_channels. All others fail with

  File "/home/wheest/tools/tvm/src/op/tensorize.cc", line 336
TVMError: Check failed: Equal(lhs, rhs): Failed to match the compute with TensorIntrin tensor_intrin's declaration  provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(x(rc))*int32(y(rc)))], axis=[iter_var(rc, Range(min=0, extent=4))], where=(uint1)1, value_index=0), intrin=  reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(x(rc))*int32(y(rc)))], axis=[iter_var(rc, Range(min=0, extent=4))], where=(uint1)1, value_index=0)
During handling of the above exception, another exception occurred:

TVMError: Check failed: Equal(lhs, rhs): Failed to match the compute with TensorIntrin tensor_intrin's declaration  provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(x(rc))*int32(y(rc)))], axis=[iter_var(rc, Range(min=0, extent=4))], where=(uint1)1, value_index=0), intrin=  reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(x(rc))*int32(y(rc)))], axis=[iter_var(rc, Range(min=0, extent=4))], where=(uint1)1, value_index=0)
Error during compile function
-----------------------------
v0.0.1
%1 = fn (%p0: Tensor[(1, 512, 28, 28), float32], %p1: Tensor[(512, 256, 3, 3), float32], __dict__=meta[StrMap][0]) -> Tensor[(1, 512, 26, 26), float32] {
  %0 = nn.conv2d(%p0, %p1, groups=2, channels=512, kernel_size=[3, 3]) // ty=Tensor[(1, 512, 26, 26), float32]
  %0
}

This happens after the Relay schedule phase, though I'm still tracing what is going wrong. Pointers from codebase aficionado appreciated.

@vinx13 vinx13 self-assigned this Apr 26, 2019
@vinx13
Copy link
Member

vinx13 commented Apr 26, 2019

@Wheest The NCHWc template that uses tensorization is only for CUDA. I believe this caused the error.
Please also check if PR #3070 fixed the issue of group conv here

@vinx13 vinx13 added the status: need update need update based on feedbacks label Apr 28, 2019
@Wheest
Copy link
Contributor Author

Wheest commented Apr 29, 2019

Yes @vinx13, the PR #3070 fixed the issue, and is a superset of all this PR set out to achieve, so am closing it for now. Thanks for the guidance all!

@Wheest Wheest closed this Apr 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: need update need update based on feedbacks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants