-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Packed Func] Pack args fp16 support #13532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Packed Func] Pack args fp16 support #13532
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
|
I am not too sure if we should do this. Mainly because the fp16 intrinsic can be slow and software based on some of the kernels. How about considering passing fp32 as scalar argument and do live conversion in kernel when we look at GPU setting? |
|
Hit this problem while tuning some convolutions from resnet18 on cuda w. fp16. Comment to keep in the loop of this problem. |
|
@shingjan I will have time next week to look at this and try TQ' suggestion. One workaround for now is disabling CSE-TIR pass I believe for now. |
|
Also getting the error when tuning winograd fp16 tasks from stable diffusion UNet. Normal convolution works. |
Yup can confirm that disabling TIR CSE fixes this issue. Turns out the combination of CSE + and I'm not sure if this is a good idea. |
|
Same concern here. Right now I am just cherry-picking this pr and it worked. Will circle back hopefully next week. |
|
Turns out, for winograd fp16, the issue was due to the following which are used by two kernels in winograd, so they are computed on the host once and passed to the winograd kernels via @AndrewZhaoLuo I wonder, for layer_norm what variables are being passed between two kernels? Useless variables like above or something meaningful? Given that you said disabling CSE worksaround this issue, maybe it is the former. Then improved algebraic simplification can fix it as well. |
|
Can confirm that this was a CSE issue which is now fixed for all the use cases I care about. masahi has a good summary in the above of what occured. I am closing for now to avoid delving deeper. |
Needs testing
Should solve https://discuss.tvm.apache.org/t/resnet-tomixedprecision-tuning-error/14015