-
Notifications
You must be signed in to change notification settings - Fork 301
NVPTX: Add f16 SIMD intrinsics #1626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVPTX: Add f16 SIMD intrinsics #1626
Conversation
|
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) some time within the next two weeks. |
5d937f5 to
dd7e165
Compare
dd7e165 to
21ff034
Compare
|
Thanks for reviewing! I have fixed the two comments from @Amanieu I have circumvented the naming issue around min/minimum brought up by @CryZe by instead implementing the intrinsics that makes sense to have named as I have also re-run the tests with my branch containing I see that there might be yet another slight naming issue. I have named the functions as |
|
CUDA seems to have an API for |
I see that the names are consistent with the ptx ISA for functions. I will change all the function names to be consistent with this naming (both CUDA and PTX ISA). I will refrain from adopting the leading double underscore The part I'm a bit more unsure about is naming the type |
21ff034 to
79cb9fa
Compare
|
I updated the PR a little while ago. I also re-run the rustc assembly tests I have in my branch to verify the correct instructions are still being produced. I also ported a much used function in our (my workplace) internal code and verified that tests were passing. (And there was also a notable performance boost on the relevant benchmarks). And with that, I think this one should be ready for merging. |
79cb9fa to
4d6fed8
Compare
This implements parts of the
f16x2intrinsics as discussed in rust-lang/rust#125440 (comment)It's unfortunately not possible to test that these intrinsics produce the correct instructions with the current test infrastructure. This was discussed on zulip
I have created assembly tests in my own branch of
rust-langrepo that verifies the correct instructions are used. I have added#[assert_instr]attributes nonetheless. I'm not sure if it's desirable to follow up onrust-langwith a PR including the tests or not?