You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation suggests: you need to ensure the GPU is synchronized at the end of every sample, e.g. by calling synchronize(). However this is generally overkill -- the overhead from @sync can be at the same order of magnitude as the actual cost of the kernel call or even higher which makes the measurement highly inaccurate. I usually end up calling @sync every N kernel calls to mitigate this. Also @benchmark generally gives a good estimate without the sync if you ignore the minimum time and use the median.