Cuda force use mad.lo.u32 for ROTATE_LEFT

Followers

November 10, 20196 yr

Cuda force use mad.lo.u32 for ROTATE_LEFT ???
Compute Capability 1.2

__global__ void fun(unsigned int * mem)
{

int a = 3;
int b = 5;
int c = 6;
int d;
asm("mad.lo.u32 %0, %1, %2, %3;": "=r"(d) : "r"(a), "r"(b), "r"(c) : );
// d = a*b+c
*mem = d;
}
This produce good result, anyway when I define (try):
#define ROTATE_LEFT2(x, n) (int)x*(1>>(32-n))+(x<<(int)n)
there is no mad instruction.

References:
https://www.openwall.com/lists/john-dev/2012/03/22/7
https://devtalk.nvidia.com/default/topic/489750/ptx-assembly-help-33-/
https://devtalk.nvidia.com/default/topic/478578/integer-mad-instruction/
https://www.blackhat.com/presentations/bh-usa-09/BEVAND/BHUSA09-Bevand-MD5-SLIDES.pdf

November 10, 20196 yr

Author

This optimization won't work:
https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievediops.htm

ADD Weighted sum of all executed integer additions (IADD). The default weight is 1.
MUL Weighted sum of all executed integer multiplications (IMUL). The default weight is 1.
MAD Weighted sum of all executed integer multiply-add (IMAD) instructions. The default weight is 2.

1(add)+1(mul) = 2 (mad) so there is no speed improvement.

Create an account or sign in to comment

Followers

Go to topic listing

Sign In

Cuda force use mad.lo.u32 for ROTATE_LEFT

Featured Replies

Create an account or sign in to comment

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)