Jump to content
Tuts 4 You

Cuda force use mad.lo.u32 for ROTATE_LEFT


CodeExplorer

Recommended Posts

Cuda force use mad.lo.u32 for ROTATE_LEFT ???
Compute Capability 1.2

__global__ void fun(unsigned int * mem)
{

int a = 3;
int b = 5;
int c = 6;
int d;
asm("mad.lo.u32 %0, %1, %2, %3;": "=r"(d) : "r"(a), "r"(b), "r"(c) : );
// d = a*b+c
*mem = d;
}
This produce good result, anyway when I define (try):
#define ROTATE_LEFT2(x, n) (int)x*(1>>(32-n))+(x<<(int)n)
there is no mad instruction.

References:
https://www.openwall.com/lists/john-dev/2012/03/22/7
https://devtalk.nvidia.com/default/topic/489750/ptx-assembly-help-33-/
https://devtalk.nvidia.com/default/topic/478578/integer-mad-instruction/
https://www.blackhat.com/presentations/bh-usa-09/BEVAND/BHUSA09-Bevand-MD5-SLIDES.pdf

Link to comment

This optimization won't work:
https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievediops.htm

ADD Weighted sum of all executed integer additions (IADD). The default weight is 1.
MUL Weighted sum of all executed integer multiplications (IMUL). The default weight is 1.
MAD Weighted sum of all executed integer multiply-add (IMAD) instructions. The default weight is 2.

1(add)+1(mul) = 2 (mad) so there is no speed improvement.
 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...