Jump to content
Tuts 4 You
Sign in to follow this  
CodeExplorer

Cuda force use mad.lo.u32 for ROTATE_LEFT

Recommended Posts

CodeExplorer

Cuda force use mad.lo.u32 for ROTATE_LEFT ???
Compute Capability 1.2

__global__ void fun(unsigned int * mem)
{

int a = 3;
int b = 5;
int c = 6;
int d;
asm("mad.lo.u32 %0, %1, %2, %3;": "=r"(d) : "r"(a), "r"(b), "r"(c) : );
// d = a*b+c
*mem = d;
}
This produce good result, anyway when I define (try):
#define ROTATE_LEFT2(x, n) (int)x*(1>>(32-n))+(x<<(int)n)
there is no mad instruction.

References:
https://www.openwall.com/lists/john-dev/2012/03/22/7
https://devtalk.nvidia.com/default/topic/489750/ptx-assembly-help-33-/
https://devtalk.nvidia.com/default/topic/478578/integer-mad-instruction/
https://www.blackhat.com/presentations/bh-usa-09/BEVAND/BHUSA09-Bevand-MD5-SLIDES.pdf

Share this post


Link to post
CodeExplorer

This optimization won't work:
https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievediops.htm

ADD Weighted sum of all executed integer additions (IADD). The default weight is 1.
MUL Weighted sum of all executed integer multiplications (IMUL). The default weight is 1.
MAD Weighted sum of all executed integer multiply-add (IMAD) instructions. The default weight is 2.

1(add)+1(mul) = 2 (mad) so there is no speed improvement.
 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
×
×
  • Create New...