Cuda force use mad.lo.u32 for ROTATE_LEFT

November 10, 2019

Cuda force use mad.lo.u32 for ROTATE_LEFT ???
Compute Capability 1.2

__global__ void fun(unsigned int * mem)
{

int a = 3;
int b = 5;
int c = 6;
int d;
asm("mad.lo.u32 %0, %1, %2, %3;": "=r"(d) : "r"(a), "r"(b), "r"(c) : );
// d = a*b+c
*mem = d;
}
This produce good result, anyway when I define (try):
#define ROTATE_LEFT2(x, n) (int)x*(1>>(32-n))+(x<<(int)n)
there is no mad instruction.

References:
https://www.openwall.com/lists/john-dev/2012/03/22/7
https://devtalk.nvidia.com/default/topic/489750/ptx-assembly-help-33-/
https://devtalk.nvidia.com/default/topic/478578/integer-mad-instruction/
https://www.blackhat.com/presentations/bh-usa-09/BEVAND/BHUSA09-Bevand-MD5-SLIDES.pdf

November 10, 2019

This optimization won't work:
https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievediops.htm

ADD Weighted sum of all executed integer additions (IADD). The default weight is 1.
MUL Weighted sum of all executed integer multiplications (IMUL). The default weight is 1.
MAD Weighted sum of all executed integer multiply-add (IMAD) instructions. The default weight is 2.

1(add)+1(mul) = 2 (mad) so there is no speed improvement.

Sign In

Cuda force use mad.lo.u32 for ROTATE_LEFT

Recommended Posts

CodeExplorer

CodeExplorer

Create an account or sign in to comment

Create an account

Sign in

Community

Search Engines

Code Search

File Search

Search Engines