Jump to content
Tuts 4 You

leftrotate in SSE


Recommended Posts

Having those different ways of doing leftrotate:


#define ROTATE_LEFT_SSE(x, n) {__m128i tmp;tmp = _mm_srli_epi32(x, 32-n);x = _mm_slli_epi32(x, n);x = _mm_or_si128(x, tmp);};



#define ROTATE_LEFT_SSE(x, n) {x = _mm_or_si128(_mm_slli_epi32(x, n), _mm_srli_epi32(x, 32-n));};

Which one would you choose? The second one without temporal variable?
The speed difference seems to be small - can be a speed calculation bug.

Link to comment

Depending on compiler optimizations, the 2nd one would probably be the slightly faster of the two. Assuming the compiler doesn't inject temporary variables, it should technically be less clock cycles than the first. However, again with optimizations, the first one could be optimized by the compiler to work more like the 2nd. 

At this point you're basically playing the game of readability over minimal optimizations. If your goal is speed, test both and see which yields the better results and go with that. But with modern compilers I'd assume both will have very similar performances if optimizations are allowed and turned on etc.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Create New...