Posted September 11, 20195 yr Having those different ways of doing leftrotate: Quote #define ROTATE_LEFT_SSE(x, n) {__m128i tmp;tmp = _mm_srli_epi32(x, 32-n);x = _mm_slli_epi32(x, n);x = _mm_or_si128(x, tmp);}; versus: Quote #define ROTATE_LEFT_SSE(x, n) {x = _mm_or_si128(_mm_slli_epi32(x, n), _mm_srli_epi32(x, 32-n));}; Which one would you choose? The second one without temporal variable? The speed difference seems to be small - can be a speed calculation bug.
September 11, 20195 yr Depending on compiler optimizations, the 2nd one would probably be the slightly faster of the two. Assuming the compiler doesn't inject temporary variables, it should technically be less clock cycles than the first. However, again with optimizations, the first one could be optimized by the compiler to work more like the 2nd. At this point you're basically playing the game of readability over minimal optimizations. If your goal is speed, test both and see which yields the better results and go with that. But with modern compilers I'd assume both will have very similar performances if optimizations are allowed and turned on etc.
Create an account or sign in to comment