CodeExplorer Posted September 11, 2019 Posted September 11, 2019 Having those different ways of doing leftrotate: Quote #define ROTATE_LEFT_SSE(x, n) {__m128i tmp;tmp = _mm_srli_epi32(x, 32-n);x = _mm_slli_epi32(x, n);x = _mm_or_si128(x, tmp);}; versus: Quote #define ROTATE_LEFT_SSE(x, n) {x = _mm_or_si128(_mm_slli_epi32(x, n), _mm_srli_epi32(x, 32-n));}; Which one would you choose? The second one without temporal variable? The speed difference seems to be small - can be a speed calculation bug.
atom0s Posted September 11, 2019 Posted September 11, 2019 Depending on compiler optimizations, the 2nd one would probably be the slightly faster of the two. Assuming the compiler doesn't inject temporary variables, it should technically be less clock cycles than the first. However, again with optimizations, the first one could be optimized by the compiler to work more like the 2nd. At this point you're basically playing the game of readability over minimal optimizations. If your goal is speed, test both and see which yields the better results and go with that. But with modern compilers I'd assume both will have very similar performances if optimizations are allowed and turned on etc.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now