Speed optimization question comparing

And that behavior is perfectly normal and expected. You really should not try to be smarter than compiler. You aren't. 99% of programmers aren't. So just let compiler do its job.

There is so much more to optimize in your code - in the logic and algorithm. It's much wiser to spend your time on that..

21 hours ago, kao said:

There is so much more to optimize in your code - in the logic and algorithm.

Noticed it.
For example:


    dif1 = Ban[1]-Ban[0];
    dif1_div = rightrotate(dif1, s[0]);
    F1 = (Cn xor (Bn or (not Dn)))+An;
    Mn[0] = dif1_div-F1;

    lr_val1_bak = leftrotate(F1+Mn[0], s[0]);
    Ban1Test = Ban[0]+lr_val1_bak;

    if (Ban[1]!=Ban1Test)
    //printf("Invalid Ban[1] value!!!!\n");
    goto StartOfSearch;

    lr_val1_bak = leftrotate(F1+Mn[0], s[0]);
    anyway Mn[0] = dif1_div-F1 so
    lr_val1_bak = leftrotate(F1+dif1_div-F1, s[0]);

    lr_val1_bak = leftrotate(dif1_div, s[0]);

    dif1 = Ban[1]-Ban[0];
    dif1_div = rightrotate(dif1, s[0]);

    lr_val1_bak = leftrotate(rightrotate(dif1, s[0]), s[0]);
    // these once again simplifies so:
    lr_val1_bak = dif1 = Ban[1]-Ban[0];

    // While the final test:
    Ban1Test = Ban[0]+lr_val1_bak = Ban[0]+Ban[1]-Ban[0];
    Ban1Test = Ban[1];

    // The only conclusion is that it was an useless check!



The only solution was to patch the compiler:


Target: Pelles C for Windows Version 8.00.60 x86
Symptom: ignores register keyword while compiling with any optimisation options (different from None)
Location: PellesC\Bin\pocc.exe
process arguments:
-std:C99 -Tx86-coff -Zi -Ot -Ob1 -fp:fast -W1 -Gd -Ze "D:\MD5PrimeBrute\simple.c" -Fo"D:\MD5PrimeBrute\output\main.obj"

004D6EB4   .  68 D05A5D00                   PUSH 5D5AD0                              ;  UNICODE "Os"
004D6EB9   .  57                            PUSH EDI
004D6EBA   .  E8 B1F50700                   CALL 00556470                            ;  pocc.00556470
004D6EBF   .  83C4 08                       ADD ESP,8
004D6EC2   .  85C0                          TEST EAX,EAX
004D6EC4   .  74 12                         JE SHORT 004D6ED8                        ;  pocc.004D6ED8

004D6ED8   > \C705 D8F65F00 01000000        MOV DWORD PTR DS:[5FF6D8],1

004D6EE7   > \68 C45A5D00                   PUSH 5D5AC4                              ;  UNICODE "Ot"
004D6EEC   .  57                            PUSH EDI
004D6EED   .  E8 7EF50700                   CALL 00556470                            ;  pocc.00556470
004D6EF2   .  83C4 08                       ADD ESP,8
004D6EF5   .  85C0                          TEST EAX,EAX
004D6EF7   .  74 12                         JE SHORT 004D6F0B                        ;  pocc.004D6F0B

004D6F0B   > \C705 D8F65F00 02000000        MOV DWORD PTR DS:[5FF6D8],2
004D6F15   .  E9 3E030000                   JMP 004D7258                             ;  pocc.004D7258

2 = Ot
1 = Os
0 = no optimisations

This is the check code:
004E0290  /$  53                   PUSH EBX
004E0291  |.  8B5C24 08            MOV EBX,DWORD PTR SS:[ESP+8]
004E0295  |.  833D 38F75F00 00     CMP DWORD PTR DS:[5FF738],0
004E029C  |.  0F8F BC000000        JG 004E035E                              ;  pocc.004E035E
004E02A2  |.  837B 2C 00           CMP DWORD PTR DS:[EBX+2C],0
004E02A6  |.  74 75                JE SHORT 004E031D                        ;  pocc.004E031D
004E02A8  |.  8B03                 MOV EAX,DWORD PTR DS:[EBX]
004E02AA  |.  F640 18 08           TEST BYTE PTR DS:[EAX+18],8
004E02AE  |.  75 6D                JNZ SHORT 004E031D                       ;  pocc.004E031D

Change at 004E02A6 to EB (short jump)
CMP DWORD PTR DS:[EBX+2C] - compare optimisation options with 0
004E031D is the "good" boy!
8 means naked functions so doesn't matter!

0055347D  |> \A1 7CAC6C00              MOV EAX,DWORD PTR DS:[6CAC7C]
00553482  |.  8338 00                  CMP DWORD PTR DS:[EAX],0
00553485  |.  75 0C                    JNZ SHORT 00553493                       ;  pocc.00553493
00553487  |.  C705 9CAC6C00 00000000   MOV DWORD PTR DS:[6CAC9C],0
00553491  |.  EB 09                    JMP SHORT 0055349C                       ;  pocc.0055349C

Here at 00553485 should NOT jump!
So change at 00553485 to two nops: 90 90

After these patches everything works like it should: it will use registers plus compiler automatic optimisations are used.
With registers time = 13-14 seconds - instead of 20 seconds.


