Killboy Posted January 5, 2009 Posted January 5, 2009 Hi I was trying to code my own packer, and after finishing most of the imports processing, I was going to move over to compressing the sections. LZMA provides a pretty good compression so I chose it over aplib, also because aplib only works on x86 from what it says on their website. Anyway, I downloaded the LZMA SDK and it's a complete mess. There is no documentation whatsoever, just 4 folders, one for each programming language. There is only some decompression code for ANSI C, and the C++ code is totally messed up, 4 subfolders, all with the most meaningless names. I dont have a single clue which files to copy and what functions to call. Has anyone ever dealt with this stuff ? I could use the dll for packing, but I don't want to ship the dll with it, and I'd have to keep a dll for each platform. But I still need the decompression code for the packer stub. Could anyone point me in the right direction ?
Armaked0n Posted January 5, 2009 Posted January 5, 2009 (edited) I've attached LZMA decompression function (from RLPack 1.21). I hope it's what you're looking for lzma_depack.rar Edited January 5, 2009 by Armaked0n
GaBoR Posted January 5, 2009 Posted January 5, 2009 (edited) Here is a packer called SimplePack which uses LZMA and has source code too:http://mordor.in/files/projects/simple-pack.1.0.lzma.beta.src.rar Edited January 5, 2009 by GaBoR
Killboy Posted January 5, 2009 Author Posted January 5, 2009 Thanks, but I was looking for how to include it in C++. Should have mentioned that in the text, doh ^^ Heh, I knew the name was familiar, this stuff was ripped from Packman. I took another look and hell yeah, it comes with C++ source Thanks
cektop Posted January 5, 2009 Posted January 5, 2009 As far as I see, LzmaEncoder is the class you are looking for and Code method is to be called when you want to compress something. There are also few steps before this.I stopped using anything but C# since a long time, so I'll point you to an example there. You should look for LzmaAlone (or Lzma_Alone for other languages) project if you want to see some exampes. In LzmaAlone.cs file, following code is one that does most of the job. You also need to set some properties before, open streams etc. Compression.LZMA.Encoder encoder = new Compression.LZMA.Encoder(); encoder.SetCoderProperties(propIDs, properties); encoder.WriteCoderProperties(outStream); Int64 fileSize; if (eos || stdInMode) fileSize = -1; else fileSize = inStream.Length; for (int i = 0; i < 8; i++) outStream.WriteByte((Byte)(fileSize >> (8 * i))); if (trainStream != null) { CDoubleStream doubleStream = new CDoubleStream(); doubleStream.s1 = trainStream; doubleStream.s2 = inStream; doubleStream.fileIndex = 0; inStream = doubleStream; long trainFileSize = trainStream.Length; doubleStream.skipSize = 0; if (trainFileSize > dictionary) doubleStream.skipSize = trainFileSize - dictionary; trainStream.Seek(doubleStream.skipSize, SeekOrigin.Begin); encoder.SetTrainSize((uint)(trainFileSize - doubleStream.skipSize)); } encoder.Code(inStream, outStream, -1, -1, null);
mudlord Posted September 11, 2011 Posted September 11, 2011 this is a bump and a half, but I figured I should post here.Is there a version of this LZMA depacker that preserves the stack?the ASM version of aplib works flawlessly though...
atom0s Posted September 12, 2011 Posted September 12, 2011 Check out 7zips page that has an LZMA SDK:/>http://www.7-zip.org/sdk.html
mudlord Posted September 12, 2011 Posted September 12, 2011 Thanks, but I meant the ASM version of depacking LZMA .
ghandi Posted September 18, 2011 Posted September 18, 2011 (edited) Couldn't you place a PUSHAD/POPAD pair at the epilogue/prologue of the function to preserve the registers? The decompress function uses EBP to reference its parameters and placing the PUSHAD after the locals are allocated on the stack means that any EBP indexing will remain accurate. When the function completes, POPAD restores all registers before the return value is placed in EAX, then the stack is cleaned up with LEAVE and RET 0C.You could also use individual pushes to save only EDX, EBX, ESI EDI but PUSHAD uses one byte and doesn't present any performance loss when used here. I don't have a clue what the clock cycles are when comparing PUSHAD to PUSH, maybe the speed a function which was called a lot (such as in the case of factorizing numbers or in an imaging library when blending) could be increased with streamlining two x two PUSHes but for this function PUSHAD is suitable imho.HR,Ghandi; Author: Brandon LaCombe; Date: February 3, 2006; License: Public Domain; The C prototype for this function is:; DWORD STDCALL Decompress(PVOID pvDest, PVOID pvSource, PVOID pvWorkMem);_lzma_depack: push ebp mov ebp,esp sub esp,30h ; <- Allocating local variable space pushad ; <- Save our registers onto the stack xor eax,eax inc eax mov edi,[ebp+10h] ; <- Function accessing parameter (pvWorkMem) mov [ebp-14h],eax ; <- Function zeroing local variable mov [ebp-1Ch],eax ; <- Etc... mov [ebp-18h],eax mov [ebp-28h],eax mov eax,400h xor edx,edx mov ecx,30736h rep stosd mov eax,[ebp+0Ch] push 5 mov [ebp-8],eax mov [ebp-10h],edx mov [ebp-1],dl mov [ebp-0Ch],edx mov [ebp+0Ch],edx or eax,0FFFFFFFFh pop ecx@loc_401041: mov esi,[ebp-8] mov edx,[ebp+0Ch] movzx esi,byte ptr[esi] shl edx,8 or edx,esi inc dword ptr[ebp-8] dec ecx mov [ebp+0Ch],edx jnz @loc_401041@loc_401058: mov esi,[ebp-10h] mov ecx,[ebp-0Ch] mov edx,[ebp+10h] and esi,3 shl ecx,4 add ecx,esi cmp eax,1000000h lea edi,[edx+ecx*4] jnb @loc_40108A mov edx,[ebp-8] mov ecx,[ebp+0Ch] movzx edx,byte ptr[edx] shl ecx,8 or ecx,edx shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],ecx@loc_40108A: mov ecx,[edi] mov ebx,eax shr ebx,0Bh imul ebx,ecx cmp [ebp+0Ch],ebx jnb @loc_401207 mov esi,800h sub esi,ecx shr esi,5 add esi,ecx movzx ecx,byte ptr[ebp-1] imul ecx,0C00h xor edx,edx mov [edi],esi mov esi,[ebp+10h] inc edx cmp dword ptr[ebp-0Ch],7 lea ecx,[ecx+esi+1CD8h] mov eax,ebx mov [ebp-20h],ecx jl @loc_401170 mov ecx,[ebp-10h] sub ecx,[ebp-14h] mov esi,[ebp+8] movzx ecx,byte ptr[ecx+esi] mov [ebp-24h],ecx@loc_4010E1: shl dword ptr[ebp-24h],1 mov esi,[ebp-24h] mov edi,[ebp-20h] and esi,100h cmp eax,1000000h lea ecx,[esi+edx] lea ecx,[edi+ecx*4+400h] mov [ebp-2Ch],ecx jnb @loc_40111B mov ebx,[ebp-8] mov edi,[ebp+0Ch] movzx ebx,byte ptr[ebx] shl edi,8 or edi,ebx shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],edi@loc_40111B: mov ecx,[ecx] mov edi,eax shr edi,0Bh imul edi,ecx cmp [ebp+0Ch],edi jnb @loc_401149 mov eax,edi mov edi,800h sub edi,ecx shr edi,5 add edi,ecx mov ecx,[ebp-2Ch] add edx,edx test esi,esi mov [ecx],edi jnz @loc_4011C9 jmp @loc_401162@loc_401149: sub [ebp+0Ch],edi sub eax,edi mov edi,ecx shr edi,5 sub ecx,edi test esi,esi mov edi,[ebp-2Ch] mov [edi],ecx lea edx,[edx+edx+1] jz @loc_4011C9@loc_401162: cmp edx,100h jl @loc_4010E1 jmp @loc_4011D1@loc_401170: cmp eax,1000000h mov ecx,[ebp-20h] lea edi,[ecx+edx*4] jnb @loc_401194 mov esi,[ebp-8] mov ecx,[ebp+0Ch] movzx esi,byte ptr[esi] shl ecx,8 or ecx,esi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],ecx@loc_401194: mov ecx,[edi] mov esi,eax shr esi,0Bh imul esi,ecx cmp [ebp+0Ch],esi jnb @loc_4011B7 mov eax,esi mov esi,800h sub esi,ecx shr esi,5 add esi,ecx mov [edi],esi add edx,edx jmp @loc_4011C9@loc_4011B7: sub [ebp+0Ch],esi sub eax,esi mov esi,ecx shr esi,5 sub ecx,esi mov [edi],ecx lea edx,[edx+edx+1]@loc_4011C9: cmp edx,100h jl @loc_401170@loc_4011D1: mov esi,[ebp-10h] mov ecx,[ebp+8] inc dword ptr[ebp-10h] cmp dword ptr[ebp-0Ch],4 mov [ebp-1],dl mov [esi+ecx],dl jge @loc_4011EF and dword ptr[ebp-0Ch],0 jmp @loc_401058@loc_4011EF: cmp dword ptr[ebp-0Ch],0Ah jge @loc_4011FE sub dword ptr[ebp-0Ch],3 jmp @loc_401058@loc_4011FE: sub dword ptr[ebp-0Ch],6 jmp @loc_401058@loc_401207: sub [ebp+0Ch],ebx mov edx,ecx shr edx,5 sub ecx,edx mov edx,[ebp-0Ch] sub eax,ebx cmp eax,1000000h mov [edi],ecx mov ecx,[ebp+10h] lea edx,[ecx+edx*4+300h] jnb @loc_401240 mov edi,[ebp-8] mov ecx,[ebp+0Ch] movzx edi,byte ptr[edi] shl ecx,8 or ecx,edi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],ecx@loc_401240: mov ecx,[edx] mov edi,eax shr edi,0Bh imul edi,ecx cmp [ebp+0Ch],edi jnb @loc_401292 mov eax,edi mov edi,800h sub edi,ecx shr edi,5 add edi,ecx cmp dword ptr[ebp-0Ch],7 mov ecx,[ebp-18h] mov [ebp-28h],ecx mov ecx,[ebp-1Ch] mov [ebp-18h],ecx mov ecx,[ebp-14h] mov [edx],edi mov [ebp-1Ch],ecx jge @loc_40127D and dword ptr[ebp-0Ch],0 jmp @loc_401284@loc_40127D: mov dword ptr[ebp-0Ch],3@loc_401284: mov ecx,[ebp+10h] add ecx,0CC8h jmp @loc_40147B@loc_401292: sub [ebp+0Ch],edi sub eax,edi mov edi,ecx shr edi,5 sub ecx,edi cmp eax,1000000h mov [edx],ecx mov ecx,[ebp-0Ch] mov edx,[ebp+10h] lea edi,[edx+ecx*4+330h] jnb @loc_4012CB mov edx,[ebp-8] mov ecx,[ebp+0Ch] movzx edx,byte ptr[edx] shl ecx,8 or ecx,edx shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],ecx@loc_4012CB: mov ecx,[edi] mov edx,eax shr edx,0Bh imul edx,ecx cmp [ebp+0Ch],edx jnb @loc_40137F mov ebx,800h sub ebx,ecx shr ebx,5 add ebx,ecx mov ecx,[ebp-0Ch] add ecx,0Fh shl ecx,4 mov [edi],ebx mov edi,[ebp+10h] add ecx,esi cmp edx,1000000h mov eax,edx lea edi,[edi+ecx*4] jnb @loc_401320 mov ecx,[ebp+0Ch] shl edx,8 mov eax,edx mov edx,[ebp-8] movzx edx,byte ptr[edx] shl ecx,8 or ecx,edx inc dword ptr[ebp-8] mov [ebp+0Ch],ecx@loc_401320: mov ecx,[edi] mov edx,eax shr edx,0Bh imul edx,ecx cmp [ebp+0Ch],edx jnb @loc_40136C mov esi,[ebp-10h] mov eax,edx mov edx,800h sub edx,ecx shr edx,5 add edx,ecx xor ecx,ecx cmp dword ptr[ebp-0Ch],7 mov [edi],edx mov edx,[ebp+8] setnl cl lea ecx,[ecx+ecx+9] mov [ebp-0Ch],ecx mov ecx,[ebp-10h] sub ecx,[ebp-14h] inc dword ptr[ebp-10h] mov cl,[ecx+edx] mov [ebp-1],cl mov [esi+edx],cl jmp @loc_401058@loc_40136C: sub [ebp+0Ch],edx sub eax,edx mov edx,ecx shr edx,5 sub ecx,edx mov [edi],ecx jmp @loc_40145F@loc_40137F: sub [ebp+0Ch],edx sub eax,edx mov edx,ecx shr edx,5 sub ecx,edx cmp eax,1000000h mov edx,[ebp+10h] mov [edi],ecx mov ecx,[ebp-0Ch] lea edx,[edx+ecx*4+360h] jnb @loc_4013B8 mov edi,[ebp-8] mov ecx,[ebp+0Ch] movzx edi,byte ptr[edi] shl ecx,8 or ecx,edi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],ecx@loc_4013B8: mov ecx,[edx] mov edi,eax shr edi,0Bh imul edi,ecx cmp [ebp+0Ch],edi jnb @loc_4013DC mov eax,edi mov edi,800h sub edi,ecx shr edi,5 add edi,ecx mov ecx,[ebp-1Ch] mov [edx],edi jmp @loc_401456@loc_4013DC: sub [ebp+0Ch],edi sub eax,edi mov edi,ecx shr edi,5 sub ecx,edi cmp eax,1000000h mov [edx],ecx mov ecx,[ebp-0Ch] mov edx,[ebp+10h] lea edx,[edx+ecx*4+390h] jnb @loc_401415 mov edi,[ebp-8] mov ecx,[ebp+0Ch] movzx edi,byte ptr[edi] shl ecx,8 or ecx,edi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],ecx@loc_401415: mov ecx,[edx] mov edi,eax shr edi,0Bh imul edi,ecx cmp [ebp+0Ch],edi jnb @loc_401439 mov eax,edi mov edi,800h sub edi,ecx shr edi,5 add edi,ecx mov ecx,[ebp-18h] mov [edx],edi jmp @loc_401450@loc_401439: sub [ebp+0Ch],edi sub eax,edi mov edi,ecx shr edi,5 sub ecx,edi mov [edx],ecx mov edx,[ebp-18h] mov ecx,[ebp-28h] mov [ebp-28h],edx@loc_401450: mov edx,[ebp-1Ch] mov [ebp-18h],edx@loc_401456: mov edx,[ebp-14h] mov [ebp-1Ch],edx mov [ebp-14h],ecx@loc_40145F: xor ecx,ecx cmp dword ptr[ebp-0Ch],7 setnl cl dec ecx and ecx,0FFFFFFFDh add ecx,0Bh mov [ebp-0Ch],ecx mov ecx,[ebp+10h] add ecx,14D0h@loc_40147B: cmp eax,1000000h jnb @loc_401499 mov edi,[ebp-8] mov edx,[ebp+0Ch] movzx edi,byte ptr[edi] shl edx,8 or edx,edi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],edx@loc_401499: mov edx,[ecx] mov edi,eax shr edi,0Bh imul edi,edx cmp [ebp+0Ch],edi jnb @loc_4014C5 mov eax,edi mov edi,800h sub edi,edx shr edi,5 add edi,edx shl esi,5 and dword ptr[ebp-24h],0 mov [ecx],edi lea ecx,[esi+ecx+8] jmp @loc_401523@loc_4014C5: sub [ebp+0Ch],edi sub eax,edi mov edi,edx shr edi,5 sub edx,edi cmp eax,1000000h mov [ecx],edx jnb @loc_4014F1 mov edi,[ebp-8] mov edx,[ebp+0Ch] movzx edi,byte ptr[edi] shl edx,8 or edx,edi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],edx@loc_4014F1: mov edx,[ecx+4] mov edi,eax shr edi,0Bh imul edi,edx cmp [ebp+0Ch],edi jnb @loc_40152C mov eax,edi mov edi,800h sub edi,edx shr edi,5 add edi,edx shl esi,5 mov [ecx+4],edi lea ecx,[esi+ecx+208h] mov dword ptr[ebp-24h],8@loc_401523: mov dword ptr[ebp-20h],3 jmp @loc_40154F@loc_40152C: sub [ebp+0Ch],edi mov esi,edx shr esi,5 sub edx,esi sub eax,edi mov [ecx+4],edx add ecx,408h mov dword ptr[ebp-24h],10h mov dword ptr[ebp-20h],8@loc_40154F: mov edx,[ebp-20h] xor ebx,ebx mov [ebp-2Ch],edx inc ebx@loc_401558: cmp eax,1000000h jnb @loc_401576 mov esi,[ebp-8] mov edx,[ebp+0Ch] movzx esi,byte ptr[esi] shl edx,8 or edx,esi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],edx@loc_401576: mov edx,[ecx+ebx*4] mov esi,eax shr esi,0Bh imul esi,edx cmp [ebp+0Ch],esi jnb @loc_40159B mov eax,esi mov esi,800h sub esi,edx shr esi,5 add esi,edx mov [ecx+ebx*4],esi add ebx,ebx jmp @loc_4015AE@loc_40159B: sub [ebp+0Ch],esi sub eax,esi mov esi,edx shr esi,5 sub edx,esi mov [ecx+ebx*4],edx lea ebx,[ebx+ebx+1]@loc_4015AE: dec dword ptr[ebp-2Ch] jnz @loc_401558 mov ecx,[ebp-20h] xor edx,edx inc edx mov esi,edx shl esi,cl mov ecx,[ebp-24h] sub ecx,esi add ebx,ecx cmp dword ptr[ebp-0Ch],4 mov [ebp-30h],ebx jge @loc_401765 add dword ptr[ebp-0Ch],7 cmp ebx,4 jge @loc_4015DE mov ecx,ebx jmp @loc_4015E1@loc_4015DE: push 3 pop ecx@loc_4015E1: mov esi,[ebp+10h] shl ecx,8 lea edi,[ecx+esi+6C0h] mov dword ptr[ebp-2Ch],6@loc_4015F5: cmp eax,1000000h jnb @loc_401613 mov esi,[ebp-8] mov ecx,[ebp+0Ch] movzx esi,byte ptr[esi] shl ecx,8 or ecx,esi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],ecx@loc_401613: mov ecx,[edi+edx*4] mov esi,eax shr esi,0Bh imul esi,ecx cmp [ebp+0Ch],esi jnb @loc_401638 mov eax,esi mov esi,800h sub esi,ecx shr esi,5 add esi,ecx mov [edi+edx*4],esi add edx,edx jmp @loc_40164B@loc_401638: sub [ebp+0Ch],esi sub eax,esi mov esi,ecx shr esi,5 sub ecx,esi mov [edi+edx*4],ecx lea edx,[edx+edx+1]@loc_40164B: dec dword ptr[ebp-2Ch] jnz @loc_4015F5 sub edx,40h cmp edx,4 mov edi,edx jl @loc_401736 mov ecx,edx sar ecx,1 and edi,1 dec ecx or edi,2 cmp edx,0Eh mov [ebp-14h],ecx jge @loc_401683 shl edi,cl mov ecx,edi sub ecx,edx mov edx,[ebp+10h] lea ebx,[edx+ecx*4+0ABCh] jmp @loc_4016C9@loc_401683: sub ecx,4@loc_401686: cmp eax,1000000h jnb @loc_4016A4 mov esi,[ebp-8] mov edx,[ebp+0Ch] movzx esi,byte ptr[esi] shl edx,8 or edx,esi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],edx@loc_4016A4: shr eax,1 add edi,edi cmp [ebp+0Ch],eax jb @loc_4016B3 sub [ebp+0Ch],eax or edi,1@loc_4016B3: dec ecx jnz @loc_401686 mov ebx,[ebp+10h] add ebx,0C88h shl edi,4 mov dword ptr[ebp-14h],4@loc_4016C9: xor ecx,ecx inc ecx mov [ebp-20h],ebx mov [ebp-24h],ecx@loc_4016D2: cmp eax,1000000h jnb @loc_4016F0 mov esi,[ebp-8] mov edx,[ebp+0Ch] movzx esi,byte ptr[esi] shl edx,8 or edx,esi shl eax,8 inc dword ptr[ebp-8] mov [ebp+0Ch],edx@loc_4016F0: mov edx,[ebx+ecx*4] mov esi,eax shr esi,0Bh imul esi,edx cmp [ebp+0Ch],esi jnb @loc_401715 mov eax,esi mov esi,800h sub esi,edx shr esi,5 add esi,edx mov [ebx+ecx*4],esi add ecx,ecx jmp @loc_40172E@loc_401715: sub [ebp+0Ch],esi mov ebx,[ebp-20h] sub eax,esi mov esi,edx shr esi,5 sub edx,esi or edi,[ebp-24h] mov [ebx+ecx*4],edx lea ecx,[ecx+ecx+1]@loc_40172E: shl dword ptr[ebp-24h],1 dec dword ptr[ebp-14h] jnz @loc_4016D2@loc_401736: inc edi mov [ebp-14h],edi jz @loc_40176A mov ebx,[ebp-30h]@loc_40173F: mov ecx,[ebp-10h] inc ebx sub ecx,edi inc ebx add ecx,[ebp+8]@loc_401749: mov dl,[ecx] mov esi,[ebp-10h] mov edi,[ebp+8] dec ebx inc dword ptr[ebp-10h] inc ecx test ebx,ebx mov [ebp-1],dl mov [esi+edi],dl jnz @loc_401749 jmp @loc_401058@loc_401765: mov edi,[ebp-14h] jmp @loc_40173F@loc_40176A: popad mov eax,[ebp-10h] leave retn 0Ch Edited September 18, 2011 by ghandi 1
mudlord Posted September 18, 2011 Posted September 18, 2011 *facepalm*forgot that, thanx!I'll let you know how it goes.
mudlord Posted September 19, 2011 Posted September 19, 2011 Thanks Ghandi. The code had a overhead of around 1902 bytes. Not bad for something offering quite significant savings over aplib.
ghandi Posted September 19, 2011 Posted September 19, 2011 Nice to know it worked, glad i could help . If i recall correctly, doesn't LZMA compression efficiency increase (up to a threshold limit) in accordance to the size of files/chunks increasing? I think that is why some compression tools offer the choice of both but I'm not sure. HR, Ghandi
mudlord Posted September 19, 2011 Posted September 19, 2011 (edited) Yep you are correct, the compression efficiency seems to be related to the size of the data. Under a certain limit, its worth just using libraries like JCALG and aplib, since the overhead of the LZMA depacker is too much. Plus, it seems those algorithms get good rates anyway on small sets of data, and the difference then with LZMA is only a few KB. o.o From my experiments with aplib in my packer, more often than not, aplib works really well with small targets like crackmes, and some of my keygen/patch templates. LZMA worked really well on my bigger stuff like my GBA emulator, plus some other emulators/apps I tried (which are around 800KB-10MB in size). Multimedia stuff it seems compresses great, which is nice. I'm considering adding some detection for such size exes/dlls, and then compression could be chosen optimally on the fly. Of course, overrides would be added in that case, too. I suppose implementing a code preprocessor ala how kkrunchy does compression could help with code compression even more, especially with LZ based compressors like aplib. I'm interested to research how other algorithms go, too, like PAQ or even more LZ variants like LZH... Edited September 19, 2011 by mudlord
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now