[Unpack/DecompileMe] little .pyc, why are you so weird?

August 23, 2015

Here you go... have fun!

You can use it like this: name.pyc <password>

Edited August 23, 2015 by bomblader

August 24, 2015

The file is nothing special. It has been processed by PjOrion

It breaks existing disassemblers and decompilers since it introduces junk instructions in the code stream which are never executed.

Python unlike java does not have a bytecode verifier, so python never complains unless those are executed.

To handle this we need a recursive disassembler like IDA Pro.

There is already python processor module(Ch 19), but it does not seem to work for for version 2.7

When I have time, I will come up with a tool which will automatically obfuscate such files

Here is a cmd output, showing why decompilers/disassemblers fail. Note the junk instructions.



>>> import marshal, dis

>>> f = open('1.pyc', 'rb')

>>> f.seek(8)

>>> co = marshal.load(f)

>>> dis.disassemble(co)

  1     >>    0 SETUP_EXCEPT            99 (to 102)

              3 <144>                  387

              6 STOP_CODE

              7 JUMP_FORWARD           217 (to 227)

             10 <157>                44944

             13 LOAD_NAME            28929

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "C:\Python27\lib\dis.py", line 97, in disassemble

    print '(' + co.co_names[oparg] + ')',

IndexError: tuple index out of range

Edited August 24, 2015 by Extreme Coders

September 4, 2015

anyone else?

Edited September 4, 2015 by bomblader

September 12, 2015

I have looked at the protection in a bit more detail.

It isn't sophisticated, but it introduces good protection, never seen before in python (atleast I never saw) .

It uses exception handling to redirect flow. In addition it introduces control flow obfuscation and variable name obfuscation.

However there aren't any opaque predicates.

For example here is one of the code objects disassembled.



0 SETUP_EXCEPT 99

3 <INVALID>

102 POP_TOP

103 POP_TOP

104 POP_TOP

105 LOAD_CONST 1

108 JUMP_FORWARD 14

125 MAKE_FUNCTION 0

128 JUMP_ABSOLUTE 205

205 STORE_FAST 0

208 JUMP_ABSOLUTE 145

145 SETUP_FINALLY 6

148 JUMP_ABSOLUTE 160

160 JUMP_ABSOLUTE 55

55 LOAD_CONST 2

58 JUMP_FORWARD 16

77 LOAD_CONST 0

80 JUMP_ABSOLUTE 25

25 IMPORT_NAME 0

28 JUMP_FORWARD 34

65 STORE_FAST 1

68 JUMP_ABSOLUTE 33

33 LOAD_GLOBAL 1

36 JUMP_ABSOLUTE 88

88 LOAD_FAST 1

91 JUMP_ABSOLUTE 4

4 CALL_FUNCTION 1

7 JUMP_FORWARD 217

227 POP_TOP 

228 JUMP_ABSOLUTE 165

165 LOAD_FAST 0

168 JUMP_ABSOLUTE 135

135 LOAD_FAST 0

138 JUMP_FORWARD 34

175 CALL_FUNCTION 1

178 JUMP_ABSOLUTE 14

14 POP_TOP 

15 JUMP_ABSOLUTE 43

43 POP_BLOCK 

44 JUMP_ABSOLUTE 213

213 LOAD_CONST 0

216 JUMP_FORWARD 20

239 DELETE_FAST 0

242 JUMP_ABSOLUTE 117

117 END_FINALLY 

118 JUMP_ABSOLUTE 189

189 LOAD_CONST 0

192 JUMP_ABSOLUTE 238

238 RETURN_VALUE

The first instruction sets up an exception handler. The second instruction is always invalid.

This will generate an exception and control will be transferred to 102.

From there there we need to disassemble instructions while tracking jumps and transferring control flow as needed.

Note that JUMP_FORWARD is a relative jump whereas JUMP_ABSOLUTE is an absolute jump.

To make the file decompilable, more effort is needed.

From the disassembly, we need to build up a basic block representation of the code.

Next, the list of basic blocks needs to be coalesced to remove the unnecessary jump instructions in between.

Finally, this intermediate representation must be assembled to generate the deobfuscated code.

All instructions offsets must be properly handled.

Removing variable naming obfuscation is easy.

Just rename all the entries in co_names member of the corresponding object and you are done.

September 13, 2015

Extreme Coders, is there a way to "debug" the .pyc file and find out exactly the order of execution witout manually checking each op_code?

September 13, 2015

At the moment, python does not seem to have a bytecode debugger.

One of the possible workarounds, is to debug the python vm itself.

The relevant code is in the function PyEval_EvalFrameEx in ceval.c The huge switch is the interpreter loop.

Another, but much easier way is to build python from source.

The LLTRACE flag needs to be defined in the preprocessor definitions.

This way python will display all instructions executed by it.

A third way is to (ab)use the line number mapping feature of code objects. See here for an implementation

If you follow any of the above methods, all you get is an instruction trace for a particular execution.

You would never achieve complete code coverage, which is necessary for proper deobfuscation.

Edited September 13, 2015 by Extreme Coders

September 14, 2015

Using some graphviz magic, here is the control flow graph of one of the simpler obfuscated code object.

In contrast, here is the cfg from a normal file.

And here is a one of them splattered with jumps

Note that the obfuscated code objects, contains unnecessary jumps between the basic blocks.
These jumps need to be removed i.e the basic blocks need to be coalesced.

Next, this IR needs to be assembled back to a code object, after which it will hopefully be decompilable.

EDIT: Added another sample

Edited September 19, 2015 by Extreme Coders

September 22, 2015

Just added the cfg simplification module taking some ideas from the llvm project http://llvm.org/docs/doxygen/html/SimplifyCFGPass_8cpp_source.html

It can now deobfuscate this

-------->

Now need to develop an assembler, which can turn this basic blocks into instructions.

Hopefully, LLVM will again be of help

September 28, 2015

The assembler has been developed, but some more changes need to be done to make it workable (and readable)

Currently, it optimises python bytecode even better than the python compiler, as a result decompilers (which are usually pattern matching) have problems.

This need to be fixed. Here is the source.



import types

import opcode

import collections

import Queue

import marshal

import pydotplus

import cStringIO

class BasicBlock:

    def __init__(self):

        self.addr = 0

        self.predecessors = []

        self.successors = []

        self.instructions = []

        self.refHandlerIns = []

        self.isHandler = False

        self.isEntry = False

        self.b_seen = False # b_seen is used to perform a DFS of basicblocks

    def addPredecessor(self, bb):

        self.predecessors.append(bb)

    def addSuccessor(self, bb):

        self.successors.append(bb)

    def addInstruction(self, ins):

        self.instructions.append(ins)
    def blockSize(self):

        return reduce(lambda size, ins: size + ins.size, self.instructions, 0)

class Instruction:

    def __init__(self, opkode, arg, size):

        self.opkode = opkode

        self.arg = arg

        self.size = size

class Disassembler:

    def __init__(self, code_object):

        assert isinstance(code_object, types.CodeType)

        self.c_stream = map(ord, code_object.co_code)

    def disasAt(self, offset):

        assert offset < len(self.c_stream)
        opkode = self.c_stream[offset]
        # Invalid instruction

        if opkode not in opcode.opmap.values():

            return Instruction(-1, None, 1)
        if opkode < opcode.HAVE_ARGUMENT:

            return Instruction(opkode, None, 1)
        if opkode >= opcode.HAVE_ARGUMENT:

            arg = (self.c_stream[offset + 2] << 8 ) | self.c_stream[offset + 1]

            return Instruction(opkode, arg, 3)

def isRetIns(ins):

    return ins.opkode == opcode.opmap['RETURN_VALUE']

def isBranchIns(ins):

    branchIns = [opcode.opmap[x] for x in [\

        'JUMP_IF_FALSE_OR_POP', \

        'JUMP_IF_TRUE_OR_POP', \

        'JUMP_ABSOLUTE', \

        'POP_JUMP_IF_FALSE',\

        'POP_JUMP_IF_TRUE',\

        'CONTINUE_LOOP',\

        'FOR_ITER',\

        'JUMP_FORWARD',\

        ]]
    return ins.opkode in branchIns

def isCondiBranchIns(ins):

    condiBranchIns = [opcode.opmap[x] for x in [\

        'JUMP_IF_FALSE_OR_POP', \

        'JUMP_IF_TRUE_OR_POP', \

        'POP_JUMP_IF_FALSE',\

        'POP_JUMP_IF_TRUE',\

        'FOR_ITER',\

        ]]
    return ins.opkode in condiBranchIns

def isHandlerIns(ins):

    handlerIns = [opcode.opmap[x] for x in ['SETUP_LOOP', 'SETUP_EXCEPT', 'SETUP_FINALLY', 'SETUP_WITH']]

    return ins.opkode in handlerIns

def getInsCrossRef(ins, addr):

    targets = []
    if ins.opkode == opcode.opmap['JUMP_IF_FALSE_OR_POP']:

        targets.append(addr + ins.size)

        targets.append(ins.arg)
    elif ins.opkode == opcode.opmap['JUMP_IF_TRUE_OR_POP']:

        targets.append(addr + ins.size)

        targets.append(ins.arg)
    elif ins.opkode == opcode.opmap['JUMP_ABSOLUTE']:

        targets.append(ins.arg)
    elif ins.opkode == opcode.opmap['POP_JUMP_IF_FALSE']:

        targets.append(addr + ins.size)

        targets.append(ins.arg)
    elif ins.opkode == opcode.opmap['POP_JUMP_IF_TRUE']:

        targets.append(addr + ins.size)

        targets.append(ins.arg)
    elif ins.opkode == opcode.opmap['CONTINUE_LOOP']:

        targets.append(ins.arg)
    elif ins.opkode == opcode.opmap['FOR_ITER']:

        targets.append(addr + ins.size)

        targets.append(addr + ins.size + ins.arg)
    elif ins.opkode == opcode.opmap['JUMP_FORWARD']:

        targets.append(addr + ins.size + ins.arg)
    elif ins.opkode == opcode.opmap['SETUP_LOOP']:

        targets.append(addr + ins.size)

        targets.append(addr + ins.size + ins.arg)

    elif ins.opkode == opcode.opmap['SETUP_EXCEPT']:

        targets.append(addr + ins.size)

        targets.append(addr + ins.size + ins.arg)
    elif ins.opkode == opcode.opmap['SETUP_FINALLY']:

        targets.append(addr + ins.size)

        targets.append(addr + ins.size + ins.arg)
    elif ins.opkode == opcode.opmap['SETUP_WITH']:

        targets.append(addr + ins.size)

        targets.append(addr + ins.size + ins.arg)
    elif ins.opkode != opcode.opmap['RETURN_VALUE']:

        targets.append(addr + ins.size)
    return targets

def  _leaderSortFunc(elem1, elem2):

    if elem1.addr != elem2.addr:

        return elem1.addr - elem2.addr

    else:

        if elem1.type == 'S':

            return -1

        else:

            return 1

def findLeaders(code_object, oep):

    Leader = collections.namedtuple('leader', ['type', 'addr'])
    leader_set = set()

    leader_set.add(Leader('S', oep))
    # Queue to contain list of addresses to be analyzed by linear sweep disassembly algorithm

    analysis_Q = Queue.Queue()

    analysis_Q.put(oep)
    analyzed_addresses = set()
    disassembler = Disassembler(code_object)
    while not analysis_Q.empty():

        addr = analysis_Q.get()
        while True:

            ins = disassembler.disasAt(addr)

            analyzed_addresses.add(addr)
            # If current instruction is a return, stop disassembling further

            # current address is an end leader

            if isRetIns(ins):

                leader_set.add(Leader('E', addr))

                break
            # If current instruction is braching, stop disassembling further

            # the current instr is an end leader, branch target is start leader

            if isBranchIns(ins):

                leader_set.add(Leader('E', addr))

                for target in getInsCrossRef(ins, addr):

                    leader_set.add(Leader('S', target))

                    if target not in analyzed_addresses:

                        analysis_Q.put(target)

                break
            # Current instruction is not branching

            else:

                # Get cross refs

                cross_refs = getInsCrossRef(ins, addr)

                addr = cross_refs[0] # The immediate next instruction
                # Some non branching instructions like SETUP_LOOP,

                # SETUP_EXCEPT can have more than 1 cross references

                if len(cross_refs) == 2:

                    leader_set.add(Leader('S', cross_refs[1]))
                    if cross_refs[1] not in analyzed_addresses:

                        analysis_Q.put(cross_refs[1])
    return sorted(leader_set, cmp = _leaderSortFunc)
def buildBasicBlocks(leaders, code_object, entry_addr):

    i = 0

    bb_list = []

    disassembler = Disassembler(code_object)
    while i < len(leaders):

        leader1, leader2 = leaders[i], leaders[i+1]

        addr1, addr2 = leader1.addr, leader2.addr

        bb = BasicBlock()

        bb_list.append(bb)

        bb.addr = addr1

        offset = 0

        if addr1 == entry_addr:

            bb.isEntry = True
        if leader1.type == 'S' and leader2.type == 'E':

            while addr1 + offset <= addr2:

                ins = disassembler.disasAt(addr1  + offset)

                bb.addInstruction(ins)

                offset += ins.size

            i += 2
        elif leader1.type == 'S' and leader2.type == 'S':

            while addr1 + offset < addr2:

                ins = disassembler.disasAt(addr1  + offset)

                bb.addInstruction(ins)

                offset += ins.size

            i += 1
    return bb_list

def insMnemonic(ins):

    return opcode.opname[ins.opkode]

def findbbinBBList(bb_list, bb_addr):

    for i in range(len(bb_list)):

        if bb_list[i].addr == bb_addr:

            return i
    raise Exception("No basic block with an address {} exists!!".format(bb_addr))  # Should not happen

def buildPositionIndepedentBasicBlock(bb_list):

    for bb in bb_list:

        offset = 0

        for i in range(len(bb.instructions)):

            ins = bb.instructions[i]
            # The last ins of a bb is processed specially

            if i == len(bb.instructions) - 1:

                cross_ref = getInsCrossRef(ins, bb.addr + offset)
                if isBranchIns(ins):
                    # Conditional branch ins have 2 cross refs

                    if isCondiBranchIns(ins):

                        # ref1 is the address of next instruction

                        # ref2 is the address of the branch target

                        ref1, ref2 = cross_ref[0], cross_ref[1]
                        pos = findbbinBBList(bb_list, ref2)

                        ins.arg = bb_list[pos]

                        bb.addSuccessor(bb_list[pos])

                        bb_list[pos].addPredecessor(bb)
                        pos = findbbinBBList(bb_list, ref1)

                        bb.addSuccessor(bb_list[pos])

                        bb_list[pos].addPredecessor(bb)

                    # Unconditional branch ins have 1 cross ref

                    else:

                        ref = cross_ref[0]

                        pos = findbbinBBList(bb_list, ref)

                        ins.arg = bb_list[pos]

                        bb.addSuccessor(bb_list[pos])

                        bb_list[pos].addPredecessor(bb)

                # FOR_ITER, SETUP_LOOP, SETUP_EXCEPT, SETUP_FINALLY, SETUP_WITH

                # They have 2 cross refs

                elif isHandlerIns(ins):

                    # ref1 is the address of next instruction

                    # ref2 is the address of the handler

                    ref1, ref2 = cross_ref[0], cross_ref[1]
                    pos = findbbinBBList(bb_list, ref2)

                    bb_list[pos].isHandler = True

                    bb_list[pos].refHandlerIns.append(ins)
                    ins.arg = bb_list[pos]

                    pos = findbbinBBList(bb_list, ref1)

                    bb.addSuccessor(bb_list[pos])

                    bb_list[pos].addPredecessor(bb)

                # For RETURN_VALUE instruction, nothing to do

                elif isRetIns(ins):

                    pass

                # Normal instructions, have only 1 cross ref

                else:

                    ref = cross_ref[0]

                    pos = findbbinBBList(bb_list, ref)

                    bb.addSuccessor(bb_list[pos])

                    bb_list[pos].addPredecessor(bb)

            # Not the last instruction

            else:

                if isHandlerIns(ins):

                    ref = getInsCrossRef(ins, bb.addr + offset)[1]

                    pos = findbbinBBList(bb_list, ref)

                    bb_list[pos].isHandler = True

                    bb_list[pos].refHandlerIns.append(ins)
                    ins.arg = bb_list[pos]
            offset += ins.size

def findOEP(code_object):

    '''

    Finds the original entry point of a code object obfuscated by PjOrion.

    DO NOT call this for a non obfsucated code object.
    :param code_object: the code object

    :type code_object: code    

    :returns: the entrypoint

    :rtype: int

    '''    

    disassembler = Disassembler(code_object)

    ins = disassembler.disasAt(0)
    try:

        assert insMnemonic(ins) == 'SETUP_EXCEPT'

        except_handler = 0 + ins.arg + ins.size
        assert disassembler.disasAt(3).opkode == -1

        assert insMnemonic(disassembler.disasAt(except_handler)) == 'POP_TOP'

        assert insMnemonic(disassembler.disasAt(except_handler + 1)) == 'POP_TOP'

        assert insMnemonic(disassembler.disasAt(except_handler + 2)) == 'POP_TOP'

        return except_handler + 3

    except:

        return -1

def simplifyPass1(bb_list):

    """

    Eliminates a basic block that only contains an unconditional branch.

    """

    foo = True
    while foo:

        foo = False

        for i in range(len(bb_list)):

            bb = bb_list[i]

            if bb.isHandler and len(bb.instructions) == 1:

                ins = bb.instructions[0]

                if insMnemonic(ins) == 'JUMP_FORWARD' or insMnemonic(ins) == 'JUMP_ABSOLUTE':

                    branch_target_bb = bb.successors[0] # Branch target of this basic block

                    branch_target_bb.predecessors.remove(bb)
                    branch_target_bb.isHandler = True

                    for refIns in bb.refHandlerIns:

                        refIns.arg = branch_target_bb
                    branch_target_bb.refHandlerIns = bb.refHandlerIns
                    # Now iterate over all predecessors of this bb

                    for j in range(len(bb.predecessors)):

                        # Remove this bb from the successor list

                        # Add branch target bb to the successor list

                        bb.predecessors[j].successors.remove(bb)

                        bb.predecessors[j].addSuccessor(branch_target_bb)

                        branch_target_bb.addPredecessor(bb.predecessors[j])
                        last_ins = bb.predecessors[j].instructions[-1]

                        if last_ins.opkode in opcode.hasjabs or last_ins.opkode in opcode.hasjrel:

                            last_ins.arg = branch_target_bb
                    del bb_list[i]

                    foo = True

                    break

            elif not bb.isHandler and len(bb.instructions) == 1:

                ins = bb.instructions[0]

                if insMnemonic(ins) == 'JUMP_FORWARD' or insMnemonic(ins) == 'JUMP_ABSOLUTE':

                    branch_target_bb = bb.successors[0] # Branch target of this basic block

                    branch_target_bb.predecessors.remove(bb)
                    # Now iterate over all predecessors of this bb

                    for j in range(len(bb.predecessors)):

                        # Remove this bb from the successor list

                        # Add branch target bb to the successor list

                        bb.predecessors[j].successors.remove(bb)

                        bb.predecessors[j].addSuccessor(branch_target_bb)

                        branch_target_bb.addPredecessor(bb.predecessors[j])
                        last_ins = bb.predecessors[j].instructions[-1]

                        if last_ins.opkode in opcode.hasjabs or last_ins.opkode in opcode.hasjrel:

                            last_ins.arg = branch_target_bb
                    del bb_list[i]

                    foo = True

                    break
def simplifyPass2(bb_list):

    """

    Merges a basic block into its predecessor if there is only one and the

    predecessor only has one successor.

    """
    foo = True

    while foo:

        foo = False

        for i in range(len(bb_list)):

            bb = bb_list[i]
            # Not a handler block & has only 1 predecessor

            if not bb.isHandler and len(bb.predecessors) == 1:

                pred = bb.predecessors[0]

                # Predecessor has only 1 successor

                if len(pred.successors) == 1:

                    # Merge this bb with its predecessor

                    last_ins_pred = pred.instructions[-1]
                    # If last instruction of predecessor is either JUMP_ABSOLUTE or JUMP_FORWARD, delete it

                    if insMnemonic(last_ins_pred) == 'JUMP_ABSOLUTE' or insMnemonic(last_ins_pred) == 'JUMP_FORWARD':

                        del pred.instructions[-1]
                    # Append all instructions of current bb

                    for ins in bb.instructions:

                        pred.addInstruction(ins)
                    del pred.successors[:]
                    for succ in bb.successors:

                        pred.addSuccessor(succ)

                        succ.predecessors.remove(bb)

                        succ.addPredecessor(pred)
                    del bb_list[i]

                    foo = True

                    break

def bbToDot(bb):

    dot = '<<table align = "left" border = "0">'

    if bb.isEntry:

        dot += '<tr><td align = "left"><font point-size = "8" color = "#9dd600">entrypoint:</font></td></tr>'
    elif bb.isHandler:

        dot += '<tr><td align = "left"><font point-size = "8" color = "#9dd600">handler:</font></td></tr>'
    #else:

    #    dot += '<tr><td align = "left"><font point-size = "8" color = "#9dd600">off_{}:</font></td></tr>'.format(bb.addr)
    for ins in bb.instructions:

        dot += '<tr><td align = "left">{}</td></tr>'.format(insMnemonic(ins))

    dot += '</table>>'
    return pydotplus.Node('off_{}'.format(bb.addr), shape='none', style='filled', color='#2d2d2d',

                        label=dot, fontcolor='white', fontname='Consolas', fontsize='9')

def buildEdges(graph, nodelist, bb_list):

    for i in range(len(bb_list)):

        bb = bb_list[i]

        for succ in bb.successors:

            graph.add_edge(pydotplus.Edge(nodelist[i], nodelist[bb_list.index(succ)]))

def buildGraph(bb_list):

    graph = pydotplus.Dot(graph_type='digraph')

    # graph.set('splines', 'curved')

    nodelist = []

    for bb in bb_list:

        node = bbToDot(bb)

        graph.add_node(node)

        nodelist.append(node)
    buildEdges(graph, nodelist, bb_list)

    graph.write_svg('1_d.svg')
class Assembler:

    def __init__(self, bb_list):

        self.bb_list = bb_list

        self.a_postorder = [None] * len(bb_list)

        self.a_nblocks = 0

    def assemble(self):

        for bb in self.bb_list:

            if bb.isEntry:

                self._dfs(bb)

                break
        # Can't modify the bytecode after computing jump offsets.

        self._assembleJumpOffsets()            

        return self._emit()

    def _assembleIns(self, ins):

        size = ins.size
        if ins.opkode >= opcode.HAVE_ARGUMENT:

            arg = ins.arg
        if size == 1:

            return chr(ins.opkode)
        elif size == 3:

            return chr(ins.opkode) + chr(arg & 0xFF) + chr((arg >> 8) & 0xFF)
        else:

            raise Exception('EXTENDED_ARG not yet implemented')

    def _emit(self):

        code = cStringIO.StringIO()

        for i in range(len(self.a_postorder) - 1, -1, -1):

            bb = self.a_postorder[i]
            for ins in bb.instructions:

                code.write(self._assembleIns(ins))
        return code.getvalue()             

    def _dfs(self, bb):

        if bb.b_seen:

            return

        bb.b_seen = True
        if len(bb.successors) > 0:

            self._dfs(bb.successors[0])

        for i in range(len(bb.instructions)):

            ins = bb.instructions[i]

            if isinstance(ins.arg, BasicBlock):

            #if ins.opkode in opcode.hasjabs or ins.opkode in opcode.hasjrel:

                self._dfs(ins.arg)

        if len(bb.successors) == 2:

            self._dfs(bb.successors[1])
        self.a_postorder[self.a_nblocks] = bb

        self.a_nblocks += 1

    def _assembleJumpOffsets(self):

        totsize = 0
        # Iterate in reverse order and calculate the addresses of each bb

        for i in range(len(self.a_postorder) - 1, -1, -1):

            bsize = self.a_postorder[i].blockSize()

            self.a_postorder[i].addr = totsize

            totsize += bsize
        # We have calculated the offsets of each bb
        for bb in self.a_postorder:

            bsize = bb.addr

            for ins in bb.instructions:

                bsize += ins.size

                if ins.opkode in opcode.hasjabs:

                    ins.arg = ins.arg.addr
                elif ins.opkode in opcode.hasjrel:

                    ins.arg = ins.arg.addr - bsize
def deobfuscate(code_object):

    assert isinstance(code_object, types.CodeType)

    oep = findOEP(code_object)
    if oep == -1:

        print 'Not generating cfg for ', code_object.co_name

        return code_object.co_code
    leader_set = findLeaders(code_object, oep)

    bb_list = buildBasicBlocks(leader_set, code_object, oep)

    buildPositionIndepedentBasicBlock(bb_list)

    print 'Original number of basic blocks: ', len(bb_list)

    simplifyPass1(bb_list)

    print 'Number of basic blocks after pass 1: ', len(bb_list)

    simplifyPass2(bb_list)

    print 'Number of basic blocks after pass 2: ', len(bb_list)

    #buildGraph(bb_list)

    return Assembler(bb_list).assemble()

def recurseCodeObjects(code_obj):

    mod_const = []

    for const in code_obj.co_consts:

        if isinstance(const, types.CodeType):

            mod_const.append(recurseCodeObjects(const))

        else:

            mod_const.append(const)
    argcount = code_obj.co_argcount

    nlocals = code_obj.co_nlocals

    stacksize = code_obj.co_stacksize

    flags = code_obj.co_flags

    codestring = deobfuscate(code_obj)

    constants = tuple(mod_const)

    names = code_obj.co_names

    varnames = tuple('var{}'.format(i) for i in range(len(code_obj.co_varnames)))

    filename = code_obj.co_filename

    import random

    name = str(random.randint(100,999)) # 'renamed'  # XXX: Use a better way

    firstlineno = code_obj.co_firstlineno

    lnotab = code_obj.co_lnotab

    return types.CodeType(argcount, nlocals, stacksize, \

                          flags, codestring, constants, names, \

                          varnames,

                          filename, 

                          name, 

                          firstlineno, 

                          lnotab)
def main():

    fSrc = open('ob1.pyc', 'rb')

    fSrc.seek(8)

    c_obj = marshal.load(fSrc)

    fSrc.close()

    fOut = open('ob1_deobf.pyc', 'wb') 

    fOut.write('\x03\xf3\x0d\x0a\0\0\0\0')

    marshal.dump(recurseCodeObjects(c_obj), fOut)

    fOut.close()

if __name__ == '__main__':

    main()

November 6, 2015

@ExtremeCoders Can we deobfuscate files such as ?

http://pastebin.com/jfpvdw2L

Obfuscated with pyobfuscate

Edited November 6, 2015 by madskillz

November 7, 2015

@madskillz: That is source code obfuscation which is different from binary obfuscation.

pyobfuscate operates on the source files whereas pjorion operates on the binary level.

To reverse the obfuscation of pyobfuscate you need to develop a python source code parser.

The ast, tokenize & parser modules would be a good starting point to learn about this.

For manual deobfuscation use a good python ide such as pycharm.

As an instance there are many redundant if's which are always true,



if 30 - 30: o0oOOo0O0Ooo - O0 % o0oOOo0O0Ooo - OoooooooOO * O0 * OoooooooOO

if 60 - 60: iIii1I11I1II1 / i1IIi * oO0o - I1ii11iIi11i + o0oOOo0O0Ooo

if 94 - 94: i1IIi % Oo0Ooo

if 68 - 68: Ii1I / O0

pycharm can automatically refactor the code to remove such statements.

November 20, 2015

How about the "protect .pyc file" option in orion?

Looks like it loads multiple times encoded code and run it in memory or something. Is there any way to "dump" the final loaded code from memory?

Also, is there any way to re-assemble disassembled python code?

Edited November 20, 2015 by bomblader

November 20, 2015

12 minutes ago, bomblader said:

Looks like it loads multiple times encoded code and run it in memory or something. Is there any way to "dump" the final loaded code from memory?

There is nothing to dump as nothing is decrypted. All the protector does is control flow obfuscation. It takes the individual instructions, link them by unconditional jumps, and scatters them.

19 minutes ago, bomblader said:

Also, is there any way to re-assemble disassembled python code?

The deobfuscator already does that, but it needs more refinement to make it usable in all cases. I would eventually like to make this a full fledged deobfuscator but there are no promises when.

November 20, 2015

24 minutes ago, Extreme Coders said:

There is nothing to dump as nothing is decrypted. All the protector does is control flow obfuscation. It takes the individual instructions, link them by unconditional jumps, and scatters them.

The deobfuscator already does that, but it needs more refinement to make it usable in all cases. I would eventually like to make this a full fledged deobfuscator but there are no promises when.

I am talking about transforming the disassembled code into a .pyc file. (Not getting the original source code out of it)

The "protect" option makes the disassembled file look like this: http://pastebin.com/NuGjqJDK (check the code ending stuff at the bottom)

Edited November 20, 2015 by bomblader

May 11, 2016

The project has been open sourced on github (https://github.com/extremecoders-re/PjOrion-Deobfuscator)

Note that this is currently in pre-alpha stage which would be improved with time.
You can read the related blog entry at https://0xec.blogspot.com/2016/05/pjorion-deobfuscator-open-sourced.html

Sign In

[Unpack/DecompileMe] little .pyc, why are you so weird?

Recommended Posts

bomblader

Extreme Coders

bomblader

Extreme Coders

bomblader

Extreme Coders

Extreme Coders

Extreme Coders

Extreme Coders

madskillz

Extreme Coders

bomblader

Extreme Coders

bomblader

Extreme Coders

Create an account or sign in to comment

Create an account

Sign in

Community

Search Engines

Code Search

File Search

Search Engines