Ripping functions from PE binaries and reusing them elsewhere

Accessing a function from a PE via Python3

Accessing a PE’s function via Python!

There may be occasions where a PE binary contains helpful functions that could be reused elsewhere. If the function’s implementation is not critical, it may be possible to rip the code out as-is. The function will then be used as a black box where input data is processed and returned as an output without knowledge of its internal workings.

There are some applications where this technique could be especially useful:

Recently, I was disassembling a binary that had a function to decrypt a block of data. At over a thousand lines of x86 instructions, the function proved to be tricky to re-implement, hence I thought of trying this process as an experiment.

The general outline of the process is as follows:


Feasibility analysis

IDA disassembly

The desired function is first identified, along with its inputs and outputs.

In my case, the decryption function is located at 0x41C6D0, taking 3 parameters: cipher (edx register), plaintext output (stack), key (stack).
These are addresses to byte arrays located in .data .

As the desired instructions will be copied as-is, some issues may arise from this process:

If the function does not have much hiccups above, it should be possible to try ripping it!

Ripping the code

There are apparently two popular utilities to accomplish this:

I chose to use the latter as the setup process was simply downloading, unarchiving and running it.

TMG Ripper Studio

Using TMGRS is straightforward:

  1. Specify the path to your binary, and the virtual address of the function to be ripped. (0x41C6D0 in my case). When Start trace is clicked, TMGRS automatically determines the file offset, traces the disassembly and displays data references used by that region of instructions.
  2. Review the data references to ensure that the data types are correct (IDA usually makes accurate guesses). Clicking on Process datarefs will save those changes.
  3. Add an identifier of your choice. This allows the function to be accessed via call yourIdentifier later. When you are ready, Save to asm and you will have a MASM-compatible rip!

Packaging the code

Packaged as a command line tool

At this point, a (M)ASM file is ready with the desired instructions, although it is incapable of doing anything on its own. Again, there are many ways to use this data, and I chose to use it like a command line utility: by building a wrapper to take in command-line parameters, and return the result via the standard output.

Modifications to the original ASM file include:

The ASM file that I worked on can be downloaded here. If you intend to hack on it, take note that there is no bounds checking and it expects the data to be properly formatted.

Building the ASM file requires MASM32. Running the below commands with your_file replaced with your filename should hopefully result in a working binary in the same folder.

c:\masm32\bin\ml /c /coff /Cp your_file.asm
c:\masm32\bin\link /SUBSYSTEM:CONSOLE /LIBPATH:c:\masm32\lib your_file.obj

Troubleshooting

Issues may arise from assuming incorrect data types during the ripping process. What I guessed as DWORDs turned out to be byte arrays. However the indices were determined at runtime, and I was unsure on how much data it was accessing. I got around this issue by ripping the a large chunk of the memory region which is effective (since unused contiguous data bytes do not affect the program flow) at the expense of a slightly larger binary.

Reusing the function elsewhere

Using it from Python

Running this Python script results in the output shown in the first image of this post

The ripped function can now be reliably accessed from other sources as a command line tool. In Python3 with the subprocess module, this one-liner runs the binary, then reads and returns the standard output as a string:

subprocess.run(['your_binary.exe', 'your_parameters'], stdout=subprocess.PIPE).stdout.decode('utf-8')

The performance is could be better, with each call taking about 9.36ms on my 2012 MacbookPro. It is likely that most of the delay stems from the process creation overhead. However, the goal of ripping and reusing a PE’s function elsewhere has been achieved! ^_^


If you are working with Sublime Text, MasmAssembly is a nifty package to add code highlighting to MASM-formatted files

Nerd snipe: If you can figure out what’s going on in the decryption function in the earlier ASM file, please let me know!