Funny thing about things like that is that you can likely write tools to automat...

3eed · on June 18, 2020

I'm gonna write about this in pt. 2. Basically you can use symbolic execution to recover the CFG[1] (using something like miasm), you can eliminate dead code, restore dynamic lib calls with an emulation, and whatever else. But the point is that it would take an incredible amount of work and co-operation between tools, and then you wouldn't have even begun understanding anything about the binary, which is a whole another story. Now there's a kind of a little shortcut to all of this, which when combined with a couple of tools, you'd be able to make sense of things in this binary, which I'm gonna reveal in my next post.

[1]: https://blog.quarkslab.com/deobfuscation-recovering-an-ollvm...

novaleaf · on June 18, 2020

awesome write up, really engaging! I enjoy the cliff hanger at the last line... "one strange trick"....

3eed · on June 18, 2020

"Evan Spiegel Hates this Trick!"

imtringued · on June 18, 2020

Most obfuscation techniques are lossy. You lose information such as project structure, names of files, data types, variable names and so on. Decompilation and deobfuscation might give you a shadow of the original source code but the benefits are overstated because the advantages over working directly with assembly code aren't that big. Most of the time is spent finding the dozen relevant functions out of 10000. If you truly need access to the entire source code your time is better spent on an opensource project.

MauranKilom · on June 18, 2020

> You lose information such as project structure, names of files, data types, variable names and so on.

You lose half of those by not having debugging symbols and the other half by stripping the binary. This is all lost during compilation already, not due to explicit obfuscation. If you've ever worked with a compiler that is mediocre at generating debug symbols, you'll know it's the compiler doing extra work that provides all these, not obfuscation that removes them.

3eed · on June 18, 2020

Couldn't agree more.

zelly · on June 18, 2020

That works if the obfuscating patterns are all straightforward like a regular grammar. But if it's not possible to distinguish an obfuscation from genuine code, that could quickly become intractable (NP).

saagarjha · on June 22, 2020

Generally obfuscated code is easy to spot, if not easy to reverse.

bluesign · on June 18, 2020

Very unlikely you can actually. It is kinda similar to why we cannot have the source of binary even if we know how the compiler works.

q3k · on June 18, 2020

We cannot have _the_ source, but we can have a good enough approximation of it, especially if a human is in the loop (see: commercial decompilation software like the Hex-Rays decompiler, Binary Ninja, and even Ghidra).

pfundstein · on June 18, 2020

The point is that we cannot automate reversing these obfuscation mechanisms the same way we cannot automate reversing a binary file to a higher level than assembly.

Andoryuuta · on June 18, 2020

This not quite true, especially with current state-of-the-art tools like Ghidra, IDA pro (with Hex-rays), etc.

In fact, Rolf Rolles wrote a wonderful guest post[1] for the Hex-Rays blog about automating the reversal of this exact obfusactor, though he wasnt aware of it's origins at the time.

[1]: https://www.hex-rays.com/blog/hex-rays-microcode-api-vs-obfu...

3eed · on June 18, 2020

All these are great programs, but none of them can understand that level of obfuscation so far. As stated in the post, both Ghidra and IDA interpret the very first block in any of the obfuscated functions, which ends with an indirect branch, as a complete function in and of its own. Because this is the usual case, indirect branches AKA tail calls terminate a function to start another, all with the same stack frame.

EDIT: also keep in mind the CFG isn't flattened here.

underdeserver · on June 18, 2020

I think the idea is that Ghidra's and IDA's plugin systems allow for manipulation of binaries at a level that allows writing deobfuscators over them.

QuickReply · on June 18, 2020

inside developer console:

Array.from(document.images).forEach(img => console.log(img.src= img.src.replace("http://hexblog.com", "https://hex-rays.com")));

to make the blog readable

underdeserver · on June 18, 2020

Exactly. Such tools are definitely possible, even if they rely on Ghidra or IDA's plugin systems.

What I like is the economics of the idea that one company can build an obfuscator, and then another company can build an anti-obfuscator which completely nullifies the value proposition of the first company.