Reverse engineering a custom CPU from a single program

Arathorn · on Sept 25, 2019

I spent a summer doing something very similar to this at a major military radio manufacturer in the mid 90s - it turned out that one of their product lines from the late 70s used an entirely custom 8-bit CPU for which the instruction set had somehow been entirely lost. However, they still had the firmware on a stack of EPROMs. So, the mission was to reverse engineer the old CPU to reimplement it on a modern DSP. Turned out that you can get surprisingly far based on a frequency analysis of things that look like opcodes ("let's assume it has two accumulator registers; that loading is the most common instruction; etc."), making some educated guesses about how the designer would have allocated the opcode bits, and then plonking a HP logic state analyser straight over the top of the 32-pin DIL to check the hypothesis. Fun times :)

nomadluap · on Sept 25, 2019

I'm curious as to why you call the package "DIL" instead of "DIP". I've never heard them called "DIL" before.

lintuxvi · on Sept 25, 2019

They're interchangeable depending on what literature you read. Funnily enough, like ATM machine, DIP package can be redundant, being equivalent to "dual in-line package package".

Arathorn · on Sept 26, 2019

I think I just picked this up from my father, who’s an old-school hardware hacker. So it’s either oldfashioned lingo (he’s been retired 20 years or so) or British or both :) Or perhaps I’m just confused, given I’m mainly a software guy.

woodrowbarlow · on Sept 25, 2019

i'm gonna guess it's because L and P are next to each other on the keyboard.

misterdoubt · on Sept 25, 2019

Good ol' DI;s

phire · on Sept 25, 2019

This is pretty much the same process we went though to reverse engineer the custom "VPU" instruction set for the co-processor that the Raspberry Pi's firmware runs on.

We used the publicly available bootcode.bin and loader.bin to RE most of the ISA before the Pis even started shipping, though there were some more obscure instructions that we weren't sure about until we could run our own code.

But when my pi did arrive, we knew enough to write a binary that would blink an LED on more-or-less the first attempt.

I guess the real lesson, custom ISAs are not a good form of security.

Someone · on Sept 25, 2019

I think that depends on how far you’re willing to go. For example, it doesn’t take many transistors to XOR every instruction byte with the least significant byte of the address it is read from before sending it to the CPU proper.

And that’s just the easy first version. XOR with a hash of the address, swap bits across entire cache lines when loading each cache line in the instruction cache, etc.

Such mechanisms even would be fairly secure if the attacker has access to the machine code of a JITter. The attacker would have to crack the encryption to understand what the JITter does.

earenndil · on Sept 25, 2019

Security by obscurity is, in general, not a great idea.

SomeOldThrow · on Sept 25, 2019

Depends on your needs and the threat profile.

dymk · on Sept 25, 2019

They're not supposed to be for security, they're to serve a specific functional purpose.

NotSammyHagar · on Sept 25, 2019

What's that purpose then for a secret instruction set?

mnd999 · on Sept 25, 2019

Probably so they can change it at will without having to worry about breaking other people’s code.

londons_explore · on Sept 25, 2019

Companies either want to be earning money from an API, or they want to keep it secret.

An internal API which people reverse engineer and use is just going to lead to hassles later when you want to change the API, when people start writing bots or abuse the API in ways you hadn't imagined, or expose bugs in the API the official client didn't.

Someone · on Sept 25, 2019

I haven’t heard it about CPUs, but worries about patents also could be part of the reason. That is/used to be a fairly common argument as to why GPU makers do not release hardware specs.

autoexec · on Sept 25, 2019

Is that because they know they are violating other people's patents and don't want to be caught or just that the patent system is such a nightmare that even though they didn't intentionally do it something they've done is probably infringing on another company's patents?

monocasa · on Sept 25, 2019

_9vzr · on Sept 25, 2019

Vendor lock-in for their IDE/toolchain. If the ISA isn't open, you're going to be paying the vendor for a license to compile new code.

ccurrens · on Sept 25, 2019

Cool! This reminds me a lot of this[0] where there was a specification for a custom VM in a newspaper along with a binary. It still blows me away when I read how they solved the puzzle.

[0] https://safiire.github.io/blog/2017/08/19/solving-danish-def...

jve · on Sept 25, 2019

I imagine that this would be a good way to find some of the best reverse-engineers in the world :)

wolfgke · on Sept 25, 2019

> I imagine that this would be a good way to find some of the best reverse-engineers in the world :)

Just a consideration: aren't the names of these persons "principally" known (at least if you are willing to do some investigations) if you are a company/government agency that has an interest in them?

lallysingh · on Sept 25, 2019

Nope. How would you find out who reverse engineers a lot and gets good at it?

Some names will be popular through fame or common channels, but you'll never get a full list. Especially RE when some of their activities aren't legal and they don't want to be found.

wolfgke · on Sept 25, 2019

> Especially RE when some of their activities aren't legal and they don't want to be found.

I have trouble believing that if you don't want to be found, you will participate in a reverse-engineering competition with your real-life identity.

lallysingh · on Sept 25, 2019

No you wouldn't. I was responding to this ideas that all the really skilled people are well known.

meithecatte · on Sept 25, 2019

I'd imagine that this is a good way to pique their interest. Then, when they succeed with solving the puzzle, they are less likely to forego the "prize" of working with the agency.

wolfgke · on Sept 25, 2019

I respect your reasoning, just a counterpoint:

I (and I know quite some programmers who think the same) really hate it when for piquing my interest some fancy puzzle/problem is presented, but the real work that is to be done has nothing to do with the marketing. I don't believe that such kind of "false advertising" is a smart way to retain talent.

If I wanted to attract talent, I would rather put some problems on the company website that are really related to problems that occur(ed) in such a job position to attract talent that exactly loves the kind of problems that likely does occur at the job position that I want to fill.

MrBuddyCasino · on Sept 25, 2019

That was impressive.

anyfoo · on Sept 25, 2019

Hah, I use that "resize the window until aligned/a pattern emerges" trick, too. If you think about it, humans' pattern recognition over vision works impressively well. I'm sure there is plenty of reason why we evolved that way, but the fact that you can take that ability and adopt it to something which is completely artificial and "unnatural" (file representations on a screen), completely without any conscious effort (you just resize the screen until you suddenly intuitively "perceive" a very abrupt and markant change), is amazing.

souprock · on Sept 25, 2019

That is fun with an FPGA bitstream. Used parts of the chip can look like a fluffy cloud.

mrfredward · on Sept 25, 2019

Resizing the window is my go to method for trying to figure out how many bytes per vertex there are in the meshes of old video games. I agree that human pattern recognition is impressive.

breakingcups · on Sept 25, 2019

I once used that by accident to find the encoding of a games' sprites. Due to a fluke I had it set to 27 characters wide and while scrolling through, I noticed an ASCII representation of a bike + rider that appeared in the game. Its outline was relatively easy to spot because transparent pixels were encoded as zeroes, so the background appeared in the readout as periods.

lainga · on Sept 25, 2019

It's almost Kasiski's test!

https://en.wikipedia.org/wiki/Kasiski_examination

zxcvgm · on Sept 25, 2019

I was following along OpenTechLab [0] as they tried to reverse engineer a real CPU used in HDMI repeaters. The instruction set was already partially reversed [1], but gaining code execution allowed a small stub to be written to infer more based on before & after register states.

[0] https://opentechlab.org.uk/

[1] https://github.com/v3l0c1r4pt0r/lkv-wiki/wiki/Instruction-Se...

v3l0c1r4pt0r · on Sept 25, 2019

Finally mentioned CPU turned out to be a core with OpenRISC architecture, but nevertheless it was quite interesting challenge. I did my part, based completely on source file, I was sure was compiled into binary (part of FreeRTOS), but, yes, possibility to execute code and observe the results allowed guy from opentechlab to achieve a lot more than I did.

archi42 · on Sept 25, 2019

As someone who sees a lot of different assemblers I really enjoyed this read. But the most important lesson learned for the next CTF: Just probe the PRNG and see if it is predictable :P

gp2000 · on Sept 25, 2019

Yes, enjoyable and good ideas. I feel like I've been on that "project" before. You do a bunch of work, dig into things and then discover a simple answer that, in hindsight, could have been applied immediately and made all the investigation irrelevant.

Usually out of pride or the sunk cost fallacy (or something like it) I'll convince myself there was no other way the problem was going to be solved. Either way the next time around I spend just a little bit longer trying to think of an easy way out.

saagarjha · on Sept 25, 2019

> We internally had a bunch other names for these things – I called them kibbles, and Zach called them hecs.

Being a CTF challenge, I'm surprised that they didn't settle on something decidedly more rude ;) I wonder if the organizers can release their assembler for the architecture, or a spec at least…

q3k · on Sept 25, 2019

The author has released their tooling: https://github.com/koriakin/cpuadventure/blob/master/README....

saagarjha · on Sept 25, 2019

Thanks!

djmips · on Sept 25, 2019

I'll contribute the obvious observation that game system emulators are reverse engineered in this way. Most of the time CPU specs are available but in some cases a weird custom DSP on a cart or other co processor needs to be figured out and this is the kind of puzzling that does that.

classified · on Sept 25, 2019

I once reverse-engineered the complete instruction set of the CPU in a pocket calculator with a built-in BASIC interpreter. That was fun. My disassembler and assembler BASIC programs are still functional today.

sq_ · on Sept 25, 2019

That sounds awesome. If you don’t mind, could you talk about what you did some more?

classified · on Sept 25, 2019

It's a Sharp PC-E500S. The BASIC has PEEK, POKE, and CALL (for running assembly) commands, which is all you need for a hacking orgy. I'm not sure you could even still get hold of that hardware today.

zests · on Sept 25, 2019

How did they even get the game running in the first place?

artemist · on Sept 25, 2019

There was a server that we socat'ed into. (I made some minor contributions to solving this problem on PPP)

woodrowbarlow · on Sept 25, 2019

the CTF prompt is fiction. they wrote a CPU emulator in about 300 lines of C!

https://github.com/koriakin/cpuadventure/blob/master/emu.c

vectorEQ · on Sept 25, 2019

very nice write up :D good job!