Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Reverse engineering a custom CPU from a single program (robertxiao.ca)
339 points by nneonneo on Sept 25, 2019 | hide | past | favorite | 47 comments


I spent a summer doing something very similar to this at a major military radio manufacturer in the mid 90s - it turned out that one of their product lines from the late 70s used an entirely custom 8-bit CPU for which the instruction set had somehow been entirely lost. However, they still had the firmware on a stack of EPROMs. So, the mission was to reverse engineer the old CPU to reimplement it on a modern DSP. Turned out that you can get surprisingly far based on a frequency analysis of things that look like opcodes ("let's assume it has two accumulator registers; that loading is the most common instruction; etc."), making some educated guesses about how the designer would have allocated the opcode bits, and then plonking a HP logic state analyser straight over the top of the 32-pin DIL to check the hypothesis. Fun times :)


I'm curious as to why you call the package "DIL" instead of "DIP". I've never heard them called "DIL" before.


They're interchangeable depending on what literature you read. Funnily enough, like ATM machine, DIP package can be redundant, being equivalent to "dual in-line package package".


I think I just picked this up from my father, who’s an old-school hardware hacker. So it’s either oldfashioned lingo (he’s been retired 20 years or so) or British or both :) Or perhaps I’m just confused, given I’m mainly a software guy.


i'm gonna guess it's because L and P are next to each other on the keyboard.


Good ol' DI;s


This is pretty much the same process we went though to reverse engineer the custom "VPU" instruction set for the co-processor that the Raspberry Pi's firmware runs on.

We used the publicly available bootcode.bin and loader.bin to RE most of the ISA before the Pis even started shipping, though there were some more obscure instructions that we weren't sure about until we could run our own code.

But when my pi did arrive, we knew enough to write a binary that would blink an LED on more-or-less the first attempt.

I guess the real lesson, custom ISAs are not a good form of security.


I think that depends on how far you’re willing to go. For example, it doesn’t take many transistors to XOR every instruction byte with the least significant byte of the address it is read from before sending it to the CPU proper.

And that’s just the easy first version. XOR with a hash of the address, swap bits across entire cache lines when loading each cache line in the instruction cache, etc.

Such mechanisms even would be fairly secure if the attacker has access to the machine code of a JITter. The attacker would have to crack the encryption to understand what the JITter does.


Security by obscurity is, in general, not a great idea.


Depends on your needs and the threat profile.


They're not supposed to be for security, they're to serve a specific functional purpose.


What's that purpose then for a secret instruction set?


Probably so they can change it at will without having to worry about breaking other people’s code.


Companies either want to be earning money from an API, or they want to keep it secret.

An internal API which people reverse engineer and use is just going to lead to hassles later when you want to change the API, when people start writing bots or abuse the API in ways you hadn't imagined, or expose bugs in the API the official client didn't.


I haven’t heard it about CPUs, but worries about patents also could be part of the reason. That is/used to be a fairly common argument as to why GPU makers do not release hardware specs.


Is that because they know they are violating other people's patents and don't want to be caught or just that the patent system is such a nightmare that even though they didn't intentionally do it something they've done is probably infringing on another company's patents?


Yes.


Vendor lock-in for their IDE/toolchain. If the ISA isn't open, you're going to be paying the vendor for a license to compile new code.


Cool! This reminds me a lot of this[0] where there was a specification for a custom VM in a newspaper along with a binary. It still blows me away when I read how they solved the puzzle.

[0] https://safiire.github.io/blog/2017/08/19/solving-danish-def...


I imagine that this would be a good way to find some of the best reverse-engineers in the world :)


> I imagine that this would be a good way to find some of the best reverse-engineers in the world :)

Just a consideration: aren't the names of these persons "principally" known (at least if you are willing to do some investigations) if you are a company/government agency that has an interest in them?


Nope. How would you find out who reverse engineers a lot and gets good at it?

Some names will be popular through fame or common channels, but you'll never get a full list. Especially RE when some of their activities aren't legal and they don't want to be found.


> Especially RE when some of their activities aren't legal and they don't want to be found.

I have trouble believing that if you don't want to be found, you will participate in a reverse-engineering competition with your real-life identity.


No you wouldn't. I was responding to this ideas that all the really skilled people are well known.


I'd imagine that this is a good way to pique their interest. Then, when they succeed with solving the puzzle, they are less likely to forego the "prize" of working with the agency.


I respect your reasoning, just a counterpoint:

I (and I know quite some programmers who think the same) really hate it when for piquing my interest some fancy puzzle/problem is presented, but the real work that is to be done has nothing to do with the marketing. I don't believe that such kind of "false advertising" is a smart way to retain talent.

If I wanted to attract talent, I would rather put some problems on the company website that are really related to problems that occur(ed) in such a job position to attract talent that exactly loves the kind of problems that likely does occur at the job position that I want to fill.


That was impressive.


Hah, I use that "resize the window until aligned/a pattern emerges" trick, too. If you think about it, humans' pattern recognition over vision works impressively well. I'm sure there is plenty of reason why we evolved that way, but the fact that you can take that ability and adopt it to something which is completely artificial and "unnatural" (file representations on a screen), completely without any conscious effort (you just resize the screen until you suddenly intuitively "perceive" a very abrupt and markant change), is amazing.


That is fun with an FPGA bitstream. Used parts of the chip can look like a fluffy cloud.


Resizing the window is my go to method for trying to figure out how many bytes per vertex there are in the meshes of old video games. I agree that human pattern recognition is impressive.


I once used that by accident to find the encoding of a games' sprites. Due to a fluke I had it set to 27 characters wide and while scrolling through, I noticed an ASCII representation of a bike + rider that appeared in the game. Its outline was relatively easy to spot because transparent pixels were encoded as zeroes, so the background appeared in the readout as periods.



I was following along OpenTechLab [0] as they tried to reverse engineer a real CPU used in HDMI repeaters. The instruction set was already partially reversed [1], but gaining code execution allowed a small stub to be written to infer more based on before & after register states.

[0] https://opentechlab.org.uk/

[1] https://github.com/v3l0c1r4pt0r/lkv-wiki/wiki/Instruction-Se...


Finally mentioned CPU turned out to be a core with OpenRISC architecture, but nevertheless it was quite interesting challenge. I did my part, based completely on source file, I was sure was compiled into binary (part of FreeRTOS), but, yes, possibility to execute code and observe the results allowed guy from opentechlab to achieve a lot more than I did.


As someone who sees a lot of different assemblers I really enjoyed this read. But the most important lesson learned for the next CTF: Just probe the PRNG and see if it is predictable :P


Yes, enjoyable and good ideas. I feel like I've been on that "project" before. You do a bunch of work, dig into things and then discover a simple answer that, in hindsight, could have been applied immediately and made all the investigation irrelevant.

Usually out of pride or the sunk cost fallacy (or something like it) I'll convince myself there was no other way the problem was going to be solved. Either way the next time around I spend just a little bit longer trying to think of an easy way out.


> We internally had a bunch other names for these things – I called them kibbles, and Zach called them hecs.

Being a CTF challenge, I'm surprised that they didn't settle on something decidedly more rude ;) I wonder if the organizers can release their assembler for the architecture, or a spec at least…



Thanks!


I'll contribute the obvious observation that game system emulators are reverse engineered in this way. Most of the time CPU specs are available but in some cases a weird custom DSP on a cart or other co processor needs to be figured out and this is the kind of puzzling that does that.


I once reverse-engineered the complete instruction set of the CPU in a pocket calculator with a built-in BASIC interpreter. That was fun. My disassembler and assembler BASIC programs are still functional today.


That sounds awesome. If you don’t mind, could you talk about what you did some more?


It's a Sharp PC-E500S. The BASIC has PEEK, POKE, and CALL (for running assembly) commands, which is all you need for a hacking orgy. I'm not sure you could even still get hold of that hardware today.


How did they even get the game running in the first place?


There was a server that we socat'ed into. (I made some minor contributions to solving this problem on PPP)


the CTF prompt is fiction. they wrote a CPU emulator in about 300 lines of C!

https://github.com/koriakin/cpuadventure/blob/master/emu.c


very nice write up :D good job!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: