I never took a compilers course in college and came to regret that. Ten years later I started reading the Dragon Book, and realized how I could trivially modify the NFA regular expression search algorithm to solve a problem I’d seen in digital forensics and eDiscovery. The writing is extremely clear and easy to follow, and I now tell all interns to be sure to take a compilers course before they graduate. Well-earned!
It was just a few days ago when I mentioned how often I see the dragon book held up as a punching bag. It seems that at some point the book, along with K&R, have become the target for unsubstantiated criticism. I've pressed for details, and I've seen others do the same, but it never ends in anything concrete—just an implicit appeal to the new conventional wisdom that they do more harm than good. Reminds me of how "truth" gets synthesized on Reddit.
Here's a counterexample: Russ Cox has a series on regular expressions, beginning with Regular Expression Matching Can Be Simple And Fast <https://swtch.com/~rsc/regexp/regexp1.html>. In the last several years, it has become very well-received. In it, he lays out the case for Thompson NFAs, and also laments in some places about how the approach is not more widely known, having become a sort of lost knowledge. (See also Jonathan Blow's <https://www.youtube.com/watch?v=ZSRHeXYDLko>.) For all the criticism of the dragon book, though, you'd think that would mean enough people have read it that someone would have pointed out that Thompson is cited in Chapter 3, where his approach is described in some detail.
I think that what's actually going on is that, like The Art Of Computer Programming, the dragon book is widely referenced and rarely read. Now, enter the 21st century meme factory.
Having said that, the most recent comment I saw that was being casually dismissive of the dragon book—ironically for its "second-rate methods" that have "held back" compiler development—did motivate me into digging the book out, which itself led to me to spending an hour or so chasing down the source of McCarthy's "ho, ho, you're confusing theory with practice" quip, since the origin story of Lisp's eval is also told in Chapter 11 of the dragon book. <https://hypothes.is/a/pQs39pAhEeupnAuO5qfOeQ>
As others noted, the Dragon Book spends far too much time on parsing, and not nearly enough on optimizations. To be fair, the first edition of the book is from 1986. It looks like the 2nd edition (2006) adds 3 chapters on data flow analysis, parallelization. I don't happen to have that version of the book so I don't know how good the material is.
I took compilers in grad school in the early 90's. Even then, we mostly skipped parsing because that was considered covered in another class (Theory of Computation, a class that covered regular expression, finite automata, Turing machines, P vs NP, those sorts of topics).
The professor spent as much time as he could on optimizations, since that was his research focus. He then went onto write his own compiler book (Engineering a Compiler; Cooper & Torczon) so you can compare the table of contents (https://www.elsevier.com/books/engineering-a-compiler/cooper...) and see what current compiler researchers feel are important topics to cover in a textbook.
Not throwing the Dragon Book under the bus, but it probably is in the "cited widely but rarely read" category as you have noted, just from its position as being first/early.
An anecdote from me - I had the occasion to write a parser for a work project a few years ago. Rather than reach for flex/bison or any theory I had from compiler courses... I went with the new-to-me method of a parser combinator. This was awesome and I might even describe as fun. Granted the target language I was parsing was much simpler than a programming language, but I hope that is covered in today's compilers courses. I remember really tedious parsing homework problems back in the day.
> As others noted, the Dragon Book spends far too much time on parsing, and not nearly enough on optimizations.
If you actually want to specialize in writing compilers, that may be true.
But most people who study compilers don't write compilers. Instead, the time they spend studying compilers will help them to better understand programming language design, which will make them more effective at programming anything.
And for that, it is far better to spend lots of time on parsing than on esoteric compiler optimizations.'
> (Engineering a Compiler; Cooper & Torczon)
I bought that book, but didn't get much out of it because it concentrates on compiler optimizations and seems to be truly targeted for an audience of compiler specialists. Instead, I chose Programming Language Pragmatics by Michael L. Scott, and went through nearly the entire thing with a study group over about 10 months.
The D programming language design is based in part on my experience with designing and building compiler optimizers, including the first data flow analysis C compiler for DOS.
It's why D has, for example, immutable types rather than just const types. (You can't optimize C const pointers because another mutable reference to the same value can change it.) D's contracts are also designed with an eye towards enabling optimizations.
> Instead, the time they spend studying compilers will help them to better understand programming language design, which will make them more effective at programming anything.
What makes you think that? especially 2nd part, but actually both
I can confirm that the second edition gives a quite-good treatment of data flow analysis. Not as thorough as you'd find in a proper program analysis book (where Nielson is the usual choice, though the recent Rival/Yi book on abstract interpretation is fantastic IMO) but it's well-written and at around the right level of detail for a compilers course.
Source: PL PhD candidate, used (parts of) it to teach a graduate compilers course.
> To be fair, the first edition of the book is from 1986.
The first edition of Compilers: Principles, Techniques, and Tools* is from 1986. The first edition of the dragon book wasn't Compilers, it was Principles of Compiler Design (1977). The 2nd edition of Compilers is the 3rd dragon book.
My impression (it's been a very long time since my compilers class) was that it didn't spend too much time on parsing; rather, it spent too much time on how to implement parser generators. I've spent a lot of time parsing things in my life, but rather much less on the details of LR parsers.
I was doing compiler design back in the late 70s/early 80s and writing your own compiler generator was quite common at the time - it wasn't until things like bison/yacc became available which not only did an excellent job but also allowed one to hack on them and/or the result (for example my Algol68 compiler needed to be able to be able to decide how to resolve some shift/reduce conflicts on the fly ... that's how you do changeable operator precedence)
Well, it was basically the only book out at that time. I remember a few others in the 1990s - Compiler Design in C; Holub, and later after I graduated Modern Compiler Implementation in C; Appel).
Hey something cool I found while googling - Holub's book is available as a free download from the author! (https://holub.com/compiler/)
The Dragon Book has flaws but it is also the case where the pioneer catches all the arrows. But the OP asked about unsubstantiated criticisms of the book so I added in the one I remember from my courses - mostly too much on parsing, not enough on everything else. The 2nd compiler course I took didn't even use a textbook, Dragon book or otherwise; it was a handful of papers and a semester-long project on SSA and associated documentation.
I took a professional development course in C from Alan Holub in Berkeley that I think was the base material from that book, ages ago. I can say that a room-full of talented people worked really, really hard to keep up the pace in that course. Like some other classes at UC Berkeley proper, surviving the course says something about you! It was all in portable C, and rigorous.
I took a course in compilers from Hennessy and Ullman at Stanford in the 80's. The course material was a series of papers written by them on various techniques, including a lot of data flow stuff.
I've thought this before, but I think it's a shame that parsing is lumped in with compiler courses. Parsing is such a huge subject, and I'd say that the Dragon Book only lightly touches on parsing (because it's a compiler book afterall).
CS course designers of the future - separate compiler courses into two separate subjects 1. Compilers, and 2. Parsing!
> I've pressed for details, and I've seen others do the same, but it never ends in anything concrete—just an implicit appeal to the new conventional wisdom that they do more harm than good.
Really? Merits of the criticism aside (I don't know anything about compilers, so I'm not capable of judging), the received wisdom seems to be that the Dragon book spends far too much time on parsing and not enough time on other, more important aspects of modern compiler development.
Exactly. I don't have the book in front of me, but I remember its typechecking section being hilariously short and imperative focused. Also in general the book doesn't focus on implementation at all. That's okay in some respects; it's probably stayed more relevant and useful due to not talking about implementation. But it makes it possible to read a bunch of the book and still have no clue how to write code to build a compiler.
I took a compilers class around the dragon book. I think the valid criticism is the sequence of the book. An undergraduate compiler class can only cover perhaps half the book.
IMHO, the number of students that would benefit from learning about AST transformation and code gen vastly outnumbers the folks that would benefit from learning about finite automata transformation (and I took a dedicated course in that).
There simply aren’t that many folks working on regex engines.
I second this based on my experience. My masters thesis was in code generation for Haskell where in I spent fair bit of time learning tree pattern matching, AST manipulation, register allocation etc. One can achieve a non trivial speed up by understanding these different phases of code generation.
Another reason, in the context of compilers, is that there’s not much to be gained by micro-optimizing lexing and parsing. After all, a code is compiled just once and executed multiple times. So you know where much of the fun lies. The undergrads unfortunately usually miss out on the meatier part of the compiler course.
It used to be the case that lexing was the hottest part of a compiler, because it had to touch every byte of source and subsequent phases did not have so much data to crunch. But that stopped being true in the 1990s as optimizing compilers became mainstream technology.
> There simply aren’t that many folks working on regex engines.
You'd be surprised: there's even a dedicated annual international conference on implementation of automata that has been going since 1996: <https://link.springer.com/conference/wia>
There are constantly new algorithms, libraries, tools and applications around regular languages and regular relations (an extension of regular expression that is more elegant, symmetric, expressive and takes a lot of the ugliness of regexes out). Simplifying somewhat, finite state transducers are NFAs "with output".
> An undergraduate compiler class can only cover perhaps half the book.
When I took a compilers course that used this book, the professor had his own syllabus and would use the dragon book in support of his plan. We jumped around quite a bit and certainly didn't use all of the book.
That course was one of my favorites and had a huge impact on me. I haven't written any compilers, but have used a lot of the concepts (especially state machines) over the years.
> There simply aren’t that many folks working on regex engines.
Yes that's true but there are A LOT of leetcode/hankerrank problems that are trivially solvable when one has a solid grasp of DFA and state machines in general!
They are good books. (I started the first one when writing a compiler).
The valid criticisms of them relate to improvements in parsing, and in optimization, which has progressed significantly in the years that they were written.
Fun fact: On the compiler we wrote, Jeanne Musinski PhD, a student of Ullman was on our team. She wrote the parser, and actually discovered an improvement in error recovery during our project and published a paper after conferring with Ullman.
I also see this sort of commentary regarding lex and yacc, the programs used in the dragon book. I always muse that the commentators probably have no idea how much software they currently use that, unbeknownst to them, has lex/flex and/or yacc/bison dependencies.
>I think that what's actually going on is that, like The Art Of Computer Programming, the dragon book is widely referenced and rarely read. Now, enter the 21st century meme factory.
In my opinion this book is "hard to read" in the sense it's dense in math formality
There's a lot of good concepts but I bet you could write it times shorter.
Although I really enjoyed compilers in college I never thought I would build one for practical reasons.
I ended up using the knowledge for two very interesting projects. One was a template engine in Java for turning variables into clauses in SQL queries. The other was an interpreter for PL/SQL that helps to extract procedures and functions in order to do partial updates of packages in production (the user chooses which procedures to send to production and doesn't have to update the whole package).
My first job had me write a biology protocol description to robot commands compiler. It wasn't Turing complete, but it did need to allocate and assign "registers", which were the physical spaces available for the 96 well sample plates.
Agreed, of all the CS classes I took in college, the compilers class has been the most useful in my career. Still have the copy of the "dragon book" from that class.
Weird, I remember the book as being too dense and cryptic without sufficient explanatory text for a compilers course when I originally read it...maybe I am mis-remembering. I will have to find a copy and take another look.
Contrary to the opinion of many HN posters, I actually believe the front-end (lexing, parsing, semantic analysis) is more important to most software engineers.
Consider: You don't get to work on the back-end for most of the careers out there. But Lexing and parsing are very valuable if you want to parse something, say a configuration file. These kinds of things are more often seen in day-in-day work than code generation and optimization.
I would recommend most software engineers to avoid inventing a new configuration or other DSL and the parser/lexer for it. This can easily lead to hard-to-debug program and long-term technical debt. Always research for existing and well-tested solutions first (even JSON!).
Even if you don't invent your language, you can avoid writing a low-level parser/lexer by using a higher-level format, like context-free grammar (see Lark https://github.com/lark-parser/lark). Define and maintain a grammar is much easier.
I agree, partially. I also think that the front end of a compiler is more broadly useful skill than the back end stuff. However, I believe that most people would be better served knowing how to work with a recursive descent parser instead of a bottom up parser generator, which is something that the dragon book spends a lot of time on. For this reason, one of my favorite compiler books is munificent's Crafting Interpreters.
I'm actually reading that book. I'm going back to university to study CS and I have a course about Java which is exactly the language he uses in the first part of the book. I think this would be a good exercise for learning the language as well as learning lexing/parsing.
I was blocked by the BNF expression in section 6.1. Not exactly sure what I didn't understand, but overall I felt like I could only copy what he says but not grasp the underlying rules. Maybe I need some primer on BNF first.
I think they should split university courses - a course about parsing (no need to even mention compilers) and a course about compilers that starts from an AST. Many the former a requirement and the latter an elective.
It gives you an AST you can assume is valid and has you do the work to rename variables and methods. Then it gives you an AST and has you develop your target code (in the case of the specific class, LLVM IR). Then the AST is checked for valid semantics. Then the course has you check a source program for syntactic validity and generate an AST.
It gives people the advantage of understanding what the AST is for on a deeper level before deciding how to represent it. I think this sort of class and a separate parsing class could be stellar together.
My university did that... sort of. There was a required "Theory of Computation" course using the Ullman & Hopcroft textbook "Introduction to Automata Theory, Languages, and Computation". A newer textbook could also be Sipser's "Introduction to the Theory of Computation".
Aho and Ullman has a great book about Parsing, Translating and Compiling that I'm currently reading, it's not a either or thing - you
can learn about both and they are related to each other
I agree that lexing and parsing is probably more useful day to day, but for those use cases, I think a approach like parser combinators will be much easier than the formal approach taken in a compilers course or book.
I wholeheartedly disagree. Compiler engineering is ultimately about taking data in one representation and converting to another, in practice much of the job is figuring out what data you need to have access to in order to perform a transformation and test that it is sound - the plumbing is usally uninteresting and straightforward.
Code generation and optimization are just the task you're trying to complete, the actual implementation of it is not so different from most SWE grunt work. But many of the abstractions you build when trying to do it are useful to understand and use in practice everywhere you right software.
If you've ever worked on an application that ingests data and spits out meaningful output, you've written a compiler. Failing to understand the shared abstractions is not good engineering.
"Jargon", perhaps. "Never been in use by authors"? Not so. The current "international economy edition" of Aho, et al uses "lexing" 29 times, according to the search function on Amazon. Flex & Bison by John Levine (O'Reilly and Associates) uses "lexing" 38 times. And so on.
The book "The AWK Programming Language" by Aho, Weinberger and Kernighan is a real gem. Chapter 6 is "Little Languages", it shows you how to write interpreters and compilers using AWK. In particular it even shows how to start a bootstrap: you write a compiler to a subset of AWK that writes down C code. You don't need to read all the 5 chapters before, only Chapter 1 is enough.
Generally when a compiler produces C code (or assembly, or whatever) it is described as emitting C code. "A compiler for a subset of AWK that emits C code"
Aho was my CS Theory professor at undergrad. He was incredibly polite and in my opinion, quite compassionate towards his beleaguered CS students. Cheers to him and Ullman.
Same. Aho was one of the nicest professors of my undergrad studies. He was also one of the few who actually taught well, and who placed a heavy emphasis on understanding the basics and then being able to build on top of that.
Alfred Aho taught my Software Engineering I and II courses at Columbia University when I was an undergrad, and they were my favorite Computer Science courses. He was a great, compassionate teacher, with a mix of theoretical and real world instruction. Really a highlight for me. Glad to hear he won the Turing Award.
I've amusingly noticed that my 1988 edition of the book does not mention tail calls whatsoever, except in one early chapter where it is in relation to the implementation of some parsing.
Index: [...] Tail recursion 52-53
On pages 52-53, they show how, in a token matching procedure that takes no arguments, and returns nothing, the recursive self call can be manually replaced by a backwards goto: "[w]e can speed up a program by replacing tail recursion by iteration. For a procedure without parameters, a tail-recursive call can be simply replaced by a jump to the beginning of the procedure."
There is no mention about the possibility of compilers doing this for you, or that there are other reasons for tail calls, like eliminating stack growth. The rest of the book is mum about tail calls.
One of the enduring memories of bumbling through my first few months as an undergrad was coming upon a torn, well-read copy of the Dragon Book in the institute library. To date, it remains one of the best pieces of CS literature I have read.
Seven years down the line, it feels strangely warm to read those names again.
The Dragon book is great! That book opened my mind onto how compilers "do it" when I was a starting college.
That being said, their book on Data Structures and Algorithms is severely underrated. It doesn't go in depth like more classical texts like Cormen and doesn't condense infinite knowledge in each paragraph like Knuth does, but it's a very nice read that goes from simple data structures to secondary memory and it's really easy to read.
I've been out of college for a looking time. I'm not a particularly nostalgic person, but there are a couple books I still have from college, and the Dragon Book is still one I like to read.
What's today's equivalent of the "Dragon book"? I guess it should cover new stuff such as JIT compilation, concurrent garbage collection, optimizations that are part of LLVM, and modern type systems.
Can someone recommend me a good online course in automata theory (or even a book). I tried Ullman's course on edX (or Coursera, don't remember). But I found it a bit difficult to follow -- seemed a bit dry and monotonous, to me at least.
Had the same experience with Ullman's book (first edition at least). I have enough theoretical background in CS and am not averse to reading dense material.
Sipser has a lot less notation and more english explanations of the concepts. I picked it up and read most of it after graduating - it's pretty easy to follow (though if I recall, I think some of the terminology around Turing complete languages differed slightly from the Ullman text).
I’ve deeply enjoyed his videos and after the course in which we treated the material, regexes were all of a sudden extremely obvious. I benefit from it almost daily.
Accompanying book I would recommend: Introduction to the theory of Computing by Michael Sipser.
Credits go to TU Delft professors for architecting the course.
That was my thought too... I didn’t do computers course so never had the joy of reading the dragon book... but Data Structures & Algorithms was the text book for second year CompSci...
A PhD advisor is the academic who guides your PhD work. Sergey Brin is a co-founder of Google. They're pointing out how connected many people in computer science.
> They're pointing out how connected many people in computer science.
Your last sentence is mostly relevant to my point.
The parent shouldn't be easily blinded and in awe of listing just only 'one example' and then proclaiming they're 'recognising greats'.
Let's have more examples of other 'greats' connected to or even advised by Jeffrey Ullman rather than mentioning only one 'great'. Even if it is the co-founder of Google.
For teaching (and indeed writing) compilers, I greatly prefer Niklaus Wirth's simple recursive descent approach (and associated texts) to the encyclopedic bulk and complexity of the Dragon book!
However, I like Automata Theory, Languages and Computation. It's a book that has grown on me over the years.
As an aside, I think it's kind of interesting and cool to give the Turing award this year for writing textbooks! It's a type of deep scholarship that seems underappreciated and rarely rewarded.
My "Dragon Book" story is I went for a haircut in a pretty high-end salon. As a guy who was used to getting cheap haircuts in bargain places, I was feeling very out of place, which only became worse when they seated me next to a very glamourous-looking woman who was having a bunch of different highlighting and colouring things done to her hair.
Then I noticed the book she was reading - the dragon book. We ended up having a fantastic conversation.
Is it possible in the future that incremental compilation methods will become basic compiler knowledge? I think it would be an amazing step forward if the default compiler design would always result in IDE-inspectable code. Language Server Protocol is amazing, but it's only half the story.
Back in the dark ages when 'Data Structures and Algorithms' was our state of the art textbook, we called the author trio of Aho, Hopcroft, and Ullman the 'Three Wise Men.' Glad to see Aho and Ullman get more serious recognition.
Good for them. The Dragon Book is a great compiler guide, despite some understandable criticisms I've heard about it not teaching blessing edge technique at the time. I'm using it right now for my compiler course, and it still holds up to a solid extent. Just out of curiosity though, why wasn't Sethi (co-author) included? Was he just an author and not involved in as much research as Aho and Ullman?
“Alfred Vaino Aho” this a the first time I looked it up in full (with the middle name) it seems like a Finnish or maybe Estonian name. I am wondering if he has Finnish or Estonian heritage.
thank you aho & ullman. your dragon book got me hooked & started a 40 year career in compilers/languages. first job, an algol68 compiler on burroughs hw back in the middle ages.
This is trolling, but as most of the HN comments here mention the dragon book and a careless reader may get the same impression, let me reply seriously anyway, and point out that the Turing award is not for that: https://awards.acm.org/about/2020-turing
It is for their pioneering research work in algorithms and theory related to compilers (some of which indeed went into some of their books later). Also, even if you consider only books, they wrote nine books, and neither of the two mentioned as most influential is "the dragon book". The first mentioned is the book by Aho, Hopcraft and Ullman: Design and Analysis of Computer Algorithms (1974)
> a classic in the field and was one of the most cited books in computer science research for more than a decade. It became the standard textbook for algorithms courses throughout the world when computer science was still an emerging field.
This predates other major algorithms textbooks like say Kleinberg and Tardos (2005), Skiena (1st ed 1997), CLRS (1st ed 1990), or Sedgewick (1st ed 1983). Easily the standard textbook for more than a decade (and still used in some universities; it's still in print in some countries).
> Principles of Compiler Design (1977)
This is the "green dragon book", not to be confused with Compilers: Principles, Techniques, and Tools (1986, 2nd ed 2006) aka the "red dragon book" and the one people usually mean by "dragon book". This book is not even mentioned in the award citation. (Their automata book was widely used too.)
So the idea that the award was given solely or even primarily for the dragon book seems entirely inaccurate. The Wikipedia pages on Aho and Ullman give some idea of their work: indexed grammars, nested-stack automata, egrep, fgrep / Aho-Corasick algorithm, the algorithms that went into yacc and lex, AWK (Aho), and "one of the founders of the field of database theory" (Ullman).
Just read some of his comments about Iran on his wiki. It shouldn’t impact the recognition of his achievements. Ironically though I was disheartened to see that a student reaching out to him was shot down because of his (valid) views on the Iranian government.
Here's a letter that Ullman signed (5th name up from the bottom) that seems to pretty explicitly condemn apartheid in South Africa [1]. The letter's main argument is that Israel is not an apartheid state (which I guess is something of a controversial topic) but overall the sentiment of the letter reads to me as "apartheid is bad".
How can Israel not be considered an apartheid state? You cannot simply dismiss the apartheid as “malicious propaganda” while condemning it. Unless of course you don’t mind what is happening but want to pretend like you are against it.