Autoconf is one of the absolutely hilarious things about UNIX. On the one hand, we've got people optimizing kernels down to the individual instructions (often doing very unsafe dirty C tricks underneath), sometimes with super-clunky and overly complex APIs as a result...and on the other hand you have all the shell script absolute nuttery like the behemoth heap of kludges that is autoconf. It's crazy to me the disconnect.
And, oh by the way, underneath? Shells have some of the absolutely bonkersly dumb parsers and interpreters; absolutely embarrassingly dumb stuff like aliases that can override keywords and change the parsing of shell script. The fact that some (most) shells interpret a shell script one line at a time, and that "built-ins" like the syntax for conditions in ifs looks like syntax but might be little binaries underneath (look up how "[" and "]" is handled in some shells--it might not be!).
What a wild irony that all the ickiest parts of UNIX--shells and scripts and autoconf and all that stringly-typed pipe stuff, ended up becoming (IMHO) the most reusable, pluggable programming environment yet devised...
Alan Perlis Epigram #9: "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures."
Unix settled for string as a data structure and it's little too low level. All Lisp programmers know that s-expressions would have been the right choice. JSON works also.
IMO a big reason Python is popular is because so many big seemingly unrelated librarues (AI things, scipy, Pillow, pandas, GDAL rasters, matplotlib, etc) all work with the "Numpy array" datatype and can therefore do things with each other's data.
Relatedly, this is why big companies (eg Google) all have internal standard interchange formats (eg Protos). It’s so every programmer and every service and every file stored can always be consumed or produced as needed.
Even smaller companies I’ve seen end up with an “interfaces” or “types” repo that contains definitions of every external object.
What's worse, with autotools, is that while maybe only one programmer in 100 knows how to do ./configure; make; make install, only one in 10,000 knows m4. It's one of those odd corners of the ecosystem like troff. It would be very easy to slip by me an exploit written in m4, or troff for that matter, because I've never paid any attention to either.
Yup, I call that the Perlis-Thompson Principle -- because Ken Thompson made a similar combinatorial argument about software composition: you should design it around "one thing".
Files had structure on pre-Unix OSes, but they don't on Unix, because it doesn't compose.
The "Uniform Interface Constraint" of REST is the same thing, and resembles a file system -- everything is GET / POST on URLs.
---
So the reason that shell/Unix and HTTP are so common is a MATHEMATICAL property of software growth.
How do you make a Zig program talk to a Mojo program? Probably with a byte stream.
What about a Clojure program and a Common Lisp program? Probably a byte stream. (Ironically, S-expressions have no commonly used "exterior" interchange format)
Every time a new language is introduced, I think "well there's another reason you're going to need a shell script".
---
The larger the system, the more heterogeneous it is. And software is larger now, which is why MORE GLUE is needed.
> What about a Clojure program and a Common Lisp program? Probably a byte stream. (Ironically, S-expressions have no commonly used "exterior" interchange format)
Probably because the details of S-expressions differ underneath the hood - a simple (foo ()) can mean a different thing on CL (where the second element is the symbol NIL) than on Scheme (where it is an empty list and not a symbol); also because Clojure introduces additional bracketing patterns, such as [] for arrays and {} for maps, which will confuse non-Clojures. And let's not talk about sexp variants of Lisp-Flavored Erlang.
Also I have no idea if anyone ever cared enough to implement cross-language software transactional memory that would work across a process boundary, to close the gap between a Clojure process and a Lisp process. It could work in theory, but someone would need to do it - either get paid to do it, or have enough mojo of their own to write and bulletprove such a thing.
And, since everyone serializes to JSON anyway, serializing to JSON is a decent enough choice for communicating across Lisps as well.
However the "exterior" extension seems obvious to me -- why hasn't anyone produced a distributed Lisp?
Well I guess Clojure/EDN is that, but nobody has produced a POLYGLOT system involving distributed Lisp. Common Lisp users probably don't use EDN very much, I gather.
---
I found this POSE exterior s-expression format, but it seems to have little interest from the Lisp community:
Common Lisp can read and write s-expressions from files. It can also compile source files and load files (both source and compiled).
The syntax for Common Lisp is standardized. Thus a lot of data may be shipped either in text or compiled files. Several implementations can also create snapshots of the runtime memory. A few domains use CL s-expressions (or a subset of that) as language syntax or as a data syntax.
Distributed Lisp applications exist, but may not use text to exchange data. For example there is a defined CORBA mapping for Common Lisp, which enabled Common Lisp software to take part in distributed software using a standard, which allows mixed language software.
How does Lisp Flavored Erlang talk to Common Lisp? Probably with a byte stream. The VMs are different.
I've noticed Erlang/Elixir also favor interior composition! Parsing is not idiomatic there -- they're actually BAD languages for parsing text. (Though they do have good affordances for parsing bytes)
They prefer to pass around Erlang terms, which are copied inside the VM.
I get that -- it's more convenient. But it's also a reason why "the rest of the world" has more code reuse -- e.g. Go talks to JavaScript talks to Python.
Most projects have multiple languages. Monoglot projects become polyglot projects when they grow bigger.
---
So I say
- Lisp prefers its own (interior) narrow waist
- Erlang prefers its own (interior) narrow waist -- even the Lisp one is second class!
- The rest of the world uses the (exterior) byte stream narrow waist, but they complain about it a lot :)
It wont. And stuff like Solaris and AIX still lives on and uses /bin/sh and has been incredibly slow to embrace even bash, and configure scripts still need to support them. Plus any change you make today won't be remotely universal until 10 years has passed.
And back in 2005-ish I thought for sure that we'd have standardized on perl as a replacement for bash scripting by now.
But if everyone on Linux and BSD uses bash/OSH, and Solaris AIX use /bin/sh, it will be a success. The latter platforms do not influence the rest of computing very much -- they lag behind.
---
The busybox shell and the FreeBSD shell have been gaining bash features lately -- which means they are gaining OSH features.
Because OSH is the most bash-compatible shell, by a mile. There's no other project like it.
---
I agree with the logic -- it is very hard to change languages.
It's like changing English to Esperanto. The inertia is incredible.
So you need to provide an upgrade path, and that's what Oils does. OSH is compatible, and YSH is a new design.
Sure but the problem is that Solaris/AIX will be very slow to actually install it, so you can't count on it being there, and those distros DO influence things like autoconf and configure scripts. And people who run them pay big money contracts to get those distros supported (one bank contract for support is worth much more than 10,000 linux users who don't pay anyone a dime). The inertia is very real, bigger than you even think, and you can't wave it off by suggesting they don't influence computing.
At any rate, you need to actually accomplish everyone on Linux and BSD having that available and having used it for the past 10 years first. And get the core distros switching over to using it so that the glue code in the distro is all YSH plus probably python and some legacy sh. Actually show me that.
And yeah, I think the analogy of trying to change English to Esperanto is about right. I learned about Esperanto in the mid-80s...
Shell is used A LOT -- for data science, machine learning, cloud, CI, putting together Linux distros / boostrapping, embedded systems, and other heterogeneous problems.
configure is one use case. And autoconf in particular will probably be around forever, regardless of the existence of Solaris/AIX -- for the simple reason that many old and important packages like coreutils use autoconf.
Probably nobody is going to rewrite the coreutils build system soon. (coreutils might be rewritten/obsolete before that happens!)
The good news is that OSH can run configure scripts as-is :) It's actually easy to run them, because they are meant to run on many different shells.
---
It's also not a binary yes/no thing. Cobol and Fortran aren't dead, but people choose to write new projects in different languages now, and that's good! I had an entire career without seeing any Cobol at all. Zero.
Likewise OSH can run POSIX configure scripts forever (and those same configure scripts can be run by /bin/sh and bash)
But I think lots of people will choose to write brand new YSH scripts as well.
The computing world is only getting more heterogeneous (in both time and space). Shell accomodates that, and Oils is designed to accomodate that.
The UNIX haters book happened for a reason, and sadly many of the complaints have hardly changed.
At least on cloud and mobile space, we have largely moved beyond that, in the age of serverless, managed containers and programming languages.
Smalltalk and Lisp based REPLs are just as programmable, even more so given the whole OS exposure, and is no wonder why Notebooks got adopted by the science community and are inspired by them.
Note that quite a lot of these problems would just go away if people would just stop trying to conform to some "portable" POSIX nonsense. There's no reason why shell scripting needs to be this painful but people keep doing it to themselves by restricting themselves to the smallest possible subset of features, the most ancient ones at that. The so called "bashisms" for example make everything so much easier and less error prone but people act like it's wrong to use those features. When you read stuff about shell scripting there's usually an entire section on avoiding the evil bashisms and torturing the shell script until it runs in the mythical POSIX shell as if it was some kind of virtuous thing to do, like bash is pushing you into temptation or something.
Just throw POSIX into the trash where it belongs. Then we can finally evolve past it.
Also, shellcheck is a wonderful tool. The script isn't done until shellcheck stops complaining.
> What a wild irony that all the ickiest parts of UNIX--shells and scripts and autoconf and all that stringly-typed pipe stuff, ended up becoming (IMHO) the most reusable, pluggable programming environment yet devised...
I increasingly believe any shell script over ~128 “symbols” should be re-written in a “real” programming language.
I’ll gladly take a slightly longer Python, Go, or Rust “script” over Bash hell every day of the week.
These days, apart from python, I end up rewriting a lot of bash in ansible as that’s what a lot of infrastructure engineers know and it’s somewhat decent at running the code.
> scripts and autoconf and all that stringly-typed pipe stuff, ended up becoming (IMHO) the most reusable, pluggable programming environment yet devised...
I mean yes, but also only because the rest of computing is stuck in the same era. Meanwhile I'm over here popping nix-shells having my nice exact same nushell on every machine I could possibly want it on.
IMO it's all symptoms of the same problems. Related, I don't know how people start new projects and use autotools. But then again I'd say the same about cmake. But then again I just yesterday read a nightmare Lobsters comment about the realities of meson at the edges. And then I just sigh, thankful that I'm a Rust hipster (and then deal with cmake at the edges all the same). Computers are fun!
You can pre-answer most autoconf sections using /etc/config.site, and a system distributor can create an /usr/share/config.site[1]. It's just really hard to get right, nobody uses it effectively.
Autoconf also has a config.cache[2], which it uses to store answers to reuse across multiple of the same checks and reruns of the configure script.
and if I understand correctly, any such /etc/config.site falls prey to one of the more famous N problems in computer science, since the only truly stable things one could put in any such file would be the host triple and byte order. Maybe if there was a systemwide hook into deb or rpm to regenerate it after any package updates it would solve the cache invalidation part
In my experience, config.cache fails to properly separate platform feature tests from build configuration options, so you need to delete config.cache if any build options change. It is extremely brittle and buggy. config.cache is definitely not suitable as a basis for for platforms to provide pre-cooked answers to platform feature tests.
In other words, it's not just as simple as "figure out how to compile things on this system, once, and then reuse that information in the future". Different projects can be compiled with different options, cross-compilation is a thing, etc. As soon as you try to solve this with a simple global `site_config.h` you discover that that doesn't quite work.
I use autotools for my C++ projects and I hate it (in my weak defense, I hate cmake more). config.cache helps (and ccache helps a lot, too; most of my ccache hits are on stupid autoconf test programs; ya gotta use ccache if you use autoconf), but rerunning configure is still s l o w.
Worse, automake and autoconf have a cyclic dependency, so if you're a developer (as opposed to a user building from source on a system) you wind up running configure all the time. This is because, most probably, your Makefile.am lists every source file individually. If you add a new source file, you must rerun autoreconf to regenerate your Makefile — and that also generates configure anew. Then the Makefile sees that configure is newer than config.h (or some other output, I don't want to look up the gory details) and, blammo!, when you run make, first configure runs.
Of course with git, this happens not just when you add a new source file to your project... it happens whenever you switch branches, where there's variance in the source files in the project. configure configure configure configure.
As much as I hate it, autoconf isn't the worst part of autotools. Automake is... pointless? Is it really less effort to write Makefile.am than to write a straight Makefile? Wouldn't that be simpler?
And then there's libtool, which should be drug out into a back alley and executed. A 2000 LoC shell script, whose whole purpose in life is to figure out, over and over and over again, stateless in the same way as TFA complains about with autoconf, precisely which compiler flags should be used to build a shared library on a system amongst all the old compilers, even though today everyone just uses gcc and clang, which are flag-compatible, and determining which flags to use is not really that hard. Please, someone, anyone, please kill libtool and banish it from this earth.
I tried Meson — ran into a problem, read the docs, asked a question on the mailing list, never got an answer, never got past it. Bazel's obviously a terrible Google project and won't work if you're not sufficiently Googly. Cmake, a disaster. I liked tup, but it seems to have died (and it didn't solve the autoconf problem anyway).
Which new build system will deliver us from the 18th circle of hell? (probably cargo, as the Rust fanbois work to get C/C++ deprecated...)
> If you add a new source file, you must rerun autoreconf to regenerate your Makefile
No you don't: automake generates rules for `Makefile.am -> Makefile.in` (`automake`) and `Makefile.in -> Makefile` (`config.status`). You should just be able to edit `Makefile.am` and run `make`.
> Is it really less effort to write Makefile.am than to write a straight Makefile? Wouldn't that be simpler?
I don't think so. Not if you want to get DESTDIR support right, and proper dependency tracking (dep info as a side-effect of compilation, so you get all the headers and don't slow down one-off builds), and support for all the standard Makefile targets, etc.
>I don't think so. Not if you want to get DESTDIR support right, and proper dependency tracking (dep info as a side-effect of compilation, so you get all the headers and don't slow down one-off builds), and support for all the standard Makefile targets, etc.
With any make, yes. With modern GNU make you can embed sh code inside a makefile with the define hack, so no, ugly, but no.
I'm be written lots of C code that runs on lots of architectures & platforms: PDPs, x86, ARM; Windows, BSD, Linux, VMS. My projects have spanned them gamut from tiny to gargantuan (millions of lines of code). I've never needed autoconf... what's it even for!?
Well, as archaic as it is, and as awful as its execution model is, autoconf is at least somewhat useful:
* determining availability of different syscalls (yay, printf)
* determining compiler
* verifying dependencies are installed and available (e.g., openssl)
This last one is made much easier with pkg-config and the pkg-config macros in autoconf. There's an argument to be made that pkg-config is bad because it will fail the build if you're just missing the wrong .pc file somewhere, even if your system satisfies the needs of the program - true enough. But pkg-config _does_ make checking for dependencies in autoconf much easier, and it does something essential for correctness, which is gathering up the necessary compilation and linker flags of your dependencies to pass along to the compiler so that you can use them correctly and match ABIs.
A good configure script will guide you through getting dependencies installed, which is much nicer for users than confronting them with compilation/linker errors.
In decades past, if you were developing software for Unix you would inevitably run across compatibility problems. Even with the POSIX standard, every Unix was trying new things, inventing new things, or just implementing the same thing in slightly different ways. So your programs had to have a bunch of macros to switch between different variations. Some API would have two arguments on Unix A but three on Unix B, so you would have to turn that api call into a macro that would insert the missing third argument or whatever. But now your users needed to know which configuration of your software to use. You could try documenting all the little knobs and switches and make them choose the correct values, but someone hit on a clever idea.
Just write a little program that tries calling the API with two arguments, and see if it compiles. If it does, you could automatically define THAT_API_HAS_TWO_ARGS and your code will do the right thing. If it fails to compile, then define something else.
So you start writing a shell script that will run through all of these tests and dump out a header containing all of these configuration choices. Call the script `configure` and the header `config.h` and you are good.
Except that actually writing a shell script that is really actually POSIX correct is pretty hard. And the code is pretty repetitive. Mistakes start to creep in, bashism break things when it’s actually run under `sh` and so on. And it turns out that all the little tools we rely on, like find and grep and sed and awk, are all different on different platforms too! Turns out that nice option you were passing to grep is just an extension, only present on your specific flavor of unix.
So the next clever idea was to write a program that generates the configure script. The repetitive bits can mostly be solved by text substitution, so someone had the bright idea to use M4. Thus autoconf was born.
Autoconf solved a lot of problems. It ensured that the configure script was actually POSIX correct. It did this even though the generated script was even more complex than ever, with extra logging and tracing and debug output that most people never look at, but which is actually super useful for debugging. The configure script you had written by hand didn’t have any of that!
And autoconf came with baked–in knowledge about all the things you might need to test for. Any time someone discovered some new difference between Unices, new tests were added to autoconf for it. Autoconf users hardly had to do any more work than to look up the name of the macro that it would define for you.
Of course, M4 causes actual brain damage in regular practitioners, and nobody cares to write software for any Unix except Linux these days. Even the BSDs are an afterthought. Technically OpenSolaris is still out there, with superior features like ZFS and Zones and whatnot, but that’s a lot of extra work. So autoconf is a dinosaur, solving problems that nobody really has any more.
Rachel is not wrong that it’s crazy to rerun the same configure script over and over, but really only developers have to do that. Most people just run the thing once when they install your software, and never see it again. And most people don't even do that any more, because they download binaries from their package manager most of the time. It’s just the distro packagers that actually run your configure script.
And she’s right that the answers configure gets should be cached system–wide. Autoconf does support that, but it’s harder to do correctly than she remembers. And by the time it was possible usage was already waning in practice.
It always seemed to me that most of these differences and macros could be defined in a static header file that has a million #ifdefs for different OS and compiler versions. That seems simpler and more reliable, and usable for cross compiling, than running dynamic checks with autoconf.
Hmmm. Nope. The file would be near unusable and contain a pile of things you don’t care about where autoconf can cope with the one or two cases you are trying to manage.
Despite the description above of the genesis of autoconf, at the time when there were dozens of nearly compatible unices, it was a key enabling technology for free software to be distributed and shouldn’t be discounted just because it’s a little hard to understand.
It was absolutely a boon to all developers who used it. If you didn’t use it you were doing things the hard way. Just don’t look too hard at the implementation, because like I said M4 causes actual brain damage.
That's a very nice summary. But you're missing one crucial point. These are GNU tools and if you use them within that ecosystem, you also often use gnulib.
Gnulib is effectively the GNU Portability Library. Your fancy, new, auto generated configure script can find all the differences, but someone still needs to account for them and write alternative code to support the various platforms. This is where gnulib comes into play. It reads all the configure checks and plugs in replacements / stubs for whatever is different. This allows you, the developer to simply target GNU/Linux in your code while gnulib handles everything else (and pretty automatically) making is portable across all the unices and even non UNIX systems.
Yeah, there’s a lot of software that uses autoconf without gnulib. It’s common to include a load of default platform tests in the configure script that are used by gnulib, but without gnulib the software never uses the results of the tests. All these futile tests are an appalling waste of time and energy.
This comment should really be promoted to the top level and pinned. I wanted to write something similar, but if I had, it would have been less detailed than this.
It seems more, then, that devs need to just move on from gross C (and C++?) at this point to modern mandates without all this insecure baggage and constant security issues from lack of memory management.
IMO Cmake's "killer app" is that it will generate projects for Visual Studio and Xcode, which allows you to get access to the Windows/Mac specific tooling that those products offer. It's also a LOT faster on Windows than Autotools because of how NTFS performs poorly when dealing with writing a ton of small files. I agree that its syntax is ugly but once I got beyond that I found it fairly simple to work with. The most annoying part was figuring out which MSVC flags corresponded with the GCC flags I use.
Out of curiosity, what does your C++ development workflow look like? Do you stick to using a text editor rather than an IDE?
It's kind of weird that these days, many (most?) projects use Cmake directly as their build tool. They don't generate a VC or Xcode project and then work from that; they generate a VC project and then immediately build it. The CMakeLists.txt effectively is the project definition. VS now has support for opening CMake projects directly, without creating a VS project file from them. So the thing that made CMake useful (and what also makes it complicated...), is now used less and less.
It also hides stuff so users look at simple, colourful lines with progress indicators instead of naked, overwhelming calls to the compiler.
That said, as a user I prefer autoconf. The semantics of --with-foo or --enable-bar are simple and well documented. And ./configure --help shows all the possibilities. Faffing around with -DCMAKE_STUFF is horrible. Ccmake should be nice but in practice never works properly.
As a developer, both autoconf and cmake are horrifying and a huge time sink to get them to behave.
We should just write Makefiles. If you want to be really nice, have it source a Makefile.inc file and keep all the user configuration stuff in there. If you want to be really really nice, consider accepting pull requests for .inc files that work with popular distros.
I just write (gnu) makefiles for my projects these days and invoke pkg-config to detect libraries and features. It's easy for me because I've been doing it this way for decades. But there are way too many footguns for me to recommend it to new developers (although I'm not sure what I would recommend instead either).
I think Make is exactly what you want, and I do recommend it to everybody, since the default alternative is usually something heinous like CMake which isn't really an improvement. You want the bit of logic to create a "build system" out of Make abstracted into a library, and then it's perfect. I use this: https://github.com/dkogan/mrbuild/ but there're many other ways to do it
What I want is a DAG, but with better syntax than make.
The closest I have come to ideal is a custom ninja build file generator, which is pretty simple. It's been easier for other people to maintain than make or even rake.
Makefiles are too brittle and allow all kinds of user errors. Targets with too many dependencies, too few dependencies, race conditions. The default doesn’t even handle transitive #include dependencies without you yourself plugging in complex .d-file generation using the compiler.
Are they though? The "complex .d-file generation" is "gcc -MMD" and a "-include *.d" at the end of the Makefile. You specify the dependencies, and it's on you to get those right, as it is in every other build system. I've never seen a race condition.
The problem with Make, is that it works just well-enough to allow people to sorta get it to work without requiring them to learn how to actually use the thing. Then they don't read the manual, and complain that Make doesn't, in fact, work.
Nice job, you just hit a footgun: try to remove any .h file (and the corresponding #include directive) and see your build just break. You need an empty `%.h: ;` directive for make to ignore missing .h files.
Make is full of this stuff and as a distro packager having to deal with each's individual quirks is prone to drive one insane. Please consider using meson, at least.
I do not believe that adding such rules for ignoring files is wise.
I prefer to just run "make clean" after I delete, rename or move source files.
With a properly written "Makefile" nothing else needs to be done and there will never be a broken build. With a properly written "Makefile" nothing has to be done when adding source files.
I don't know how you can think CMake is worse than autoconf. At least it has a declarative approach with good mechanisms for reusing common logic. Its much easier for IDEs/editors to work with it.
I don't think it's worse, per se, but I hate it/myself _more_ when I contemplate switching from autotools to cmake -- it's like upgrading from CP/M to DOS, when what I'd like to have is a Mac.
And as much as autotools blows, every time I try to build some project that uses cmake I inevitably go through some rigmarole making a build directory (not that out-of-source tree builds aren't the right thing, they are, but why make me create one?) and finding some obscure set of -D options to get the sucker to build on my system. configure && make still has cmake beat from a user's perspective.
Yeah it’s not intended to be used directly for end user builds. Its ultimate product is a set of makefiles (or Visual Studio project files) and the developer should distribute those.
libtool does serve a purpose: it causes plain 'make' to produce shell script wrappers as outputs, which is maybe kinda sorta vaguely useful when developing and utterly and completely obnoxious if you want an actual ELF file.
So you end up running 'make install' when you didn’t actually want to install or trying to remember how to find the actual ELF files. And you then wonder why 'make install' appears to be building things instead of just installing them, and you curse at libtool.
In its slight defense, the magic incantation to convince gcc (or, yikes, pre-gcc cc or ld or…) to competently produce a DSO is both nontrivial and platform dependent. Last time I tried to get all the options right off the top of my head, sensible defaults were not a thing.
Oh right, how could I forget that awful aspect of using libtool? I hate my life; maybe 2024 will be the year I find a decent build tool and the motivation to move projects to it.
Contra your slight defense, is it really so hard to produce a good DSO these days? -fPIC is half the battle and a couple other weird flags for macOS/darwin will do you. Cross-compiling for Windows, sure, a few more flags — but you look them up once on stackoverflow, you have three options, and you're done.
I always thought that purpose of libtool was to incorrectly guess that it was smarter than the developer in figuring how to link files together. When I've been trying non-default link magic stuff (something like lto), trying to figure out how to get libtool to actually pass the damn necessary linker flags to the linker was damn near impossible.
> Bazel's obviously a terrible Google project and won't work if you're not sufficiently Googly
I don’t even know what that means, sounds like a bunch of fud to me. C++ and Java are first-class languages in Bazel. It works great for these, then Go, then you start venturing into the alpha land
Works great if you follow Google's style of vendoring all of your dependencies. If you try to rely on third-party package managers like Maven or Ivy or whatever C++ uses, you're in for some serious pain. The more you venture into OSS tools like Spring, gRPC, containers, the more you have to incorporate shoddy and poorly-maintained third-party Bazel tools which inevitable stomp on each other in subtle ways.
My team abandoned Bazel for Gradle, and it has been a tremendous quality-of-life improvement.
> Works great if you follow Google's style of vendoring all of your dependencies. If you try to rely on third-party package managers like Maven or Ivy or whatever C++ uses, you're in for some serious pain.
It sounds like you've mainly dealt with Java because it's not a pain in C/C++ - if you wish you can link dynamically with system libs or vendor .so binaries. And at least for Maven I used rules_jvm_external which worked fine.
I’ll grant that container rules were a mess for a long time but new oci rules are much improved.
> My team abandoned Bazel for Gradle, and it has been a tremendous quality-of-life improvement.
I hear as much if not more gripes about gradle of the same variety - "hurr durr it sucks bc i don't like it". Personally I'm thinking this is more appropriate critique for an art piece not a build system.
I've done some C/C++ with Bazel before. It's nice, except that you end up having to vendor all your dependencies if you want repeatable builds :). Had more success sharing projects and building a community with CMake, horrible as that tool is.
The reason Gradle is a quality-of-life improvement is that we had boxed ourselves into a corner. rules_jvm_external, rules_docker, rules_proto, and a set of others all relied on internal details of other rules (accidentally, because pre-MODULE.bazel everything gets loaded into one giant global namespace and collisions happen). We basically got ourselves locked in, unable to move off of rules_docker, upgrade our JVM beyond 17, or even upgrade Spring, without breaking and having to upgrade everything else.
Gradle, which has first-party support for dependency management, and which is the primary system supported by large OSS vendors (Spring, etc), doesn't have these pitfalls. Poorly-written Gradle plugins can sink you as surely as Bazel packages can, but there is less opportunity for you to discover them.
I'm well aware that we were Doing Bazel Wrong, and I've certainly had my share of pain with Gradle, but never in my life have I seen so much time wasted on a build system. For plain old JVM stuff, Bazel didn't work well for us.
Portability. A test to get the answer to "what the fuck am I running on and what does it support" is more portable and robust than thousands of "flavours" manually configured in /etc/whatamieven.conf
The author misses that the buildtime magic for the xz exploit is not in the m4 file but in an obfuscated, compressed, encrypted, binary disguised as a test file that alters the build process at multiple stages (configure and build)
A better argument can be made that the act of compiling a binary / obfuscating / minifying code instead of interpreting code directly is the fault.
That means it's totally normal to ship all kinds of really crazy looking stuff, and so when someone noticed that and decided to use that as their mechanism for extracting some badness from a so-called "test file" that was actually laden with their binary code, is it so surprising that it happened? To me, it seems inevitable.
Yeah, no. The author is well aware of how and why autotools are not awesome but also with the background of why they exist.
A better argument can be made that the act of compiling a binary / obfuscating / minifying code instead of interpreting code directly is the fault.
I can’t decide if you’ve never worked in systems software or trying to be hyperbolic. Given that it’s HN I’ll assume the best. But who do you think would do the interpreting? The priests at Delphi?
configure is a mistake. Building a project shouldn’t generate input files that are dependent on the system state. That’s what C projects from the 90s do and it’s genuinely awful.
What is the alternative? Even if you're very organized and abstract away all the stuff into neat platform-specific modules, you'll still have those system-dependent inputs in the build system. No matter what happens it will still have to pick which of those modules to include in the build and it will most likely do so via some target variable whose default value is whatever it autodetects the host system as.
FWIW the sandbox-disabling bit in xz was in CMake logic, not autoconf. (Or at least the obvious one is in the CMake part; maybe the autoconf is separately backdoored.)
Yeah, this autotools thing is just a chance for people to whine about a favorite hated technology[1]. Clever attackers (and this was a very clever attacker) can hide stuff in anything. Anyone who thinks that CMake or Bazel or Python or whatever can't hide misbehavior is just fooling themselves.
[1] And even then it's more hatred of M4 than it is anything in autoconf per se.
Oh sure, and that's true enough. Working with autoconf[1] is a giant pain for Kids Today weaned on entirely different idioms[2]. And it's the kind of thing you can't really avoid[3] if you're going to be doing work on these traditional Linux packages. And that sucks, and it makes it look like Linux is a weird kingdom filled with dinosaurs. And that sucks because you still have to work in Linux. So you hate the proximate cause.
But the boring truth is that configuration management is just a hard problem and realistically you'd hate it anyway.
[1] Not so much automake/libtool, which are configured by straightforward idioms and mostly harmless.
[2] Though if you've got some C preprocessor fu, it doesn't look quite so alien.
[3] Though it also needs to be pointed out that 90% of the problem autoconf was originally intended to solve (gratuitously incompatible Unix variants) is basically gone now. You either write to straight ISO-C/POSIX if you're doing simple stuff, work through well-established abstraction libraries with fixed APIs, or you include well-known platform headers (Linux or otherwise) for the complicated bits. No more nonsense testing like "Hm... what header do I need for strncat()?"
I can't tell what the disabling element is - is it the variable rename from HAVE_LINUX_LANDLOCK_H to HAVE_LINUX_LANDLOCK? That seems the most suspicious to me, an uneducated bystander. Or, does the snippet just never compile?
One of the worst things about Autoconf is the way it is deployed in the GNU Project's own programs.
Typically, GNU project programs assume that anyone working on the program (cloning the repo, looking for bugs, making changes, ...) has the Autotools installed. Not just any Autotools but the exact version the program wants.
Only the "release tarball" if a GNU program contains the generated ./configure script that can run on a machine that doesn't have Autotools installed. You have to do some "make boostrap".
So to make a source code release, they can't just tag a code baseline and serve that; they have to generate the release source!
See the stupidity? The source code that is being released has to be compiled from the real source source.
Now this makes life hell if you have to, say, git bisect to find a commit that introduces a problem. To be sure you're building each bisect step correctly, you have to disclean everything and re-run the autoconf bootstrapping, then ./configure and so on. The old code you're checking out may call for different Autotools versions from the trunk head.
Everything you say is technically true, but I can't remember a time when just having the latest autotools from my distribution packages wasn't sufficient to build a GNU package.
That's either a lucky coincidence, or use of a pre-generated ./configure script, not Autotools.
If you're actually running the steps to regenerate the Makefile.in from Makefile.am, and configure from configure.ac, and all that, then you run into the version or version range checks.
For instance, I have working repo of GNU Make. In the generate Makefile there are lines like:
The generated script missing is a wrapper for the program such as automate-1.16, which prints an error if it is missing. This version requirement comes from the file boostrap.conf:
Does autotools really matter here? Did folks at the distros really audit the tainted package contents against the source repo and decide that the additional contents must be legitimate?
In all likelihood there's no process to do such an audit? Let's assume such a process did exist, forcing attackers to take a different strategy. Could the rain have hidden in plain sight in the source repo anyways?
autotools matter just as much a the obfuscated bash, the tarball, email-only workflows, it's a symptom. we settle for bad tools out of convenience/necessity (status quo bias, time pressure, perverse incentives - ie. it's a hobby project, free and open source, yet half the world depends on it, so half the world prefers it to remain free even if it remains amateurish).
Younger people may not remember one of the early alternatives, libiberty (now at least somewhat improved as gnulib), but as an idea it was even worse: if your system doesn't provide a GNU compatible function call, it wrote one that hijacked whatever your system did have. It's still bumbling around there in the internals of gcc, silently patching every system it builds on.
...But I like seeing hundreds of lines of "yes"es and success messages fly by at lightning speed before the inevitable compilation failure for the first line of actually incompatible (and non-trivial-to-fix) code. It makes me feel like I actually have a shot at compiling liboldafgnuglibxx3 from 2004 and that my efforts aren't futile.
Autoconf has never really made sense to me in the last twenty years or so. At what point does a Makefile-based project start thinking "hmm, we should upgrade to autoconf to solve this"? And what is "this" that they can't solve any other way? Seems like that point never occurs anymore. Either projects start with autoconf or they don't.
To me, the point is when something I write has tor each for BLAS or LAPACK and every single user seems to want it to work with a different library (ATLAS, OpenBLAS, MKL, Accelerate.framework, vanilla BLAS, cuBLAS)… Then, combine that with the various FFT libraries flavours, wherever the fuck hdf5 and NetCDF are, and which goddamned version of MPI they want to use. Not mentioning some computers where they insist on using Intel’s compilers. Or, dog saves us, Cray.
Getting this to work is a major pain, we cannot expect the users to figure it out themselves, and we cannot list all the possible combinations in a Makefile.
We tried several fancy build systems, and every single one breaks in more or less subtle ways.
autoconf is a mess for developers, but when it's working it is great for users who want to build software from source. Fantastic feedback and help at the command line to enable/disable features and dependencies. Much friendlier than cmake, at least.
> So, okay, fine, at some point it made sense to run programs to empirically determine what was supported on a given system.
That stopped making sense several decades ago, when FOSS began to be heavily used for embedded systems, where you are crosscompiling. Autoconf supports cross-compiling, but very badly. Programs not prepared with crosscompiling in mind will do stupid things at configure time that require a compiled program to be executed. For that reason, distros ended up using Qemu for cross builds.
> What I don't understand is why we kept running those stupid little shell snippets and little bits of C code over and over. It's like, okay, we established that this particular system does <library function foobar> with two args, not three. So why the hell are we constantly testing for it over and over?
Autoconf does have a cache system for this. I think it's only used within one package, not globally. There may be some knobs and levers to share autom4te.cache stuff between projects.
Even when the configuration is not cached for the purposes of the test itself, obviously the information is cached in the generated config.h and config.make type files; once the program is configured, each time you recompile it, it's not testing for whether foobar has two args, not three; it's just referring to HAVE_TWO_ARG_FOO or whatever.
There could be a global cache for autoconf tests, or at least the stanadard ones built into autoconf. Each test would have to have a unique ID, which changes whenever it changes between autoconf versions.
The cache would have to distinguish targets. Packages building for the build machine itself must not share configuration results with packages built for the target.
Even if it were done right in these regards, there is a risk of introducing bugs due to package build interaction through such a shared cache. A clean build of any package is not really clean if it refers to material from a previous build, possibly of another package. I can see distro builds opting out of something like this, and they would be the primary audience, since it is distro builds that are hurt by hundreds of packages executing essentially the same tests over again.
Autotools, for all its ugliness, solved a critical problem: getting GNU software up and going on a variety of Unix systems each of which supported a different mix of features.
But it's 2024, people. If you port to Linux, macOS, and Windows, that's cross-platform enough to capture 99% of your userbase, and everything after that is diminishing returns. Still running DEC Ultrix on Alpha, a CPU architecture that's been out of manufacture for literal decades? Maintain the support yourself if it's so important.
In conclusion, use cmake or Meson. Let autotools die. Kill them if you have to.
A huge part of the problem is relying on the system. Project repos should include their dependencies. The Linux model of global shared libraries is, imho, bad and wrong.
Building a project shouldn’t be complicated. There should be extremely minimal branching and if checks.
Ahh the eternal cycle of static vs dynamic linking swings around again.
"I need common.lib, it gets linked into my application at compile-time"
"why does every binary include this common.lib code? Let's pull that out to a shared module so we only have one version of it! See? that saved 35739475749Kb of storage! Everything is smaller and sleeker now!"
"which version of common.lib does this application need? Hmm, but that other application needs an older version. I'll have to hack some way of having multiple versions of the common.lib available"
"ahh, this application needs version 5.2.1 but that dependency needs version 4.11.6! Two incompatible versions in the same application! Who designed this thing?"
"right, let's compile whatever versions of common.lib we need into our binary. It'll make life easier and to be honest there are so many versions of common.lib around that we might as well have a separate version for every application"
"Ah, small problem, I have to compile a binary for my application for each variant of each platform because common.lib actually varies per platform... hmm, wonder if I can link to the system version of that lib at run-time..."
> Ahh the eternal cycle of static vs dynamic linking swings around again.
Somewhat. Although it's agnostic to static vs dynamic linking. The philosophy goes a bit deeper than that.
> See? that saved 35739475749Kb of storage!
Storage capacity hasn't been relevant for over a decade.
Besides, we foolishly replaced "use shared modules to save storage space" with "build multi-gigabyte docker images because it's the only way to reliably launch a simple program without crashing on startup".
> Storage capacity hasn't been relevant for over a decade.
> Besides, we foolishly replaced "use shared modules to save storage space" with "build multi-gigabyte docker images because it's the only way to reliably launch a simple program without crashing on startup".
But... you just said it still is :) By two orders of magnitude now.
> Storage capacity hasn't been relevant for over a decade.
Bullshit. It's especially visible with games. I cannot install more than 3-4 big games on my 512GB drive because apparently all game developers believe that "storage capacity isn't relevant".
Let me rephrase, storage capacity with respond to dynamic/shared libraries hasn't been relevant for over a decade.
AAA video games have massive storage requirements primarily due to textures, and sometimes audio. Which is a totally different conversation. It's not a bunch of redundant static libraries causing your pain.
$100 will get you an 8Tb spinning drive or a 1Tb SSD or possibly 2TB. You can also get an external SSD for the same price. Large game sizes is a real and annoying issue. But even 4 year old consoles launched with a 1Tb SSD.
That is almost all assets and has almost nothing to do with static vs shared. In fact, making things shared usually makes them worse, because you miss out on WPLTO.
Ah, the bad old days, when we checked DLLs into our cvs repos and every six months the greybeard on the team would bring a case of Mountain Dew into the office on a weekend and try upgrading the dependencies, sometimes successfully, before everyone else would arrive at work on Monday morning. Fun stuff.
vcpkg, conan, and Fetch_Content fix all the issues you discuss in that blog with the added benefit of not being insane. You dismiss package managers out of hand and you shouldn't.
conan makes it very hard to build a project offline and with a fixed set of inputs, it makes things worse in important ways if you care about reproducibility.
> Have you ever had a build fail because of a network error on some third-party server? Commit your dependencies and that will never happen.
author's "ideal" solution:
> First, the user clones a random GenAI repo. This is near instantaneous as files are not prefetched. The user then invokes the build script. As files are accessed they're downloaded. The very first build may download a few hundred megabytes of data.
Seems like contradiction to me? Do you want your builds to be offline or online?
I have a feeling that the real problem is that (1) author never worked with build systems which supply their own compiler, like buildroot or hermetic bazel and (2) author only downloaded dependencies from third-party servers.
You can solve all the problems in the blog post by (1) changing build system to download compilers and libraries and (2) adopting some sort of "binary storage" (something as simple as private ftp server + small text file checked into repo with path and checksum + few scripts to upload and download). And it would not require any new major code investment, nor new tooling, nor new VCS.
The key idea is that all dependencies should always come from your local binary server, never from the third-party server. This is a bit of PITA as many modern tools default to downloading stuff from third-party servers, and it could be hard to convince them to look at your local server instead. But once you do it, you get reliability + reproducibility + sustainability, without having to reinvent the world.
As a Xoogler one of the things I miss the most was srcfs/CitC(client in the cloud). It worked exactly as you were describing and also brought some very useful features including transparent read access to other users clients and time machine like capabilities. After having every single edit saved for you doing period git commits feels like such a chore.
That post is derived from my experience in video games and at Meta. There is a better way, and it already exists in some places! I'm shocked at how resistant people are to the idea that current popular workflows aren't actually that good and some places already do it better.
Admittedly, my feels are that we socially tend to overly emphasize the brutalist shades of Darwinianism, of natural selection as a competitive pressure.
Evolution though is a much broader game of adaption, of finding niches. It's about capturing energy. And having diverse & creative means to adapt & succeed, less one finds oneself trapped only able to survive in a overly narrow environment. Evolutions greater story is more about over specialization being doom than it is about refining the exact perfect match; the perfect matches are amazing & elegant but rarely able to stick around.
I don't really know my stance here. But it does feel like a call for authority. Why are we still diverse? Why still have complexity? Can't someone free us of the diversity of options and crown a champion, decree how it will be? That sort of resembles Darwinianism on the small, but in the big, we get change & evolution from having non-static environments, from having many ways to shape and be shaped, all about.
That shift from dynamic to static systems, when evolution peaks, is usually not far from when that stasis turns to ossification & rot, decay & the fall. We probably should cleanup some & not support a million options, for our own sake, but this notion the past had of supporting a deeply rich & diverse technical ecosystem I find enormously beautiful.
Anyhow, I forget how often autoconf needs to be run. I think often many projects would work fine for aomg time using the same generated conf across many many builds? There's focus here on the silliness of re-running config again and again and again, but I'm not sure when technically it's required; it seems more like habit & ritual than a need?
If you're building something really portable, you still need autoconf. Anything that has to run on a BSD or a Mac needs these tools.
The article comes across as a bit of "blame the tools" where the tools aren't really at fault, it's the culture which has failed here. We seem to have never graduated from "wow it actually works" levels of Linux appreciation that isn't that different from the 1990s. Those distributions that most of us rely on (whether directly or indirectly) - Debian, RHEL - need to up their game somehow on vetting the packages they accept.
We also need to stop just downloading .deb packages from the internet and installing them and preferring distros that look at security as a serious business including the packaging.
Blaming autoconf is just silly. Sure it can be annoying, is mired in the past and perhaps seen as an anachronism now. But it's not to blame for supply chain attacks any more than a C compiler can be blamed for supply chain attacks.
The problem is that a configure script takes a look at the system, right now, as configured by the end user, even with their home directory and whatever they've shoved into env variables like LD_LIBRARY_PATH and then builds the software.
At a minimum you need to have /etc, /usr/local/etc and ~/etc databases of "union mounted" layered configuration information, along with the ability to add and subtract arbitrary other locations where software might get installed. And it all needs to get updated with every new change to packages in those locations.
As someone else pointed out in a comment, this is sort of the very unrealized promise of pkg-config.
You can't do with just a static file/directory in /etc that defines the distro, that isn't even half the problem once you start installing stuff in /usr/local that you want to link against.
This is a pretty good example of a Chesterton's Fence. You have to first fully understand why the fence is there before you knock it down.
I'm finding myself in the unenviable position of defending autotools.
> They were a reflection of the way the kernel, C libraries, APIs and userspace happened to work. Short of that changing, the results wouldn't change either.
These things do change, especially as toolchain and OS vendors realize their quirks that cause portability friction.
there's absolutely nothing wrong with the functionality (it's essential in fact), it's that maybe, maybe, we should purge shell scripts from our hearts. (or at least reserve them for ad-hoc one-off in-situ time-conscious pragmatic programs, instead of letting a billion bash-lines beat its little shell-song behind every .deb,.rpm & etc.)
// and of course 'should' is more like "it would be great if someone would rewrite it in Rust"
Is anyone doing and publicizing the git clone vs. tarball work in other projects? I suppose if something has been found, it might be too soon to disclose publicly.
The major downside of using git clone build workflows from an autotools project is that your build environment and the developer's may have different enough versions of autotools that macro incompatibility introduces more problems for you to solve. I've noticed this less recently (say 5 years) but still has come up occasionally. While a distributed tarball will have the "developer blessed" autoreconf run and declared good by way of releasing the tarball.
Without that process the options are committing generated files (boo) and burdening those that would build from a git clone instead of a distribution tarball with the demands of details from the development environment. That's intentionally a little vague because of the variety of things that can ultimately affect the autotools output.
All of this is kind of the design of the autotools build process and could very well be a reason to take a harder look at it and the tradeoffs incumbent on its use.
Ultimately autotools isn't doing anything magical, so it's not unusual for cross-platform package managers like vcpkg to simply ignore all that and write their own clean CMakeLists.txt.
In this particular case xz-utils actually did support CMake for building liblzma, so that's what you use if you don't care about the command line tools.
It's also worth appreciating that in the case of the xz project, the CMake flow is secondary other than for Windows. While I have no evidence one way or another, I wonder if the period in the CMake test that disabled Linux Landlock was actually intentional. Since it disabled it for the CMake build and not the autotools build, it seems like it could have been accidental, unless some significant liblzma consumer builds with CMake and would have otherwise had Landlock enabled.
Note that this sort of comparison isn't necessarily trivial. I have code where running `foo --version` out of the git repository will print "foo @@VERSION@@" because that gets substituted with the actual version number as part of the release process, for example.
A first step could be performance benchmarking a tarball release vs something built from a git clone, maybe? Just a guess since that would be similar to how this was found.
I am so grateful and happy I only have to write little programs for this one machine that only I run. Free cookies, fortune and fame for the real programmers who made this possible
The sendmail configuration file is one of those files that looks like someone beat their head on the keyboard. After working with it... I can see why! -- Harry Skelton
If you redid all that work, and managed to avoid m4, it would still be a win!
I believe m4 was used because some Unix shells at the time did not have functions. All shells have functions now -- it's in POSIX.
If you have shell functions, then you can have modularity without concatenating blocks of code. You can make a library of functions, and then allow people to reuse them.
It could just be
source lib_configure.sh
detect_foo
detect_bar
etc.
POSIX shell can do everything that m4 can, and it's MORE deployed, not less. (Related: some people missed that m4 never runs on the end user machine which compiles the source code -- only on the machine that makes the tarball from the repo.)
Djb followed this with email and Djbdns having config scripts that were plain bash. I remember at the time a lot of people got upset - "it can never be portable". But it compiles and runs everywhere it's supported maybe we're OK with not having detection for amigaOS or something.
All the good programmers use plain shell -- DJB, Fabrice Bellard, Xavier Leroy for OCaml, etc. :)
They don't copy and paste stuff they don't understand
(The DJB scripts were almost certainly plain sh and not bash, since I think this machines were BSD. At that time bash wasn't so ubiquitous -- there was more diversity in shells. I did read a bunch of his scripts many years ago.)
If that’s this OCaml, it has a configure.ac file in the root directory, which looks suspicious for an Autotools-free package: https://github.com/ocaml/ocaml
It's literally the only way to be absolutely certain a specific feature is available. Version checks, shibboleth strings like __GLIBC__, documentation, all of these things can be subverted or lied to. In the end, verifying the desired behavior is present is the only correct move.
In 17 years of professional dev I have never, ever worked on a project that required such dark arts.
glibc is a special evil. zig, which can compile C/C++, solves that conundrum. Hopefully someday the core glibc project fixes itself.
Food for thought: cross-compile should be a first class feature of any build system. If you’re depending on a pile of arbitrary system state you’re doing it wrong.
zig can compile for any target from any target. It’s beautiful. And not nearly as difficult as people seem to think.
Congrats on the clean living, but it remains the norm regardless. Both CMake and Autotools use this approach.
C and C++ aren't the only languages for which this is true, by the way. It is de rigueur these days to have versioned releases for the purposes of feature gating, but there's half a century of code out there written in languages without that, and explicit feature testing will be with us as long as that code is.
Anyway, native cross-compilation is also not a language feature. Plan 9's C compilers can target any supported architecture from any other supported architecture with no special configuration.
Autotools, CMake, and "compile and see" are all artifacts of a highly diverse unix ecosystem. Tools like it are the price we pay for heterogeneity. Flexing to accomodate so many subtly different platforms is a big part of what brought things like GNU and llvm to promenence, so it's not a good idea to dismiss it as a bad thing.
> it remains the norm regardless. Both CMake and Autotools use this approach
Yes, it is the norm (in certain circles, but not others). No, it doesn't have to be that way. CMake and Autotools are both genuinely awful. The world would be a better place if Bazel/Buck2/similar were the norm.
I find it extraordinarily bizarre how defensive Linux people get when I say "the status quo is actually bad, but it doesn't have to be!".
> there's half a century of code out there written in languages without that, and explicit feature testing will be with us as long as that code is.
I think you radically overestimate how difficult of a problem this is. It doesn't have to be this way. I promise you!
> native cross-compilation is also not a language feature.
Correct. That's why I called it a feature of a build system. C/C++ standards famously do not even attempt to describe a build system. This has resulted in many elements that really ought to be language features to actually be implementation defined.
> so it's not a good idea to dismiss it as a bad thing.
I didn't say Autotools and CMake weren't awful, I said that approach exists for a reason. You seem not to understand; maybe it's my poor communication skills. And I don't appreciate being called "Linux people," because I am not one, thanks. Not sure why personal attacks are in play here.
The thing you're missing is that Bazel, Buck2, Zig, and all these things that do not require the compile-testing approach is they are not portable. They support Windows, MacOS, and Linux, by special-casing each platform, and even the Linux support is restricted to a few mainstream distros. This is a perfectly valid technical decision, and one I support, but it was not possible for decades.
Now people take Linux for granted, but until relatively recently you had dozens of almost-compatible Unix clones from dozens of vendors, of which Linux was but one. All of them had to be, essentially, written from scratch or based on a BSD release, with the differentiating features implemented on top by a given vendor. Since different developers had different interpretations and priorities for "Unix compatibility," this led to a combinatorial explosion of possible software configurations even just at the libc level, much less kernel API. The Single Unix Specification and POSIX standards were assembled to try to correct this, but there was no way to go back and un-differentiate all these Unix clones. So developers targeted SUS or POSIX, and mostly got what they needed, but something had to account for all the little differences that remained. This was the purpose of Autotools, and Cmake was written because Autotools sucked.
So, these things you're praising which avoid that whole scene, are able to do so because, for better or worse, Linux won. The other Unix clones have either completely died out (SCO Unix etc), are dying out quickly (Solaris etc), or have stabilized into their own distinct platforms (FreeBSD, MacOS, etc).
It has nothing to do with the build system. It has everything to do with platform consistency. That's why Plan 9 can do it without any build system beyond mk (a weird clone of Make), and why the FAANG companies can crank out build systems as get-me-promoted projects -- they're targeting three stable operating systems, instead of an uncountable number of weird almost-compatible monsters.
But again, from this terrible Unix scenery emerged a remarkably robust and portable set of tools (gcc, the gnu coreutils, etc) which became the de facto standard across all the Unix clones. People liked them because they worked the same on every Unix, and because (thanks to Autotools) they could build on every Unix. Because they had this versatile build system, when Linux came around the GNU tools were the easiest to port. Because they were the main tools running on Linux, and people had experience with them from other Unix clones, Linux adoption was quick and easy. And so it grew in popularity...
So yes, I mean "because of." It's easy to hate on complicated old rickety shit like Autotools, and I will be happy when I never have to deal with it again, but pretending it has no value and typing in "ewww" is needlessly dismissive and denies the actual importance this code had in getting us to a place where we can live without it.
First of all, thanks for engaging thoughtfully! The "Linux people" comment was a bit rude, sorry. If it helps it was meant as a general observation and not super targeted at you. Anyhow.
> they are not portable. it was not possible for decades.
I could be missing something, but I don't think this is true at all. The definition of "portable" can be quite fuzzy.
Supporting many different platforms is easy. Coming from gamedev the target platform list looks something like: Windows, macOS, SteamOS, Linux, Android, iOS, PS4, PS5, XSX, Switch. A given codebase may have supported another 10 to 20 consoles depending on how old it. Supporting all the different consoles is quite a bit more work than supporting all the niche nix flavors.
Building anything "new" probably requires building it at least three times. You make mistakes, hit edge causes, learn what's important, and eventually build something that's pretty good. It's a process. One perspective is "the approach exists for a reason". Another perspective is "damn we made a ton of mistakes, I really wish we knew then what we know now!".
> I said that approach exists for a reason
My core thesis here is that the reason is a bad one. It was a mistake. Folks didn't know better back then. That's ok. No shade on the individuals. But it doesn't mean it's a good design. In a parallel universe maybe the better design would have been made before a bad one. Such is life.
> They support Windows, MacOS, and Linux
Publicly. Internally Buck2 supports dozens of weird hardware platforms and OS variants. Adding support for a new hardware or OS platform doesn't have to be hard!
I swear compiling computer programs doesn't have to be hard! Build systems don't have to try to compile and see if it fails or not!
As one example, Windows handles linking a dynamic library much, much better than GCC/Clang on Linux. All you need is a header and a thin import static lib that only contains function stubs. GCC made a mistake by expecting a fully compiled .so to be available. The libc headers across all the different weird Unix variants are 99% identical. Supporting a new variant or version should be as simple as grabbing headers and import lib. Zig jumps through hoops to pre-produce exactly that. It'd be even better if libc headers were amalgamated into a single header. Two files per variant. Easy peasy.
"The libc headers across all the different weird Unix variants are 99% identical."
This is both true and not applicable. The headers are nearly identical. The behaviors of the systems are not. Functions with the same name do different things, functions have different names but have slightly-incompatible shims with the name you expect, the headers have the function primitives but those functions are stubbed to NOPs in the actual library, and a million other things. These differences, furthermore, are not documented. All of the platforms you listed are meticulously documented and many of them have SDKs available. This, again, is not something that existed for many decades.
If libc headers being close were so meaningful, widevine binaries would work on musl libc. They do not. Modern build systems work around these incompatibilities by declaring incompatible platforms unsupported and ignoring them. That's a perfectly valid business decision that makes everyone's lives easier! But it's not the only approach, and other approachs are valid too.
Meanwhile, if you want complex software to accurately and correctly support a broad array of platforms, you use Autotools to compile-and-see what behavior you're dealing with, and then something like libtool to polyfill the differences. Doing this is not the result of ignorance, but the result -- borne of hard experience -- of trying to get code working on divergent undocumented platforms.
I don't really understand your point in the last paragraph. Neither GCC nor Clang handle linking. GNU ld or gold or some linker is invoked to do that.
> if you want complex software to accurately and correctly support a broad array of platforms, you use Autotools to compile-and-see what behavior you're dealing with, and then something like libtool to polyfill the differences. Doing this is not the result of ignorance, but the result -- borne of hard experience -- of trying to get code working on divergent undocumented platforms.
Yes, when you have differences in behavior you need to create an abstraction layer that behaves in a single, unified way.
"The Right Thing" is to do that once ever for a given target. Write a bootstrap tool if you want. Or let one person spend one week testing and implementing shims for a brand new platform. Which is what you need to do anyways for embedded platforms that can't host the compiler. In any case don't force every user to run overly complex and brittle scripts when the result never changes for a given target.
Oh hey, this is exactly what the source article argues! At least I'm not alone.
> Neither GCC nor Clang handle linking. GNU ld or gold or some linker is invoked to do that.
Bleh. ld is part of the GNU toolset. Pretend I said "standard Linux linking behavior". As you said, Linux's popularity stems from a collection of tools that work together nicely. The point I tried and failed to make is "compiling and linking doesn't have to be hard". Implementing an abstraction layer for a new platform isn't particularly hard either. Probably best to forget I said it.
Yeah, it's definitely a Chesterton's Fence problem; GP is dismissing Autotools because he has never had to deal with the problem which Autotools solves.
Zig definitely has the right approach, but it's targets are still a bit limited so saying it's not difficult is maybe ignoring gcc's extreme portability that zig hasn't yet had to deal with.
The build system should specify whether it wants glibc or musl and what version. Relying on current system state is one of the reasons that glibc is such a monumental pain in the ass. It should be trivial to target an arbitrary version of glibc. (zig build system achieves this).
The way C projects from the 90s work is you locally run .configure and it generates a whole bunch of headers, if not .c files, with a whole bunch of crap based on local system state. Then you run a build script. "The Right Thing" is for the repo files to contain a superset and for the build system pass the tiny handful of defines necessary to select the target platform. For example glibc, musl, PlayStation4, NintendoSwitch, etc.
It's very, very easy to do. musl does things right. glibc is the only bad citizen stuck in the 90s. The difference between the libc headers across the kajillion target platforms is exceedingly minimal.
Autoconf is most useful when the build isn't relying on current system state! That's it's most useful characteristic. If the goal was to build against the current system, autoconf probably would never have existed. Building against the current system is a side-effect of doing proper cross-compilation builds.
> "The Right Thing" is for the repo files to contain a superset and for the build system pass the tiny handful of defines necessary to select the target platform. For example glibc, musl, PlayStation4, NintendoSwitch, etc.
For common platforms, sure... that works well and that is sort of how my Makefiles are written.[1]
But to cover the domain that autoconf/configure.sh covers ... well that is not feasible. Glibc might have 3 parameters when compiling against Linux, but 2 when compiling against Solaris. Linux itself might have some `#define`s when compiling on x86_64, and a different set when compiling on RISC-V. The target distribution might have different parameters when compiled for Debian vs when compiled for Alpine.
It's why gcc uses hyphenated targets: `ISA`-`kernel`-`platform`. So you see things like `x86_64-w64-mingw32` and `sparc-unknown-linux-gnu`. The combinatorial explosion of ISA + kernel + ABI + distro + user libs is much much harder to manage for a portable project that has many dependencies than simply putting up with autoconf.
I mean, I don't even like autoconf (which is why I avoid it), but I gotta say, autoconf actually does solve those problems especially for cross-compiling, because then the entire chain of libraries getting linked in will be guaranteed to be the correct ones, for the correct ABI, on the correct target ISA, on the correct target kernel, on the correct target distribution ... and if something doesn't match, you'll know before Make is even run. This is especially important with C++, which needs another value in the tuple for seh/sjlj, and you cannot detect which of the two you need until you run the resulting program and it generates an exception.
It's just possible that, in 2024, we don't actually need the level of portability that autoconf brings to cross-compilation anymore - build it against Windows/Linux/BSD on x86_64/ARM and you cover about 50% of useful scenarios.
Those that need the remaining 50% probably already know what they are doing and can tweak the Makefiles and/or source code to build for their target.
[1] I do minimal detection and expect the user to set certain variables if they don't want the defaults.
There are super common scenarios with Autoconf, CMake, et al. which are just not provided as readily available examples or recipes and the whole time you work in these ecosystems you’re scratching your head asking why the hell no one has written these things in literal decades.
No, I’m not switching build systems because everything people use as dependencies for Real Work™ relies on you using the incumbents.
They don’t provide those templates either, anyway. Mind boggling.
And, oh by the way, underneath? Shells have some of the absolutely bonkersly dumb parsers and interpreters; absolutely embarrassingly dumb stuff like aliases that can override keywords and change the parsing of shell script. The fact that some (most) shells interpret a shell script one line at a time, and that "built-ins" like the syntax for conditions in ifs looks like syntax but might be little binaries underneath (look up how "[" and "]" is handled in some shells--it might not be!).
What a wild irony that all the ickiest parts of UNIX--shells and scripts and autoconf and all that stringly-typed pipe stuff, ended up becoming (IMHO) the most reusable, pluggable programming environment yet devised...