Defending XML-based Build Systems - Jonathan Pryor's web log

« mdoc Repository Format History | Main | HackWeek V »

Defending XML-based Build Systems

Justin Etheredge recently suggested that we Say Goodbye to NAnt and MSBuild for .NET Builds With IronRuby. Why? because they're based on XML.

He goes on to mention several problems with XML-based build systems, principally:

It's not code (unless you're using XSLT, and having maintained XSLTs if you need to write them you have my condolences...)
Lose existing tooling: editors, debuggers, libraries, etc.
Limit creation of custom rules.
Require that we write custom tools to workaround the limitations of XML build systems.

His solution: use Ruby to describe your build process.

My reaction? No, no, for the love of $deity NO!

Why? Three reasons: GNU Autotools, Paul E. McKenney's excellent parallel programming series, and SQL.

Wait, what? What do those have to do with build systems? Everything, and nothing.

The truly fundamental problem is this: "To a man with a hammer, everything looks like a nail" (reportedly a quote from Mark Twain, but that's neither here nor there). In this case, the "hammer" is "writing code." But it's more than that: it's writing imperative code, specifically Ruby code (though the particular language isn't the problem I have, rather the imperative aspect).

Which is, to me, the fundamental problem: it's a terrible base for any form of higher-level functionality. Suppose you want to build your software in parallel (which is where Paul McKenney's series comes in). Well, you can't, because your entire build system is based on imperative code, and unless all the libraries you're using were written with that in mind...well, you're screwed. The imperative code needs to run, and potentially generate any side effects, and without a "higher-level" description of what those side effects entail it can't sanely work.

Want to add a new file to your build (a fairly common thing to do in an IDE, along with renaming files?) Your IDE needs to be able to understand the imperative code. If it doesn't, it just broke your build script. Fun!

OK, what about packaging? Well, in order to know what the generated files are (and where they're located), you'll have to run the entire script and (somehow) track what files were created.

Want to write an external tool that does something hitherto unknown? (As a terrible example, parse all C# code for #if HAVE_XXX blocks so that a set of feature tests can be automatically extracted.) Well, tough -- you have to embed an IronRuby interpreter, and figure out how to query the interpreter for the information you want (e.g. all the source files).

etc., etc.

My problem with imperative languages is that they're not high-level enough. McKenney asks what the good multicore programming languages are; the answer is SQL because it's dedicated ~solely to letting you describe the question but leaves the implementation of the answer to the question up to the SQL database. It's not imperative, it's declarative (at least until you hit esoteric features such as cursors, but in principal you can generally stick to a declarative subset).

OK, so I want a higher-level language to describe targets and dependencies, and supports faster builds. To a large degree, make(1) supports all that, and it's the basis of Autotools. Surely I like that, right?

The problem with autotools is that it's a mixture of declarative and imperative code, with Unix shell scripts forming the backbone of the imperative code (aka the target rules), and these are inherently Unix specific. (Possibly Linux specific, much to my consternation.) Plus, the format is virtually unreadable by anything other than make(1), what with all the language extensions...

So why XML?

Because it's not code, it's data, which (somewhat) lowers the barrier of entry for writing external tools which can parse the format and Do New Things without needing to support some language which might not even run on the platform you're using.

Because it's easily parseable AND verifiable, it's (somewhat) safer for external automated tools to manipulate the file without screwing you over "accidentally" -- e.g. adding and removing files from the build via an IDE.

Because custom rules are limited, there is a smaller "grammar" for external tools to understand, making it simpler to write and maintain them. It also encourages moving "non-target targets" out of the build system, simplifying file contents (and facilitating interaction with e.g. IDEs).

Am I arguing that XML-based build systems are perfect? Far from it. I'm instead arguing that small, purpose-specific languages can (and often are) Good Things™, particularly if they permit interoperability between a variety of tools and people. XML allows this, if imperfectly. An IronRuby-based build system does not.

Posted on 26 Apr 2010 | Path: /development/ | Permalink