collapse Blogs I Read
collapse Table of Contents
  1. Re-Introducing monodocer - Jonathan Pryor's web log
    1. Re-Introducing monodocer
      1. Monodocer
      2. monodocer -importecmadoc
      3. Optimizing monodocer -importecmadoc
      4. Conclusion

Re-Introducing monodocer - Jonathan Pryor's web log

« Goodbye Cadmus; Hello Novell | Main | It's A Boy! »

Re-Introducing monodocer

In the beginning... Mono was without documentation. Who needed it when Microsoft had freely available documentation online? (That's one of the nice things about re-implementing -- and trying to stay compatible with -- a pre-existing project: reduced documentation requirements. If you know C# under .NET, you can use C# under Mono, by and large, so just take an existing C# book and go on your way...)

That's not an ideal solution, as MSDN is/was slow. Very slow. Many seconds to load a single page slow. (And if you've ever read the .NET documentation on MSDN where it takes many page views just to get what you're after... You might forget what you're looking for before you find it.) A local documentation browser is useful.

Fortunately, the ECMA 335 standard comes to the rescue (somewhat): it includes documentation for the types and methods which were standardized under ECMA, and this documentation is freely available and re-usable.

The ECMA documentation consists of a single XML file (currently 7.2MB) containing all types and type members. This wasn't an ideal format for writing new documentation, so the file was split up into per-type files; this is what makes up the monodoc svn module (along with many documentation improvements since, particularly types and members that are not part of the ECMA standard.

However, this ECMA documentation import was last done many years ago, and the ECMA documentation has improved since then. (In particular, it now includes documentation for many types/members added in .NET 2.0.) We had no tools to import any updates.

Monodocer

Shortly after the ECMA documentation was originally split up into per-type files, Mono needed a way to generate documentation stubs for non-ECMA types within both .NET and Mono-specific assemblies. This was (apparently) updater.exe.

Eventually, Joshua Tauberer created monodocer, which both creates ECMA-style documentation stubs (in one file/type format) and can update documentation based on changes to an assembly (e.g. add a new type/member to an assembly and the documentation is updated to mention that new type/member).

By 2006, monodocer had (more-or-less) become the standard the generating and updating ECMA-style documentation, so when I needed to write Mono.Fuse documentation I used monodocer...and found it somewhat lacking in support for Generics. Thus begins my work on improving monodocer.

monodocer -importecmadoc

Fast-forward to earlier this year. Once monodocer could support generics, we could generate stubs for all .NET 2.0 types. Furthermore, ECMA had updated documentation for many core .NET 2.0 types, so...what would it take to get ECMA documentation re-imported?

This turned out to be fairly easy, with supported added in mid-May to import ECMA documentation via a -importecmadoc:FILENAME parameter. The problem was that this initial version was slow; quoting the ChangeLog, "WARNING: import is currently SLOW." How slow? ~4 Minutes to import documentation for System.Array.

This might not be too bad, except that there are 331 types in the ECMA documentation file, documenting 3797 members (fields, properties, events, methods, constructors, etc.). 4 minutes per type is phenominally slow.

Optimizing monodocer -importecmadoc

Why was it so slow? -importecmadoc support was originally modeled after -importslashdoc support, which is as follows: lookup every type and member in System.Reflection order, create an XPath expression for this member, and execute an XPath query against the documentation we're importing. If we get a match, import the found node.

The slowdown was twofold: (1) we loaded the entire ECMA documentation into a XmlDocument instance (XmlDocument is a DOM interface, and thus copies the entire file into memory), and (2) we were then accessing the XmlDocument randomly.

The first optimization is purely algorithmic: don't import documentation in System.Reflection order, import it in ECMA documentation order. This way, we read the ECMA documentation in a single pass, instead of randomly.

As is usually the case, algorithmic optimizations are the best kind: it cut down the single-type import from ~4 minutes to less than 20 seconds.

I felt that this was still too slow, as 20s * 331 types is nearly 2 hours for an import. (This is actually faulty reasoning, as much of that 20s time was to load the XmlDocument in the first place, which is paid for only once, not for each type.) So I set out to improve things further.

First was to use a XPathDocument to read the ECMA documentation. Since I wasn't editing the document, I didn't really need the DOM interface that XmlDocument provides, and some cursory tests showed that XPathDocument was much faster than XmlDocument for parsing the ECMA documentation (about twice as fast). This improved things, cutting single-type documentation import from ~15-20s to ~10-12s. Not great, but better.

Convinced that this still wasn't fast enough, I went to the only faster XML parser within .NET: XmlTextReader, which is a pull-parser lacking any XPath support. This got a single-file import down to ~7-8s.

I feared that this would still need ~45 minutes to import, but I was running out of ideas so I ran a full documentation import for mscorlib.dll to see what the actual runtime was. Result: ~2.5 minutes to import ECMA documentation for all types within mscorlib.dll. (Obviously the ~45 minute estimate was a little off. ;-)

Conclusion

Does this mean that we'll have full ECMA documentation imported for the next Mono release? Probably not. There are still a few issues with the documentation import where it skips members that ideally would be imported (for instance, documentation for System.Security.Permissions.FileIOPermissionAttribute.All isn't imported because Mono provides a get accessor while ECMA doesn't). The documentation also needs to be reviewed after import to ensure that the import was successful (a number of bugs have been found and fixed while working on these optimizations).

Hopefully it won't take me too long to get things imported...

Posted on 15 Jul 2007 | Path: /development/mono/ | Permalink
blog comments powered by Disqus