DocBook sucks…

I wrote my original grammar guide in clean XHTML by hand. The beauty of XHTML validation is that it’ll easily catch your boneheaded tag errors automatically. Also, I was already comfortable with HTML, CSS, DOM, and Javascript and so I could easily tweak the content exactly how I wanted it including mouse-over popups and practice exercises. Finally, having it all in XHTML made it incredibly easy to move around and share. No database or language runtime to install, all you needed to do was put the files on a web server. The clean markup made it easier for others to modify for the language translations as well.

The only drawback is that it is not easy to port into other formats. The pdf version is not very good and RTF is pretty much out of the question. But it’s ok because it was pretty much built for the web anyway and that’s where it will stay.

For the textbook, I wanted it to be not just for the web but for a variety of formats, including (as the word “textbook” suggests) a printed book. That’s why I went with DocBook, which seemed to serve my purposes. Unfortunately, now I need an XSLT processor and have to mess with XSL to make any kind of major tweak which is a huge pain. Still, since the original document is XML, it’s still portable and shareable. I also really enjoy the ability to easily reorder content around because I’m still trying to figure out how to arrange everything.

Unfortunately, I’m finding some annoying issues with DocBook that lies with the purist mentality that absolutely no formatting should be in the document itself. Can you believe that there is no built-in support for freakin’ line breaks?? So when I want to write a dialogue, I either have to use “literallayout” which means I have to mess with the whitespace therefore completely ruining my prettily formatted xml or I have to add my own custom tag and XSL template, which means it’s no longer docbook and I have to carry around my customized XSL forever.

Another example is the complete lack of support for strike-through text. Apparently I should use subversion or something to track revision changes. This is a perfect example of purists thinking they know better than you. But have they considered that maybe I want to show readers what doesn’t belong in a sentence and should be deleted? Nobody can imagine all the uses people have for various formatting so they shouldn’t try to second-guess what you need them for.

For now, I’m not going to let it bother me and just concentrate on the content. Worst case, I can always port the stuff to whatever I want by hand. Or maybe I can just run a cleanup perl script at the end. I don’t even want to think about PDF conversion right now. I evaluated FOP at Hitachi when they needed documents with dynamic data and format customized for each company. I told them to forget about it and go with Big Faceless Java PDF Library. Even with a crazy name like that, it’s probably the smartest thing I ever did there. The funny part is that took a multimillion dollar company MONTHS to license something that costs several hundred dollars. We almost released the tax modules with “DEMO” written in large letters across the back of all the documents.

I don’t know. Maybe I should try LaTeX or something? I know next to nothing about it except that you have to compile your document!

9 thoughts on “DocBook sucks…

  1. Go to, and get their XHTML to PDF software. It’s great, I love it.

  2. Oh, be forewarned, Prince is free to use as an individual, but if you actually want to publish a book without breaking the law, the licensing fee is several hundred bucks.

  3. Go with LaTeX. It has its own problems, but it’s much easier to personalize the markup language and the output.

  4. Maybe I’ll take a look at it. I assume that as a type setting system, it’s only for print ie pdf and postscript? Would I be able to make an HTML or RTF version, for example?

  5. I have never used it with Japanese fonts, but I would also suggest LaTeX, with a grain of salt:
    – it is a very powerful tool, that will let you do anything you want;
    – the language is based on a quite readable text format, so differences between revisions are human-readable;
    – it can “generate” Postscript, PDF, RTF (latex2rtf), HTML (hevea, tth, tex4ht…), DocBook (tex4ht), and others, using appropriate tools;
    – it is extended by several “packages”, and you may create your own (to define macros, preferred layout, etc.)

    The drawbacks are:
    – you have to compile it (and to some extent, when dealing with many files (several sources, images, etc.) a Makefile is a good thing :P)
    – sometimes it is just painful to find how to do something you thought trivial at first (the worst problem of LaTeX, IMHO)
    – the packages are generally made to work with the PS back-end; for example, over-striking characters are implemented through the use of a dedicated package, and I am not sure how it will behave when exporting to HTML (the worst problem for your needs, IMHO 🙂 )

    Ultimately, you might find that XHTML simply fits your needs… (Silly idea: what about converting your XHTML to LaTeX ?)

  6. Thanks for the breakdown. It sounds like I should stop wasting time with technology (they all have their goods and bads) and just concentrate on the actual content. I’ll stick with DocBook and figure something out at the end.

  7. On further thought, I’ve decided to go with LaTeX. I’ve decided to focus on this as a printed PDF book instead of being wishy-washy about HMTL. And LaTeX seems to be the way to go for professional layout.

Comments are closed.