DocBook sucks…

I wrote my original grammar guide in clean XHTML by hand. The beauty of XHTML validation is that it’ll easily catch your boneheaded tag errors automatically. Also, I was already comfortable with HTML, CSS, DOM, and Javascript and so I could easily tweak the content exactly how I wanted it including mouse-over popups and practice exercises. Finally, having it all in XHTML made it incredibly easy to move around and share. No database or language runtime to install, all you needed to do was put the files on a web server. The clean markup made it easier for others to modify for the language translations as well.

The only drawback is that it is not easy to port into other formats. The pdf version is not very good and RTF is pretty much out of the question. But it’s ok because it was pretty much built for the web anyway and that’s where it will stay.

For the textbook, I wanted it to be not just for the web but for a variety of formats, including (as the word “textbook” suggests) a printed book. That’s why I went with DocBook, which seemed to serve my purposes. Unfortunately, now I need an XSLT processor and have to mess with XSL to make any kind of major tweak which is a huge pain. Still, since the original document is XML, it’s still portable and shareable. I also really enjoy the ability to easily reorder content around because I’m still trying to figure out how to arrange everything.

Unfortunately, I’m finding some annoying issues with DocBook that lies with the purist mentality that absolutely no formatting should be in the document itself. Can you believe that there is no built-in support for freakin’ line breaks?? So when I want to write a dialogue, I either have to use “literallayout” which means I have to mess with the whitespace therefore completely ruining my prettily formatted xml or I have to add my own custom tag and XSL template, which means it’s no longer docbook and I have to carry around my customized XSL forever.

Another example is the complete lack of support for strike-through text. Apparently I should use subversion or something to track revision changes. This is a perfect example of purists thinking they know better than you. But have they considered that maybe I want to show readers what doesn’t belong in a sentence and should be deleted? Nobody can imagine all the uses people have for various formatting so they shouldn’t try to second-guess what you need them for.

For now, I’m not going to let it bother me and just concentrate on the content. Worst case, I can always port the stuff to whatever I want by hand. Or maybe I can just run a cleanup perl script at the end. I don’t even want to think about PDF conversion right now. I evaluated FOP at Hitachi when they needed documents with dynamic data and format customized for each company. I told them to forget about it and go with Big Faceless Java PDF Library. Even with a crazy name like that, it’s probably the smartest thing I ever did there. The funny part is that took a multimillion dollar company MONTHS to license something that costs several hundred dollars. We almost released the tax modules with “DEMO” written in large letters across the back of all the documents.

I don’t know. Maybe I should try LaTeX or something? I know next to nothing about it except that you have to compile your document!

CJK in Ubuntu 8.04

Encouraged by your comments about getting CJK input to work in Linux, I decided to give it another shot over the weekend. I was pleasantly surprised at the much improved support since my last attempt (ver 7.10). Check out this SCIM documentation and compare the instructions for 8.04 vs 7.10. In the newest version, everything is done via the Language Support menu in a few clicks. In prior versions, you had to manually install packages and edit config files by hand. Pretty much where I failed last time.

So I have to give Ubuntu credit here. It really is turning into a fully-featured and intuitive OS with every new release. Now they just have to do something about the default fonts. I wished I saved a screenshot but the Japanese fonts out of the box really are horrendous. The kana and kanji don’t even line up properly! So unless you want to punish your eyes, you still need to download Microsoft fonts as described here. If the fonts are freely downloadable, you would think including them in the distro would be the easiest thing in the world. Maybe there are distribution issues or the developers don’t know enough about Japanese to see how bad the fonts currently are.

So I’m using all three OSs now! Here’s my current setup:
Dell Inspiron 530: Windows XP Home and Ubuntu 8.04 dual boot
Panasonic Let’s Note Light W5: Windows XP Professional (Japanese)
MacBook Pro (Loaned from work): OS X 10.5

By the way, the mouse precision and acceleration is horrible in OS X. I just stick to the trackpad because the mouse feels like it’s moving through molasses.

Which OS do you like?

This post has almost nothing to do with Chinese, Japanese, or Korean but hey, it’s called “Tae Kim’s Blog” remember? I can write whatever I want, Ha Ha!

OS X

I requested and recently finally got a Macbook Pro for my work laptop and so far I’m really liking it! I especially like the fact that I can automatically rotate all my Suzumiya Haruhi wallpapers every hour.

I just wish uTorrent and Notepad++ were available for OS X. I suppose I can just use vim for general text editing. I haven’t used XCode extensively yet but at first glance, it looks like it has a ways to go before it can compete as an IDE.

Linux

I tried Ubuntu briefly and it was nice and all but I refuse to use an OS that has such poor multilingual support and ugly Asian fonts. I guess there are not many Linux users who need to use English and another CJK language at the same time. Vote for my “Better Multilingual support and CJK fonts” idea on Ubuntu brainstorm if you’re in the same boat as me.

In any case, until I can just add the languages in a menu, have an input editor that doesn’t drive me insane, and fonts that don’t make my eyes bleed, I’m not switching.

Windows

Windows 2000 was my favorite version and I reluctantly switched to XP when my newer computers didn’t have compatible drivers. XP is not glamorous but it certainly does everything I need especially with Google Pinyin. My favorite Windows-only apps include: uTorrent, Notepad++, WinSCP, K-Lite Codec Pack, WinRAR, ImgBurn, and DVD Shrink. I recently bought a Dell desktop with XP while they were still offering the option and so it will be my main OS for many more years.

I haven’t tried Vista yet and have no plans to unless my work requires it. I refuse to use an OS that requires at least 1gb of ram and 40gb of hard disk space on my current systems. I mean you can’t even use more than 2gb of ram with 32-bit Windows!! (And I hear 64-bit is a whole another can of worms.)

Which OS are you currently using and any thought of switching? According to Google Analytics, 86% of you use Windows, 8% Mac, and 5% Linux. Among Windows users, 78% use XP while 19% use Vista.



システム開発における用語

I remember when I was trying to get a computer job in Japan, I tried to learn some computer terminology worried that I wouldn’t understand any of the technical words in Japanese. Unfortunately, I couldn’t find such a site on the Internet. And so, in an effort to improve the usefulness of the Internet by .00001%, here’s an informative post (hopefully) for the up-and-coming programmers wanting to work in Japan.

General Terminology

オブジェクト指向 (しこう)- Object-oriented
継承 (けいしょう) – inheritance
カプセル化 (か) – encapsulation (black box programming)
抽象クラス(ちゅうしょう) – abstract class
変数 (へんすう) – variable
固定値 (こていち) – constant
値 (あたい) – value
閾値(しきいち) – threshold (often used in validation, program limits, and the like)
関数 (かんすう) – function
メソッド – method (java/c# functions)
引数 (ひきすう) – parameter
戻り値(もどりち) – return value
文字列(もじれつ) – string
配列 (はいれつ) – array
スレッド – thread (no it’s not a sled)
マルチスレッド – multi-threaded
同期(どうき) – synchronous
同期化(どうきか) – synchronize
非同期(ひどうき) – asynchronous
静的(せいてき) – static
動的(どうてき) – dynamic
実行する(じっこうする) – to execute

Design-related Terminology

定義書 (ていぎしょ) – a document that defines something (ex: XML定義書)
クラス図 (ず) – class diagram
基本設計 (きほんせっけい) – basic design (broad level)
詳細設計 (しょうさいせっけい) – specific design
仕様 (しよう) – specifications (what your program is supposed to do)
仕様書 (しようしょ) – written specifications
要件 (ようけん) – requirements
見積もる (みつもる) – to make an estimate
見積もり (みつもり) – estimate

If there are other terms you’re curious about, just let me know and I’ll add it to the list.

Lang-8+Twitter=Awesome!

What are you doing?
何してんの?
你在做什么?
뭘 하고 있어?

No matter what language you’re speaking in, this is a question you’re answering all the time. So naturally, your conversation skills should improve if you learn how to answer this question in your studies. And what better way to practice than by using Twitter, a service built entirely for this purpose? As they describe it, “Twitter is a service… to communicate and stay connected through the exchange of quick, frequent answers to one simple question: What are you doing?” It seems to be exactly the thing for some quick language practice. You can even set it to bug you if you don’t update it for 24 hours.

So I decided to give it a try by signing up and posting some stuff in Chinese. So far, the experience has been very positive and I even put the latest status on my blog sidebar under “Quick Update”. Answering the simple question, “What are you doing?” motivated me to look up lots of new and useful grammar and vocabulary while helping me apply the stuff I already knew. In addition, the 140 character limit helps keep me focused and motivated. I find it much easier to write a quick sentence or two in Twitter compared to journals (Lang-8) or blogs where there is more pressure to write something more significant.

One thing I did before I started was to make sure I had ready access to update whenever I felt like. Unfortunately, updating from my phone was not an option since my phone can only send English messages. (I’ll try not to rant here on the poor state of mobile technology in the US where you don’t even get a freaking email address for your phone let alone multilingual messaging!!!!!) Since I check all my stuff on iGoogle all the time anyway, I added BeTwittered, a Twitter gadget for the iGoogle homepage. There are lots of other options that might make more sense depending on your habits but you’ll definitely want to set it somewhere where you’ll see it all the time.

This is all fine and dandy but the major problem I have is that nobody reads my Twitter updates. Granted, they’re not all that interesting but it would sure be nice to have native Chinese speakers read them and reply with their comments. In turn, I can do the same for them if they’re learning English or Japanese. Hmm… does this sound familiar? Yes in fact, I have a whole list of friends that fit that criteria in my Lang-8 account. Wouldn’t it be cool if Lang-8 had Twitter integration?! What if you and your friend entered your Twitter account information into your profile and Lang-8 automatically set the appropriate followers based on you and your friend’s native and target languages? It certainly seems possible based on the Twitter API.

Until Lang-8 decides to introduce such a feature, if you speak Chinese, please follow my Twitter account! In exchange, I promise to follow yours. (I wish I could write this in Chinese but it’s too hard and I’m too lazy right now.)

In any case, if you are a Twitter user and you’re using it for language practice, leave a comment with your Twitter link! I write Japanese updates as well so feel free to follow me if it sounds interesting to you.

Update
I didn’t know this but to reply to somebody, you have to start your Twitter message with @[username]. You can tell this is an organic feature and not fully designed as it will reply only to the user’s latest Twitter update. If you want to reply to an older message, you’re out of luck.

Link: My Twitter account

Ruby tags considered harmful

For those of you unfamiliar with the ruby tag, it is an html tag that adds tiny readings over kanji. 「ルビ」 traditionally is used in print for archaic kanji or when the author wants to indicate a non-standard reading for the kanji. However, on the net, ruby tags are being abused everywhere I see them. Here’s a simple benchmark (with a neat acronym to make it “official”) for determining whether you’re abusing the ruby tag.

Ruby Abuse Benchmark (RAB)

1. Do you use ruby tags for every kanji?

2. Do you use ruby tags for any kanji that most Japanese people can read?

3. Do you use ruby tags?

If you answered “Yes” to any of the questions above, you are abusing the ruby tag.

This abuse happens most often on sites that are intended for people learning Japanese. For example, this site about the JLPT or Japanese language blogs like the one you’re reading now. I don’t use ruby tags though. Even sites for kids stay away from ruby and just use Hiragana instead. Here’s why you should stay away from them too.

The Technical Reason

Ruby is only included in the XHTML 1.1 specification, which has been around forever and still hasn’t gained much traction. The HTML 4.01 and XHTML 1.0 Transitional DTDs are still being used in the majority of website that care about standards. This means that if you want to use a schema that the majority of the web is using, <ruby> won’t validate.

Plus, the markup is terribly hard to read and write. Take a look at these markup examples. Imagine doing that for every kanji. Your Japanese text will be indecipherable and an incredible pain to edit.

The Practical Reason

Because XHTML 1.1 hasn’t gained much traction, a majority of browsers don’t support ruby. The only one I’m aware of that does is IE and in today’s world where up to 30% of your visitors might not be using IE, IE-only is not practical.

People without Ruby support will see this.

田中(たなか): はい、元気(げんき)です。早坂(はやさか)さんは?

Terrible, just terrible. It’s totally unreadable. Plus, even if you DID have Ruby support, the text is far too small. It’s a lose-lose situation. The correct use of ruby is to show the readings of a few archaic words that the author assumes will not readable by his audience or when he wants to expand on the word. It is NOT intended to be used for every kanji. The print is too small for people who need them and distracting for the people who don’t need them. Also, it can become a crutch allowing people to never actually read and learn the kanji.

So, even if you can install something such as an extension to make ruby tags work, it’s just not a good idea.

Alternatives

1. CSS mouse-over popups: It’s one simple span tag and it works in all major browsers. It’s also more versatile because you can add more information such as English definitions, etc.

Html: <span title=”たべる – to eat” class=”popup”>食べる</span>
Appears as: 食べる

I suggest adding a visual highlight so that the reader can easily see which part of the text applies for the popup or whether there is a popup at all (not supported by some older browsers). You can easily do this by adding some CSS like the following to your stylesheet.

span.popup:hover {
text-decoration:none;
color: rgb(159,20,26);
}

Plus, you can easily see the readings for only the words you need, removing the distracting ruby text and preventing the furigana from becoming a crutch.

Here’s a recent convert and look at all the positive comments he’s gotten.

2. Make a list of the vocabulary at the beginning or end of the page so that the reader has something to refer to.

3. Suggest additional tools such as WWWJDIC, 理解.com, moji, and rikaichan so that people can learn to teach themselves. (You know, the whole teach a man to fish thing.)

Conclusion

I think the first method is good for static resources like my guide to Japanese grammar but when you don’t have the time to add readings and definitions manually all the time (like this blog), you can’t beat the third method. Plus, it helps your readers read any online Japanese text instead of just your own. In the end, whatever method you use, it certainly beats the hell out of writing this for every word that uses kanji.

<ruby>日本語<rp>(</rp><rt>にほんご</rt><rp>)</rp></ruby>

Ah!!! My eyes!!