2008-04-22

OOXML - 慢追、慢追、慢慢追……

你不知道OOXML?有沒有裝MS Office 2007?(放心,我不會問是不是合法版本,我知道你最喜歡違法使用較爛的付費軟體……)那你知不知道微軟(經過一些不正常的管道)讓這個新的檔案格式變成ISO標凖?高興,對不對?他們很棒,是不是?

不過,如果他們那麼棒,為什麼他們也沒有辦法按照自己的規格?(6000頁“而已”)檢查Office 2007做出的檔案,有122,000個錯誤訊息,總共17MB!看起來,這個星球上還沒有任何一個軟體會按照這個標凖,連那麼強烈推薦的微軟不會支援自己的規格。

“好玩”的是,他們應該要把OOXML當成ISO標凖之前先要檢查這種事情,不是已經通過之後。現在他們“希望微軟會修改Office 2007”。我也希望明天會有人幫我洗我的摩托車。我還不知道那個人為什麼需要幫我洗,但是“我希望”……

2008-04-21

Poot, poot, poot, poot, pootle...

I doubt that anyone can help me with this (as usual), but I try it anyway, so at least you can see what entertaining problems are brightening my day: I have been trying to set up a Pootle server for a while. Basically, Pootle is a very nice software, allowing very comfortable (though not as stylish as launchpad) software localization. But...

Pootle comes along with its own web server. Although it is written in Python, setting up a Pootle server is not done by copying a few files into a web server's hunting grounds. (though that should be possible somehow) I couldn't get its requirements fulfilled on Ubuntu 7.04, but it was in the repositories in 7.10, so I set up a machine with that. That machine also runs ISPConfig, supposed to ease account management.

There is a bug in pootle's package in Gutsy, and the recommended fix did not work for me, but at least Pootle would run... That machine is running Apache, plus a second Apache on port 81 for ISPConfig. Now Pootle comes with its own server on port 8080, and it wants to be executed by root.

Originally I wanted to "put" pootle into a user account, but the docs are a bit "fuzzy" here: The mentioned scripts and archive files do not exist on Ubuntu, and there is no hint what files I really need and how I could collect all necessary files manually. This is one problem.

Another problem is that I need to redirect port 8080 from that account (if I get it to work one day) to port 80, for which I probably need mod_proxy. The pootle docs seem to suggest a forward proxy (though with a few settings not recommended by the mod_proxy docs), while in my understanding a reverse proxy should be used. But again, no idea how to set that up properly...

So why I am creating myself all that trouble? Short answer: Because I've gone mad. Long answer: I would like to set up a localization site. I would like to use it for localizing Postnuke and some other FOSS. I know that my students will not be interested in this, but rumours claim there are people outside our school... And with localized Chinese versions, a few nice programs may become a bit more popular here in Taiwan...

So, you can't help me either? Never mind, I got used to that, I'll see that I get it done myself...

2008-04-20

Lifelong learning - but not for me!

A while ago I was not amused about a few things happening around me. This actually happens quite often, but most times I just try to ignore it. That time however things went a bit beyond that thin red line that should not be crossed and I was not very calm when I wrote an article regarding all that. I have calmed down meanwhile, and as usually I try to see things from their funny side. So I deleted the old article and wrote this new one...

I am not sure if you have heard of "lifelong learning". It is all the rage in Taiwan now - at least officially. The government supports it, because they hope that way they will get wiser citizens. The education industry supports it, because it means more money for them. 終身學習 is the term used in Chinese, and a slightly different, but more sarcastic translation could be "learn until you die". (death by learning?)

Being immersed in the education industry, I find it remarkable how members of this industry, while urging everyone else to "learn" (whatever that means) until their last penny is spent, they themselves refuse to learn even those things they would need to properly do their job, not to mention anything else. Let me give you an example.

Almost every person working in an administrative office in our "organization" demands to have a computer on their desk. (or right under it, to save space and collect more dust) But it is almost impossible to find any user of an administrative computer willing to learn anything related to the use of that computer. This approach has a number of advantages for the user in question:

- Getting something done takes longer, so the person seems very busy while not actually doing a lot.
- A lot of work can be rejected by simply saying "I don't know how to do that." or "That's computer-related work. That's not our responsibility."

So when an office has to key a lot of student informations into the administrative system, they can come and say "Hey, we may be responsible for these student data, but we are using a computer to key these data in, we are keying them into a database on a network, so all of these computer thingies clearly show: This is something the computer centre should do! We need the CC to support us, they should key this in!"

...and their wish was fulfilled... I am not joking here, I was there when that happened and the guy sitting next to me at the CC had to key in all those data. Cunning, isn't it? Do you now understand why it is so important not to learn?

You might think that those users will at least know basic computer usage, so at least their daily work gets done. Right... That's why they file a trouble ticket with the CC, requesting a student helper to be sent over to install "Office", because their computer is missing Word, Excel and Powerpoint. The problem is, all administrative computers got their software installed by copying an image to their harddisk, and that image contained both OpenOffice and Microsoft Office.

So our student helper went over to their office, clicked on "Start - Programs - Microsoft Office" and there they were. "Oh, there they are..." Right, why should you click on "Start" if you want to start using that computer. And how should you know that after only eleven years? Microsoft introduced this menu only in 1995, with Windows 95. If nobody told you, how should you know? But of course you "can" Windows...

Windows... So, supposed you have a computer only used for playing audio and video files, which are usually brought along on flashdisks, what would you have to do to play such a file? Start the computer, plug your flashdisk in, select the action you want, and then doubleclick on your file. Hmm...

How about a computer running Linux? Linux would have the advantage of offering desktops in multiple languages, so people who do not understand Chinese are still able to navigate around and (even more importantly) can read system messages. What steps are necessary here? Pressing the power button, plugging your flashdisk in and then (Gnome will automatically show the contents of your flashdrive) doubleclick on your file.

That's three simple steps, two of them would be necessary on any computer anyway. But would you believe how much energy people are willing to invest into a fight so they don't have to "use" Linux? ("Use" here means the three steps above. This is not an administrative computer, it belongs to an AV lab.) And the reason they give? Well, they don't want to "learn another system". Of course... Learning to click onto an icon is very difficult indeed and takes years of practice...

Now, you may ask if I shouldn't give them the "age bonus", because learning becomes more and more difficult at higher ages. Indeed I would, but... What if the people refusing to plug their flashdisks into a Linux machine are younger than me? What if they are considered "highly educated" and "intelligent"?

My mother (who naturally is even a bit older than me) is in contrast not considered "highly educated". But strangely, she never had trouble using HC CAOS, Amiga Workbench, DOS, Geos, Windows (in various incarnations), Linux (with all kinds of desktops, including fvwm2), OS/2 or now OS X. If she wasn't more than 9000km away, maybe I could ask her to "teach" some icon clicking around here...

But why do I worry, when some Taiwanese do not even need to be able to write Chinese on a computer. Yes, that's right. "I can't write Chinese on a computer. But I want one. And I want a student helper to do all the work." And this too was granted...

So, the next time someone tells me I have to use Zhuyin IME, I will just smile and leave that person standing there. After all, a lot of people around here want other IMEs installed, always claiming "This is the only IME I can use." If you can, so can I.

To be honest, in some way I can understand the "I do not and will not understand this" fellows. When I started administrative work at the CC right when I started teaching at this organization, I knew barely that you needed to tell your computer where to find a DNS server, and I had heard of proxies. But other than that I could only fix the "usual" computer (mostly hardware and Windows) problems.

But I had to learn. Because nobody else around me knew, I had to understand DNS. (That's when I found out how crappy Bind is.) I had to understand proxies, Apache and more. And with every new piece I learned, my work (but unfortunately never my pay) increased. I tried to draw the line at web sites - and "failed". Would you believe that 4000 Taiwanese (some of them with a sheet of paper claiming they are experts on this field) expect one German interpreter to set up a web server and create one (or more) web sites - including content, of course, and that content has to be in Chinese and has of course to be made out of thin air. So one German is expected to write content in Chinese for 4000 Taiwanese who do not quite know how to do this themselves? Yes, I too found that quite interesting at that time...

OK, so because it is text that will be used on the web, people can not write it - because that's a "homepage", not just some regular text, and "I can not homepages." Right. But they should be able to create documents properly, with fields, paragraph styles, perhaps even as a document template. Or not?

At one point, the CC planned to do training "for OpenOffice". We had convinced the people in charge that OpenOffice was good for us, because it was free of cost. I know, I know, there are more costs involved. But what do you expect from people buying bubble jet printers "because they are cheaper"? So we (I) did some text processing training, disguised as "OpenOffice training".

Every office had to send at least one delegate. (order from high above) Some sent a student helper, because their own time was too precious. I had 33 names on the list, 17 showed up the first time. There were five sessions, and the fifth time my audience consisted of: three people, one of them a student helper.

Well, maybe they all knew about all that already? Hmm, then why have I not seen a single document created by any of our offices using a field or styles? Maybe it's because of the weather, or because of the mosquitoes, but maybe it's just because nobody knows how to do it.


Apologies if you expected something constructive or helpful in this article, but I did state in the beginning that this was a rant... OK, feeling a bit better now, time for something constructive again...

2008-04-16

翻譯軟體: OmegaT (翻譯記憶)

你雖然可以花多一筆錢買Trados,但是也許你先要了解這種軟體好不好用,所以一開始不想花太多錢。或者你是學生,根本沒有太多錢可以花。或者你發現,自由軟體其實蠻不錯,本來就做得到你需要的。

反正,我在這裡想介紹OmegaT,一個我認為真不錯的翻譯軟體。它是Java軟體,所以你電腦上沒有裝Java,你沒有辦法執行它。如果你不確定你電腦有沒有裝,你可以下載包含Java的套件。目前的官方版本還沒有安裝程式,你只要下載ZIP檔,然後把裡面的檔案解壓縮到適合的地方。在Windows,你要點OmegaT.bat啟動它。

啟動之後,我們先要建立新的方案(project)。一個方案只適合用在一個語言方向(例如英翻中),但是可以包含幾個文件。一個方案的文件會共用翻譯記憶跟詞庫。建立方案的時候,軟體想知道一些設定:


最重要的是原語言跟目標語言的設定。因為一個句子包含比較完整的意思,我們最好也要選sentence-level segmenting。(等一下就能看到這是什麼,先相信我,OK?)你先最好不要進去"Segmentation" - 這個也請你相信我。那邊都已經設好,需要修改的可能率很低。而且,你最好先多了解這個軟體才要考慮修改這些設定。

你會發現OmegaT在方案的目錄下想建立另外一些目錄。平常沒有必要改這些。在source files有原來的(原語言)文件,這些不會被修改。Translation memory是翻譯記憶檔案,而在glossary可以自己放詞庫檔案。我們翻完的文件會在translated files (target)裡面出現。設完之後,我們的方案當然需要文件。


我已經加一個原語言文件:pigs.txt。(我猜你應該知道這個故事。等一下你會了解為什麼我這裡要用它。)OmegaT能處理普通的文字檔、HTML、ODT,而最近好像也開始支援XLIFFOOXML。 (Grrrrrrr...)


主要的視窗之後可能變成如上的樣子。好像不太精彩:沒有3D,沒有敵人,沒有其它槍可以選... 而且,fuzzy matches視窗跟glossary視窗(目前看不到)也都沒有東西,我是不是還要自己動手翻?

是。一開始,這種軟體對你不會是很大的幫助。但是你用它越久,它幫助越大。目前我們最好先高興地發現它在原語言的文件裡好像有把標題獨立出來。標題就是第一個segment,第一個分段。Segment 0001 跟 end segment之間我們可以寫目標語言版本,上面(綠色)都會有原語言版本。


這裡可以看到分段裡面,原語言跟目標語言的句子都在。我寫完目標語言版,現在可以刪除原來的句子。我其實也可以一開始直接刪除,因為上面會有無法刪除的原語言版本,但是我個人比較喜歡翻完才拿掉原來的句子。不過,到這裡好像還是沒有特別的“動作”,軟體還沒有幫我任何忙。好吧,繼續翻...


Ooops,軟體終於睡醒了!這裡發生了什麼事?我目前翻的句子比較像我之前翻的另一個句子。OmegaT認為之前的句子像目前的44%,籃色的詞是跟目前的句子不同的地方。Hmm,然後呢?

然後我有選擇:我可以不理軟體的建議。(別忘記你有這個選擇。)我可以把之前的目標語言句子插入目前的分段。(Ctrl-I) 或者我可以把目前分段裡面的原語言句子換成之前的目標語言句子。(Ctrl-R)

不過,44%還不是很高的數字,而如果我們看籃色的部分多大,不管我們決定哪一個選擇,都還需要一些手工。但是我們其實也剛才開始,所以就繼續翻吧!


Voila! 這個比較有趣。OmegaT發現已經有兩個句子像目前要翻的。最像的當然排第一名,因為如果我要用記憶裡面的句子,它應該是第一個選擇。只有兩個詞不同,像度80%,相當高。我個人這次會選取代,按Ctrl-R,把第一個目標語言句子放進分段,然後修改那兩個不同的詞。

也許你現在了解為什麼我要剛好用這個故事:這裡有一些重復的句子,只有少數的詞不同,所以我們這麼快就可以看到效果。如果我用OmegaT更久,它會記住更多我翻的句子,給我更多,更方便的選擇。此外:軟體雖然會幫你記住很多句子,但是我也想建議你記住它的快速按鍵。這樣工作才真快...

到現在我們好像一直沒有看到glossary(詞庫)這個視窗。其實,OmegaT可惜不是完美的軟體。這不能算是錯誤,只是使用者介面的問題,但是... 你可能不小心把三個視窗其中一個或兩個關掉,然後不知道怎麼再叫回來,所以可能沒辦法正常工作。放心,只要到Options - Restore Main Window,它就會還原視窗的排版。


現在我們終於也可以看到glossary視窗,而它好像也有意見。OmegaT可以使用一個詞庫,一個簡單的辭典。這種檔案原則上是簡單的文字檔,每一行有一個原語言的詞跟一個目標語言的詞,中間用tab隔開,像這樣:


這種檔案的名稱可以自己定,但是副檔名要用"utf8"。如果你把這個相當小的詞庫跟上面glossary視窗的建議,還有剛要翻的句子比較的話,你應該會發現軟體沒有提到bricks的翻譯。原因很簡單:我的詞庫裡面只有單數的brick,但是句子裡面有復數的bricks。所以,最好要注意這一點。(雖然你的母語不會特別分...)

我當然可以中間停,把軟體跟電腦關掉。下次啟動OmegaT我只需要開之前的方案,它就會告訴我每一個文件的進度:


翻完一個文件,你要到Project - Create Translated Documents,軟體就會在目標語言的目錄(target)裡輸出目標語言的文件。原來的文件一樣還在source目錄裡面,沒有被改過。

這樣好用吧!而且,你不用花幾萬塊的錢...

2008-04-12

Conference fun (NKFUST)

It's always fun to see a school with "technology" in its name running into problems with said technology, our school (After almost seven years, General Affairs is still unable to let the bell ring on time - or at all...) is no exception. But this time the topic was translation and interpreting (even though they called it "interpretation"), not technology in language teaching, so let's ignore that.

This was the first time some of my students went with me, and although one of them had pretty high expectations and was a bit disappointed later, I still think the day was not wasted for them. I was in fact surprised to see so many students on the list, even from more northern schools. (Though, funnily, they had not put my name on that list...)

There were bits of theory presented that were not too closely related to my job, and maybe next time I should really prepare a camera and snap photos of attendees sunken into deep self-reflections (as long as I am not too busy self-reflecting myself), but there were a few presentations that kept even my students very awake.

Something I really like about conferences at NKFUST: Foreigners are not considered idiots speaking only their native language, and people do not care what language is used in presentations, questions and answers. If you want to attend such conference or a certain presentation, you have to be able to understand more than one language, otherwise you're at the wrong party, period.

2008-04-07

翻譯軟體: 翻譯記憶 (Translation memory)

在台灣,如果有人提到“翻譯軟體”,大部分的人應該會想到一個軟體:Dr. Eye。它不錯 - 當一個辭典。而它就是一個辭典,電子化的辭典。我幾年前也用過,那時候發現技術方面(例如電子)的詞還蠻多,蠻正確。

當一個電腦上的電子辭典,Dr. Eye可以提供一些“方便”:自動地把文章的詞翻成另一個語言。因為有這個方便,一些教翻譯的老師們好像不太喜歡Dr. Eye。他們覺得,這樣不是學生翻的。不過,Dr. Eye也只是一個方便一點的辭典,它沒有辦法取代人。

在我的周邊,不少人(老師們跟學生們都一樣)雖然不這麼認為。他們覺得只要有一個“比較好的翻譯軟體“,就會變成一個很勵害的翻譯!夢想... 如果真的可以,那為什麼還要教筆譯?電腦技術還沒有那麼發達。如果你自己不行,你用什麼軟體也沒有關系。

但是(電子)辭典不是唯一對翻譯(還是該說“筆譯”?)有用的軟體。我認為更重要的是“翻譯記憶”(Translation Memory)。你的老師沒有提過這種東西?Hmm,(在台灣)教微控制器的老師們好像也比較少會提到8051之外(例如AVR、 MSP系列)的晶片。不過,我現在跟你介紹這個東西,然後你自己決定要不要用。

翻譯記憶不會讓你覺得自己突然變成超人。如果我不懂一個詞,我可以查辭典,它會告訴我另一種語言的版本。所以,我突然“知道”一件之前不知道的事情。不太懂的人當然根本不會發現辭典剛給他的版本完全不對,他還是有“超人感”。翻譯記憶不會這麼騙你。

如果你想用翻譯記憶,你必須在這個軟體裡面翻你的文章。一開始,你不會發現任何好處或幫助,但是軟體一直在學。它會記錄你翻的每一個句子 - 原語言跟目標語言。然後,它一直會比較你目前翻的句子會不會比較像它之前記錄過的句子。

如果軟體發現比較像的句子(平常也會告訴你像百分之幾),它會提醒你之前有翻過的句子跟翻譯版。你現在可以自己判斷要手動“模仿”同樣的部分,然後把不同的地方再翻或者用軟體提到的句子直接取代目前的句子 - 而後來修改可能需要修改的地方。

你用這種軟體越久,它越有幫助。它會“學”你的風格:你翻錯,它也“學錯”。不過,不用但心,你後來當然還可以教它正確的版本。那為什麼我會覺得這種軟體對筆譯重要?它不會做你自己不會做的,但是它會讓你的工作越來越輕鬆。

而且,如果你要翻同樣題目的文章(不管是技術文件或小說系列),軟體會知道你之前把一些“特殊“(也需在辭典裡根本找不到)的詞、詞組或句型翻的如何,所以你這次可以用提同樣的方法。如果你真的了解翻譯,你應該知道需多時候可以用幾種方式翻。翻譯記憶幫你避免每次在目標語言用不同的版本。

你現在可能問為什麼這種軟體要處理完整的句子,而不是像Dr. Eye跟你提每一個你之前翻過的詞。原因很簡單:不管是口譯或筆譯,我們翻的不是一個一個詞,做翻譯的時候我們把人在一種語言表現的意思轉到另外一種語言。在目標語言我們可能要用不同的詞,甚至完全不一樣的句型,重要的是意思。而最小的,提供比較明確意思的單位是句子。

講了這麼多,我好像還沒有提出任何具體的軟體。好吧:如果你有錢,或者你真的要靠筆譯生活,你可以考慮買Trados。很多“專業”的筆譯在用它,但是它真“不便宜”。在台灣,我聽過超過五萬的報價,包含教育訓練。而對“社會組”的人來講,不要花教育訓練的錢,你也不用花軟體的錢 - 你不會知道怎麼正確使用它。

不過,這個軟體不管多好,它還是有一些限制:你必須用Windows,你必須用Word。雖然很多“社會組”的成員因為不太了解技術相關事情,所以就用廣告叫他們用的軟體(Windows跟Word),所以還有另外一些這方面的軟體:Wordfast (shareware)跟Wordfisher (freeware),但是我還是比較喜歡不會那麼限制我的軟體。

如果你想用免費、公開、多平台的軟體,你可以參考OmegaT。它是獨立的軟體,可以處理幾種檔案格式。如果你想多了解它,你可以自己摸摸看,或者等我下一篇文章寫完...