Sunday, September 12, 2010

Internationalization and Localization

With the release of PEBL Version 0.11, support for non-English character sets is greatly improved.  This includes better handling of non-english accent characters, and also non-western characters.

First, for many years, I've included the free Bitstream Font series called Vera.  This was used, hard-coded, by default in many of the test battery experiments, and in helper functions like GetSubNum().  What I never knew is that Vera has very poor handling of accented characters.  A project has started to create better international character support for Vera fontfaces--these are called the DejaVu fonts.  From the DejaVu website, they cover:

  • Latin (including European and African alphabets, IPA, ...)
  • Greek (including polytonic)
  • Cyrillic
  • Georgian
Following scripts aren't available in all the styles:
  • Armenian
  • Hebrew
  • N'ko
  • Tifinagh
  • Lao
  • Canadian Aboriginal Syllabics
  • Ogham
  • Arabic

For PEBL Version 0.11, I made a few changes to help support DejaVu. First, I created three global variables, called gPEBLBaseFont, gPEBLBaseFontMono, and gPEBLBaseFontSans.  By default, these are set to appropriate dejavu font names.  So, to create a new font, you can still just name the font in the MakeFont() command, but you can also use one of these filenames. Now, most accent characters in  European languages should work by default.

But suppose you want to use a different default font, maybe for a language DejaVu doesn't handle?  You can set gBaseFont to the name of that font file, add the file to the directory the script is in, and you can use that instead:
 gBaseFont <- "MyFont.ttf"

This will also change the font used for things like GetSubNum() and EasyLabel, which, which is handy.

Notice that DejaVu does not handle so-called "CJK" fonts: Chinese, Japanese and Korean.  PEBL also includes two specialty fonts to help support those fonts.  These are "fireflysung.ttf" which handles many Chinese characters, and "UnBatang.ttf", which handles Korean (hangul), Japanese (katanana, hiragan, kanji) and supposedly some chinese characters as well.  These take up a lot of room, so I'm looking for a single relatively small font that can handle. UnBatang seems to have some problems with Chinese though, as can be seen below:

It gets most of the characters, but has large squares for some chinese characters.  If anyone knows a nice GPL font that has good support of all major CJK fontsets, please let me know.

Setting language from the launcher or command line.

One of the command-line options added to Version 0.11 was --language.  Put a two-letter code after --language (or in the textbox in the launcher),  and two things happen.  The code you entered will be accessible via the gLanguage global variable.  Also, if you use do 'ko' 'cn' or 'jp', a useable font name will be used for gPEBLBaseFont.

In some of the test battery tests, it uses the two-letter gLanguage code to select the proper language.  Currently, this works for BCST, Bechara's gambling task, and the tower of London test--however, the language support for these tasks is currently minimal.  Initially, user-contributed translations exist for Polish, Chinese, and Korean.  Here's how to help:

in the iowa/ and bcst/ folders, there is another folder called 'translations'.  In there, there is the default translations for two versions of the task (keyboard and mouse-driven).  Open these in a text editor, and save it in the same folder, using your own 2-letter keycode.  Translate all of the text in the file.  One thing you should be aware of---line breaks are important in this file.  Only put line breaks in the file at the points where they are in the original file, even for long instructions.

Translating a new experiment
Only a few of the Test Battery tests have translation files like this.  For others, I can modify them to support translation if there is a demand, but the first thing to do is to edit the .pbl file you care about, translating the instructions and feedback labels.  If you send it to me, I will create a way to select the proper language.  Make sure you save as UTF-8 if there are extended character sets used.
Post a Comment