Sunday, September 12, 2010

Internationalization and Localization

With the release of PEBL Version 0.11, support for non-English character sets is greatly improved.  This includes better handling of non-english accent characters, and also non-western characters.

Fonts
First, for many years, I've included the free Bitstream Font series called Vera.  This was used, hard-coded, by default in many of the test battery experiments, and in helper functions like GetSubNum().  What I never knew is that Vera has very poor handling of accented characters.  A project has started to create better international character support for Vera fontfaces--these are called the DejaVu fonts.  From the DejaVu website, they cover:

  • Latin (including European and African alphabets, IPA, ...)
  • Greek (including polytonic)
  • Cyrillic
  • Georgian
Following scripts aren't available in all the styles:
  • Armenian
  • Hebrew
  • N'ko
  • Tifinagh
  • Lao
  • Canadian Aboriginal Syllabics
  • Ogham
  • Arabic


For PEBL Version 0.11, I made a few changes to help support DejaVu. First, I created three global variables, called gPEBLBaseFont, gPEBLBaseFontMono, and gPEBLBaseFontSans.  By default, these are set to appropriate dejavu font names.  So, to create a new font, you can still just name the font in the MakeFont() command, but you can also use one of these filenames. Now, most accent characters in  European languages should work by default.

But suppose you want to use a different default font, maybe for a language DejaVu doesn't handle?  You can set gBaseFont to the name of that font file, add the file to the directory the script is in, and you can use that instead:
 gBaseFont <- "MyFont.ttf"

This will also change the font used for things like GetSubNum() and EasyLabel, which, which is handy.

Notice that DejaVu does not handle so-called "CJK" fonts: Chinese, Japanese and Korean.  PEBL also includes two specialty fonts to help support those fonts.  These are "fireflysung.ttf" which handles many Chinese characters, and "UnBatang.ttf", which handles Korean (hangul), Japanese (katanana, hiragan, kanji) and supposedly some chinese characters as well.  These take up a lot of room, so I'm looking for a single relatively small font that can handle. UnBatang seems to have some problems with Chinese though, as can be seen below:


It gets most of the characters, but has large squares for some chinese characters.  If anyone knows a nice GPL font that has good support of all major CJK fontsets, please let me know.


Setting language from the launcher or command line.

One of the command-line options added to Version 0.11 was --language.  Put a two-letter code after --language (or in the textbox in the launcher),  and two things happen.  The code you entered will be accessible via the gLanguage global variable.  Also, if you use do 'ko' 'cn' or 'jp', a useable font name will be used for gPEBLBaseFont.

In some of the test battery tests, it uses the two-letter gLanguage code to select the proper language.  Currently, this works for BCST, Bechara's gambling task, and the tower of London test--however, the language support for these tasks is currently minimal.  Initially, user-contributed translations exist for Polish, Chinese, and Korean.  Here's how to help:

in the iowa/ and bcst/ folders, there is another folder called 'translations'.  In there, there is the default translations for two versions of the task (keyboard and mouse-driven).  Open these in a text editor, and save it in the same folder, using your own 2-letter keycode.  Translate all of the text in the file.  One thing you should be aware of---line breaks are important in this file.  Only put line breaks in the file at the points where they are in the original file, even for long instructions.


Translating a new experiment
Only a few of the Test Battery tests have translation files like this.  For others, I can modify them to support translation if there is a demand, but the first thing to do is to edit the .pbl file you care about, translating the instructions and feedback labels.  If you send it to me, I will create a way to select the proper language.  Make sure you save as UTF-8 if there are extended character sets used.

15 comments:

Taha Yusuf said...

Hi Sir,
I would like to thank you for the amazing application.
I am an Arabic user, how can I use the application?

Taha Yusuf said...

Thanks

Shane Mueller said...

Taha,

I did a bit of testing, with somewhat promising results (but not ideal). Arabic is right-to-left, and the pebl text layout function doesn't know this, but the current 0.11 should handle the fontface properly. Try saving the following to a text file and running it:

define Start(p)
{
win <-MakeWindow()
arabic <- "هذا هو الاختبار. واحد اثنين ثلاثة"
label <- EasyLabel(arabic,200,200,win,22)
Draw()
WaitForAnyKeyPress()
}

(the arabic was translated from google). In pebl, the arabic turns up backward, because it only handles rtl layout. So, if you are able to reverse the text in the translated strings, the result should turn out fine, but I can't know for sure. This is not ideal, obviously, but it is workable. The only really problematic part is for text entry, which only a few of the battery tests use other than for entering a subject code, and maybe that is something that you can deal with. For the next verion 0.12, I'll try to include a text reverser function that will make for an easier workaround.

Taha Yusuf said...

Shane,

Many thanks for your reply.
I tried this actually, but i got small squares instead of text. I can handle it if it needs from left to right, but the problem is it doesnt get the output. I think it may be fonts issue, so i tried to copy some arabic .ttf files but still have same problem.

Shane Mueller said...

This worked for me in Linux using 0.11, although for earlier versions I got the squares issue as well, so make sure you are using 0.11 (for earlier versions, the default font did not support arabic glyphs). DejavuSans.ttf should support arabic though, and you aren't able to get things to work with arabic fonts. It could be that you are saving the file using some unicode format PEBL can't handle. Make sure you save as UTF-8, not just 'unicode' and not UTF-16. I won't be able to test on windows for another week or so, but I'll try some things then.

Taha Yusuf said...

Many Thanks, I think this will help me so much.

Shane Mueller said...

Taha,

I was able to make arabic display properly. You need to use UTF-8, not just unicode, which is typically utf-16.

You can download an example at

http://obereed.net/arabic.zip

Shane

Taha Yusuf said...

I will try
Thanks again

Taha Yusuf said...

Shane,

Thank you so much for your help, I ran your example and it worked for me from left to right like you said, but actually I need to change some already made test by pebl.
I need to translate Card Sorting test but when I try to change the test into Arabic it gives me undefined symbols.
Finally what do you mean by "You need to use UTF-8, not just unicode, which is typically utf-16. "? How will I add UTF-8 ?

Thanks and sorry if i am bothering.

Shane Mueller said...

The fact that you got the test program working is a great start.

First, try translating one of the files in the translations/ subdirectory, and send me the file when you are done. It is likely (because the text probably needs to be 'reversed' on a line-by-line basis, and not the whole thing) it will eventually need to be imported back into the original cardsort.pbl file. Another option for might be to translate just the labels and other 1-line things, and make screenshots of the more complex instructions from a word processor. You should probably email me directly from this point on, and I'll try to help you to make something that works.

Unknown said...

Dear Shane,

I would like to translate the BCST and Stroop into Slovak language, and than use in research. But if I use non-English characters, the result is just squares etc. Even the manual did not help me to figure out how to make DejaVu fonts default. Can you provide som description, rather step by step? Thank you very much in advance

Ivan

Shane Mueller said...

Ivan,

I'm assuming you have already found the labels-en-mouse.txt files for the BCST, and are translating that. It probably needs to be saved in UTF-8 for PEBL to display it correctly--I'm guessing that is your problem. Send me a copy of your current translation file and I'll probably be able to help (my email is in the cardsort.pbl file).

For the stroop task, it is complicated to translate for a number of reasons, including the fact that it looks for matches to the name of the color, which will change in different languages. I intend eventually to make it possible to translate, but I'm not sure when I'll get to it. I have a new version that is easier to translate, and is a much better test overall, and it will be released in the next release. Again, if you contact me, I'll send you a copy and give instructions on how to do it.

Shane

Unknown said...

Dear Shane,

I would like to use the ANT test and I have two small problems. First, there is a character in the task (a "star" that cues the location of the target stimulus) that's not displayed properly during the actual task. During the instruction phase i see the star but later during the practice phase and test phase i see some other weird characters instead of the star. I guess it has to do something with coding, i have no idea what to do though. Second, i would like to use the script with Hungarian instructions. I translated the texts and labels and saved it in UTF-8 format, but i couldn't find your email adress to send it to you as you suggested.
Is there a fast way to do the translations and to correct for the 'star' character?

Your help is very appreciated.
Bests,

Emese Hallgató

Unknown said...

(i just successfully translated the script and i am using a different character in UTF-8 instead of the star. now everything works properly. No need to do anything for me.:)

bests,
Emese Hallgató

Unknown said...

Emese,

Please send your translated script to pebl-lists@lists.sourceforge.net. This will make it available to future users.

Shane