Home
Search
 
What's New
Index
Books
Links
Q & A
Newsletter
Banners
 
Feedback
Tip Jar
 
XML RSS Feed
Tutorial: International Character Sets
This tutorial by Chris Wagg explains international character sets. The example program shows how to select different character sets.

 

Click here to download an example program.

If you want to display international text, all you have to do is assign it to a control's Text or Caption property, right? Wrong!

That method might work for most languages of the Western Hemisphere, but it won't work for the rest of the world. To properly display some international text, you need to manipulate the seldom used Character Set property (.Charset) of the seldom used Font Object (Standard OLE Types library). Using the Font Object is kind of tricky at first, since setting an unsupported property does not fire an error most of the time. Some programmers hate this, but it's beneficial: if you set a bad font property name or font size, the whole system could crash! The Font Object tries to be as accommodating as possible. The trick is to set a property, then verify the property was actually set.

While almost every object in VB has and supports font properties, the library is needed to properly set and return the Font's Character Set (not exposed anywhere else). Simply setting the .FontName property will not work in all cases. Using the Font Dialog (common dialog) has limitations too: while it allows you to pick the Character Set (thinly disguised as "Script" at the bottom of the dialog), it doesn't allow you to set or return the value without resorting to fancy API tricks (who has this trick? I want it!).

Character Sets, simply said, are display mappings between byte or byte values and the actual character glyphs seen on the screen or printer. Fonts are complex and beyond this topic, but most mappings are vector- or raster-based images or strokes. The "Western" Character Set is the default for most Win installations and includes, basically, North and South America and parts of Western Europe. Western includes most of the Latin-based Romance languages, such as English, French, German, Spanish, Italian, etc.

But what about all those other countries and languages that are not considered Western? There's provisions for Greek, Turkish, Central European, Russian (Cyrillic), Hebrew, Arabic, Baltic states, Chinese, Japanese, Korean, etc. Most likely there's a Character Set for almost every country, language, and written character. But the Character Set is not only a language or a country - it's a computer's representation of what the characters look like too: included are DOS, Symbol, and Mac. Why Mac? Since font technology was originally developed on Macintosh computers, Mac is included for some since many fonts were initially ported from Mac.

Included is a project to manipulate the Font Object's Character Set (see Charsets.frm). Once you understand and see the differences, you'll be on your first step to Internationalization. Mind you, Character Sets are simplistic and not the complete answer: for example, Russia has hundreds of different dialects and languages... Cyrillic is not used for all. Also Character Sets are only part of the puzzle: other pieces include Unicode characters (different flavors), Wide characters, Double-Byte characters; Locales and Code Pages and Sort Orders; text display methods (right-to-left versus left-to-right); IME handlers; encoding methods; etc.


Okay, I'll stop yaking and show you some real examples:

The "Western" character set is not ideal for some text:

    The Greek Alphabet:
    ÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÓÔÕÖ×ØÙ

    That's not Greek! Here's how to display it properly!

    The Greek Alphabet:
    ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
    αβγδεζηθικλμνξοπρςτυφχψω

    (Upper case+lower case)
    Αα, Ββ, Γγ, Δδ, Εε, Ζζ, Ηη, Θθ, Ιι, Κκ, Λλ, Μμ, Νν, Ξξ, Οο, Ππ, Ρρ, Σς, Ττ, Υυ, Φφ, Χχ, Ψψ, Ωω

    Alpha, Beta, Gamma, Delta, Epsilon, Zeta, Eta, Theta, Iota, Kappa, Lambda, Mu, Nu, Xi, Omicron, Pi, Rho, Sigma, Tau, Upsilon, Phi, Chi, Psi, Omega

Much better. All I did was change the Character Set to Greek. Very simple, if you know what to do. See the Charset.frm example for some code.

Now let's put all this in perspective for International programming. Here's some resource strings and the proper character sets to use:

    English~ (Western Character Set)
    What is this?

    Basque~ (Western Character Set)
    Zer da hau?

    French~ (Western Character Set)
    Qu'est-ce que c'est?

    Spanish~ (Western Character Set)
    ¿Qué es esto?

    Portugese~ (Western Character Set)
    O Que É Isto?

    German~ (Western Character Set)
    Direkthilfe

    Greek~ (Greek Character Set)
    Τι είναι

    Turkish~ (Turkish Character Set)
    Bu Nedir

    Russian~ (Cyrillic Character Set)
    Что зтотaкое


FAQs

Q: Are these all the Character Sets available?
A: No! There's ChineseBig5, ChineseSimplified, ShiftJis (Japanese), etc. I didn't include them since they didn't seem to work with this example. If you know the value, you can enter it directly into Combo box without modifying the code. For example, I think the value for ShiftJis is 128.

Q: The "Mac" Character Set doesn't seem to work. What's up?
A: Test and play with more than one font. Mac works on my system when using the fonts "Algerian", "Elephant", etc.

Q: I can't display all Character Sets shown? How do I?
A: Not every font supports every character set. Times New Roman, Courier New, Arial, and Tahoma seem to support the most character sets. Experiment and play.

Q: I have those fonts, but still can't display all charsets shown. Another way?
A: It depends on which version of MS Win32. Most should work on all versions. Newer versions have more support. Also it depends upon which versions of Office and IE are installed (underlying support files and fonts are added). Other software adds this support too.

Q: I don't have or don't want IE or Office. What's another way to get charset support?
A: Control Panel > Add/Remove Software > Win Setup > Multilanguage Support: add as many available languages as you want.

Q: I can display all or most charsets shown, but when I go to the Control Panel, it shows no Multilanguage support installed. What's up?
A: Like I said above, some software adds supports for the underlying files, like IE. Too bad it doesn't show up as being installed.


 
Subscribe to the VB Helper newsletter
Copyright © 1997-2001 Rocky Mountain Computer Consulting, Inc.   All rights reserved.
www.vb-helper.com/tut11.htm Updated