By
Adam Skutt (mailto:askutt@wnec.edu),
Stephan Windischmann (mailto:windi@arslinux.com),
Amit Gurdasani (mailto:amit@arslinux.com),
Joe Sweeney (mailto:joe@hopelost.net)
We're back. Did you miss us?
You shouldn't in the future, as we strongly believe we have the
infrastructure in place to dish up fresh servings on a weekly basis.
This week,
Linux.Ars looks at internationalization and localization of the Linux desktop,
something at which the system shines, as well as Ghost for Unix, a portable hard
drive imaging program. Additionally, everyone's favorite retailer is getting
even deeper into the Linux game.
Several of
the GNOME Project (http://www.gnome.org/)
servers were compromised last week, leaving various services unavailable. All
critical GNOME web sites and the main FTP archive are running again; only minor
sites, such as
art.gnome.org, still remain unavailable. As a result of this, the release of
GNOME 2.6 was
delayed until today (http://www.gnomedesktop.org/article.php?sid=1713&mode=thread&order=0&thold=1),
even though no code has been compromised. The initial discovery of the intrusion
is detailed
here (http://mail.gnome.org/archives/gnome-announce-list/2004-March/msg00113.html).
Updates about the intrusion can be found in this
post (http://mail.gnome.org/archives/gnome-hackers/2004-March/msg00019.html)
to the gnome-hackers mailing list.
The world's
largest retailer,
Wal-Mart (http://www.walmart.com/),
has begun selling Microtel PCs bundled with
Sun Microsystems' Java Desktop System (http://wwws.sun.com/software/javadesktopsystem/),
Sun's Linux distribution. There are several models available, ranging from
US$298 to US$698. The US$398
Microtel SYSWM8003 (http://www.walmart.com/catalog/product.gsp?cat=3951&dept=3944&product_id=2592735&path=0%3A3944%3A3951%3A41937%3A86796%3A132690)
comes with an AMD Athlon XP 2400+ processor, 128MB of memory, a CD-ROM drive, a
40GB hard drive and Sun's StarOffice software suite, but no monitor. The US$698
SYSWM8006 (http://www.walmart.com/catalog/product.gsp?cat=3951&dept=3944&product_id=2592739&path=0%3A3944%3A3951%3A41937%3A86796%3A132690)
has an Intel P4 processor, 256MB of memory, an 80GB hard drive and a
CD-RW/DVD-ROM combination drive. It should be noted that these are not the only
Linux PCs that Wal-Mart sells, as it also ships PCs with
LindowsOS installed (http://www.walmart.com/catalog/product_listing.gsp?cat=96356&path=0%3A3944%3A3951%3A41937%3A96356)
and
Lycoris Desktop/LX installed (http://www.walmart.com/catalog/product_listing.gsp?cat=106560&path=0%3A3944%3A3951%3A41937%3A106560).
Wal-Mart seems determined to be the lowest-cost PC retailer around, and if they
can convince customers that not having Windows XP is no problem, they could be
the ones spearheading the adoption of Linux on the desktop.
With
software and hardware getting cheaper and easier to access, computing is
becoming increasingly international in scope, with an increasing demand for the
ability to compute in non-English languages and non-Roman scripts. The past few
years have seen releases from commercial operating system and productivity
software vendors gaining support for input, display and printing compliant to
national standards for scores of locales. Fortunately for us, Linux has
excellent multilingual support.
Internationalization (i18n, for I–18 letters–N) and localization (l10n, for L–10
letters–N) are terms used to describe the typical efforts involved in getting a
piece of software to speak different languages.
Internationalization refers to the ability of software to deal with input and
output in various locales, so that the software will provide an interface
to the user that is capable of handling characters corresponding the language
used in the user's locale, and items such as date and time formats, digit
grouping, currency units, units of measurement and the like will correspond to
the standard uses in the locale.
Localization
is a related concept. It refers to the ability of software to provide a user
interface in the language specified by the locale. Usually, this is accomplished
by translating all the text that the software presents into the languages that
the software supports, and depending on the locale, choosing the appropriate
translation to present to the user.
A locale
usually encompasses the specific dialect of a language used in a region (often a
country), occasionally specifying the character set used for the script,
which standardizes the representation of the alphabet, numerals, diacritic marks
and symbols used in text written in the language.
Increasingly, the character set of choice is Unicode. Certain Unicode-based
encodings are more popular (mappings from the machine numerical representation
of a character to the textual representation of the character; not necessarily
the actual glyph displayed, for glyphs can result from the combination of
letters, diacritics and the like), such as UTF-8 (a variable-length encoding
whose lower-order code points are similar to the ISO 8859-1 Latin 1 character
set used for most Western European languages) and UCS-2 (a 16-bit encoding of a
subset of Unicode used pervasively by Windows NT and derivatives). In Linux, the
most popular Unicode encoding is UTF-8. Other encodings tend to be popular in
certain locales; for instance, in the US and many Western European nations, ISO
8859-1 (Latin 1) and ISO 8859-15 (Latin 9) are popular; in Taiwan and China, the
Big5 and GB2312 encodings are widely used; and in Japan, the EUC-JP and
Shift-JIS encoding are frequently used. The reasons for using non-Unicode
character sets are varied; for instance, the national encodings may be richer
than the Unicode representation of the script, or the use of the character set
may be deeply entrenched.
In Linux,
there is no standardized method for developers to internationalize or
localize their applications; the method used depends on the user interface
chosen, licensing, etc. of the software. For instance, frequently, GTK+ and
GNOME applications use the GNU gettext library (LGPL-licensed), which is a
convenient framework for incorporating and maintaining translations of the text
used in the application into various languages, and the Pango library
(LGPL-licensed) in order to lay out text in the Unicode character set.
Applications using the Qt widget toolkit can use Qt's built-in means for dealing
with translations, or (in the case of applications using the KDE framework) can
use gettext. Applications such as MULE for XEmacs have their own mechanisms for
internationalization and localization. Conversions between encodings can be
accomplished by the use of the iconv library (LGPL-licensed).
However, as
far as end users are concerned, things are much simpler. On a system-wide
scale, the locale can be set by fiddling with a number of environment variables
in the configuration file of your favorite shell (e.g. /etc/bash.bashrc for
bash, /etc/csh.cshrc for csh and tcsh, /etc/zshenv for zsh, /etc/profile for sh,
ksh and pdksh, and so on) and in configuration files for various components,
such as /etc/gdm/gdm.conf for the GNOME Display Manager (the graphical login on
GNOME systems). On a per-user scale, these settings can be made in your shell's
configuration file, e.g. .bashrc for bash, .cshrc for csh/tcsh, .zshrc for zsh,
and so on. If you log in graphically, it might also help to set it in your .xsession
(graphical login script) if you have one. There are a number of available knobs
to turn:
Usually,
just setting the LANG (for many applications) and LANGUAGE (for software such as
GNOME) is sufficient.
# My language is Spanish as is written in the U.S., using the Unicode
# character set, in the UTF-8 encoding.
LANG=es_US.UTF-8
LANGUAGE=es_US.UTF-8
export LANG LANGUAGE
The locales
you intend to use must be generated first; to do this, you edit /etc/locale.gen
and run the locale-gen utility. Here's a sample /etc/locale.gen:
en_USISO-8859-1
en_US.UTF-8UTF-8
es_US.UTF-8UTF-8
Running
locale-gen results in this output:
root@athena:~# /usr/sbin/locale-gen
Generating locales...
en_US.ISO-8859-1... done
en_US.UTF-8... done
es_US.UTF-8... done
Generation complete.
In order to
configure the system for text input in a certain language using a particular
keyboard layout, it is possible to use the XKB framework with XFree86 via the X
Keyboard extension. To do this, you can edit your XF86Config or XF86Config-4
file, usually found in /etc/X11 or /usr/X11R6/lib/X11. Alternately, you can use
the setxkbmap tool.
There are
various XKB settings that can be set:
There are
other settings; for more information, see the
XFree86.org documentation (http://www.xfree86.org/current/XKB-Config.pdf)
on XKB. The available choices for these and other settings can be found in the
file /usr/X11R6/lib/X11/xkb/xfree86.lst.
The
configuration looks like this:
Section "InputDevice"
Identifier "Keyboard1"
Driver "Keyboard"
# We want the US keyboard layout with an optional Arabic
# keyboard layout. (You can specify multiple layouts -- up
# to four -- only with XFree86 4.3.0 or later.)
Option"XkbLayout""us,ar"
# 104-key PC keyboard with the right-hand Windows Logo key
# mapped to the Compose key to combine letters and accents.
Option"XkbModel""pc104compose"
# We want to use the Alt-Shift key combination
# to switch languages. We also want to swap the left-hand
# Ctrl and Caps Lock keys.
Option"XkbOptions""grp:alt_shift_toggle+ctrl:caps_ac"
EndSection
You can try
out the settings in the current session using the setxkbmap utility.
setxkbmap -layout us,ar -model pc104compose -option grp:alt_shift_toggle+ctrl:caps_ac
Users of
languages where it isn't easy to use a keyboard layout for text entry
(especially Chinese, Japanese and Korean) can frequently use input method
editors using the XIM (X Input Method) API. There is a
good HOWTO (http://www.suse.de/~mfabian/suse-cjk/)
on this topic.
Of course,
in order to be able to view text in a particular language, you need a font that
provides glyphs for that language in the character set of your choice. One of
the most easily obtainable Unicode fonts that has support for several scripts is
the
GNU Freefont collection (http://savannah.nongnu.org/download/freefont/).
Another set of fonts that carry most Latin glyphs as well as scripts such as
Arabic, Hebrew and Cyrillic are Microsoft's
core fonts (http://corefonts.sourceforge.net/)
for the web. There are various web sites dedicated to information about fonts
available (http://www.alanwood.net/unicode/fonts.html)
for many languages in different encodings.
Putting all
of this together, it is possible to have a desktop environment in one's native
language (even if that language isn't English) by making a few settings. For
instance, the following screenshot shows a recent GNOME 2.5 snapshot (mostly) in
the Hindi language (locale hi_IN.UTF-8, XKB layout dev):
Missing image
Hindi-gnome.png
Description
You might
notice that the quality and extent of the translation varies from software to
software and translator to translator. Localization is a painstaking procedure,
and not all translations are alike in quality and availability.
If you'd
like to participate in localization efforts for your language, several
open-source software projects have internationalization and localization
projects that could use your help.
KDE (http://i18n.kde.org/),
GNOME (http://developer.gnome.org/projects/gtp/)
and
OpenOffice.org (http://l10n.openoffice.org/)
all have localization projects; there is also the
Free Software Translation Project (http://www2.iro.umontreal.ca/~gnutra/po/HTML/).
Modifying or
designing software to allow for internationalization is beyond the scope of this
write-up. However, there are
several (http://graal.ens-lyon.fr/~mquinson/l10n.html)
good (http://handhelds.org/~zecke/apidocs/qt/unicode.html)
resources (http://mail.gnome.org/mailman/listinfo/gtk-i18n-list)
available.
A note:
While Mozilla the web browser has excellent multilingual support, it tends to
fall down a bit on displaying complex scripts such as Thai and several Indic
scripts, such as Devanagari. There is a
Bugzilla report filed (http://bugzilla.mozilla.org/show_bug.cgi?id=215219)
and a patch in the works, with a
patched binary (ftp://ftp.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.6/contrib/mozilla-i686-pc-linux-gnu-gtk2-pango.tar.gz)
of Mozilla available for download. This patched binary works well for the
complex scripts, but falls down on right-to-left support, which the regular
unpatched Gecko engine gets right. We hope that these issues will be resolved
soon.
GNOME
tweaks
While many
folks are content with the way their GNOME desktop works, others among us are
tweakers at heart, and are never content with what's served to us. Others find
some components of the desktop (e.g. the Metacity window manager, which is a bit
anemic in features, especially compared to the likes of Sawfish and
Enlightenment, which enable the user to do things such as match windows
dynamically based on their X11 window class and set special properties)
unsatisfactory. We've got a few GConf settings (set with the gconf-editor tool
or the gconftool-2 tool) to help you out.
Ghost for
Unix: Portable Hard Drive Imaging for Unix
Missing image
G4u-welcome.gif
Description
Ghost for
Unix, or
g4u (http://www.feyrer.de/g4u/),
is a NetBSD-based bootable floppy or CD-ROM that allows you to clone your hard
disks for backup or to do mirrored installations. The floppy and CD offer two
uses. One function is to upload a compressed image of the local hard disk to a
FTP server. The other function is to restore that image via FTP, uncompress it
and write it back to the disk using the network configuration obtained via DHCP.
With the
hard disk being compressed as an image, any filesystem and Linux distribution
can be used with g4u. Backups of entire local disks as well as individual
partitions are also supported. Since g4u reads the disk bit by bit, starting
with the first bit to the last, it includes the MBR, partition table and the
partitions themselves. g4u works with both IDE and SCSI drives of any size and
geometry. The default compression type is GZIP Deflate at level 9, but can be
changed from compression level 1 (fast, little compression) up to 9 (slow,
maximum compression). Requirements include an empty 1.44MB floppy disk or an
empty CD, an FTP server with sufficient free space for the hard drive images and
a DHCP server. You can download the
floppy image here (http://www.feyrer.de/g4u/g4u-1.14.fs)
and the
CD-ROM image here (http://www.feyrer.de/g4u/g4u-1.14.iso).