The Domain Name System (DNS) in its present form supports only names
consisting of a combination of letter, number, and dash characters from an
English writing system. These are a subset of the ASCII character set.
There is however, a strong desire in the global Internet
community to support more than just English characters in DNS names. In particular, the desire is
to internationalize names used on the web in Uniform Resource Locators (URLs),
or Web site addresses.
In order to achieve this, it is necessary to modify the DNS system so that
it can map the IP addresses to a string of non English characters. Similarly,
the browsers should be enabled to accept non English characters as input. Also
when a browser sends a query for resolving a non English character URL, all
the systems involved in the resolution should be capable of reading the non
English characters and check their respective data base.
This is obviously not an easy task. One of the main reasons is that
different languages in the world use different scripts. There are scripts like
German which are very close to English, but there are also scripts like
Kannada or Hindi or Tamil which have no relation to them. There are also
languages like Chinese or Japanese where a word may actually be a visual
concept derived out of many picture form. The formation of letters in many
languages is also not a build up of letters from left to right but either from
right to left as in Arabic, or top to bottom as in Chinese
Also, letters in many languages are not just single letters but are
Characters which are combinations of one or more letters or a letter with an
extension.
To be used in a Computer, it is necessary to order the set of
characters in any script and assign a number to each of them, and create
a coded character set. ASCII, for example, is a coded character set in which
uppercase "A" is assigned the number 65. Unicode is another example of a
coded character set meant to create a universal character set that covers all
the major scripts of the world. Because of this, Unicode is the coded
character set of choice for Internationalized DNS names. At present Unicode is
still being updated with new scripts and new characters.
The International Domain Name System providing for non
English domain names is presently under a testing ground
where the standards are being developed. The approach is to use the
browser to first convert the non English character string to
Unicode, and then fed through a transformation process to produce an ASCII
encoded string so that the existing DNS systems can continue to work on
the present technical standards only.
An example of how different characters in Chinese, Japanese and Korean are
mapped to ASCII strings through what is called the RACE (Row-Based
ASCII Compatible Encoding ) is shown in the table below.
Indian language domain registrations are also available at
http://global.networksolutions.com/en_US/name-it/ml-index.jhtml
Presently domain name registrations are available in more
than 350 languages. However, all the technical difficulties regarding the user
being able to easily resolve the language domain name have not been fully
removed.
A few of the technical problems that have come up
during this test implementation and the suggested solutions are as
follows.
-
Resolution problems:
Problem: By default, Internet Explorer sends URL's as UTF8. The
conversion to UTF8 is not always 100% accurate, which can cause problems
resolving the domain names.
Suggested Solution: Try turning off the option to send URL's as UTF8. This
can be done in the "Internet" Control Panel on the "Advanced" Tab.
Problem: Internet Explorer on Chinese Windows (Simplified and
Traditional) has difficulty with domain names with odd numbers of
characters (3,5,7, etc.).
Suggested Solution: Turn off the option to
send URL's as UTF8. This can be done in the "Internet" Control Panel on the
"Advanced" Tab.
Problem: When surfing to a web site with a multilingual domain
name the page isn't displayed or a DNS error occurs.
Suggestion 1: Make sure the "http://www." and ".cc" are entered
using the ASCII Character Set, not multilingual characters.
Suggestion 2: Try putting a "www." before the multilingual
characters. Some versions of some browsers need this.
Problem: Internet Explorer parses certain double-byte character
set domain names incorrectly. This may affect Chinese, Japanese, and
Korean domain names. According to Microsoft, this issue only affects
certain versions of IE.
Suggested Solution: If you think you are experiencing this problem check out
this article on Microsoft's Web site.
- Proxy Server Issues:
Problem: Users who are behind non-8-bit-clean proxy servers
will not be able to resolve multilingual domain names. The proxy server
will filter out the multilingual characters as invalid.
Suggested Solution: Upgrade the proxy server to an 8-bit compliant server.
Language Domain Names are here to stay. But are they going
to introduce new legal complications?.. We shall explore in the next article.
Naavi
June 6, 2002
Your Views
can be sent here