Since the beginning of the World Wide Web in the early 90s, English has been the lingua franca of web technologies. Issues related to supporting other languages have not gotten much attention as the majority of developers have adopted English as the standard language environment in everything from character encodings to browser user interfaces. Now, as the number of people using the internet is growing faster than ever, the need for standardized solutions is apparent to enable the web to scale to thousands of different languages and locales.
- In Africa alone, there are more than 2000 spoken languages. More than 100 of these have over one million speakers (Denis Gikunda, Google Africa)
- In India, the number of official languages is 22 out of 123 major languages and +2000 dialects (Swaran Lata, Dept. of Information Technology, Government of India)
- In some areas of the world, internet access will be available before literacy (Max Froumentin, World Wide Web Foundation)
- The current number of internet users is about 1.97 billion with 444.8 % growth in the past ten years (http://www.internetworldstats.com/stats.htm)
The MultilingualWeb project is exploring standards and best practices that support the creation, localization and use of multilingual web-based information. In the first workshop of the project titled "MultilingualWeb - Where Are We?" the project partners and industry representatives from browser vendors to representatives of large multilingual web sites (including Google, Facebook and Mozilla) gathered in Madrid to discuss the current state of standardization in language interoperability.
So, let's use UTF-8
While character encoding is the most obvious issue in enabling support for multiple languages, the consequences go much deeper into the locality. For example, in some languages the text flow can be right-to-left (e.g. Arabic) or even vertical (e.g. Chinese, Japanese, Korean). Combine this with the requirements of producing a portable layout with CSS and you end up with a handful of new issues which show that multilingual interoperability is indeed a big challenge.
In his talk, Richard Ishida of the W3C (World Wide Web Consortium) gave a tour of tentative web standards in the area of multilingual interoperability in the web. Some of the exciting new developments include WebFonts, CSS3, bidirectional text and Unicode locale extensions.
Max Froumentin of the World Wide Web Foundation gave some further insights into the future of internet use that the English speaking audience easily overlook. Namely, a significant proportion of "the remaining 5 billion" people that do not yet use the web are 1) not literate 2) do not speak English 3) are soon able to access the web through mobile devices. Despite these limitations, the ability to communicate through the internet is having a major impact in the organization of local governments.
In fact, it is estimated that in less than five years, the majority of web use is done through mobile devices. There is thus a huge potential for mobile web technologies.
Roberto Belo-Rovella of BBC World Services is all too familiar with the problems of internationalization as their office provides news in 32 languages in the web. It's really quite staggering that despite the availability of standards like Unicode, BBC still publishes text content embedded into images in some languages such as Hindi to circumvent the poor support for Hindi characters in the browsers of mobile phones.
Two sides to a standard
In his keynote talk, Reinhard Schäler of the Rosetta Foundation reminded that standardization is usually a coin with two sides. Standards are indeed necessary for interoperability and standards facilitate technological development since anyone can provide an improved alternative to an existing solution as long as the interface is standardized. But standards can also be restrictive, outdated and impractical. For a standard to be useful, it has to become widely adopted and developers are reluctant to comply with standards that are not up to par with the technological development.
Standardization is also a form of authority and one should question what gives the legitimacy to a standards organization. To this, Ishida's attitude is humble:
"It's your web, not ours [W3C]. Now go and make the most out of it!".