Wednesday, April 20, 2011

Content on the Multilingual Web

The Internet world stats site is a fascinating source of information for predicting the trends of web usage in terms of language and locality. These statistics were frequently cited in Pisa, where the participants of the W3C coordinated MultilingualWeb project held its second workshop under the title multilingual content in the web.

To fully appreciate the demand for new developments in multilingual web technologies one only needs to look at the statistics of Internet penetration rates by geographic regions (see figure). In 2010, already 42% of Internet users are from Asia while the Internet penetration rate has only reached 21.5%. Considering that the basic building blocks of web technology (browsers, content markup syntax) were originally developed for English, it is obvious that there is much work to be done in developing web standards that enable language interoperability from the nuts-and-bolts level of character encoding all the way to statistical machine translation.

While the MultilingualWeb project in itself is a small one, the European Commission has substantial investments in developing multilingual technologies in the near future. In specific, the project officer Kimmo Rossi counted that in the year 2015 there will be a total of 50 ongoing projects in the area of multilingual technologies. Recently, the EC conducted a survey with Eurobarometer to study how Europeans use languages online. One of the findings was that 44% of people feel that they miss important information online because they don't understand the content language. Stay tuned for the full report of the survey on Eurobarometer's site.

Expertise << popularity ?

Standardization regarding multilingual web content often boils down to a compromise between sound engineering practices and the popularity of existing ad hoc solutions. The developers of elegant standards (such as XLIFF 2.0 as presented by David Filip) are often racing against the clock to finish their work before de facto standards (however good or bad) take over.

"Web standardization is a popularity contest. Facebook can get any non-standard garbage into browsers just by being popular." (anonymous browser developer in Pisa)
The increasing adoption of HTML5 is a promising development for the support for multilinguality. Some of the recent changes in HTML5 include the removal of the charset attribute in a and link elements and updates to ruby annotation syntax. Richard Ishida of the W3C invites everyone to participate in the development of the standard.

Charles McCathieNevile (presenting on behalf of Marcos Caceres) gave a tutorial on how the W3C Widget specification can be used to package a web application that can operate across device platforms. Simply by writing a small configuration file and adding the content into a zip archive I was able to create my own web widget in a matter of minutes. (Since when has implementing standards been this much fun?)

Crowdsourcing and love brands

One of the recurring topics of the workshop was the use of crowdsourcing for translation tasks on the web. Crowdsourcing (i.e. the process of distributing a task to an online community) was presented as a solution to commercial translation services as well as voluntary translation efforts. Pål Nes (of Opera Software) discussed some of the challenges they have met in crowdsourcing their translation efforts and highlighted that crowdsourcing has its costs and is not suitable for time critical tasks.

Chiara Pacella (of Facebook) presented their success in using crowdsourcing in the translation of Facebook's user interface. The overwhelming popularity of the site is reflected in an active translation community (the French translation of Facebook's interface was effectively finished within hours). So could anyone follow Facebook's example in letting the community do their localization work (and for free, did I mention). Could Microsoft use the community to translate MS Office?
"People translate Facebook for free because they love Facebook. People use Facebook because they want to. People use MS Office because they need to." Eliot Nedas (of XTM International)
There's certainly a dose of wisdom in these comments. It's quite possible that the most effective way to disseminate new technologies and standards is to make them easy and fun to work with.

No comments: