One of the terms buzzing around Localization World in Seattle last week was “just in time” (JIT) translation versus translating everything “just in case” someone, somewhere, sometime needs it. JIT proponents — Lionbridge’s CEO Rory Cowan among them — maintain that many words are translated, but never read. They suggest that by translating words at the point of demonstrable need or demand, companies can save money, time, and resources over translating “everything.”
Interesting concept, but it is largely irrelevant for most companies. The reality is that few companies can afford to translate everything. In fact, we estimate that 99.44% of information in most firms, governments, and other organizations is never translated — we call that “zero translation” (ZT). What most companies do translate is often little more than they are obligated to translate: 1) basic information like “who we are, what we sell, how to contact us,” 2) whatever documentation is required by regulations such as the European Union’s directives for medical devices, and 3) whatever the courts tell them to translate.
The big challenge for these firms is figuring out what beyond these should get translated — and the painful reality for language service provides is that most companies don’t have budgets for any more human translation (HT). That’s where machine translation comes into play as a way of reducing the mountain of ZT content. Gates’ speech and Language Weaver’s machine translation (MT) announcements both play to this need:
- In a speech at Princeton University in New Jersey (exit 9, New Jersey Turnpike), Bill Gates noted that “we’re at the point today that within the domain of text about computers we can do automatic translation with basically the same level of quality of a human translator. We, over the last five years, have been trying this out by taking the articles that we hand translate, have half of those done by the machine, half of them done by a human, and then seeing what the response is, do people like it better or use it more, or are they really indistinguishable. Over this five-year period we’ve driven that to where now we’ve actually have achieved that goal, where those things are indistinguishable.” MT output equals HT output! Stop the presses! Unfortunately, only Microsoft can use its MT program so we’ll have to take Gates’ word on this — but Albanian and Mongolian users of Office Vista may be able to tell us in 2008 about the quality of their Excel documentation.
- Meanwhile, Language Weaver announced Version 4 of its statistics-based MT system, in which users can add their own data to extend its statistical model into other markets and knowledge domains. A feedback loop lets these users confirm that the engine got it right, a function that Language Weaver’s own engineers provide for the company’s work in black-box and OSINT (open-source intelligence gleaned from newspapers and other public documents) intelligence. From our discussions with MT developers, we have found that the most critical elements for improving the output of statistics-based MT are the incorporation of ever more corpora, the addition of corpora and lexicons that tweak the engine to the linguistic requirements of different markets or domains, and a way to validate the correctness of the output. Language Weaver claims that this new version delivers on all 3 scores — and thus goes mano à mano against rules-based commercial MT products from IBM, SDL, and Systran (which recently announced a new distribution agreement with Nuance, née ScanSoft). As evidence of its commercial thrust, the display at Language Weaver’s Localization World stand in Seattle featured Spanish examples rather than Arabic, a good example of “localizing” its message to the audience of commercial medical device, automotive, and software companies at the conference.
JIT, JIC, MT, HT, ZT, OSINT — hmm, we think we might have an abbreviation abuse problem here. In any case, we believe that companies will employ MT to chip away at the mountains of ZT information. In actual usage, most machine translation is just-in-time, so maybe the JIT advocates are correct but just misguided in who (what?) will actually do those translations of materials heretofore untranslated.