Google uploaded a video that explains how Google’s machine translation service works. It’s fascinating to see how much Google Translate has improved in the past 4 years and how many Google services use it.
Here’s the full text of the video:
“Google Translate is a free tool that enables you to translate sentences, documents and even whole websites instantly. But how exactly does it work? While it may seem like we have a room full of bilingual elves working for us, in fact all of our translations come from computers. These computers use a process called ‘statistical machine translation’ — which is just a fancy way to say that our computers generate translations based on patterns found in large amounts of text.
But let’s take a step back. If you want to teach someone a new language you might start by teaching them vocabulary words and grammatical rules that explain how to construct sentences. A computer can learn foreign language the same way – by referring to vocabulary and a set of rules. But languages are complicated and, as any language learner can tell you, there are exceptions to almost any rule. When you try to capture all of these exceptions, and exceptions to the exceptions, in a computer program, the translation quality begins to break down. Google Translate takes a different approach.
Instead of trying to teach our computers all the rules of a language, we let our computers discover the rules for themselves. They do this by analyzing millions and millions of documents that have already been translated by human translators. These translated texts come from books, organizations like the UN and websites from all around the world. Our computers scan these texts looking for statistically significant patterns — that is to say, patterns between the translation and the original text that are unlikely to occur by chance. Once the computer finds a pattern, it can use this pattern to translate similar texts in the future. When you repeat this process billions of times you end up with billions of patterns and one very smart computer program. For some languages however we have fewer translated documents available and therefore fewer patterns that our software has detected. This is why our translation quality will vary by language and language pair. We know our translations aren’t always perfect but by constantly providing new translated texts we can make our computers smarter and our translations better. So next time you translate a sentence or webpage with Google Translate, think about those millions of documents and billions of patterns that ultimately led to your translation – and all of it happening in the blink of an eye.”
Google Translate added 5 new languages: Armenian (6.7 million speakers), Azerbaijani (20-30 million speakers), Basque (about 660,000 speakers), Georgian (4 million speakers) and Urdu (70 million speakers), but don’t expect to read high-quality translations because the language models are still in alpha. “Translation to and from ALPHA languages may not work as well as other languages, as these systems are still in early stages of development,” explains Google Translate’s FAQ.
In a recent article from Spiegel, Google’s Franz Och said that “the databases for 296 other languages are in development”. Right now, Google Translate supports 57 languages, while Microsoft’s Bing Translator has support for 30 languages.
These languages are available while still in alpha status. You can expect translations to be less fluent than for our other languages, but they should still help you understand the multilingual web. We are working hard to “graduate” these new language out of alpha status, just as we did some time ago with Persian. You can help us improve translation quality as well. If you notice an incorrect translation, we invite you click “Contribute a better translation”. If you are a translator, then you can contribute translation memories with the Translator Toolkit. This helps us build better machine translation systems especially for languages that are not well represented on the web.
Collectively, Armenian, Azerbaijani, Basque, Georgian and Urdu have roughly 100 million speakers.
One of the most annoying issues with Google Translate is that it’s very difficult to copy the translated version of a web page. Translate a web page, copy some text, paste it in a text editor and you’ll notice that, before each translated phrase, there’s the original version of the phrase.
Fortunately, you can properly copy some text from a translated page if you use the translation feature from Google Toolbar or from Google Chrome. Microsoft’s translation service has a more flexible interface and it doesn’t mix the translated text with the original text.
Google Translate has made the text-to-speech feature more useful by adding 27 new languages: Afrikaans, Albanian, Catalan, Chinese (Mandarin), Croatian, Czech, Danish, Dutch, Finnish, Greek, Hungarian, Icelandic, Indonesian, Latvian, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Swahili, Swedish, Turkish, Vietnamese and Welsh.
Google used the open-source speech synthesizer eSpeak, but I’m sure that this is just a placeholder until Google manages to obtain better results. “You may notice that the audio quality of these languages isn’t at the same level as the previously released languages. Clear and accurate speech technology is difficult to perfect, but we will continue to improve the performance and number of languages that are supported,” says Google’s Fergus Henderson.
Initially, the feature was only available for English, but 4 other languages have been added in the past two months: Haitian Creole, French, Italian and German.
To try Google’s TTS service, go to Google Translate, type some text, translate it and click on the audio icon to listen to the translation.
One of the popular features of Google Translate is the ability to hear translations spoken out loud (”text-to-speech”) by clicking the speaker icon beside some translations, like the one below.
We rolled this feature out for English and Haitian Creole translations a few months ago and added French, Italian, German, Hindi and Spanish a couple of weeks ago. Now we’re bringing text-to-speech to even more languages with the open source speech synthesizer, eSpeak.
By integrating eSpeak we’re adding text-to-speech functionality for Afrikaans, Albanian, Catalan, Chinese (Mandarin), Croatian, Czech, Danish, Dutch, Finnish, Greek, Hungarian, Icelandic, Indonesian, Latvian, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Swahili, Swedish, Turkish, Vietnamese and Welsh.
You may notice that the audio quality of these languages isn’t at the same level as the previously released languages. Clear and accurate speech technology is difficult to perfect, but we will continue to improve the performance and number of languages that are supported.
Google offers three services for translating Microsoft Office documents, but none of them works well. You can upload documents to Google Translate, but the output is an HTML file that can’t be properly saved and that doesn’t include images. There’s also Google Translator Toolkit, which has a buggy document converter. Google Docs has a translation feature, but the document is first converted to HTML and the original formatting isn’t always preserved.
DocTranslator is a service that translates Microsoft Office documents using Google Translate, but it manages to preserve the layout of the original file. Unlike Google Docs, DocTranslator is not limited to Microsoft Word documents that have less than 500 KB and it works for Excel spreadsheets and PowerPoint presentations, as well. DocTranslator uses a Java applet to upload the files and it translates them using Google Translate’s API.
“Its benefits are that you can save the documents in their original file format and also maintain the original layout (fonts, tables, columns, spacing, etc.). It basically replaces the text of the file while keeping everything else,” explained a user of the service.
The following screenshots show a document translated in Google Docs and the same document translated using DocTranslator and opened in Microsoft Word.
Google Translate has come a long way since Google switched to its own machine translation service, back in 2007. Google now supports 52 language pairs and “the databases for 296 other languages are in development,” according to a Spiegel article.
Google Translate is more useful, now that it’s integrated with many Google services. Here’s a list of the best uses of Google Translate in other Google services:
1. Translate web pages dynamically using Google Chrome or Google Toolbar for Internet Explorer and Firefox. Browsers can now detect web pages in foreign languages and automatically translate the text, without opening a new page.
2. Google Translate for Android is a free application that uses voice input and a text-to-speech engine that reads the translation.
3. Google Goggles for Android is a free visual search application, which is now able to extract text from photos and translate it. “On a recent trip to Japan, Franz Och [Google’s head of translation services], who doesn’t speak Japanese, was able to decipher restaurant menus and even read local news — using his mobile phone, which provided him with the translations within seconds,” reports Spiegel.
4. Cross language search is a feature of Google Search that lets you find web pages written in other languages. When you click on “translated search” in Google’s left sidebar, Google finds the most appropriate languages for your query, translates your query and shows the results translated into your language.
5. Google Translator Toolkit is a great way to translate documents using Google’s machine translation service as a starting point.
6. Gmail Translate is a Labs feature that lets you translate messages written in foreign languages.
7. YouTube lets you translate captions, which is quite useful for videos that have closed captions in one language.
8. Google Talk’s translation bots help you translate messages from a conversation. “Google Talk can help you with quick translations, or even translate your chats in real-time. All you need to do is chat with one of the Translation Bots. You can also get your conversation translated by inviting a bot to a group chat with a friend.”
9. Picasa Web Albums has a great feature that translates comments into your language.
10. Google Translate OneBox shows the translation of a text if you start your query with “translate”. The OneBox is displayed at the top of Google’s search results page,
Google Translate is also used to translate feeds in Google Reader, documents in Google Docs, reviews in Google Maps and YouTube’s search results.
Today, we’re taking another step to make automatic translation easier. Now, if Google Toolbar’s default language is set to one of our supported languages, you can use our new Word Translator feature to hover over a word with your mouse and get an automatic instant translation. If you want Toolbar to translate into a different language, you can change it in the Toolbar Options menu.
Entire page translations are great if you have little knowledge of a given language. However, if you’re a multi-lingual user who just needs certain words translated, hovering is a lot quicker than searching word-by-word on Google Translate.
Here is an example of the word “vitesse” (speed) translated from French to German:
The new Word Translator feature is available for Internet Explorer and Firefox. And if you use Google Chrome, automatic page translation is already built in, and we’re working to build more Translate features.
We hope this helps you browse pages in non-native languages faster, regardless of your language proficiency. Install the latest Toolbar version and give it a try!
Posted by Dmitry Gozman, Software Engineer