Translation/Computer/Mayer-Schönberger: teaching computers to translate means not only teaching them the rules, but also the exceptions.
Algorithms/Big Data/Mayer-Schönberger: the more data available, the more algorithms are trumped. This can be seen in the way computers learn to deal with everyday language and translate it.
With larger amounts of data, an older, simpler algorithm worked even better. (1)
Google Translate: as of 2006, Google Translate did not use comparisons of finished translated text pages, but a larger and much messier amount of data: the global Internet and more. Existing translations of official documents such as the United Nations and the EU were also taken into account, as were translations from Google Books' scanned books. Instead of 300 carefully translated sentences from Candide (2) , there were now billions of pages of varying quality. See also the history of machine translation. (3)
For Google Translate: see (4), (5).
Algorithms/Translation/Big Data/Mayer-Schönberger: the new automatic translations did not work better because better algorithms were available, but larger amounts of data were taken into account.
The material also contained all kinds of uncorrected errors and incomplete words. But the fact that the new body was a million times larger outweighed these disadvantages.
1. Michele Banko and Eric Brill, “Scaling to Very Very Large Corpora for Natural Language Disambiguation,” Microsoft Research, 2001, p. 3 (http://acl.ldc.upenn.edu/P/P01/P01-1005.pdf).
2. Adam L. Berger et al., “The Candide System for Machine Translation,” Proceedings of the 1994 ARPA Workshop on Human Language Technology, 1994 (http://aclweb.org/anthology-new/H/H94/H94-1100.pdf).
3. History of machine translation—Yorick Wilks, Machine Translation: Its Scope and Limits (Springer, 2008), p. 107. [>] Candide’s millions of texts versus Google’s billions of texts—Och interview with Cukier, December 2009.
4. Alex Franz and Thorsten Brants, “All Our N-gram are Belong to You,” Google blog post, August 3, 2006 (http://googleresearch.blogspot.co.uk/2006/08/all-our-n-gram-are-belong-to-you.html).
5. Halevy, Norvig, and Pereira, “The Unreasonable Effectiveness of Data.”_____________Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. The note [Author1]Vs[Author2] or [Author]Vs[term] is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.
Big Data: A Revolution That Will Transform How We Live, Work, and Think New York 2013