With Hadoop, Amazon Adds A Web-Scale Data Processing Engine To Its Cloud Computer (KDnuggets News 09:07, item 47, Briefs)

KDnuggets : News : 2009 : n07 : item47

Briefs

With Hadoop, Amazon Adds A Web-Scale Data Processing Engine To Its Cloud Computer

by Erick Schonfeld on April 2, 2009, techcrunch

Slowly but surely, Amazon keeps adding capabilities to its cloud computing services. What started out as pay-by-the-drink storage (S3) and computational processing (EC2), now includes a simple database (SimpleDB), a content delivery network (CloudFront), and computer-to-computer messaging (SQS). And today Amazon added a web-scale file system data processing engine with Amazon Elastic MapReduce. (It is a framework for accessing data stored in file systems and databases).

This is actually a big deal because it allows developers to better take advantage of the massive computing power Amazon has to offer and create applications which process huge reservoirs of data (conveniently stored in Amazon S3) in parallel. MapReduce is the name of the data processing framework Google created to index and search the Web. It literally breaks up huge computational tasks and spreads them to different servers. This is called mapping the data. Once each processor is done with its portion of the math problem, it sends the result back so that all the different partial answers can be combined and then "reduced" into one final answer.