KDnuggets Home » News » 2012 » Jan » Software » Microsoft plan for Hadoop and big data  ( < Prev | 12:n03 | Next > )

Microsoft plan for Hadoop and big data


 
  
Hadoop is a central part of Microsoft data strategy. Eventually, Microsoft wants to create a purely open source Hadoop on Windows.


O'Reilly, by Edd Dumbill, 25 January 2012

Microsoft has placed Apache Hadoop at the core of its big data strategy. It's a move that might seem surprising to the casual observer, being a somewhat enthusiastic adoption of a significant open source product.

Hadoop The reason for this move is that Hadoop, by its sheer popularity, has become the de facto standard for distributed data crunching. By embracing Hadoop, Microsoft allows its customers to access the rapidly-growing Hadoop ecosystem and take advantage of a growing talent pool of Hadoop-savvy developers.

Microsoft's goals go beyond integrating Hadoop into Windows. It intends to contribute the adaptions it makes back to the Apache Hadoop project, so that anybody can run a purely open source Hadoop on Windows.

Microsoft's Hadoop distribution
The Microsoft distribution of Hadoop is currently in "Customer Technology Preview" phase. This means it is undergoing evaluation in the field by groups of customers. The expected release time is toward the middle of 2012, but will be influenced by the results of the technology preview program.

Microsoft's Hadoop distribution is usable either on-premise with Windows Server, or in Microsoft's cloud platform, Windows Azure. The core of the product is in the MapReduce, HDFS, Pig and Hive components of Hadoop. These are certain to ship in the 1.0 release.

As Microsoft's aim is for 100% Hadoop compatibility, it is likely that additional components of the Hadoop ecosystem such as Zookeeper, HBase, HCatalog and Mahout will also be shipped.

Microsoft plans for Hadoop
How Hadoop integrates with the Microsoft ecosystem. (Source: microsoft.com.)

Additional components integrate Hadoop with Microsoft's ecosystem of business intelligence and analytical products:

  • Connectors for Hadoop, integrating it with SQL Server and SQL Sever Parallel Data Warehouse.
  • An ODBC driver for Hive, permitting any Windows application to access and run queries against the Hive data warehouse.
  • An Excel Hive Add-in, which enables the movement of data directly from Hive into Excel or PowerPivot.
Read more.

 
Related
Data Mining Software

KDnuggets Home » News » 2012 » Jan » Software » Microsoft plan for Hadoop and big data  ( < Prev | 12:n03 | Next > )