GigaOM, Derrick Harris Jun. 27, 2011
Yahoo will be spinning off a separate company focused on the development and commercialization of Apache Hadoop, called HortonWorks. The official announcement likely will come tomorrow or Wednesday to coincide with Yahoo's annual Hadoop Summit, but rumors have been circulating for months and I confirmed the news today with a source familiar with the project.
As the originator of the Hadoop technology, Yahoo's official entry into this space should play a big role in shaping how the market of Hadoop-based products evolves.
Yahoo's HortonWorks (as in the Dr. Suess book "Horton Hears a Who," a reference to the elephant logo that Apache Hadoop bears) will be comprised of a small team of Yahoo's Hadoop engineers and will focus on developing a production-ready product based on the Apache Hadoop project, the set of open source tools designed for processing huge amounts of unstructured data in parallel. It's a natural step for Yahoo, which uses Hadoop heavily within its own web operations, and which has contributed approximately 70 percent of the code to Apache Hadoop since the project's inception.
By incorporating next-generation features and capabilities, HortonWorks hopes to make Hadoop easier to consume and better suited for running production workloads. Its products, which likely will include higher-level management tools on top of the core MapReduce and file system layers, will be open source and HortonWorks will try to maintain a close working relationship with Apache. The goal is to make HortonWorks the go-to vendor for a production-ready Hadoop distribution and support, but also to advance Yahoo's repeated mission of making the official Apache Hadoop distribution the place to go for core software. Earlier this year, Yahoo discontinued its own Hadoop distribution, recommitting all that code and all its development efforts to Apache.