Did Spark Really Kill Hadoop?
A comprehensive survey conducted by iDatalabs shows us the trends of the future of these two Data Science technologies.
By Julia Cook, iDatalabs
Apache Hadoop, built by Yahoo for engineers and data scientists, is showing its age. Once praised for its wealth of potential, it's suffered at the hands of swifter products, often from its own ecosystem."[Apache] Spark killed Hadoop," H20.ai founder Sri Ambati told Datanami earlier this year. It seems despite success of do-everything competitors like Google and Microsoft, the Big Data space calls for specifics.
In the past couple of years, however, Hadoop betrayed few signs of decay. In a 2015 survey, atscale reported “76% of those who already use Hadoop plan on doing more within the next 3 months.” About half of these respondents claimed they derived value from the program. Gartner and bluedata assessed that any decrease in adoption was because of the user’s learning curve.
Though Hadoop's brand seems synonymous with confusion, adoption has not slowed for this decade-old offering. And Spark has by no means taken over. So has the newer product lived up to the hype? What follows are our findings:
Market Penetration by Industry
It's no surprise that a product created for experts would stay in its lane. Spark, however, boasts a meaningful distribution across industries, thanks perhaps to a proliferation of Big Data principles in all kinds of markets. So while Spark may have a wider spread, Hadoop still dominates its intended user base.
Major Geographic Markets
Worldwide, we see competitor Informatica taking center stage, with a more meaningful presence in Europe and the Americas, and an overall market share of 32%. In the two and a half years we’ve tracked it, Informatica has grown 50% in the cloud market and diversified in Industry as well, seizing the lead in Higher Education. Just last week Gartner cited Informatica as a Leader in its 2017 Magic Quadrant for Master Data Management Solutions. Hadoop (and consequently, Spark) remain limited to past successful markets.
Adoption Trends by Customer Company Size
Nor has there been a proliferation of Spark among enterprise customers. Noting that most companies in the world are smaller-scale (1-50 employees), Spark doesn't appear to be the only choice for companies of any size. It's emerged as a helpful offering to those already using Hadoop, not as a selling point for the product at-large. That said, it's not limited to one kind of customer, as Hadoop may have been a decade prior.
Who is Using Hadoop with Spark?
Our records show legacy tech companies. Mainstays like eBay, Verizon, HP and Amazon mingle with UnitedHealth, Ciena, Epsilon, Pronix and Booz Allen. Certainly Hadoop has not been abandoned on a grand scale. Rather, customers wield Spark as an introduction to the system, using the program to vault Hadoop’s hurdles.
Curious about the top players in Big Data, or other hot tech spaces? iDatalabs' system gauges probability that a company is using a certain product. Access these insights and more on over 10,000 tech targeting pages.
Further Information on iDatalabs: At iDatalabs we use current and historic data to create predictive models for the future. Crawling both structured and unstructured data points (like job postings, LinkedIn skills, and testimonials) we create a fuller picture of who’s using what tech where. We refresh our system every couple weeks and filter points that don’t meet our standard for accuracy, then build models that improve as they operate.
- Spark – The Definitive Guide – exclusive preview
- How (& Why) Data Scientists and Data Engineers Should Share a Platform
- Hadoop is Not Failing, it is the Future of Data