We discuss the startup - Elephant Scale, DIY Hadoop learning, best free online resources for learning Hadoop, getting a good job in Big Data, and the experience of authoring a book - Hadoop Illuminated (available for free).

Sujee ManiyamSujee Maniyam has been developing software for 15 years. He is a hands-on expert on Hadoop, NoSQL and Cloud technologies. He consults and teaches Big Data technologies. Sujee has authored a few open source projects and has contributed to Hadoop project. He is an author of open source Hadoop book called ‘Hadoop illuminated’

Sujee is the founder of ‘Big Data Gurus’ meetup in San Jose, CA. He has presented at various meetups and conferences.

Here is my interview with him:

Anmol Rajpurohit: Q1. What inspired you to launch ElephantScale? What does ElephantScale do?

Sujee Maniyam: Elephant Scale is a boutique company that offers expert consulting and training around Big Data eco system.  We focus exclusively on Big Data technologies (Hadoop, NoSQL, Cloud, etc.).

Elephant Scale was founded by Mark Kerzner and Sujee Maniyam. Both Mark and Sujee are veterans in Big Data space. We provide enterprise support for FreeEed (freeeed.org) - an open source eDiscovery Engine.  We are seeing a lot of interest and adoption of FreeEed.  We are in the process of enhancing FreeEed.

We are a very open source friendly company. Look at our Github : github.com/elephantscale

And we have written an open source book on Hadoop: ‘hadoop illuminated’ : hadoopilluminated.com

AR: Q2. Based on your extensive training experience, what strategy do you recommend for DIY learning Hadoop and related skills?

First off, let me say Hadoop is NOT easy.  It is one thing to have a ‘hello world’ to run on a laptop; it is completely another to run production ready cluster.

There are so many resources now available to learn Hadoop.  Some of them are online and free! Here are some pointers (in no particular order):

  1. Read some good books

    Hadoop Book’ by Tom White is pretty good intro text.
    Hadoop Operations’ by Eric Sammer is a very practical guide for Admins. I will also do a self-promotion for our Hadoop book as well. It is called ‘hadoop illuminated’ -- it is online and completely free ! http://hadoopilluminated.com/

  2. Hadoop takes a lot of practicing

    Get a Hadoop sandbox from either Cloudera or HortonWorks.  This is a virtual machine with all components installed and configured and ready to use. Start playing with various components (Pig, Hive, HBase ..etc)

  3. To learn Hadoop I’d recommend our open source labs

    github.com/elephantscale/HI-labs. We use these labs for our training, so they are substantial and we keep them up-to-date.  The labs are open and available on GitHub for all.

And finally, I’d like to mention a program that I am involved in -- Insight Data Engineering (insightdataengineering.com).  It is a free, intensive, 6-week, full time training fellowship designed to train data engineers.

AR: Q3. Once someone has learned Hadoop skills through self-learning, how should one approach the goal of getting a good job?

Big Data JobsSM: There is a skills chasm in Big Data.  There is a huge demand for experienced Big Data developers.  However someone who just learned Hadoop will have to ‘prove himself’ to land a ‘good job’

One technique I recommend is, once you have learned the basics of Hadoop, you should try to solve a substantial real world problem.  Find a data set and try solve an interesting problem.

We have a list of publicly available big data sets here:


Even better, if you can do this as an open source project. This will go a long way in helping you with your interview process.

Also, getting certified may not be a bad idea.  Hadoop certifications offered by Cloudera & HortonWorks are pretty affordable.  And having a certification might add some weight to the resume (especially if your real world expertise is light).

AR: Q4. How and when did you get inspired to write books? What were your thoughts behind making them available for free?

Hadoop IlluminatedSM: Hah :-) Both Mark and I were approached by publishers to write a book on Hadoop.  We thought it would be interesting to write a book on Hadoop, but do it open source, out in the open.

And that is what we did.

Since we were completely in charge of the content, we could write it at our own pace. And since it is a ‘living book’ we can have chapters like ‘Hadoop Use Cases’ and ‘Big Data Eco System’ -- we keep adding to these chapters.

We wrote the book to make Hadoop accessible to a wider audience, not just the deeply technical.  We have been getting lot of ‘thank you’ emails from all around the world -- tells us we did something right :)

We like to think of the book as our little contribution to the Hadoop project. Plus the book has given us name recognition also.  Sometimes we will meet a prospective customer and they will tell us that they enjoyed reading our book! Pretty good :-)

The entire book is freely available here: hadoopilluminated.com

And the book content is open source: github.com/elephantscale/hadoop-book

We have released it under a Creative Commons license (same as MIT open course-ware).

Second and last part of the interview will be published soon.