Interview: Sujee Maniyam, Elephant Scale on Why Open Source is So Important for Big Data

We discuss the importance of contributing to Open Source, Big Data skills for business managers, Big Data predictions, key qualities sought in data engineers, career advice and more.

Sujee ManiyamSujee Maniyam has been developing software for 15 years. He is a hands-on expert on Hadoop, NoSQL and Cloud technologies. He consults and teaches Big Data technologies. Sujee has authored a few open source projects and has contributed to Hadoop project. He is an author of open source Hadoop book called ‘Hadoop illuminated’

Sujee is the founder of ‘Big Data Gurus’ meetup in San Jose, CA. He has presented at various meetups and conferences.

Here is first part of interview.

Here is second and last part of my interview with him:

Anmol Rajpurohit: Q5. What are the advantages of using and contributing to Open Source? How can one get started on contributing to Open Source?

Open SourceSujee Maniyam: Most of the cutting edge technologies are open source. And as big data developers, we use these open source technologies routinely. So on a ‘do good’ level , contributing to open source is a wonderful way to give back.

Plus open source contributions are great addition to one’s resume. It really set a developer apart from the pool of others. Contributing to open source tells me that you are passionate about technology and can take initiative.

Most open source contributions take up a lot of personal time. So it demonstrates one’s passion for technology (so that people are not just doing this gig because it pays well :-) )

I see that companies have started seeing GitHub profiles of applicants. If you have a solid track record in open source contributions, you will definitely land on your ‘dream job’.

I highly recommend developers to start contributing to open source. Best way to get started is to join an existing project that you are using and care about. Most projects welcome new developers. And the veterans really help out of the newbies. For example: Cassandra project has a tag on their bug tracking system ‘low hanging fruits’ :-) Start with these and move forward.

AR: Q6. What kind of Big Data related skills should the business managers focus on? Is there any minimal technical understanding they need to have?

SkillsSM: I encourage managers to learn the fundamentals. They don’t necessarily need to learn to code. But they need to understand the complexities of Big Data systems, so they can manage their teams effectively.

I expect managers to have good domain specific knowledge (e.g. wireless networking, security, etc.). Managers should develop a sense of how to apply Big Data technologies to solve their domain specific problems. I believe this will really help with their careers.

AR: Q7. What do you personally think about the future of Big Data? Your predictions?

There is still so much hype around Big Data. But now companies are starting to adopt big data technologies (Hadoop, NoSQL) a little more. Big Data technologies will continue to mature and become easier to use -- the enterprise features in Hadoop 2 are a good example.

And also I see Hadoop becoming a ‘distributed Operating System’ upon which lot of other applications will be built. Things like Spark & Storm running on top of Hadoop (YARN) are great examples.

And we will see more ‘real time’ or ‘near real time’ applications on Hadoop (think Spark, Storm, etc.).

AR: Q8. If you were to hire a data scientist or data analyst, what kind of interview questions would you ask? What are the soft skills that you will look for?

SM: I know more about ‘Data Engineering’ than ‘Data Science’, so let me speak from that perspective.

A good data engineer is a
  • Excellent programmer
  • Skilled admin
  • Has ‘get it done’ attitude

Interview QuestionsI look for people with broad experience in programming plus admin -- generalists than specialists. When things don’t work right, you need to jump in, scan through the log files and figure out what is going on. Plus being handy in system / performance monitoring tools is key (e.g. why this is machine so slow?). And they should be very comfortable in scripting (shell or python, etc.).

Soft Skills are
  • Get it done attitude (don’t whine about how something isn’t working.. either make it work, or find an alternative)
  • Pleasant to work with (life is too short to put up with jerks)
  • Willing to lend a hand to others (even if that is not part of his / her job)

AR: Q9. What is the best advice you have got in your career?

SM: Network like crazy :-)

AR: Q10. On a personal note, we are curious to know what keeps you busy when you are away from work?

SM: I have two young kids … enough said :-)