Interview: Christophe Toum, Talend on Why Big Data Needs Big Governance

We discuss the priority order of data governance for Big Data initiatives, impact of increasing shift towards Hadoop and NoSQL, data quality, current trends, talent crunch, advice and more.

Toum_ChristopheChristophe Toum, Data Governance Product Manager at Talend, is responsible for the road-map of Talend's Data Quality and Master Data Management products. Christophe originally joined Talend as Director of Data Quality and Master Data Management, and has been extensively involved in Talend's product strategy and focus toward Big Data. Christophe has over 15 years of experience in Data Management software. Prior to working with Talend, Christophe acquired an extensive technical knowledge working as a developer, consultant, architect and presales at various companies including Ardent/Ascential, Oracle and IBM.

Here is my interview with him:

Anmol Rajpurohit: Q1. Why should data governance be included in the list of priority tasks for Big Data initiatives?

Christophe Toum: At Talend we believe Big Data without governance will quickly become a big problem. With the hype around Big Data people tend to dump data-governanceeverything in Hadoop or NoSQL without the formal control and quality processes they have adopted for their ERP or other operational data sources. Whereas, quite conversely,

Big Data needs even more governance. The new sources of Big Data, typically the web logs and social channels, tend to be less structured, and they come at high volume and high speed. Also, new streaming frameworks are emerging in Hadoop to allow near real-time data analysis and reduce the latency to make business decisions. Without the proper data governance in place, we think it is very likely our users and customers will make bad decisions.

AR: Q2. What kind of changes are required in the data governance policies in order to adapt to the increasing shift from traditional data management to Big Data technology(such as NoSQL and Hadoop)?

nosql-hadoopCT: It is typical for Big Data to ingest data external to the Enterprise, including open data. Controlling who can access and use this data, what data is verified and trusted, by whom and how, is a big deal. The usage is also critical. This data can be extremely sensitive, there are privacy issues, opt-in/out, specific regulations per geographies etc. As a matter of fact, I think Big Data without a tight governance can be not only damaging to the business - to put it bluntly,
it can actually send people to jail.

AR: Q3. How does the Talend Platform for Big Data help measure, monitor and improve Data Quality?

CT: Talend reduces the learning curve when adopting Big Data by offering the same experience as with traditional sources. Therefore the same Talend Profiler Talend Logoyou use on traditional files and databases now let's you assess the quality of your Hadoop sources and measure it over time. We also let you parse and standardize data on Hadoop, which is key to working with semi-structured data. Finally, the Talend Platform features advanced fuzzy matching algorithms to help mash up the disparate sources you usually find in Big Data.

AR: Q4. What data governance trends do you see currently in the Big Data projects across industries?

CT: Data in general and Big Data in particular seem to be moving up the stack, from IT to the business trendsusers. A trend we see is self-service data discovery, data transformation/processing and even mashup/blending without the direct involvement of IT. The actual consumers of the data tend to take the matter into their own hands in order to have the agility the business requires. To regain control and apply the proper data governance policies, IT needs to deploy and support a platform that offers a non-IT person enough simplicity, flexibility and productivity for them to willingly give up their ad-hoc tools and scripts.

AR: Q5. Is "talent crunch" a real problem in Big Data?

Talent CrunchCT: It is definitely an issue, because the skills required to really take advantage of Big Data are rare and expensive. However, at Talend our goal is precisely to make complex things simpler. What we want is to reduce this barrier of entry and open Big Data to less specialized people. This is why the Talend Platform for Big Data typically addresses two issues that make Big Data complex:
  • We provide easy to use connectors to cope with the profusion of distributions, frameworks, languages and databases the generic term "Big Data" so conveniently hides
  • Our Studio lets you design data transformations graphically, and the code generator deploys the native code for Hadoop under the covers so you do not have to be a Hadoop expert to find new insight into Big Data. Instead, you can focus on the business logic and reuse the widely available ETL competencies.

AR: Q6. If you were a fresher starting in analytics industry today, how would you shape up your career?

CT: Like Billy Bean and Peter Brand with the Oakland Athletics I would strive to make data "speak". The best way to get a job right now may be to get the Big Data skills the market craves, but I think what you need to secure a career in the analytics world is the ability to translate data into business opportunities.

AR: Q7. On a personal note, we are curious to know what keeps you busy when you are away from work? How do you manage work-life balance?

CT: My family is what keeps me busy outside work. I am also a pretty bad tennis player but it does not stop me from hitting the court at least once a week. Despite its incredible growth Talend has not lost its French roots of "art de vivre" and they give me the flexibility I need to get the work done and be happy in my personal life.