DataReview interview with me on KDnuggets, Data Mining, and Data Science

I have recently given an interview to DataReview, an Ukrainian site, where we talked about KDnuggets origins, history of Data Mining, interesting problems I worked on, and typical problems faced by young data scientists.

By Gregory Piatetsky, @kdnuggets, Aug 20, 2014. is an information and education portal focusing on data analysis, business intelligence, and big data. It is based in Ukraine, and ably run by Yevhen (Eugene) Dwortsyn, entrepreneur and IT specialist, and editor-in-chief Larisa Shuriga, a journalist and IT enthusiast. publishes mainly in Russian, but also translates some of its content in English.

I have recently given them an interview - we talked about how I arrived to data mining, the difference between data mining and KDD, the history of data mining, interesting problems I solved with the help of data mining, typical problems faced by aspiring data scientists, and more.

Here is the full interview in English:
Gregory Piatetsky: Overfitting Is the Cardinal Sin of Data Science
and in Russian: Григорий Пятецкий: Переподгонка — «смертный грех» для аналитика.

Here are some excerpts (illustrated by my photo in "Data Miner" hat)

Gregory Piatetsky with a Data Miner hat DataReview: Data scientists are some of the most demanded apecialists in the IT-market. What tasks the solve? What challenges they face? DataReview has addressed these questions to one of pioneers of data analysis, the founder of KDD concept, the president of KDnuggets, Gregory Piatetsky-Shapiro.

Larisa Shuriga, DataReview: Gregory, you are known as one of the best specialists in data analysis. How did you realize that this area is your calling?

GP: Thank you, Larisa, but you are much too kind. There are now many thousands of excellent data scientists, and I am glad if I am considered somewhere among them.

I am probably best known as one of the pioneers in this field. I organized the first 3 workshops/meetings on Knowledge Discovery and Data Mining (KDD-89, 91, 93), co-edited the first 2 books in this field (1991 and 1996), helped launch KDD Cup - the first large data mining competition in 1997, co-founded the SIGKDD association - ACM group for Knowledge Discovery and Data Mining in 1999, and served as SIGKDD chair from 2005 to 2009.

LS: How did you arrive to this field?

GP: As a child, I loved science fiction, especially A & B. Strugatsky, Isaac Asimov, and Stanislaw Lem, and that , along with mathematical inclination inherited from my father Ilya Piatetski-Shapiro - one of the leading mathematicians in Moscow - led me to study computers and being interested in Artificial Intelligence and Machine Learning.

My PhD at New York University was on application of machine learning method for database optimization, and my first job was working with databases. So perhaps working with databases and being interested in Machine Learning naturally led me to try to combine the two, which led to my work on knowledge discovery in data.

I described my journey to data mining in my chapter in Journeys to Data Mining: Experiences from 15 Renowned Researchers , Mohamed Medhat Gaber (Editor) Springer, 2012

LS: Could you, please, explain the difference between data mining and KDD?

GP: Of course I did not invent data mining - analyzing facts and finding patterns is probably one of the basic human traits. Statisticians have been working on data analysis for centuries.

Regarding the different names of this field - Data Mining, Knowledge Discovery, Predictive Analytics, Data Science - here is a very brief history.

In 1960-s, statisticians have used terms like "Data Fishing" or "Data Dredging" to refer to what they considered a bad practice of analyzing data without a prior hypothesis. The term "Data Mining" appeared around 1990s in the database community. I coined the term "Knowledge Discovery in Databases" (KDD) for the first workshop on the same topic (1989) and this term became popular in academic and research community. KDD conference, now in its 21 year, is the top research conference in the field and there are also KDD conferences in Europe and Asia.

However, the term "data mining" is easier to understand it became more popular in the business community and the press.

In 2003, the term "data mining" acquired a bad image in the US because of its association with US government program called TIA (Total information Awareness) which was closed by US Senate after protests by privacy advocates. In 2006, the term "Analytics" jumped to great popularity, driven by introduction of Google Analytics (Dec 2005). According to Google Trends, "Analytics" became more popular than "Data Mining", as measured by Google searches, around 2006, and continued to climb ever since.

The term "Data Science" appeared in early 2000, but became used in its current meaning only since 2012, and we can see a huge demand in jobs for "Data Scientist" on, a popular platform for jobs. ...

Read more in the full interview

Gregory Piatetsky: Overfitting Is the Cardinal Sin of Data Science