Data Science in 30 minutes, Artificial General Intelligence, and Answers to your Questions

I recently was on a "Data Science in 30 minutes webcast", but there were interesting ideas and questions we did not have time to cover adequately. Here is a summary.

I recently had a great opportunity to be on a webcast Data Science in 30 Minutes with Michael Li, CEO and founder of The Data Incubator. The Data Incubator has a great program for Data Science Fellowship, Hiring, and Training - check it out if you want to become a Data Scientist.

We had an interesting discussion, and the time was too short to present many ideas and observations in sufficient depth. There were also interesting questions which I did not have time to answer but will answer below. This post has several sections:
  • My Journey to Data Science
  • Trends in AI and Machine Learning
  • Artificial General Intelligence and Intuition
  • Questions and Answers
Here is the video of the webcast.

My Journey to Data Science

Michael asked me how I came to Data Science.

As a kid, I was very fascinated by science fiction and loved stories about robots.

This probably motivated me to learn computers, and in my first year at college I spent several weeks of my free time writing a program to play battleships. I used APL - which was very advanced (for that time) array programming language. I was soundly defeated by my own program in the very first game, and became more interested in creating programs than playing them.
Gregory Piatetsky, 1990
Gregory Piatetsky, 1990

I joined GTE Laboratories in Boston area in 1984, shortly after receiving my PhD in computer science and worked on applying Machine Learning methods to large databases. A few years later, I attended a workshop called "Expert Database Systems". The workshop was interesting, but the concept of Expert Database Systems seemed very fuzzy to me and I thought we could focus on something more useful. In a GTE project where we worked on federated database systems I found that I could speed up one query by several orders of magnitude if I knew that certain constraint (of the type X=a then Y=b) was always true. How could we find such rules automatically?

Being young, energetic, and naive, I decided that I can organize my own workshop. The term "Data Mining" seemed very prosaic, so I called it Knowledge Discovery in Data (KDD) to emphasize both the "Knowledge" and the "Discovery" part. The first KDD workshop was held in 1989 and it attracted many great researchers. I organized 2 more workshops and in 1994 one of my best ideas was to recruit Usama Fayyad who was then a fresh PhD, and Sam Uthurusamy, a researcher at GM who advised Usama on his PhD work. They both attended the previous workshops and agreed to run the next workshop in 1994. KDD-1995 In 1995 the workshop became a separate KDD conference as part of AAAI and in 1998, Won Kim helped us create a separate group within ACM: AGM SIGKDD, which organizes the KDD conferences now and has related activities to support Data Science research.

After over 20 years of KDD Conferences I am very pleased to see that KDD remains the leading research conference in the field, based on citations and other indices.

In 1993, after the third KDD workshop, I started KDnuggets News, an e-newsletter focused on data mining and knowledge discovery. The first issue went to 50 researchers who attended the workshop. Today, the KDnuggets has over 200,000 subscribers and followers across email, Twitter, Facebook, LinkedIn, and RSS. website received over 500,000 visitors a month in Q4 2017 and is a very popular site for analytics, data science, and Machine Learning news, software, jobs, courses, education and more.

Conference and KDnuggets were only a part-time activity for me for many years. I started my career as a researcher at GTE Labs in Waltham, MA and worked there for 12 years. In 1997 I joined a startup that was doing analytics / data mining consulting for financial industry. I was a chief scientist there and managed a team of about 10 people. Our smallish startup was bought by a big startup that went quickly to over $1B valuation (a "unicorn"), but stayed at this high valuation very briefly, and before we could do anything stupid with all those stock options, the big startup crashed and burned in the dotcom crash of 2001, and went from a billion dollars to nearly zero.

I left that startup a few months before it crashed and since 2001 I was publishing KDnuggets as a full-time activity. Initially I was also doing data mining consulting, but in the last few years, thanks to huge interest Big Data and later Data Science, Machine Learning, and AI, KDnuggets became so popular that I stopped consulting and only publish KDnuggets.

There is too much content on the web so our mission is to find a few interesting stories, opinions, and tutorials in the intersecting fields of AI, Big Data, Data Science, and Machine Learning. I am helped by Matthew Mayo, who does excellent work as a second (besides me) full-time editor of KDnuggets. We also have several interns that help us publish KDnuggets and do original data journalism research.

For people interested in the history of the field, I recommend the book Journeys To Data MiningJourneys to Data Mining: Experiences from 15 Renowned Researchers, by Mohamed Medhat Gaber (Editor), 2012, Springer, which includes my chapter: Journeys of Knowledge Discovery and personal stories from other leading researchers in the field.

Trends and Observations

Some trends I see for 2018 and beyond:
  • AI and Machine Learning capabilities are growing fast, but AI Hype is exceeding AI reality
  • Enterprise AI is becoming a topic for consideration, but beware of AI Hype
  • Deep Learning will continue its triumphal march
  • Capsule Networks - the latest idea from Hinton - promises to improve upon Convolutional networks for image recognition, especially when image is rotates or transposed.
  • GDPR - European General Data Protection Direction, will come into effect on May 25, 2018, and will have a significant impact on Data Science operations for European and global firms like Google or Facebook that work with personal data.
  • Citizen Data ScientistCitizen Data Scientist term was introduced by Gartner, but I remain very skeptical of citizen data scientists. Data Science can be either fully automated - this is the direction taken by DataRobot, H2O, and other firms, or require training and expert Data Scientists. Would you trust an airplane to be flown by a citizen pilot?
  • Reinforcement Learning will be key to the next level of AI Capabilities, and that will involve Predictive Learning - agents that learn from their actions. See my interview with Rich Sutton, father of Reinforcement Learning
  • Transfer Learning is the key unsolved problem for Machine Learning and Deep Learning. Humans can learn from one example because they have accumulated tremendous knowledge about related situations. There is already some progress with One-shot learning in Computer vision and zero-shot machine translation.
See also KDnuggets series of Predictions for 2018

Artificial General Intelligence (AGI)

AGI Progress in AI and technology seems inexorable and almost all AI and Machine Learning researchers think that AGI is likely to emerge (unless humanity destroys itself in some nuclear war).

When will AGI be here? Ray Kurzweil, a noted futurologist, says AGI will be achieved by 2045 (and Singularity will follow shortly thereafter).

Many experts give similar estimates. AGI in 20 to 50 years was the median estimate in a recent KDnuggets Poll.

A note of caution about such predictions is that people are better at seeing trends than at predicting timing. For example, people can see that a line on a chart is increasing, but are very bad at estimating the slope of the line.

Will singularity follow AGI? I have no idea, but we can have a preview of super-intelligence already achieved in one area, and we will not be able to understand how it thinks.

Chess has long been considered a good testbed for AI - until computers have mastered it. IBM worked for several years to prepare the Deep Blue computer that beat world champion Garry Kasparov in 1997. Last year, until December 2017, the strongest chess player in the world was a program called Stockfish which knew all human openings and searched tens of millions of positions, ~20 moves ahead.

In 2016 Google DeepMind program AlphaGo Zero defeated the best Go player Lee Sedol. In 2017, DeepMind generalized from Go to multiple games and developed a program called AlphaZero, which achieved superhuman abilities in 3 games: Go, Chess, and Shogi.

AlphaZero started with no chess knowledge (hence the name) - just rules of the game. It used Deep Learning, Reinforcement Learning, and massive compute power. It played itself and after 4 hours and a few million games, it reached a superhuman level of chess and was able to beat Stockfish convincingly.

AlphaZero still searches a lot of moves using Monte Carlo Tree Search but it searches a thousand times fewer positions than Stockfish. However, by playing millions of games vs itself, it developed a very superior position evaluation function - what we can call intuition.

I am a moderately strong chess player myself, and I was astonished by AlphaZero games. Here, for example, is a position from Game 10.

AlphaZero vs Stockfish, Dec 2017, Game 10, after 18 moves
AlphaZero vs Stockfish, Dec 2017, Game 10, after 18 moves. White to play.

Here almost every human will make a move 19. Ng4, saving the knight under attack.

AlphaZero, playing White, made a stunning move 19. Re1, sacrificing the knight for an attacking position.

Stockfish, which calculates all the moves to a depth of 20 or so, did not see the danger and took the sacrifice. If a chess grandmaster made a move like 19. Re1, we would call it a beautiful game and praise the player intuition, so it is fair to say that AlphaZero developed its own intuition.

Watch also great commentary on this game by IM Daniel Rensch.

The example of AlphaGo Zero and AlphaZero demonstrates that computers can achieve superhuman ability in a narrow field, and their moves (decisions) will in many cases be different from human moves (decisions). In Go, AlphaGo Zero was applied to grandmaster games after it completed training and it chose the same best move as human grandmasters in only 40% of the times, vs 50% for AlphaGo Zero which was trained using those human games.

If AGI is achieved, it is doubtful that humans will always understand how AGI makes its decisions or be able to predict its moves.


Here are several questions from the audience of the webinar, most of which I did not have time to answer.

Q1: Business Intelligence / Data Mining / Predictive Analytics / Data Science / Machine Learning / Artificial Intelligence. What are the differences?

GP: Data Science Venn Diagram Business Intelligence is primarily concerned with analysis of existing data and generating reports. Data Mining, Predictive Analytics, and Data Science are just different names for the same field.

It is interesting that the "popular" name for this field keeps changing
  • Data Mining in 1970s-1980
  • KDD (Knowledge Discovery in Data), 1990s
  • Predictive Analytics, 2000s
  • Data Science, 2012-
Machine Learning has a large overlap in methods with Data Science but is not the same as Data Science. ML also includes situations where there is an active agent that can learn from its actions (Reinforcement Learning), while Data Science works with existing data.

AI is a much broader field than Machine Learning (eg including computer vision, robotics and hardware aspects), but Machine Learning is the core capability for AI.

See a more detailed explanation in The Data Science Puzzle, Explained.

Q2: I am a Master's student in Mathematics and Business Intelligence. What is your advice to me to join the Data Science field, please? Thanks for advance.

GP: Here are steps to join Data Science - do as many as you can
  • Languages: Learn Python, R, SQL, may be Scala
  • Tools: Learn how to use data science, visualization, and Deep Learning tools
  • Textbooks: Read introductory textbooks to understand the fundamentals, especially of statistics, probability, algebra and calculus (for Deep Learning)
  • Education: watch webinars, take courses, and consider a certificate or a degree in data science / Machine Learning
  • Data: Check available data resources and find something to analyze
  • Code: contribute to some open source Data Science / Machine Learning projects
  • Competitions: Participate in Kaggle or other data science competitions
  • Interact with other data scientists, via social networks, groups, meetups, and conferences
See more details in my post 7 Steps for Learning Data Mining and Data Science

Q3: With the inherent difficulties of social science research, what are your thoughts about using data science in social sciences?

GP: Data Science is especially important in social sciences in two cases:
  • when dealing with small data
  • when dealing with very large data
The first case of small datasets (dozens or hundreds of test subjects) is the situation of traditional Data Science. The pressure to publish and modern data analysis tools make such studies prone to overfitting and mistaking random noise for true findings. This was pointed out by John P. A. Ioannidis in his landmark paper Why Most Published Research Findings Are False (PLoS Medicine, 2005).

When such "studies" are repeated, the results may change. The phenomenon of old results no longer holding has been so widespread that a few years ago some journalists came up with a crazy theories about "cosmic habituation" or or "the decline effect" - the idea that the laws of the universe seem to change when you try to repeat an experiment. Ha!

The explanation is much simpler - the researchers were doing to much data mining without proper controls and overfitting the data.

See also my post Overfitting the S&P 500 The Cardinal Sin of Data Mining and Data Science: Overfitting.

The second case of dealing with web scale data is even more important.

Facebook, Google, and other web giants are constantly testing their algorithms, in effect conducting many large-scale experiments on their users every day. In 2014, Facebook reported on a study where it changed the user feeds of 700K users to be more "positive" or more "negative" and found that this affected their moods - users became sadder or happier.

In 2016, as everyone knows, Facebook was in the center of fake news manipulation.

Facebook and other social media are now cricized that they create an addiction cycle, manipulating users to watch kittens instead of helping them do something useful with their life - see for example Tristan Harris site Time Well Spent.

Partly in response to such critique, Facebook announced in Jan 2018 changes in the news feed to put what friends and family say ahead of news publishers. We can only hope that this will improve the quality of time spent on Facebook.

Social scientists have an unprecedented opportunity to study web-scale data on human interactions, and I hope they use it responsibly.

Q: Data privacy is generally seen as opposite of data mining i.e. data privacy laws hinder data science functions. What are your views about this? Can they both go hand in hand?

With people leaving so much digital crumbs, thru their social media presence, email, web searches, DNA, etc, complete privacy is probably a thing of the past. Even if so-called Personally Identifiable Information (PII) like name, social security number, date and place of birth, are removed from the data, a person can sometimes be identified from other data pointers.

For example, Netflix Prize datasets had anonymous ratings several hundred thousand people. Researchers managed to match some users who rated many obscure movies with non-anonymous users who had similar set of ratings on IMDB, which caused cancellation of the 2nd Netflix Prize.

Now, Facebook and other sites have so much information about their users it can learn their political, sexual and other preferences with very high accuracy, and others can buy or get this information.

Privacy via anonymity has existed only for a limited time in human history, when people started to live in large cities. When people lived in a village, everyone mostly knew what everyone else did - there was little privacy.

Perhaps in the near future, when we all live in the same digital village, privacy will become a virtue in the sense that even if you know everything about your neighbor, it will be a virtue to behave as if you don't know what they don't want you know.

What do you think? Please comment below.