The Evolution of the Data Scientist
We trace the evolution of Data Science from ancient mathematics to statistics and early neural networks, to present successes like AlphaGo and self-driving car, and look into the future.
Evolution might be considered to be an unusual word to describe the advancement of the Data Scientist. After all, evolution is defined as: “The way in which living things change and develop over millions of years”. I’m certainly not claiming that the Homo erectus could code.However, what we can clearly see, is that there is evolution in the methods, process and technology used by a Data Scientist. Many would contest the true beginnings of statistical modelling,but fewer would argue what that evolutionary lifecycle looks like. From the early days of scratching numbers into papyrus, up to the modern day punching of numbers into a keyboard, Data Science has come a long way.The technology may have changed, the methods may also have changed, but what hasn’t changed going as far back past the industrial revolutions of the 19th and 20th centuries, or past the Renaissance, as far back as the dawn of human kind, is that we’ve always sought to leverage mathematics and statistics to improve the world around us.
Data Science in the form we know it today has only been around since the new millennium, when statisticians who felt that they had unique sets of skills chose to separate themselves from traditional mathematicians and computer scientists. Data Science in its purest form started out as statistics in 800 A.D, when Iraqi mathematician Al Kindi used his own method of statistical analysis for cryptography, also known as code breaking. His work is credited as the first recognised example of frequency analysis, and led the way for other thinkers.
During the 1300s, Florentine banker Giovanni Villani used his extensive records and knowledge of Florence, including population, geography, trade, education, to build a comprehensive guide of the city, which has since been described as the first use of statistics for philanthropic ends. In the 17th century, John Graunt and William Petty created the first life table after studying the population of London. Using only the rates of mortality of London as a marker, Graunt and Petty were able to calculate that the population of London was somewhere around 384,000 people, and that the average family size in London during the 17th Century was 8. These are extraordinarily accurate figures, as despite there being a census in place, there was fluid mobility of groups in and out of the major cities almost every day, with many residents not having one fixed abode.
In the 20th century, statistics became a recognised and prominent field, being used to help quantify the increasingly diverse societies of the 1900s. Some of this work was led by Karl Pearson and Francis Galton, two revered mathematicians who studied societal diversity in terms of height, weight, race, hair colour and more. Galton contributed his knowledge of deviation, correlation and regression analysis, while Pearson pioneered the ‘Pearson product-moment correlation coefficient’ and the ‘Pearson distribution’, which became key in helping to measure a degree of linear dependency.
This research was continued by Ronald Fisher, who was credited with writing the textbooks that defined the academic discipline of statistics. His most famous work, the 1918 paper, “The Correlation between Relatives on the Supposition of Mendelian Inheritance” became one of the cornerstones of statistical academic research at universities all over the world. Fisher also divided opinion with his work, “The Genetical Theory of Natural Selection”, which looked to prove evolutionary theory using statistics. The first real examples of statistics and computer integration were pioneered by Marvin Minsky and Arthur Samuel, two men who are arguably the forefathers of Machine Learning. Minsky created the first randomly wired neural network, code named SNARC, in 1951, while in 1949 Samuel designed a self-learning checkers program designed for a commercial IBM 700 computer. From here on in, the tide began to change, with computers sharing the driving seat with humans in the advancement of statistical analysis.
Fast-forward to the modern day, and the profile of the Data Scientist looks incredibly different. One of the Data Scientists who best represent the modern day landscape is Andrew Ng. Andrew is the Chief Data Scientist at Baidu and a Stanford Professor and a pioneer of Deep Learning, one of the newest advancements in the world of Machine Learning. During his time at the head of Google Brain, he and his team developed some of the most intricate and complex deep neural networks in the world. He has taken this research to Baidu, where he is helping to design their Minwa AI platform, which specialises in image recognition, powered by trained Deep Learning algorithms. His research will help to power many of the visually featured artificial intelligence you will see around the world.
Another example of the modern day Data Scientist is Demis Hassabis. Dennis is a former chess prodigy and neuroscientist, who is head of Google DeepMind, a British AI firm. The computer program they’ve produced has been taught to tackle vintage video games and traditional board games with no human input, again using Deep Learning. It has recently beaten the European champion of the Chinese board game ‘Go’, and has just beaten the world champion. Finally, Sebastian Thrun is a former head of Google X and was the pioneering mind behind the Google Self-Driving car project, which has seen Google’s fleet drive over one million autonomous miles around California. He also worked on implementing probabilistic techniques into robotics, which have since been implemented in commercial products such as robot vacuum cleaners. Amazing to think that these three leaders, who are at the top of their respective fields and are the future of Data Science, despite having an average age of just 42.
The role of the Data Scientist has unequivocally evolved since the field of statistics of over 1200 years ago. Despite the term only existing since the turn of this century, it has already been labelled ‘The Sexiest Job of the 21st Century’, which understandably, has created a queue of applicants stretched around the block. Data Scientists have evolved because advancement is the difference between surviving or dying. As technology becomes better, so does its autonomy. The market is currently awash with predictions and concerns about how far autonomy and self-learning robots will go. The only thing that is for certain is that it’s going to be very exciting to observe. How will the Data Scientist of tomorrow look? Will the advancement of technology and self-learning robots who learn to model and code, mean that Data Scientists themselves will be replaced? Now that would be a cruel ironic twist of fate!
- Automated Data Science and Data Mining
- Deep Learning and the Triumph of Empiricism
- AI Supercomputers: Microsoft Oxford, IBM Watson, Google DeepMind, Baidu Minwa