Gold Blog20 AI, Data Science, Machine Learning Terms You Need to Know in 2020 (Part 1)

2020 is well underway, and we bring you 20 AI, data science, and machine learning terms we should all be familiar with as the year marches onward.

In the past, KDnuggets has covered collections of key terms, including those for machine learning, deep learning, big data, natural language processing, and more. As we get into a new year, and as we have not published any collections of key terms in the recent past, we thought it would be a good idea to highlight some AI, data science, and machine learning terms that we should all now be familiar with in the constantly evolving landscape.

As such, these terms are a combination of some more recently-emerging concepts, as well as existing concepts which may be of perceived increased importance of late. The definitions for these are a combined effort from the KDnuggets team, including Gregory Piatetsky, Asel Mendis, Matthew Dearing, and myself, Matthew Mayo.

See also 20 AI, Data Science, Machine Learning Terms You Need to Know in 2020 (Part 2)

And so without any further ado, here are the first 10 terms you need to know, with the second 10 coming next week, giving us a total of 20 terms to know for 2020.




Automated machine learning (AutoML) spans the fairly wide chasm of tasks which could reasonably be thought of as being included within a machine learning pipeline.

An AutoML "solution" could include the tasks of data preprocessing, feature engineering, algorithm selection, algorithm architecture search, and hyperparameter tuning, or some subset or variation of these distinct tasks. Thus, automated machine learning can now be thought of as anything from solely performing a single task, such as automated feature engineering, all the way through to a fully-automated pipeline, from data preprocessing, to feature engineering, to algorithm selection, and so on.

To put it another way — and to be honest, my favorite way — if, as Sebastian Raschka has described it, computer programming is about automation, and machine learning is "all about automating automation," then automated machine learning is "the automation of automating automation." Follow me, here: programming relieves us by managing rote tasks; machine learning allows computers to learn how to best perform these rote tasks; automated machine learning allows for computers to learn how to optimize the outcome of learning how to perform these rote actions.

This is a very powerful idea; while we previously have had to worry about tuning parameters and hyperparameters, manually engineering features, performing algorithm selection, and the like, automated machine learning systems can learn the best way to tweak these processes for optimal outcomes by a number of different possible methods.

"Regular" programming is data and rules in, answers out; machine learning is data and answers in, rules out; automated machine learning involves automating the optimization of some set of constraints to "best" get from data and answers to rules, defining "best" with any metric you like.


The Bayesian methodology allows us to apply probability distributions to model the real world and update our beliefs as new data becomes available to us. For years statisticians have generally relied on the frequentist approach. The Bayesian approach is suitable for modelling a hypothesis that has only a small amount of data that may not be significant in the eyes of a frequentist.

This explanation by Brandon Rohrer is a great simple example on how the Bayesian approach works:

Imagine you are at the movies and a fellow moviegoer drops their ticket. You want to get their attention. This is what they look like from behind. You can’t tell their gender, only that they have long hair. Do you call out “Excuse me ma’am!” or “Excuse me sir!” Given what you know about men’s and women’s hairstyles in your area, you might assume that this is a woman. (In this oversimplification, there are only two hair lengths and genders.). Now consider a variation of the situation where this person is standing in line for the men’s restroom. With this additional piece of information, you would probably assume that this is a man. This use of common sense and background knowledge is something that we do without thinking. Bayesian inference is a way to capture this in math so that we can make more accurate predictions. -Brandon Rohrer


BERT stands for Bidirectional Encoder Representations from Transformers, and is a pretraining technique for natural language processing. What sets BERT apart from other language representations is the application of bidirectional training to the existing Transformer attention model. BERT pre-trains deep bidirectional representations of unlabeled text data on both left and right contexts, resulting in a language model which can be fine-tuned with only a single added layer. BERT has achieved state of the art performance on a number of NLP tasks, including question answering and inference. Both BERT and the Transformer were developed by Google.

It should be intuitive that training a language model bidirectionally on text as opposed to left to right (or right to left) would result in some better sense of language "understanding" and word meaning. Bidirectionality allows for the learning of word meaning based on the entirety of its surroundings, as opposed to making determinations based on what can be gleaned from "reading" from one direction up to the point of a given word occurrence. Therefore, words with different meanings in different contexts can be treated separately and a better representation of their contextual meaning captured (think "bank" of a river versus a "bank" where you keep your money).

Practically, BERT can be used to extract features from text in the form of word or sentence embeddings, or the BERT models can be fine-tuned on additional data for specific tasks such as question answering or text classification. BERT is available in a few different sized models (numbers of parameters), and has inspired an additional family of BERT-related models, such as RoBERTa and DistilBERT.

For a full treatment and practical tutorial using BERT, see Chris McCormick and Nick Ryan's fantastic write-up.


CCPA, which stands for California Consumer Privacy Act, which went into effect on Jan 1, 2020, has important implications to businesses that collect personal data, and by implications to businesses that analyze and process such data. It is similar in intent to GDPR but offers California consumers stronger protections. CCPA allows any California consumer to demand to see what information a company has on them, and a full list of third parties with whom this information has been shared. California consumers can also access their personal data, say no to the sale of their personal data, and request a company to delete any part of their personal information that a company has.

It applies to any business that collect consumers' personal data, does business in California, and satisfies at least one of the following :

  • Annual gross revenues over $25 million;
  • Buys or sells the personal information of 50,000 or more California consumers or households
  • Earns over 50% of its annual revenue from selling California consumers' personal information

For more information, see Wikipedia entry on CCPA

Data Engineer

Data engineers are the ones responsible for optimizing and managing the storage and retrieval of an organization’s data. A data engineer would set out the road map on how best to acquire the data and create databases for storing it. They would typically deal with cloud services to optimize data storage and create algorithms to make sense out of the data. The role of a data engineer is highly technical and requires advanced knowledge in SQL, database design and computer science.

There is an increasing trend for data engineers to be cloud certified to create databases in the cloud and handle large complex datasets in a cloud environment to scale and optimize data retrieval.


Deep Fakes are fake images, video, or audio that have been created using advanced Deep Learning and Generative Adversarial Networks GANs technology. This technology is so advanced that the results are very realistic and are very hard to identify as fake. Here is an example of Deepfake using Obama's image and voice:

Deepfakes first became prominent in the context of porn, with popular celebrities faces superimposed on top of adult videos, but recently the technology has progressed with apps like FakeApp and more recent open-source alternatives like FaceSwap and DeepFaceLab.

For voice, it used to require several minutes of speech, but recently technology can generate convincing voice imitation from only a few seconds of speech. In a first of its kind cybercrime, in Sep 2019 a company was tricked to pay $243K by scammers who used deep fake technology to imitate the CEO's voice.

Deepfakes have already been used in political disinformation campaigns for creating fake images for bot profiles, but are likely to be used in 2020 to spread disinformation about candidates using their fake voice and images.

There is now an arms race between creators of DF and web companies that try to identify them. Facebook, together with several other companies has announced a $10M contest to produce technology to identify Deepfakes. Stay tuned and don't automatically trust everything you see on the web - check its source.

Deployment/Productionizing models

In this age of machine learning, deep learning and AI, the final objective of the process is to deploy the it to put it in the hands of the final consumer. There are many services available to deploy a model through the web such as Heroku, AWS, Azure, GCP, Github etc. Different providers have different costing and provide slightly different services. Deployment and putting models into production will require some knowledge of Front-end and Back-end development as well to a certain extent as well to be able to work in teams.

Many models are now being deployed using cloud computing providers due to their ease of scaling to millions of users while being able to monitor the cost of scaling to such a level. A model in production allows an organization to monetize it and create better value for their customers.

Graph Neural Networks

Data Scientists are swimming in data. Piles of data. Some data can be raw and unorganized as it streams in through a firehose. Other data can be neat and orderly (or curated to be so), formatted within manageable dimensions. With these “Euclidean” data sets, such as text, images, and videos, machine learning has provided much success in applications for text generation, image manipulation, and facial recognition. Pair deep learning models run on a GPU or two with a mountain of training data, and the possibilities for discovering hidden patterns and meaningful features in the data appear infinite.

What about data that is more interrelated? Data can be connected to one other through dependent relationships. Interactions between users might impact purchasing decisions on e-commerce platforms. Chemical interactions for drug discovery are mapped out through complex interconnections of reactions. Social networks are formed and devolve through ever-changing, irregular, and unordered relationships. The human brain is built on individually communicating cells connected through an intertwined ball of spaghetti.

These sorts of data relationships can be modeled as graphs with data points represented as nodes and relationships encoded through interconnecting links. Traditional machine learning approaches, including deep learning, need to be generalized further to compute within a non-Euclidean, graph-based space. While some related work was performed earlier, the notion of a graph neural network (GNN) was defined by Margo Gori and team in 2005, followed by more research that expanded into the development of graph versions of recurrent and convolutional neural networks. Deep learning research is now actively working to apply the GNN approach to data-as-spaghetti sources and is an area of study that should be watched closely in 2020.

MLOps & AIOps

Following the expansive success of DevOps in IT organizations that merges the processes of software developers with IT service delivery, this term has been elevated to a present-day cultural buzzword. Not soon after most buzzwords take root, will new contexts or areas of applicability latch on to the hype.

Such is so with the term MLOps to represent the latest best practices for developing and deploying machine learning models through effective collaborations with data scientists and IT professionals. Working within a well-defined development lifecycle should be quite welcome for many data scientists, as formal and self-guided educational tracks tend to focus on the fundaments of AI and ML, with less familiarity provided for the requirements of production deployments.

Broadening this scope of leveraging artificial intelligence into the operations of an organization is AIOps that pulls in any and all machine learning technologies to extract meaningful insights from IT systems. This approach combines the intelligence of humans with that of AI algorithms to enhance IT teams in making better and faster decisions, responding to incidents in real-time, and developing optimized applications to facilitate more effective or automated business processes. With Gartner’s prediction that only 30% of large enterprise CIOs will be exclusively using AIOps to improve operations by 2023, there will be much more to be seen from the evolution of AIOps across IT organizations.

Transfer Learning

Consider the following pair of issues that can arise while training machine learning models. First, often there is not enough training data available to sufficiently train a model. Second, even (and especially) if sufficient amounts of training data exists, training process is often resource- and time-consuming.

If you factor in that machine learning models are generally trained on specific data for a particular task, and that resulting models are task-specific, the maximum potential of these models is often not realized. Once data and compute are leveraged to train a model, why not use this model in as many situations as possible? Why not transfer what has been learned to a new application? Could highly-optimized trained models not be additionally exploited for a wider assortment of tasks?

Transfer learning involves the leveraging of existing machine learning models for use in scenarios in which the models were not originally trained. Much as humans do not discard everything they have previously learned and start fresh each time they take up a new task, transfer learning allows a machine learning model to port the "knowledge" it has acquired during training to new tasks, extending the reach of the combination of computation and expertise having been used as fuel for the original model. Simply put, transfer learning can save training time and extend the usefulness of existing machine learning models. It is also an invaluable technique for tasks where the large amounts of training data typically required for training a model from scratch are not available.

Considering time- and compute-consumption, transfer learning allows us to better maximize the usefulness of a model. Concerning the insufficiency of training data, transfer learning allows us to take pretrained models trained on potentially massive amounts of data and tweak them on the smaller amounts of task-specific data available. Transfer learning is an effective approach to managing 2 distinct potential shortcomings in machine learning model training, and so it should be no surprise why it is becoming increasingly more used.

This answer is adapted in part from my foreword to the book Transfer Learning with Python from Packt Publishing.