Dive into the Future with Kaggle’s AI Report 2023 – See What’s Hot

Dive into the world's largest data science and machine learning community on what they have learnt about the world of AI.

By Nisha Arya, Contributing Editor & Marketing and Client Success Manager on November 3, 2023 in Artificial Intelligence

Dive into the Future with Kaggle's AI Report 2023 – See What's Hot

Image by Editor

On May 12 2023, Kaggle opened up a competition where the Kaggle community can participate in building a report that will summarize the rapid advancements in AI from the past two years. The Kaggle community is a diverse group that has a variety of experiences within the depths of AI.

Participants were asked to write an essay on a particular topic based on the changes and developments over the past 2 years, for example, Generative AI, AI ethics and more.

The report is here and is made up of the following sections:

Generative AI
Text Data
Image & Video Data
Tabular & Time Series Data
Kaggle Competitions
AI Ethics

So let’s dive into what we’ve learnt…

Generative AI

Generative AI has been a popular topic of conversation recently. This starting section dives into the rapid progress and applications of Generative AI in the past 2 years. We have seen advancements such as text generation, image creation and music development using tools and techniques such as GANs and LLMs.

This has only been possible with the use of larger datasets and improved hardware for enhancing algorithms during their training phase. Although Generative AI is still in its early stage, it has shown in the past year alone how it is revolutionizing different industries. There are still ethical concerns to take into consideration such as privacy concerns, misinformation, and use of these AI systems.

Have a further read in the different essays:

Text Data

With the hype around Generative AI, there has been a major rise of interest in Natural Language Processing (NLP) due to the rise of large language models (LLMs). Naturally, the next section of the Kaggle AI report focuses on NLP techniques and their use in various tasks such as summarisation and translation.

If we take it back, early approaches to text-based tasks included term-frequency-based feature engineering in conjunction with non-neural network-based machine learning methods. Now we are catering to larger datasets which undergo learning word representation for model interpretation.

The use of the internet data as a training corpus has allowed these models to learn better, and produce better performance in areas such as transfer learning. Within Kaggle competitions, there has been a trend in fine-tuning publicly available models which have shown to surpass human-level performance.

The following top essays focus on the emergence and recent techniques of LLMs:

Image & Video Data

Just like text data being used in tasks such as content generation, image and video generation has been very popular too. Computer vision has been around for a long time, but in recent years it has skyrocketed. We can now handle tasks such as object detection and more.

This section dives into model architectures as well as common practices used in computer vision such as augmentation. Used in a variety of different industries such as healthcare for medical imaging, computer vision still has its challenges within areas such as deep fakes, ethical and philosophical considerations, limitations of multi-modal models and more.

We have models such as the Segment Anything Model (SAM) and YOLO (You Only Look Once) which have shown how generalized, open-source models can be adapted for different and unique tasks.

Dive into the advances in image and video data with these essays:

Tabular & Time Series Data

The next section dives into the historical significance of tabular data and time series data. Both of these have not been widely popular in the past few years as they have not had the same impact as the deep learning revolution. However, there are still widely used and very effective, trending in areas such as:

Unique approach for individual datasets/problems
Importance of data preprocessing and feature engineering
The dominance of gradient-boosted trees

Within the Kaggle community, these trends have been highly recognised and the following essays will dive into these as well as the unique challenges tabular and time series data come across.

Kaggle Competitions

A part of this report from the Kaggle community was to also analyze Kaggle competitions by looking into its developments and the community's observations of it in the past 2 years. Kaggle competitions have been widely popular over the years as the community has used the platform to test their skills, build a portfolio and prepare for the real world.

Observations of changes in Kaggle competitions are techniques such as pseudo labeling, seed averaging, and hill climbing which were once upon a time considered "tricks," but have now become common practices. Kaggle competitions over the past 2 years have become more competitive and competitions such as RSNA, Learning Agency and more are very popular.

Dive into the winning tricks of Kaggle competitions:

AI Ethics

Ethics around AI is also another area of concern, with a lot of people from society having mixed emotions about the use and implementation of AI systems. Organizations are looking into the ethical principles of AI and creating new strategies to ensure that they can not only understand the AI systems but also be able to monitor and mitigate risks.

It is not an academic study but a societal one, there are many opinions which are important to understand the world of AI and how it can still be used whilst safeguarding society's values. We have seen organizations undergo continuous auditing of their AI systems with the adoption of ethics-by-design.

Learn more about the challenges around AI and the impact it is having on society:

Wrapping it up

The Kaggle team has created a unique report in which it has allowed its community to express their opinions and experience of the world of AI and its changes in the last 2 years. Let us know if there was a particular section or essay you found very interesting!

Nisha Arya is a data scientist, freelance technical writer, and an editor and community manager for KDnuggets. She is particularly interested in providing data science career advice or tutorials and theory-based knowledge around data science. Nisha covers a wide range of topics and wishes to explore the different ways artificial intelligence can benefit the longevity of human life. A keen learner, Nisha seeks to broaden her tech knowledge and writing skills, while helping guide others.