The First Half of 2023: Data Science and AI Developments
6 months of 2023 has gone by like that. Here’s a recap of what the major data science and AI advancements have been in the first half of 2023.
Photo by Tara Winstead
A lot has happened in the first half of 2023. There have been significant advancements in data science and artificial intelligence. So much that it’s been hard for us to keep up with them all. We can definitely say that the first half of 2023 has shown rapid progress that we did not expect.
So rather than talking too much about how we’re all woo’d by these innovations, let’s talk about them.
Natural Language Processing
I’m going to start off with the most obvious. Natural Language Processing (NLP). Something that was building in the dark, and in the year 2023 has come to light.
These advancements were proven in OpenAI’s ChatGPT, which took the world by storm. Since their official release earlier on in the year, ChatGPT has moved from GPT-4 and now we’re expecting GPT-5. They have released plugins to improve people's day-to-day lives, and workflows for data scientists and machine learning engineers.
And we all know after ChatGPT released, Google released Bard AI which has proven to be successful amongst people, businesses, and more. Bard AI has been competing with ChatGPT for the best chatbot position, providing similar services such as improving tasks for machine learning engineers.
In the midst of the release of these chatbots, we have seen large language models (LLM) drop out of thin air. Large Model Systems Organization (LMSYS Org), an open research organization founded by students and faculty from UC Berkeley created ChatBot Arena - a LLM benchmark to make models more accessible to everyone using a method of co-development using open datasets, models, systems, and evaluation tools.
So now people are getting used to chatbots that answer questions for them and make their work and personal life much easier - what about data analysts and machine learning specialists?
Well they’ve been using AutoML - a powerful tool for data professionals such as data scientists and machine learning engineers to automate data preprocessing, hyperparameter tuning, and perform complex tasks such as feature engineering. With the advancements in data science and AI, naturally we have seen a high demand for data and AI specialists. However, as the progress is moving at a rapid rate, we are seeing a shortage of these AI professionals. Therefore, being able to find ways to explore, analyze, and predict data in an automated process will improve the success of a lot of companies.
Not only will it be able to free up time for data specialists, but organizations will have more time to expand and be more innovative on other tasks.
If you were around for the outburst of chatbots, you would have seen the words ‘Generative AI’ being thrown around. Generative AI is capable of generating text, images, or other forms of media based on user prompts. Just like the above advancements, generative AI is helping different industries with tasks to make their lives easier.
It has the ability to produce new content, replace repetitive tasks, work on customized data, and pretty much generate anything you want. If generative AI is new to you, you will want to learn about Stable Diffusion - it is the foundation behind generative AI. If you are a data scientist or data analyst, you may have heard of PandasAI - the generative AI python library, if not it is an open-source toolkit which integrates generative AI capabilities into Pandas for simpler data analysis.
But with these generative AI tools and softwares being released, Are Data Scientists Still Needed in the Age of Generative AI?
Deep Learning is continuing to thrive. With the recent advancements in data science and AI, more time and energy is being pumped into research of the industry. As a subset of machine learning concerned with algorithms and artificial neural networks, it is widely being used in tasks such as image classification, object detection, and face recognition.
As we’re experiencing the 4th industrial revolution, deep learning algorithms are allowing us to learn from data the same way humans do. We are seeing more self-driving cars on the roads, fraud detection tools, virtual assistants, healthcare predictive modeling, and more.
2023 has proven to show the works of deep learning through automated processes, robotics, blockchain, and various other technologies.
With all these that are happening, you must think these computers are pretty tired right? In order to meet the advancements of AI and data science, companies require computers and systems that can help to support them. Edge computing brings computation and data storage closer to the sources of data. When working with these advanced models, edge computing provides real-time data processing and allows for smooth communication between all devices.
For example, when LLMs were getting released every two seconds, it was obvious that organizations would require effective systems such as edge computing to be successful. Google released TPU v4 this year - computing resources that can handle the high computational needs of machine learning and artificial intelligence.
Due to these advancements, we are seeing more organizations move from the cloud to edge to fit their current and future requirements.
Ethical AI and Data Science
A lot has been happening, and it’s been happening in a short period of time. It’s becoming very difficult for organizations such as the government to keep up. Governments from around the world are raising the question of ‘how do these AI applications affect the economy and society, and what are the implications?’.
People are concerned about the bias and discrimination, privacy, transparency, and security of these AI and data science applications. So what are the ethical aspects of AI and data science, and what should we expect in the future?
We already have the European AI Act pushing a framework that groups AI systems into 4 risk areas. OpenAI CEO Sam Altman testified about the concerns and possible pitfalls of the new technology at a US Senate committee on Tuesday the 16th. Although there are a lot of advancements happening in a short period of time, a lot of people are concerned. Over the next 6 months we can expect a few more laws getting passed and regulations and frameworks being put into place.
Wrapping it up
If you haven’t been keeping up with AI and data science in the last 6 months, I hope this article has provided you with a quick breakdown of what’s been going on. It will be interesting to see over the next 6 months how these advancements get embraced whilst being able to ensure responsible and ethical use of these technologies.
Nisha Arya is a Data Scientist, Freelance Technical Writer and Community Manager at KDnuggets. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.