Introducing Superalignment by OpenAI

OpenAI’s new dedicated team to steer and control AI systems, to look after the people of the future against superintelligence.

Introducing Superalignment by OpenAI
Image by Author


OpenAI has been in the media a lot, not only because of the release of ChatGPT, GPT-3, and GPT-4. But also surrounding the ethical concerns of AI systems like ChatGPT to the socioeconomics of today's world. 

CEO Sam Altman has addressed the safety around AI several times, such as at a US Senate committee and said:

"I think if this technology goes wrong, it can go quite wrong...we want to be vocal about that. We want to work with the government to prevent that from happening."

With that being said, the team at OpenAI have taken matters into their own hands. Many people are concerned with superintelligence, an AI system that is so intelligent that it surpasses human minds. Some believe that technology could solve a lot of the world's current problems, however with very little information or understanding around it - it is difficult to weigh the pros against the cons. 

It may be too soon to talk about superintelligence, but it is definitely a conversation that needs to be had. The best approach to take is to manage these potential risks earlier on before they become a bigger problem that cannot be handled. 


OpenAI’s Approach


OpenAI has stated that they do not currently have a solution for superintelligent AI, however, it is something that they are working on with their new team Superalignment. They are currently using techniques such as reinforcement learning from human feedback, which heavily relies on humans to supervise AI. However, there are concerns about the future challenges of humans not being able to reliably supervise AI and the need for new scientific breakthroughs to handle this. 

With that being said, OpenAI is looking at building a human-level automated alignment researcher that will be able to learn from human feedback and assist humans in evaluating AI, as well as being able to solve other alignment problems. OpenAI has dedicated 20% of the compute that they have secured to date to this effort, to iteratively align superintelligence.

In order for the superalignment team to be successful in this, they will need to:


1. Develop a Scalable Training Method


They aim to leverage other AI systems to help assist in evaluating other AI systems, along with being able to better understand how models generalize oversight, which humans can’t supervise.


2. Validate the Resulting Model


In order to validate the results of the alignment of the systems, OpenAI plans to automate searches for problematic behavior to refine the robustness of the model, as well as automated interpretability. 


3. Stress Test the Entire Alignment Pipeline


Testing, testing, testing! OpenAI plans to test its entire alignment process by deliberately training misaligned models. This will ensure that the techniques used will be able to detect any form of misalignment, specifically the worst kind of adversarial testing. 

OpenAI has already gone through preliminary experiments, which have shown good results. They aim to progress on these using useful metrics and the continued work of studying models. 


Wrapping it up


OpenAI aims to create a future in which AI systems and humans can live harmoniously without one another feeling endangered. The development of the superalignment team is an ambitious goal, however, it will provide evidence to the wider community about the use of machine learning and being able to create a safe environment.
Nisha Arya is a Data Scientist, Freelance Technical Writer and Community Manager at KDnuggets. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.