Behind OpenAI Codex: 5 Fascinating Challenges About Building Codex You Didn’t Know About

Some ML engineering and modeling challenges encountering during the construction of Codex.

OpenAI Codex


A couple of weeks ago, OpenAI astonished the artificial intelligence(AI) world with the release of Codex, a massive model that can translate natural language into code. Codex can effectively generate end to end from basic language instructions. If you don’t believe me, you should watch this video which can be considered one of the best AI demos of all time 😉

Video Credit: OpenAI


A lot has been written about Codex’s capabilities since its initial launch.

However, I have been more intrigued by the small requirements that become incredibly relevant to build a model of this magnitude. Deep diving into Codex, there are a few interesting things I found that thought would be good to highlight:


1. Codex is proficient in about a dozen languages but it was trained for Python

I found this incredibly insightful. The original goal of OpenAI was to make Codex proficient in Python but it turns out that the model picked up other languages during the pretraining process. This speaks to the unique capabilities of language pretrained models.


2. Testing Codex’s was more than tricky

The AI community has been amazed by the research behind Codex but I think the engineering side has been as impressive. One aspect that I was particularly intrigued about was the testing part. How in the world do you test live code without taking massive risks. It turns out that the OpenAI team put a ton of work building very sophisticated sandboxes to test the outputs from Codex in isolation.


3. Matching semantics to code is far from trivial

Training a model in all the source code in the world sounds cool but its far from trivial. After all, not all code is created equal. Code in Github can be poorly documented while notebooks can have rich semantic information. Similarly, code snippets in Stack Overflow have richer levels of semantic information. Mapping code sections to language semantics was one of the challenges of building Codex.


4. Codex still struggles with task decomposition

If you think how programmers work, we tend to decompose a problem into smaller tasks and produce code for those. It turns out that Codex is great at the latter but still struggles in problem decomposition tasks. This shouldn’t be surprising if we think that problem decomposition requires very complex cognitive skills.


5. Supervised Fine-Tuning was a huge part of building Codex

Code in the internet appears in all sorts of levels of completeness, documentation, syntactic richness etc. Training a model in such a diverse code sets can produce unreliable results. In that sense OpenAI had to undergo a massive supervised fine-tuning effort.

These are some of the aspects about Codex that are not super well-known but that have been major contributors to the success of the first version of the model. Codex success was both due to advanced ML research as a massive ML engineering and infrastructure efforts.

Bio: Jesus Rodriguez is currently a CTO at Intotheblock. He is a technology expert, executive investor and startup advisor. Jesus founded Tellago, an award winning software development firm focused helping companies become great software organizations by leveraging new enterprise software trends.

Original. Reposted with permission.