DeepMind has made software-writing AI that rivals average human coder

DeepMind has made software-writing AI that rivals average human coder

AI company DeepMind has built a tool that can create working code to solve complex software challenges


2 February 2022

Artist’s impression of data

Andriy Onufriyenko/Getty Images

DeepMind, a UK-based AI company, has taught some of its machines to write computer software – and it performs almost as well as an average human programmer when judged in competition.

The new AlphaCode system is claimed by DeepMind to be able to solve software problems that require a combination of logic, critical thinking and the ability to understand natural language. The tool was entered into 10 rounds on the programming competition website Codeforces, where human entrants test their coding skills. In these 10 rounds, AlphaCode placed at about the level of the median competitor. DeepMind says this is the first time an AI code-writing system has reached a competitive level of performance in programming contests.

AlphaCode was created by training a neural network on lots of coding samples, sourced from the software repository GitHub and previous entrants to competitions on Codeforces. When it is presented with a novel problem, it creates a massive number of solutions in both C++ and Python programming languages. It then filters and ranks these into a top 10. When AlphaCode was tested in competition, humans assessed these solutions and submitted the best of them.

Generating code is a particularly thorny problem for AI because it is difficult to assess how near to success a particular output is. Code that crashes and so fails to achieve its goal could be a single character away from a perfectly working solution, and multiple working solutions can appear radically different. Solving programming competitions also requires an AI to extract meaning from the description of a problem written in English.

Microsoft-owned GitHub created a similar but more limited tool last year called Copilot. Millions of people use GitHub to share source code and organise software projects. Copilot took that code and trained a neural network with it, enabling it to solve similar programming problems.

But the tool was controversial as many claimed it could directly plagiarise this training data. Armin Ronacher at software company Sentry found that it was possible to prompt Copilot to suggest copyrighted code from the 1999 computer game Quake III Arena, complete with comments from the original programmer. This code cannot be reused without permission.

At Copilot’s launch, GitHub said that about 0.1 per cent of its code suggestions may contain “some snippets” of verbatim source code from the training set. The company also warned that it is possible for Copilot to output genuine personal data such as phone numbers, email addresses or names, and that outputted code may offer “biased, discriminatory, abusive, or offensive outputs” or include security flaws. It says that code should be vetted and tested before use.

AlphaCode, like Copilot, was first trained on publicly available code hosted on GitHub. It was then fine-tuned on code from programming competitions. DeepMind says that AlphaCode doesn’t copy code from previous examples. Given the examples DeepMind provided in its preprint paper, it does appear to solve problems while only copying slightly more code from training data than humans already do, says Riza Theresa Batista-Navarro at the University of Manchester, UK.

But AlphaCode seems to have been so finely tuned to solve complex challenges that the previous state of the art in AI coding tools can still outperform it on simpler tasks, she says.

“What I noticed is that, while AlphaCode is able to do better than state-of-the-art AIs like GPT on the competition challenges, it does comparatively poorly on the introductory challenges,” says Batista-Navarro. “The assumption is that they wanted to do competition-level programming problems, to tackle more challenging programming problems rather than introductory ones. But this seems to show that the model was fine-tuned so well on the more complicated problems that, in a way, it’s kind of forgotten the introductory level problems.”

DeepMind wasn’t available for interview, but Oriol Vinyals at DeepMind said in a statement: “I never expected ML [machine learning] to achieve about human average amongst competitors. However, it indicates that there is still work to do to achieve the level of the highest performers, and advance the problem-solving capabilities of our AI systems.”


More on these topics: