Davis Blalock, a computer science graduate student at the Massachusetts Institute of Technology (MIT) told Science magazine that some of the gains may not exist at all.
Blalock and his mates compared dozens of approaches to improving neural networks—software architectures that loosely mimic the brain and found that it wasn’t obvious what the state of the art even was.
The researchers evaluated 81 pruning algorithms, programs that make neural networks more efficient by trimming unneeded connections. All claimed superiority in slightly different ways. But they were rarely compared properly—and when the researchers tried to evaluate them side by side, there was no clear evidence of performance improvements over a 10 year period.
The result, presented in March at the Machine Learning and Systems conference, surprised Blalock’s Ph.D. adviser, MIT computer scientist John Guttag, who says the uneven comparisons themselves may explain the stagnation. “It’s the old saw, right?” Guttag said. “If you can’t measure something, it’s hard to make it better.”
Researchers are waking up to the signs of shaky progress across many subfields of AI. A 2019 meta-analysis of information retrieval algorithms used in search engines concluded the “high-water mark … was actually set in 2009”. Another study in 2019 reproduced seven neural network recommendation systems, of the kind used by media streaming services.
It found that six failed to outperform much simpler, non-neural algorithms developed years before, when the earlier techniques were fine-tuned, revealing “phantom progress” in the field.
In another paper posted on arXiv, Kevin Musgrave, a computer scientist at Cornell University, took a look at loss functions, the part of an algorithm that mathematically specifies its objective. Musgrave compared a dozen of them in image retrieval, and found that, contrary to their developers’ claims, accuracy had not improved since 2006. “There’s always been these waves of hype”, Musgrave says.
Gains in machine-learning algorithms can come from fundamental changes in their architecture, loss function, or optimisation strategy—how they use feedback to improve. But subtle tweaks to any of these can also boost performance, says Zico Kolter, a computer scientist at Carnegie Mellon University who studies image-recognition models trained to be immune to “adversarial attacks” by a hacker.
An early adversarial training method known as projected gradient descent (PGD), in which a model is simply trained on both real and deceptive examples, seemed to have been surpassed by more complex methods. But in a February arXiv paper, Kolter and his colleagues found that all of the methods performed about the same when a simple trick was used to enhance them.
Old dogs, new tricks
After modest tweaks, old image-retrieval algorithms perform as well as new ones, suggesting little actual innovation.