Improved algorithms may be more important for AI performance than faster hardware
A new MIT study finds that algorithmic improvements are more beneficial than powerful hardware in AI, at least at a certain point. …
The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!
When it comes to AI, algorithmic innovations are substantially more important than hardware â at least where the problems involve billions to trillions of data points. Thatâs the conclusion of a team of scientists at MITâs Computer Science and Artificial Intelligence Laboratory (CSAIL), who conducted what they claim is the first study on how fast algorithms are improving across a broad range of examples.
Algorithms tell software how to make sense of text, visual, and audio data so that they can, in turn, draw inferences from it. For example, OpenAIâs GPT-3 was trained on webpages, ebooks, and other documents to learn how to write papers in a humanlike way. The more efficient the algorithm, the less work the software has to do. And as algorithms are enhanced, less computing power should be needed â in theory. But this isnât settled science. AI research and infrastructure startups like OpenAI and Cerberus are betting that algorithms will have to increase in size substantially to reach higher levels of sophistication.
The CSAIL team, led by MIT research scientist Neil Thompson, who previously coauthored a paper showing that algorithms were approaching the limits of modern computing hardware, analyzed data from 57 computer science textbooks and more than 1,110 research papers to trace the history of where algorithms improved. In total, they looked at 113 âalgorithm families,â or sets of algorithms that solved the same problem, that had been highlighted as most important by the textbooks.
The team reconstructed the history of the 113, tracking each time a new algorithm was proposed for a problem and making special note of those that were more efficient. Starting from the 1940s to now, the team found an average of eight algorithms per family of which a couple improved in efficiency.
For large computing problems, 43% of algorithm families had year-on-year improvements that were equal to or larger than the gains from Mooreâs law, the principle that the speed of computers roughly doubles every two years. In 14% of problems, the performance improvements vastly outpaced those that came from improved hardware, with the gains from better algorithms being particularly meaningful for big data problems.
Growing evidence
The new MIT study adds to a growing body of evidence that the size of algorithms matters less than their architectural complexity. For example, earlier this month, a team of Google researchers published a study claiming that a model much smaller than GPT-3 â fine-tuned language net (FLAN) â bests GPT-3 by a large margin on a number of challenging benchmarks. And in a 2020 survey, OpenAI found that since 2012, the amount of compute needed to train an AI model to the same performance on classifying images in a popular benchmark, ImageNet, has been decreasing by a factor of two every 16 months.
Thereâs findings to the contrary. In 2018, OpenAI researchers released a separate analysis showing that from 2012 to 2018, the amount of compute used in the largest AI training runs grew more than 300,000 times with a 3.5-month doubling time, exceeding the pace of Mooreâs law. But assuming algorithmic improvements receive greater attention in the years to come, they could solve some of the other problems associated with large language models, like environmental impact and cost.
In June 2020, researchers at the University of Massachusetts at Amherst released a report estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide, equivalent to nearly 5 times the lifetime emissions of the average U.S. car. GPT-3 alone used 1,287 megawatts during training and produced 552 metric tons of carbon dioxide emissions, a Google study found â the same amount emitted by 100 average homesâ electricity usage over a year.
On the expenses side, a Synced report estimated that the University of Washingtonâs Grover fake news detection model cost $25,000 to train; OpenAI reportedly racked up $12 million training GPT-3; and Google spent around $6,912 to train BERT. While AI training costs dropped 100-fold between 2017 and 2019, according to one source, these amounts far exceed the computing budgets of most startups and institutions â let alone independent researchers.
âThrough our analysis, we were able to say how many more tasks could be done using the same amount of computing power after an algorithm improved,â Thompson said in a press release. âIn an era where the environmental footprint of computing is increasingly worrisome, this is a way to improve businesses and other organizations without the downside.â
VentureBeat
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more