Bolster AI Models Using Facebook’s Dynabench Benchmarking

Unprecedented artificial intelligence (AI) progression depend on recurrent testing of swaths of data.

Owing to the potential to determine complex patterns in massive amounts of data, deep learning models are now becoming important tools to solve complex data science tasks like natural language processing (NLP) and image classification.

Benchmark testing: what it is

In an ever-changing world of technology, benchmark testing plays a critical role to predict how intelligent AI technology is — to detect weaknesses, and further build stronger and smarter models.

From MNIST to GLUE or ImageNet, benchmarks played a significant role to drive progress in AI research. They provide a specific target for the community to achieve, quantitative measures to compare model performance, and a common objective to exchange ideas.

However, whenever a new benchmark is introduced, it gets saturated easily. The rate at which AI is growing, it is making the existing benchmarks saturate at a rapid pace. As and when a new NLP model is developed, benchmarks tend to fall back.

Benchmarks keep saturating every two months a new NLP model is released. Though for historical reasons, these benchmarks are static. Only in recent times, we’ve noticed it is time-consuming and were expensive to collect. And to place humans and models in the loop together may not be a good idea, since these models were difficult.

Therefore, AI researchers need to spend more time while developing a new benchmark to further improve AI’s performance.

Introducing Facebook Dynabench Benchmarking

It is time to rethink how we need to benchmark machine learning models.

To address this challenge, AI researchers at Facebook released Dynabench, a platform used for data collection and benchmarking. This approach involves both humans and state-of-the-art (SOTA) AI models in a loop to develop a new dataset and detect how often these models can make a mistake when humans try to make a fool of them.

This technique is also called dynamic adversarial data collection, Dynabench can easily demonstrate how humans can fool AI. And according to Facebook, this could be a great determinant of the model’s quality as compared to the current benchmarking features.

Most often rich analytical models are difficult to develop due to multiple interactions between different job interference, application complexity, network topologies, and node components. In such a situation, a machine learning-based performance model helps. Using machine learning algorithms and methods are efficient in determining unknown interaction of the system and application by using application runs. This is where benchmarking plays a critical role to evaluate the right model to be used to bolster AI performance.

“Douwe Kiela, Facebook researcher says, reliance on faulty benchmarks stunts AI growth. You end up with a system that is better at the test than humans are but not better at the overall task. It’s very deceiving because it makes it look like we’re much further than we actually are.”

As a result, the Dynabench metric will demonstrate better AI models in situations that matter most. For instance, when interacting with people who tend to react in a complex situation that cannot be reflected in a fixed set of data points.

The current static benchmark challenges:

👉 Forces the AI community to specifically put their complete focus on one specific task. Whereas, they should not be worrying about a specific metric or a task but rather how efficient the AI system is functioning while people are interacting with them.

👉 When a new benchmark is released, it shouldn’t be made too easy or too hard. This will make it likely to soon become outdated.

👉 Consist of annotation artifacts and inadvertent biases. For instance, modern machine learning algorithms are perfect tools used to exploit biases in benchmark datasets. Therefore, researchers must be careful against overfitting a specific data set.

Now is the right time to improve the way AI researchers do benchmarking.

How Facebook Dynabench Benchmarking Improves AI Models

Dynabench allows AI researchers to exactly determine how perfect NLP models are in present times. As a result, the process yields data that can be used further to train other models.

The core idea behind Dynabench is to leverage human creativity while challenging the models.

Machines are not far too close to comprehend language the way humans do. But in Dynabench, a language model can be made to classify a review for sentiment analysis. Now, the hyperboles of language can easily fool the model. Therefore, what human annotators do is, they keep adding these adversarial examples till the model cannot be fooled by humans.

In this manner, humans are in a continuous loop of every progress the machine makes, unlike traditional benchmarking.

As AI engineers, researchers, and computational linguists start using Dynabench to improve the performance of their AI models, the platform will accurately track which set of examples are fooling the models leading to irrelevant predictions.

What’s next?

Facebook Dynabench will improve current benchmarking practices. Therefore, making lesser mistakes and having less harmful biases.

--

--

--

AI Researcher, Writer, Tech Geek. Contributing to Data Science & Deep Learning Projects. #coding #algorithms #machinelearning

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Ring in the New Year With These Free AI Resources

DeepMind Paper Provides Insights on Detoxifying Large Language Models

The Future of Robots and Artificial Intelligence

New Fujitsu & Inria Tech Automatically Builds Anomaly-Detecting AI Models

AI Adds Colour to Grandma’s Cherished Memories

CMU, Google & UC Berkeley Propose Robust Predictable Control Policies for RL Agents

Transforming employee experience through virtual enterprise assistants

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Albert Christopher

Albert Christopher

AI Researcher, Writer, Tech Geek. Contributing to Data Science & Deep Learning Projects. #coding #algorithms #machinelearning

More from Medium

AI ML in Autonomous Network

ML Security with the Adversarial Robustness Toolbox

🌒 The Dark side of Graph Neural Networks

ANN Benchmarks with Etienne Dilocker — Weaviate podcast #16