MGB Challenge

The challenge

The Multi-Genre Broadcast (MGB) Challenge is an evaluation of speech recognition, speaker diarization, dialect identifcation and lightly supervised alignment using TV recordings and Youtube data.

The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology.

MGB-1

The first edition of the Multi-Genre Broadcast (MGB-1) Challenge is an evaluation of speech recognition, speaker diarization, and lightly supervised alignment using TV recordings in English.

The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology.

In 2015, the challenge used data from the British Broadcasting Corporation (BBC). It was an official challenge of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop.

MGB-2

The second edition of the Multi-Genre Broadcast (MGB-2) Challenge is an evaluation of speech recognition and lightly supervised alignment using TV recordings in Arabic.

The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology.

In 2016, the challenge featured two new Arabic tracks based on TV data from Aljazeera. It was an official challenge at the 2016 IEEE Workshop on Spoken Language Technology.

MGB-3

The third edition of the Multi-Genre Broadcast (MGB-3 speech recognition challene in the wild) is an evaluation of speech recognition and 5-classes dialect identification using youtube recordings in dialectal Arabic.

The MGB-3 is using 16 hours multi-genre data collected from different YouTube channels.

In 2017, the challenge featured two new Arabic tracks based on TV data from Aljazeera as well as youtube recordings. It was an official challenge at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop.

MGB-5

The fifth edition of the Multi-Genre Broadcast (MGB-5) is an evaluation of speech recognition and 17-classes dialect identification using youtube recordings in dialectal Arabic.

The MGB-5 is using 16 hours multi-genre data collected from different YouTube channels.

In 2019, the challenge features two new Arabic tracks based on youtube recordings. It was an official challenge at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop.

You can sign up for to receive announcements at lists.inf.ed.ac.uk/mailman/listinfo/mgb-challenge.

Why participate?

We hope that participation in the challenge will give:

the opportunity for labs around the world to use large quantities of captioned TV data, to build systems with immediate real-world applications
the chance to compare the best research systems against other labs using exactly comparable data and conditions
a platform on which to evaluate novel approaches to adaptation, diarization and alignment.