MGB Challenge

The challenge

The Multi-Genre Broadcast (MGB) Challenge is an evaluation of speech recognition, speaker diarization, dialect detection and lightly supervised alignment using TV recordings in English and Arabic.

The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology.

In 2015, the challenge used data from the British Broadcasting Corporation (BBC). It was an official challenge of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop. In 2016, the challenge featured two new Arabic tracks based on TV data from Aljazeera. It was an official challenge at the 2016 IEEE Workshop on Spoken Language Technology. In 2017, the challenge covered Aljazeera Arabic, and youtube recordings for the multi-genre speech-to-text and dialect identifcation. It is an official challenge at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop.

The third version of the challenge will again run as an official challenge of the IEEE Automatic Speech Recognition and Understanding Workshop and will feature both English and Arabic data. Unfortunately, due to licensing issues, the English training and test sets will not be the same as those in the 2015 challenge.

The Arabic task for the MGB-3 is now ready. You can sign up for to receive announcements at lists.inf.ed.ac.uk/mailman/listinfo/mgb-challenge.

Why participate?

We hope that participation in the challenge will give:

the opportunity for labs around the world to use large quantities of captioned TV data, to build systems with immediate real-world applications
the chance to compare the best research systems against other labs using exactly comparable data and conditions
a platform on which to evaluate novel approaches to adaptation, diarization and alignment.