The challenge

The Multi-Genre Broadcast (MGB) Challenge is an evaluation of speech recognition, speaker diarization, and lightly supervised alignment using TV recordings in English and Arabic.

The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology.

In 2015, the challenge used data from the British Broadcasting Corporation (BBC). It was an official challenge of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop. In 2016, the Challenge featured two new Arabic tracks based on TV data from Aljazeera. It is an official challenge at the 2016 IEEE Workshop on Spoken Language Technology.

The third version of the challenge will again run as an official challenge of the IEEE Automatic Speech Recognition and Understanding Workshop and will feature both English and Arabic data. Unfortunately, due to licensing issues, the English training and test sets will not be the same as those in the 2015 challenge.

We are not yet ready to distribute the data for the 2017 Challenge, but you can sign up for to receive announcements at

Why participate?

We hope that participation in the challenge will give:

  • the opportunity for labs around the world to use large quantities of captioned TV data, to build systems with immediate real-world applications
  • the chance to compare the best research systems against other labs using exactly comparable data and conditions
  • a platform on which to evaluate novel approaches to adaptation, diarization and alignment.

Register now!