The challenge

The Multi-Genre Broadcast (MGB) Challenge is an evaluation of speech recognition, speaker diarization, and lightly supervised alignment using TV recordings in English and Arabic.

The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology.

In 2015, the challenge used data from the British Broadcasting Corporation (BBC). It was an official challenge of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop. In 2016, the Challenge will feature two new Arabic tracks based on TV data from Aljazeera. Unfortunately, due to ongoing licensing issues, we will not be able to re-run the English MGB Challenge in 2016, but we hope to run a second version in future years.

Why participate?

We hope that participation in the challenge will give:

  • the opportunity for labs around the world to use large quantities of captioned TV data, to build systems with immediate real-world applications
  • the chance to compare the best research systems against other labs using exactly comparable data and conditions
  • a platform on which to evaluate novel approaches to adaptation, diarization and alignment.

Register now!