MLCommons and Hugging Face team up to release massive speech data set for AI research

MLCommons, a nonprofit AI safety working group, has teamed up with AI dev platform Hugging Face to release one of the world’s largest collections of public domain voice recordings for AI research.

The data set, called Unsupervised People’s Speech, contains more than a million hours of audio spanning at least 89 different languages. MLCommons says it was motivated to create it by a desire to support R&D in “various areas of speech technology.”

“Supporting broader natural language processing research for languages other than English helps bring communication technologies to more people globally,” the organization wrote in a blog post Thursday. “We anticipate several avenues for the research community to continue to build and develop, especially in the…

Source link

Leave a Comment