Researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and elsewhere have developed a new technique for analyzing unlabeled audio and visual data that could improve the performance of machine-learning models used in applications like speech recognition and object detection. The work, for the first time, combines two architectures of self-supervised learning, contrastive learning and masked data modeling, in an effort to scale machine-learning tasks like event classification in sing.
Scaling audio-visual learning without labels miragenews.com - get the latest breaking news, showbiz & celebrity photos, sport news & rumours, viral videos and top stories from miragenews.com Daily Mail and Mail on Sunday newspapers.
A new multimodal machine-learning technique from the MIT-IBM Watson AI Lab blends two kinds of self-supervised learning methods to learn more similarly to humans.