Microsoft’s new research focuses on improving the image-encoding module. When combined with VL fusion modules such as OSCAR and VIVO, Microsoft’s newest VL system scored big on the most competitive artificial intelligence (AI) benchmarks, including visual question answering (VQA), Microsoft COCO Image Captioning, and novel object captioning (nocaps).
The tech giant also highlighted that VinVL significantly surpasses human performance on the nocaps leaderboard for consensus-based image description evaluation (CIDEr).
Microsoft trained its VinVL object-attribute detection model using a large object detection dataset containing 2.49 million images ascribed to 1,848 object classes and 524 attribute classes to achieve the results mentioned above. Microsoft formed the dataset by merging four public object detection datasets (COCO, Open Images, Objects365, and VG).
Presunto dolo dietro l incendio di un negozio quinewspisa.it - get the latest breaking news, showbiz & celebrity photos, sport news & rumours, viral videos and top stories from quinewspisa.it Daily Mail and Mail on Sunday newspapers.