Language models like GPT-4 and Claude are powerful and useful, but the data on which they are trained is a closely guarded secret. The Allen Institute for AI (AI2) aims to reverse this trend with a new, huge text dataset that's free to use and open to inspection. As the model is intended to be free to use and modify by the AI research community, so too (argue AI2 researchers) should be the dataset they use to create it.
EXCLUSIVE: It comes after Express.co.uk reported that the Duke of Sussex s HRH titles were belatedly removed from his profile page on the Royal Family s website, after he lost them back in 2020.