For decades, a humble text file governed the behavior of web scrapers. But as the AI industry grows, the social contract of robots.txt is falling apart.
OpenAI has implemented its GPTBot web crawler, utilizing the internet to further train its AI models, but this tactic has led to controversy previously.
ChatGPT's LLM has been developed by scraping vast amounts of freely available internet content, a fact that OpenAI readily acknowledges. The company is now providing instructions on.