Major Publishers Opt Out of Apple’s AI Training Tool
Less than three months after Apple introduced a tool for publishers to opt out of its AI training, numerous renowned news outlets and social platforms have seized the opportunity. WIRED confirms that Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, the USA Today network, and Condé Nast are among the organizations opting to exclude their data from Apple’s AI training.
Applebot-Extended: A New Approach to Data Usage
This new tool, Applebot-Extended, is an extension to Apple’s web-crawling bot, allowing website owners to instruct Apple not to use their data for AI training. Initially, Applebot was launched in 2015 to power Apple’s search products like Siri and Spotlight. Recently, its purpose expanded to include collecting data for training AI models.
Applebot-Extended respects publishers’ rights without stopping the original Applebot from crawling websites, ensuring that content still appears in Apple search products but is not used in AI training. Publishers can block Applebot-Extended by updating their robots.txt files, a long-standing method for managing web crawlers.
Adoption and Compliance
Despite its recent introduction, relatively few websites block Applebot-Extended. An analysis by Originality AI found that around 7 percent of high-traffic websites, mostly news and media outlets, were blocking it. Another analysis by Dark Visitors revealed similar findings. Data journalist Ben Welsh noted that while 53 percent of the news websites he surveyed block OpenAI‘s bot, only about 25 percent block Applebot-Extended.
Differences in blocking decisions among news publishers may be influenced by licensing deals, where companies allow bots in exchange for compensation. For instance, Condé Nast unblocked OpenAI’s bots after announcing a partnership.
The Battle Over AI Training Data
The ongoing struggle over AI training data has put robots.txt files in the spotlight. Some outlets, like Vox Media, block AI scraping tools without commercial agreements. Others, such as The New York Times, emphasize legal prohibitions against unauthorized content use.
The future of AI and data licensing remains uncertain, but changes in robots.txt files may provide early indicators of new partnerships and agreements. As Originality AI founder Jon Gillham observes, the fight for AI training data is unfolding publicly in these seemingly mundane text files.
Stay updated with our biggest stories, including insights on technological advancements and their implications.