The social network Reddit continues to fight web bots that use the platform’s content for free to train neural networks. Over the past few weeks, the Reddit administration has adjusted the robot.txt file, which tells bots whether or not to crawl sections of the site, so that community content and user comments are no longer displayed correctly in many search engines. Currently, only Google’s system correctly displays search results for the latest posts on Reddit. In contrast, other search engines, such as Bing or DuckDuckGo, process similar requests incorrectly, either not finding the pages that users are interested in or displaying only part of them.
The Google Agreement and Its Implications
The situation with Google appears unique due to previously reached agreements, under which the search giant will pay Reddit $60 million a year for using the site’s content to train its own AI algorithms. However, Reddit has denied that this deal influenced developers’ permission to use the platform’s content for training neural networks. A Reddit representative commented, “This is completely unrelated to our recent partnership with Google. We negotiated with several search engines. We could not reach an agreement with everyone because some are unable or unwilling to make any promises regarding their use of Reddit content, including for training artificial intelligence.”
Reddit’s Bold Move to Protect Content and Attract Investors
For a site as large as Reddit, blocking major search engine web bots is a bold move, but one that is expected, notes NIX Solutions. Over the past year, the site administration has become much more active in protecting the content published by users, trying to open a new source of income and attract investors. The developers increased the cost of using the Reddit API by third-party developers and also threatened Google with blocking the search engine if the company did not stop using the platform’s content for free to train its neural networks. We’ll keep you updated as this story continues to evolve.