Recent reports have unveiled a new chapter in the ongoing saga of Perplexity's web scraping practices. Cloudflare has alleged that Perplexity's bots are employing sophisticated tactics to bypass restrictions set by websites, effectively 'stealth crawling' to gather data despite being blocked by robots.txt files and firewalls.
Perplexity, known for its AI-driven search engine, operates under the guise of 'PerplexityBot' and 'Perplexity-User.' However, Cloudflare's investigation reveals that these bots may be masquerading as generic browsers mimicking Google Chrome on macOS. This tactic allows them to circumvent blocks set by website owners.
The issue is not new for Perplexity. In 2024, multiple websites reported unauthorized access despite explicit prohibitions in their robots.txt files. The company initially attributed this to third-party crawlers but later partnered with publishers to share ad revenue as a form of restitution.
Cloudflare has taken decisive action by removing Perplexity from its list of verified bots and implementing measures to detect and block these stealthy crawlers. Yet, the battle against unauthorized data scraping continues, akin to a game of whack-a-mole in the digital realm.