Cloudflare vs Perplexity –AI Web Scraping
The AI Era
Web data has become the most essential resource in AI economy, triggering a legal battle over its ownership and use. This fundamental dispute is forcing courts and regulators to define the new terms and ethical limits under which AI companies can legally access, copy, and utilise this massive store of copyrighted material.
Cloudflare’s Allegations:
Cloudflare, which provides security and performance services for millions of websites, has made a serious accusation against Perplexity. Cloudflare alleges that Perplexity is ignoring standard web protocols, such as robots.txt files, to scrape content from websites that have specifically forbidden it.
According to Cloudflare’s detailed investigation, Perplexity’s bots allegedly used sophisticated and deceptive tactics to bypass these rules, including:
- Identity Obscuring: The bots would change their user agent to impersonate legitimate web browsers, like Chrome on macOS, to appear as a normal user rather than a data-scraping bot.
- Rotating IP Addresses: The bots reportedly cycled through different IP addresses and even entire network providers to evade detection and blocks put in place by website owners.
- Ignoring Directives: Even when website owners explicitly blocked Perplexity’s known bot, Cloudflare’s data suggests the company deployed “stealth” crawlers to continue accessing the content.
Cloudflare’s findings have been corroborated by controlled tests where they created new, restricted domains and found that Perplexity’s AI was able to summarise content it should have been unable to access.
Perplexity’s Response: ‘Just a misunderstanding’
In its official response, the company dismissed Cloudflare’s report, arguing that the evidence was flawed. Perplexity’s main points are:
- Denial of Ownership: Perplexity claims that some of the bots identified by Cloudflare were not under their control, and that they often rely on third-party services for web fetching.
- User-Driven Activity: The company makes a crucial distinction between large-scale automated crawling (which they say they don’t do for training) and “user-driven fetching.” They argue that when a user asks a question, the AI bot is simply acting on the user’s behalf, and therefore should not be treated as a malicious scraper.
However, this argument has been met with skepticism from a number of commentators who point out that this “user-driven” activity still bypasses a publisher’s ability to monetise their content through ads or subscriptions.
A Shifting Economic Model: The Era of Free Data is Over
For years, the internet has run on an ad-supported model, where content creators provided information for free in exchange for ad revenue generated by human visitors. AI companies, however, can scrape this content and provide a summary without ever sending a human to the original source, effectively cutting publishers out of the loop.
This is where Cloudflare’s new initiatives come in. The company is actively pushing for a new model where content has explicit value to AI. They have introduced services like “Pay Per Crawl,” which allows website owners to:
- Charge AI crawlers for access to content. Websites can set a price, and AI bots that wish to use their data must pay.
- Block AI bots by default. Cloudflare has made it easier for sites to block most AI crawlers, forcing a conversation about compensation.
This move marks a potential end to the “free-for-all” data harvesting that has fueled the AI boom. With major publishers already suing AI firms and platforms like Reddit are now charging for API access, the message is clear: if AI companies want to build their models on the web’s content, they will have to start paying for it.
Update – June 2026
Since this post was written, the situation has escalated considerably.
- Cloudflare de-listed Perplexity as a verified bot and added rules to block its stealth crawlers.
- Reddit filed a federal lawsuit after a forensic honeypot post (accessible only to Google) was reproduced by Perplexity within hours.
- Amazon followed with its own legal action over Perplexity’s browser agent, and in March 2026 a federal judge issued a court order blocking it.
- Most recently, CNN sued in May 2026 over the alleged scraping of 17,000+ pieces of content.
Perplexity now faces active legal action from publishers including the New York Times, News Corp, Reddit, Amazon, and CNN, while others such as Time and Le Monde have opted for licensing deals instead.
Further reading:
- https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/
- https://techcrunch.com/2025/08/04/perplexity-accused-of-scraping-websites-that-explicitly-blocked-ai-scraping/
- https://news.bloomberglaw.com/ip-law/news-outlets-perplexity-ai-suits-strike-at-existential-threat
- https://www.cnbc.com/2026/03/10/amazon-wins-court-order-to-block-perplexitys-ai-shopping-agent.html
- https://variety.com/2026/biz/news/cnn-sues-perplexity-alleging-copyright-infringement-1236760987/
Latest Posts:
- Cloudflare vs PerplexityThe AI Era Web data has become the most essential… Read more: Cloudflare vs Perplexity
- UK Online Safety ActThe UK Online Safety Act – how it affects your e-commerce business…
- WordPress 6.8 LaunchThe latest WordPress ‘Cecil’ update is here, bringing a wave of refinements, security enhancements, and performance boosts…


