Current lawsuits by Dow Jones, the New York Put up, the New York Occasions and Amazon in opposition to AI search engine Perplexity spotlight how automated extraction has turn out to be a boardroom disaster affecting honest competitors and fiduciary responsibility. AI coverage researcher and information safety supervisor Areejit Banerjee explores how OWASP is redefining scraping danger from “server load” to “worth extraction” that erodes ROI on information belongings, why technical defenses function with out clear authorized backstop and the way boards ought to deploy layered countermeasures together with limiting uncovered worth, making automated use more durable and instrumenting irregular entry patterns whereas ready for federal reform.
Net scraping started as a device for search indexing, however it has now mutated to a worldwide extraction trade. Analysis from estimates the web-scraping market presently sits at $1.03 billion and is projected to almost double to $2 billion by 2030. For boards, compliance officers and chief info safety officers (CISOs), that is now not a purely technical downside; it’s a governance difficulty that impacts honest competitors, fiduciary responsibility and the credibility of the group’s data-protection commitments.
Technological defenses have resulted in an arms race and we now face a strategic disaster. As automation scales, we’re witnessing the rise of a “free-rider” dynamic: One facet invests capital to construct, curate and confirm high-quality information infrastructure, whereas automated actors applicable that worth at zero value. In impact, in case you are constructing information merchandise at present, you’re subsidizing your competitor’s product.
This imbalance destabilizes competitors and discourages innovation. Current federal coverage discussions have highlighted, US regulation has not stored tempo with automated harvesting methods, leaving excessive worth information belongings uncovered to industrial-scale extraction.
From nuisance to litigation
This “free-rider” downside is now flooding the US court docket system. Dow Jones, the New York Put up and the New York Occasions have all filed main lawsuits in opposition to AI search engine Perplexity, alleging copyright infringement and information theft. Concurrently, Amazon has additionally taken authorized motion in opposition to Perplexity. The core difficulty in these instances is the usage of “agentic” browsers. Not like conventional bots, brokers simulate human person habits and bypass phrases of service and technical safety in opposition to automated scraping. This makes conventional perimeter defenses, similar to CAPTCHA and primary fee limiting, a lot much less efficient on their very own.
LinkedIn v. hiQ narrowed what counted as “unauthorized entry” beneath the Pc Fraud and Abuse Act (CFAA) for public information, which weakened the authorized backstop for bot blocking lengthy earlier than Perplexity. That hole is why these Perplexity lawsuits really feel like a final resort: When your technical filters fail, the regulation doesn’t provide you with a clear option to argue “that is infrastructure theft.”
The result’s a regulatory grey zone. Whereas platforms can nonetheless try to dam bots technically, the authorized deterrent is gone. Corporations are left managing relentless exploitation with no clear recourse when technical filters fail.
It’s about ROI, not simply bandwidth
The trade’s understanding of the risk is lastly shifting from “server load” to “worth extraction.”
OWASP’s Automated Risk undertaking is updating its definition of scraping to replicate this actuality, recognizing that the first symptom is not only community lag, however the erosion of return on funding (ROI) for high-quality information infrastructure.
This distinction is crucial. When a competitor scrapes your pricing, stock or proprietary content material, they aren’t simply utilizing your bandwidth; they’re eroding the ROI of your information belongings. This dynamic means the unique platform can now not get well the substantial investments made to assemble and maintain its dataset.
A federal framework
Technical defenses can sluggish attackers, however so long as federal regulation treats industrial-scale harvesting as a grey space, the free-rider downside persists. For boards and compliance leaders, this implies at present’s controls are working with out a clear authorized backstop. A modernized federal framework may shut that hole by:
- Redefining “unauthorized entry”: Treats automated entry as “unauthorized” at any time when it ignores revealed entry guidelines (similar to robots.txt or phrases of service).
- Establishing “information misappropriation”: Acknowledges large-scale stripping of investment-heavy datasets as asset misappropriation moderately than a contractual dispute.
- Making a unified normal: Replaces at present’s patchwork of state guidelines with a single federal normal aligned to rising worldwide views on scraping and mental property.
- Preserving analysis exceptions: Maintains slim, documented carve-outs for bona fide analysis and interoperability.
A layered strategy
Whereas that form of reform works its method via Washington (if it ever does), boards and CISOs nonetheless need to preserve their information merchandise defendable at present. OWASP’s handbook confirms that scraping shouldn’t be solved by a single management. As a substitute, software house owners are suggested to deploy a coordinated set of countermeasures:
- Restrict uncovered worth: Expose solely the information fields wanted for reliable use and depend on aggregation, truncation, masking, anonymization or encryption wherever attainable.
- Make automated use more durable: Differ how content material and URLs are delivered, set express scraping necessities and construct take a look at instances that simulate abusive assortment patterns.
- Establish and sluggish automation: Use fingerprinting, fame and behavioral alerts to identify non-human utilization, then apply fee limits, delays or stronger authentication to high-risk entry.
- Instrument and formalize the response: Log and monitor irregular entry patterns and again technical measures with contracts, playbooks and information-sharing with friends and emergency response groups.
For boards and compliance leaders, the secret is to not handle every management straight however to make sure that scraping danger is explicitly in scope for data-protection governance, that these sorts of layered measures are being applied and that the group can clarify to regulators, prospects and buyers, how it’s defending its information infrastructure in opposition to free-rider abuse.
Earlier in 2025, I described a layered-defense strategy that treats scraping mitigation as a stacked system: make it more durable for automated actors to enter, more durable for them to function at scale and more durable for them to transform stolen output into aggressive worth. That philosophy aligns carefully with the OWASP steerage: a number of, coordinated controls that elevate the price of extraction, whereas we await a federal “information misappropriation” normal to offer defenders a authorized backstop that matches the technical actuality.
Innovation requires boundaries
We can’t construct a strong AI financial system on a basis of infrastructure theft. If the free-rider downside stays unchecked, we danger a market the place nobody invests in information high quality as a result of nobody can defend it.
The answer is to not ban automation however to manipulate it. As AI reshapes the character of labor, we should defend the information infrastructure that makes these fashions efficient. Preserving the worth of high-quality information is crucial for the sustained development of the trade. By defining “information misappropriation” on the federal stage, we are able to safeguard reliable analysis and interoperability whereas making certain that the businesses constructing the digital future can maintain the infrastructure that helps it.
Current lawsuits by Dow Jones, the New York Put up, the New York Occasions and Amazon in opposition to AI search engine Perplexity spotlight how automated extraction has turn out to be a boardroom disaster affecting honest competitors and fiduciary responsibility. AI coverage researcher and information safety supervisor Areejit Banerjee explores how OWASP is redefining scraping danger from “server load” to “worth extraction” that erodes ROI on information belongings, why technical defenses function with out clear authorized backstop and the way boards ought to deploy layered countermeasures together with limiting uncovered worth, making automated use more durable and instrumenting irregular entry patterns whereas ready for federal reform.
Net scraping started as a device for search indexing, however it has now mutated to a worldwide extraction trade. Analysis from estimates the web-scraping market presently sits at $1.03 billion and is projected to almost double to $2 billion by 2030. For boards, compliance officers and chief info safety officers (CISOs), that is now not a purely technical downside; it’s a governance difficulty that impacts honest competitors, fiduciary responsibility and the credibility of the group’s data-protection commitments.
Technological defenses have resulted in an arms race and we now face a strategic disaster. As automation scales, we’re witnessing the rise of a “free-rider” dynamic: One facet invests capital to construct, curate and confirm high-quality information infrastructure, whereas automated actors applicable that worth at zero value. In impact, in case you are constructing information merchandise at present, you’re subsidizing your competitor’s product.
This imbalance destabilizes competitors and discourages innovation. Current federal coverage discussions have highlighted, US regulation has not stored tempo with automated harvesting methods, leaving excessive worth information belongings uncovered to industrial-scale extraction.
From nuisance to litigation
This “free-rider” downside is now flooding the US court docket system. Dow Jones, the New York Put up and the New York Occasions have all filed main lawsuits in opposition to AI search engine Perplexity, alleging copyright infringement and information theft. Concurrently, Amazon has additionally taken authorized motion in opposition to Perplexity. The core difficulty in these instances is the usage of “agentic” browsers. Not like conventional bots, brokers simulate human person habits and bypass phrases of service and technical safety in opposition to automated scraping. This makes conventional perimeter defenses, similar to CAPTCHA and primary fee limiting, a lot much less efficient on their very own.
LinkedIn v. hiQ narrowed what counted as “unauthorized entry” beneath the Pc Fraud and Abuse Act (CFAA) for public information, which weakened the authorized backstop for bot blocking lengthy earlier than Perplexity. That hole is why these Perplexity lawsuits really feel like a final resort: When your technical filters fail, the regulation doesn’t provide you with a clear option to argue “that is infrastructure theft.”
The result’s a regulatory grey zone. Whereas platforms can nonetheless try to dam bots technically, the authorized deterrent is gone. Corporations are left managing relentless exploitation with no clear recourse when technical filters fail.
It’s about ROI, not simply bandwidth
The trade’s understanding of the risk is lastly shifting from “server load” to “worth extraction.”
OWASP’s Automated Risk undertaking is updating its definition of scraping to replicate this actuality, recognizing that the first symptom is not only community lag, however the erosion of return on funding (ROI) for high-quality information infrastructure.
This distinction is crucial. When a competitor scrapes your pricing, stock or proprietary content material, they aren’t simply utilizing your bandwidth; they’re eroding the ROI of your information belongings. This dynamic means the unique platform can now not get well the substantial investments made to assemble and maintain its dataset.
A federal framework
Technical defenses can sluggish attackers, however so long as federal regulation treats industrial-scale harvesting as a grey space, the free-rider downside persists. For boards and compliance leaders, this implies at present’s controls are working with out a clear authorized backstop. A modernized federal framework may shut that hole by:
- Redefining “unauthorized entry”: Treats automated entry as “unauthorized” at any time when it ignores revealed entry guidelines (similar to robots.txt or phrases of service).
- Establishing “information misappropriation”: Acknowledges large-scale stripping of investment-heavy datasets as asset misappropriation moderately than a contractual dispute.
- Making a unified normal: Replaces at present’s patchwork of state guidelines with a single federal normal aligned to rising worldwide views on scraping and mental property.
- Preserving analysis exceptions: Maintains slim, documented carve-outs for bona fide analysis and interoperability.
A layered strategy
Whereas that form of reform works its method via Washington (if it ever does), boards and CISOs nonetheless need to preserve their information merchandise defendable at present. OWASP’s handbook confirms that scraping shouldn’t be solved by a single management. As a substitute, software house owners are suggested to deploy a coordinated set of countermeasures:
- Restrict uncovered worth: Expose solely the information fields wanted for reliable use and depend on aggregation, truncation, masking, anonymization or encryption wherever attainable.
- Make automated use more durable: Differ how content material and URLs are delivered, set express scraping necessities and construct take a look at instances that simulate abusive assortment patterns.
- Establish and sluggish automation: Use fingerprinting, fame and behavioral alerts to identify non-human utilization, then apply fee limits, delays or stronger authentication to high-risk entry.
- Instrument and formalize the response: Log and monitor irregular entry patterns and again technical measures with contracts, playbooks and information-sharing with friends and emergency response groups.
For boards and compliance leaders, the secret is to not handle every management straight however to make sure that scraping danger is explicitly in scope for data-protection governance, that these sorts of layered measures are being applied and that the group can clarify to regulators, prospects and buyers, how it’s defending its information infrastructure in opposition to free-rider abuse.
Earlier in 2025, I described a layered-defense strategy that treats scraping mitigation as a stacked system: make it more durable for automated actors to enter, more durable for them to function at scale and more durable for them to transform stolen output into aggressive worth. That philosophy aligns carefully with the OWASP steerage: a number of, coordinated controls that elevate the price of extraction, whereas we await a federal “information misappropriation” normal to offer defenders a authorized backstop that matches the technical actuality.
Innovation requires boundaries
We can’t construct a strong AI financial system on a basis of infrastructure theft. If the free-rider downside stays unchecked, we danger a market the place nobody invests in information high quality as a result of nobody can defend it.
The answer is to not ban automation however to manipulate it. As AI reshapes the character of labor, we should defend the information infrastructure that makes these fashions efficient. Preserving the worth of high-quality information is crucial for the sustained development of the trade. By defining “information misappropriation” on the federal stage, we are able to safeguard reliable analysis and interoperability whereas making certain that the businesses constructing the digital future can maintain the infrastructure that helps it.
















