Web scraping can be useful for many businesses, from price comparison to market research and analysis. It’s efficient and incredibly valuable to businesses. However, web scraping also raises an uncomfortable legal question for many businesses: Can you actually do it without getting sued? The answer is not very straightforward. Web scraping sits in a legal gray zone. It is not automatically illegal in the U.S., but it is not risk-free either. Understanding what determines whether web scraping is legal is critical before deploying any scraping strategy.
What is Web Scraping?
Web scraping is the automated collection of data from websites using software that simulates human browsing. Instead of manually copying information, scripts extract data such as prices, listings, reviews, or business details, typically at scale. Businesses often use scraping for competitive intelligence, lead generation, trend monitoring, content analysis, and financial modeling. The web scraping technology is neutral. The legal aspect arises from the type of data collected and the rules governing its access.
Key Legal Considerations That Determine Legality
In the USA, there are four key areas of law that a business must consider when assessing whether its web scraping practices are lawful.
Copyright Law
Under copyright law, facts are not protected by copyright. As such, product prices, names, and basic specifications can be scraped safely. However, issues may arise when a business scrapes copyrighted material, such as articles, images, or databases, and then republishes or commercializes it. Essentially, scraping copyrighted material may not be illegal in itself, but how you use it is essential. If the content is used solely for internal analysis or is limited to non-core excerpts, it may qualify as fair use. However, if you republish it for profit, it may constitute copyright infringement, which is illegal.
Contract Law
Contract law applies through the website’s terms of service. If a site requires users to explicitly agree to terms that prohibit automated scraping, violating those terms could expose your business to breach-of-contract claims.
Data Protection Laws
State laws, such as the California Consumer Privacy Act (CCPA), govern the collection and use of personal information. While collecting publicly available data may be lawful, scraping personal information can lead to compliance issues. If your business scrapes personal data without explicit consent, it is almost always illegal because it violates data privacy regulations, and it may be challenging to argue you had a lawful reason for doing so.
The Computer Fraud and Abuse Act (CFAA)
The CFAA was initially designed to combat hacking. However, it is increasingly being applied to web scraping as well. Courts have debated whether web scraping constitutes hacking and, therefore, violates the CFAA. In one of the most recognized cases, involving hiQ Labs v. LinkedIn, the court clarifies that accessing publicly available data generally doesn’t violate the CFAA. That said, scraping behind logins, bypassing technical blocks, exceeding authorized access, or ignoring cease-and-desist notices can quickly change the specifics of a case.
So, is Web Scraping Legal?
In the U.S., scraping publicly accessible, non-personal, factual data without breaching website terms or security measures is often lawful. However, scraping personal data, copyrighted material, or systems with restrictions increases the risk of falling on the wrong side of the law. Before starting any scraping project, it’s advisable to seek legal advice to prevent legal risks from becoming legal liability.
The Computer Fraud and Abuse Act and Web Scraping
The primary federal statute invoked against web scrapers is the Computer Fraud and Abuse Act (CFAA), 18 U.S.C. § 1030, which prohibits accessing a computer “without authorization” or in a manner that “exceeds authorized access.” For years, the dominant approach in courts was to treat a website’s Terms of Service as defining the scope of authorized access—meaning that violating ToS by scraping could constitute a federal crime or create civil liability. This view was effectively rejected by the Ninth Circuit in hiQ Labs, Inc. v. LinkedIn Corp., 938 F.3d 985 (9th Cir. 2019), which held that scraping publicly available data from LinkedIn’s website likely did not violate the CFAA because the data was publicly accessible to anyone without login credentials. The court reasoned that “authorization” under the CFAA addresses access restrictions comparable to locked gates—ToS violations alone do not transform public data access into unauthorized computer access.
The Supreme Court reinforced this narrower reading of the CFAA in Van Buren v. United States, 593 U.S. 374 (2021). The Court held that “exceeds authorized access” means accessing information off-limits for a particular user on a computer system they are authorized to use—not using authorized access for an unauthorized purpose. Van Buren was a criminal case involving a police officer who queried a law enforcement database for personal reasons, but the Court’s reasoning—that the CFAA’s access restrictions apply to technical access barriers, not use restrictions—significantly limits CFAA-based claims against scrapers who access only publicly visible data. After Van Buren, website terms of service violations alone are unlikely to constitute CFAA violations, though technical circumvention of access controls (rate limiting, IP blocks, CAPTCHAs, authentication requirements) remains a different matter.
The hiQ v. LinkedIn Saga: What It Actually Decided
The hiQ v. LinkedIn litigation is the most significant web scraping case in U.S. legal history, and its resolution has been closely watched by data aggregators, social platforms, and practitioners. After the Ninth Circuit’s 2019 decision largely favored hiQ’s right to scrape LinkedIn’s public profiles, the Supreme Court vacated the ruling in light of Van Buren and remanded. On remand in 2022, the Ninth Circuit again found for hiQ, holding that LinkedIn had not shown a likelihood of success on its CFAA claim because hiQ’s scraping of publicly accessible data did not constitute unauthorized access under Van Buren’s narrower standard.
The case ultimately settled in 2023, leaving its legal holdings intact but without a final Supreme Court ruling on web scraping’s legality under the CFAA. The practical takeaway from the litigation: scraping publicly available data that requires no login, no circumvention of access controls, and no bypass of technical barriers is unlikely to violate the CFAA after Van Buren. But scraping data behind a login wall—even a free one—scraping after receiving a cease and desist letter, or circumventing technical measures like rate limits or CAPTCHAs significantly increases legal risk. The CFAA issue and the CFAA decision do not resolve copyright, contract, or state law claims, which are analyzed separately.
Copyright and Database Protection in Web Scraping
Even when scraping does not violate the CFAA, it may infringe copyright. Under Feist Publications, Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991), raw factual data is not copyrightable, but the selection, arrangement, and coordination of a database may be. A scraper that copies an entire website—its layout, its textual content, its creative categorization of information—may infringe the copyright in the compilation even if individual data points are unprotectable facts. Courts analyze whether the scraped material includes protected expression and whether the scraping constitutes reproduction of a copyrightable compilation.
AI training presents new copyright dimensions in scraping litigation. Several pending cases—including The New York Times Co. v. Microsoft Corp., No. 23-cv-11195 (S.D.N.Y.), and Raw Story Media v. OpenAI—allege that scraping web content to train large language models constitutes copyright infringement and, in some cases, violation of the DMCA’s prohibition on removing or altering copyright management information under 17 U.S.C. § 1202. While courts have not definitively ruled on whether AI training on scraped data constitutes fair use, the volume of litigation and the involvement of major publishers suggests that the scraping-for-AI-training landscape will be substantially different in two to three years than it is today.
State Law and Contract Claims Against Scrapers
Beyond federal law, scrapers face potential liability under state law theories. Trespass to chattels—a common law tort recognizing liability for intentional interference with another’s personal property—has been invoked against scrapers in cases like eBay, Inc. v. Bidder’s Edge, Inc., 100 F. Supp. 2d 1058 (N.D. Cal. 2000), where the court granted a preliminary injunction based on the theory that automated scraping bots occupied server resources in a manner that constituted trespass. However, Intel Corp. v. Hamidi, 30 Cal. 4th 1342 (2003), subsequently limited the trespass to chattels theory by requiring proof of actual harm to the chattel, not just unauthorized use. Post-Hamidi, trespass claims against scrapers generally require evidence that the scraping materially impaired server performance.
Contract-based claims are the most durable scraping cause of action for website operators. A user who creates an account and agrees to terms of service that prohibit scraping enters into a contract with the operator. If that user then scrapes in violation of those terms, the operator has a breach of contract claim regardless of whether the CFAA applies. The breach of contract claim also supports injunctive relief without the need to prove CFAA violations or trespass to chattels. For scrapers who access a site through a clickwrap or browsewrap ToS agreement, the terms—if enforceable—represent a contractual prohibition on scraping that survives the post-Van Buren limitation of CFAA claims.
Privacy Law Constraints on Scraping Personal Data
Web scraping that collects personal information about individuals creates privacy law exposure even when scraping publicly available pages. Under the CCPA, “personal information” includes information that is “reasonably capable of being associated with” a particular consumer or household—a definition that covers names, email addresses, and social media profile information even when publicly posted. A business that scrapes personal information of California residents for commercial purposes without a lawful basis or appropriate disclosures may be engaged in a “collection” of personal information that triggers CCPA obligations even if the source data was publicly visible.
GDPR imposes even stricter constraints. Scraping personal data of EU residents without a valid legal basis—consent, contract, legitimate interest, legal obligation, vital interests, or public task under GDPR Article 6—is a GDPR violation regardless of whether the data was publicly available. The European Data Protection Board (EDPB) issued guidance in 2023 confirming that publicly available data is not exempt from GDPR simply because it is visible online, and that scraping it for commercial purposes generally requires a legitimate interest assessment demonstrating that the interest outweighs the data subjects’ privacy interests. Several EU data protection authorities have fined companies for scraping publicly available personal data without a lawful basis.
Best Practices for Legally Defensible Web Scraping
Businesses that rely on web scraping for competitive intelligence, market research, price comparison, or data aggregation should follow several practices to minimize legal risk. First, review and respect robots.txt files—while these do not have the force of law, courts and regulators treat disregard of robots.txt as evidence of bad faith. Second, review the target site’s Terms of Service before scraping; if the terms prohibit scraping, do not scrape through an authenticated session. Third, avoid circumventing technical access controls, including CAPTCHAs, IP blocks, and rate limiting—circumvention significantly elevates CFAA risk. Fourth, do not scrape data that requires authentication or login.
Fifth, if you scrape personal information of individuals, analyze your CCPA and GDPR obligations before using the data commercially. Sixth, do not store or use scraped data in ways that would infringe compilation copyrights—use the underlying facts, not the creative selection and arrangement of the database. Seventh, consult legal counsel before using scraped data to train AI models, given the unsettled state of the copyright law in this area. Eighth, maintain documentation of your scraping practices, the legal analysis underlying your approach, and any communications with target site operators—documentation of good faith and legal analysis is important if your practices are later challenged.
If your business uses web scraping for data collection—or if your website is being scraped without authorization—contact the internet law attorneys at Revision Legal through the form on this page or call (855) 473-8474. Our internet law practice advises businesses on CFAA compliance, data collection strategy, copyright in databases, and enforcement against unauthorized scrapers nationwide.