Web scraping: browse wrap agreements and available protections

Contents

Technological evolution has made web scraping a fundamental tool for data analysis and competitive intelligence. However, this practice raises important legal questions, particularly when it conflicts with the terms of use of the websites being scraped.

Web scraping statistics

  • 42% of internet traffic is generated by automated bots (Source: Imperva, 2024).
  • 89% of companies have reported web scraping attempts on their digital assets (source: Cybersecurity Industry Report 2023).
  • 78% of Fortune 500 companies use web scraping techniques for competitive monitoring (source: Forrester Research, 2023).

Beginning with a technical analysis of the phenomenon, this article examines the nature and effectiveness of browse wrap agreements, explores national and international jurisprudence on the subject, and provides practical guidelines for those intending to engage in scraping activities, as well as for those who need to protect their web content from unauthorized extraction.

What is web scraping

Web scraping represents a digital automation technology that enables the systematic extraction of data from websites through various technical approaches. This practice is primarily divided into three distinct methodologies:

  • HTML parsing: involves the automated analysis of a web page’s source code to extract specific information from its HTML structure. This technique allows for the identification and collection of structured data such as prices, product descriptions, or technical specifications.
  • Headless browsing: Simulates human browsing using automated browsers without a graphical interface, enabling access to dynamic content generated by JavaScript and other client-side technologies.
  • Crawling: represents a more systematic approach that combines automated browsing and indexing, allowing recursive exploration of entire site structures and the archiving of relevant data.

The widespread use of these practices inevitably raises legal questions across various domains. Intellectual property protection emerges as a primary issue, especially when scraping involves copyrighted content or structured databases. In addition, there is a contractual dimension, where violating terms of use may constitute specific liabilities, especially in a commercial context. Equally relevant is the privacy aspect, considering that many scraping activities may directly or indirectly involve personal data subject to GDPR regulations.

Legal Precedents

National and European jurisprudence has begun defining the boundaries of legality for these practices. The Court of Justice of the European Union, in its June 3, 2021, judgment (C-762/19, CV-Online Latvia SIA v. Melons SIA), provided key insights on database protection, particularly clarifying the conditions for sui generis protection and the limits of systematic data extraction.

At the national level, the legal landscape has evolved through significant rulings. The issue of scraping and database protection has been addressed with a tendency to protect the rights of database holders against unauthorized data extraction. The Italian Supreme Court in judgment no. 6639/2013 clarified that systematic data extraction can infringe on sui generis rights, granting database holders exclusive control over content use, particularly when such data is utilized for commercial purposes. Similarly, the Court of Milan, in ruling no. 3514/2019, affirmed that unauthorized use of data obtained through scraping may constitute unfair competition, as it enables competitive advantages by leveraging others’ investments. A ruling from the Court of Rome, judgment no. 5202/2020, reiterated that massive data extraction from a protected database without consent can harm copyright and sui generis rights and justify claims for compensation due to economic damages. These rulings demonstrate that Italian jurisprudence is leaning towards stringent protection of databases, especially where data is the result of significant investment in time and resources.

Browse Wrap Agreements

The validity of website terms of use is central to assessing the legality of web scraping activities, especially regarding the distinction between different presentation and acceptance methods.

Browse wrap agreements are a unique form of online contract characterized by terms of use accessible through hyperlinks, without requiring explicit user consent. These differ significantly from clickwrap agreements, where users are asked to actively express consent, typically by ticking a box or clicking an acceptance button.

The legal nature of browse wrap agreements raises significant questions about their enforceability: the premise that mere site navigation implies acceptance of the terms conflicts with general principles of Italian civil law, particularly the need for clear expression of intent by both parties.

Courts in different jurisdictions have addressed the issue with varying approaches. The Irish High Court, in the notable case Ryanair v. Billigfluege.de GmbH (2010), adopted an expansive interpretation, recognizing the enforceability of clearly visible terms accessible via hyperlinks, even in the absence of explicit acceptance. This approach is based on the assumption that visibility and accessibility of the terms create user awareness of the site’s conditions of use.

Conversely, the Court of Milan, in its June 4, 2013, ruling, adopted a more cautious approach: passive browsing on a website would not constitute a valid contractual obligation, requiring instead a clearer and more informed expression of user consent. This position is grounded in Italian civil law, which mandates clear consent for contract formation.

This interpretative divergence highlights the complexity of adapting traditional contract law principles to new forms of digital interaction, underscoring the need to balance the practicality of e-commerce with informed consent protections for users.

Legal Implications and Risk Assessment

The absence of uniform judicial guidance on web scraping presents a complex scenario in terms of liability. The analysis highlights two main areas of risk: contractual liability and tort liability.

If browse wrap agreements are considered binding, unauthorized web scraping may constitute a contractual breach, exposing the scraper to claims for damages based on the failure to adhere to site terms of use. Damage calculations would consider both the direct harm from data extraction and potential reputational harm or loss of business opportunities.

Even without a valid contractual obligation, a site owner may seek redress under Article 2043 of the Italian Civil Code, requiring proof of an unlawful act, actual and quantifiable damage, causation between scraping and harm, and the unjust nature of the harm.

Operational Guidelines

Risk management requires a tailored approach for different parties involved in web scraping activities.

Companies engaging in web scraping

Implementing web scraping solutions requires careful review of target site terms of use and due diligence on content ownership and protection. Activities should be supported by rigorous documentation of compliance measures. Key elements to consider include:

  • preliminary legal and technical risk assessment;
  • protocols respecting target server resources;
  • documentation of activities and protective measures.

Website owners

Protecting web content requires a multi-faceted strategy that combines contractual and technical elements. Contractually, it is advisable to implement highly visible clickwrap agreements with clear definitions of content usage terms. Protective measures should include:

  • user identity verification systems (e.g., CAPTCHAs, IP blocks);
  • systematic monitoring of suspicious access;
  • development of APIs as a controlled alternative to scraping.

Practical experience suggests that combining technical protections with commercial solutions, such as developing APIs, allows for controlled data access, potentially as part of strategic partnerships for content distribution.

In the e-commerce sector, many marketplaces have recognized that price monitoring through scraping, previously considered problematic, could be managed through structured commercial agreements. They have developed partnership programs with controlled API access, defining data usage terms, access frequency limits, quality standards, and economic conditions for collaboration.

This approach has transformed a potentially contentious situation into a regulated partnership, benefiting both parties.

Conclusions

The legal framework surrounding web scraping remains complex and varies across jurisdictions. Companies should adopt a conservative approach to risk management, including comprehensive legal review before initiating scraping activities and implementing technical and contractual protections for website operators.

The evolving nature of this field suggests the importance of continuously monitoring legal developments, prioritizing transparency and compliance in a rapidly changing digital landscape.

Download Area
Scarica il PDF
Download
Date
Speak to our experts