Ghost leads: Behind the scenes of B2B visitor identification

In our recent comparative test of the B2B visitor recognition platforms from Leadfeeder, Leadinfo and SalesViewer, we came across a worrying phenomenon: High nominal recognition rates are often associated with significant error rates. While SalesViewer impressed with its high level of data precision in the test, companies suddenly appeared in the Leadinfo and Leadfeeder dashboards that had in fact never visited our website. But how do such “ghost leads” actually come about technically? In this forensic deep dive, we explore the question of which detection mechanisms running in the background can cause these errors. We take a look at the complex data transfer and show how the industry is trying to de-anonymize anonymous traffic.

The home office test: When employees become Italian companies

The starting point for our forensic analysis was a controlled test scenario: 18 employees from the Publicare team accessed specific blog posts on our website from their home office at precisely defined times. Since these accesses were made via classic, dynamic IP addresses (Telekom, Vodafone, etc.), B2B tools simply had to ignore these visits. (Read the detailled report of our side-by-side test here)

However, the result was different: Leadfeeder incorrectly registered four of these home office accesses as visits from third-party companies. Leadinfo falsely assigned two of the targeted visits to existing companies. SalesViewer, on the other hand, correctly did not recognize these accesses as evaluable B2B companies and therefore did not produce any ghost leads. This raises the question: Which fragile data sources and algorithms do some providers use that lead to such serious errors?

The technical basis: Static IPs and the OSI layer

The classic anchor point of every B2B identification is the IP address. If organizations use static IP addresses that are permanently stored in the databases of regional Internet registrars, SaaS solutions can assign them directly to the company and location via a reverse IP lookup. The problem starts with employees working from home who surf using dynamic IP addresses from regular Internet service providers. A simple lookup only provides the provider. Even with fundamentally static IP address pools, you don't always get the desired answer from reverse lookups. In order to still allocate this traffic to companies, visitor recognition solutions must rely on far more profound and often error-prone methods.

Outdated IP lists and the phenomenon of “data erosion”

The incorrect detections in our home office test suggest that parts of the industry are using purchased, historically grown IP address databases in order to be able to make allocations even with dynamic addresses.

The main problem here is so-called “data erosion.” Statistics show that around 16 percent of IPv4 addresses change their location or allocation in an average month. If a dynamic IP address — whether as a result of old form entries or leaks from third-party sources — has been mistakenly assigned to a company in the past, it often remains stuck in these systems as an apparent “company IP” for years. Cheap providers obtain their data partly from WHOIS entries, in which companies do not document their network usage in real time. If algorithms rely on such outdated or unclean databases, this would provide a plausible explanation for our test result. The dynamic IP address of a home office employee is only assigned to an end user for a short time, but the tool persistently shows an outdated company from the database.

Another phenomenon is inaccuracies that arise as a result of other anonymization measures, such as deleting the last digits of IP addresses. Many general analytics solutions and data sources delete the last digits of an IP so that it can no longer be clearly assigned. In some cases, however, such inaccurate data appears to be used as a basis. However, such a pool leaves 256 options open when the last byte of an IPv4 address is deleted. This means that if such data is used, a recognition potentially has 255 “neighbors” who can also be falsely assigned.

Identity Graphs and the Home Office Puzzle

Another technical explanation for the absurd assignments in our home office test lies in the possible use of so-called “identity graphs.” Since the pure IP lookup for dynamic addresses comes to nothing, providers could draw on purchased databases that store billions of links between different identifiers. To do this, they would have to enter into cooperation with ad tech networks or specialist publishers. The principle: When a user logs in to a partner site (e.g. an IT portal) with their company email, the identity graph links this login to the current dynamic IP address. If a completely different user, who has been randomly assigned the same dynamic IP by the provider, visits the website some time later, the outdated identity graph appears. The system creates an incorrect historical link and falsely assigns the visit to the company of the first user.

Cloud proxies and manipulated headers

Another source of error, which could promote the significant number of false detections, is caused by network intermediaries. In large organizations, cloud security solutions (such as Zscaler, Netskope, and Palo Alto Networks) route requests from thousands of companies via bundled, shared IP pools. In order to still recognize the original origin, B2B tools read HTTP headers such as “X-Forwarded-For.” These document an IP chain of the request path. However, since this header can be easily manipulated or enterprise firewalls sometimes inject their own headers, identification tools must decide which IP to trust. If a recognition algorithm blindly relies on the first entry or evaluates proxy IPs incorrectly due to outdated databases or blurred assignments, visitors are permanently attributed to a completely wrong company.

Deeper into the protocol bag of tricks: ETags and HTTP/3

Since conventional tracking cookies are increasingly blocked or rejected by users via the consent banner, the tracking industry is sometimes switching to profound protocol features as a hidden cookie replacement. It stands to reason that, in the race for the highest detection rates, such gray area methods are also being tested on the market by various providers.

To save bandwidth, browsers validate resources via the cache (using ETags or time stamps), for example. When a SaaS tool evaluates on the server side whether a resource is still in the user's cache, these headers can act as a unique identifier. Modern protocols such as HTTP/3 (QUIC) also use so-called “connection IDs.” These make it possible to maintain a connection even if the user changes networks — a powerful anchor to guarantee seamless continuity of a visit, even if the IP address changes.

Browser fingerprinting: Uniqueness by reading out the hardware

In order to recognize devices completely without cookies, identification systems can use client-side scripts for browser or device fingerprinting. This involves reading out a combination of system features that are often unique in their sum. The attributes read out include hardware configurations (processor cores, memory), the list of locally installed system fonts, or canvas rendering, which measures subtle differences between graphics cards. The entropy of this fingerprint enables extremely precise attribution.
However, from a data protection perspective, this process is highly sensitive. Since fingerprinting usually takes place in the background and without active user knowledge, there is often a lack of necessary transparency. According to the European GDPR and the German TDDDG, the reading of information from the device — even without the use of classic cookies — is usually subject to explicit consent, unless it is technically necessary. It is therefore not only the risk of outdated identity graphs that are falsely linked to an external company that is problematic, but also the encroachment on digital sovereignty, as users have little opportunity to effectively prevent or control this “silent” tracking.

Conclusion: Why the human factor remains decisive

The technological analysis shows that the identification of B2B visitors is highly complex. While allocation via static IP addresses works excellently for real company networks, the industry's algorithmic auxiliary constructs inevitably lead to massive problems for remote workers or shared networks.

The potential use of purchased IP address databases and identity graphs, error-prone proxy evaluations, or aggressive protocol trackers could explain why some solutions produce an astonishing number of “false positive” detections. From our point of view, more conservative, high-precision data validation is the safer and more recommended route, even though it nominally identifies fewer companies at first. The following applies to B2B sales: Blind trust in automated identification algorithms is by no means a substitute for human validation and a focus on genuine quality.

Share this article now
link
blog

Even more about B2B website visitor tracking

b2b-website-besuchererkennung
All Categories

B2B Visitor Identification: A side-by-side comparison of Leadfeeder, Leadinfo, and SalesViewer

Anyone who wants to exploit the unused potential of their website in B2B sales can hardly ignore tools for visitor recognition. They unmask anonymous traffic and provide valuable buying signals to existing and prospective customers. As a vendor-independent digital marketing agency, we pitted various common solution providers against each other in three empirical comparative tests in 2019, 2020 and 2022. Since then, we have been recommending the former test winner SalesViewer in German-speaking countries. But the market has changed massively: Offensive players such as Leadinfo and Leadfeeder (now part of Dealfront) have expanded their market shares through heavy investments.

b2b-website-besuchererkennung
All Categories

Website visitor identification on autopilot: The GDPR and UWG traps

In addition to identifying companies, platforms such as Leadinfo and Leadfinder also provide the names, positions and direct email addresses of potential contacts for carrying out cold calling campaigns. We take a closer look at the data protection and competition law pitfalls surrounding scraping of contact data, misleading cookie consent handling and the possible incitement to breach of law through illegal cold calling infrastructures. ‍

e-mail-marketing, kontaktgewinnung
All Categories

System-relevant B2B marketing: From cost center to profit engine

When economic conditions get tougher, a painful reflex ensues in many B2B companies: The marketing budget is cut first. Why? Because marketing is often not perceived as business-critical. Anyone who wants to pass this “dismissal test” must free themselves from dysfunctional lead acquisition processes and the crippling fear of data protection.

b2b-website-besuchererkennung
All Categories

B2B website visitor tracking in an indirect sales model

Many well-known companies do not sell their products with their own sales teams. In order to tap their full market potential and concentrate entirely on development and production, they rely on an indirect sales structure with a network of legally independent distribution partners (e.g. authorized dealers).

b2b-website-besuchererkennung
All Categories

B2B visitor tracking with Leadfeeder and LeadRebel put to the test

B2B website visitor recognition: What can Leadfeeder and LeadRebel do? Does identifying anonymous company visitors help companies systematically exploit the potential of their B2B website with regard to new leads?

b2b-website-besuchererkennung
All Categories

Website visitor tracking for sales: the ten most important requirements for CRM integration

According to a Lattice Engine study, 42% of salespeople feel they don't have enough information before starting a phone call with a lead. This gap is relatively easy to fill, at least for one target group: Visitors to your own B2B website, because they provide a lot of valuable information based on their reading behavior, time and duration of the visit — provided you have visitor recognition software. Of course, there is a second condition: The lead identification data must also reach sales, ideally in the customer relationship management system.

b2b-website-besuchererkennung
All Categories

Lead qualification is a marathon — not a sprint

B2B website visitor recognition offers major benefits far beyond traditional sales work. From the digital performance measurement of classic “offline” marketing measures to the optimization of customer and partner relationships, there are a variety of possible applications. But how promising is it to win over B2B website visitors as customers if they do not identify themselves via a contact form and there has been no contact with their companies so far?

b2b-website-besuchererkennung
All Categories

B2B website visitor tracking: What can Leadinfo and Albacross do?

Even in normal times, tracking and identifying company visitors on B2B websites is an enormous asset for sales, customer service and the management level — even more so now in the absence of face-to-face events. Publicare is therefore monitoring the market and innovations in this area.

b2b-website-besuchererkennung
All Categories

B2B visitor tracking can do even more than it seems at first glance

More than just new leads: Seven use cases for B2B visitor recognition on your website: Business relationships, sales and customer advice are found more than ever in the digital space