Data Collection
Access to a large amount of high-quality data enables the more effective tracking of consumers and improves ad targeting and ad attribution capabilities. As these are valuable functions to advertisers and to publishers, improved targeting and attribution capabilities from increased access to data can give rise to a competitive advantage in the supply of display advertising services.
There is a widening divide in the volume and scope of data collected for use within the closed ecosystems of large, advertising-funded digital platforms (such as Google and Facebook) compared to the more fragmented data collected by other market participants. This means that large platforms can fuel their ad tech services with a much broader range of data than other ad tech providers, advertisers and publishers. Google has a particularly significant data advantage due to its ability to collect reliable first-party data from a wide range of consumer-facing services, which is supplemented by an extensive network of trackers on third-party websites and apps.
There are currently no close substitutes to the large datasets held by large advertising-funded digital platforms with numerous consumer-facing services, a large network of third-party trackers and access to a range of unique identifiers to link together different datasets.
On the open internet outside the walled garden of large digital platforms, data collection is much more fragmented. A key difference in how these other market participants collect data compared with the walled gardens of the large digital platforms is that it typically involves directly collecting first-party data from a much narrower subset of consumers (if any) and supplementing this with third-party data from numerous different sources. This process of matching user IDs across data sets can be inefficient, as the data collected relates to separate but potentially overlapping groups of individuals and may use a range of different identifiers, data formats, and rules.

First-party vs. third-party data

There are two main ways of collecting data. Data can be collected directly from a consumer (also referred to as first-party data) or indirectly collected from an intermediary (third-party data). The same data can be first-party data or third-party data, depending on how it is collected. For example, a consumer’s browsing history on a publisher’s website is first-party data when directly collected by the publisher, but will become third-party data if it is provided by the publisher to another party such as an ad tech provider.

First-party data

Advertisers may directly collect first-party data from their interactions with customers (e.g. visits to the advertiser's website, past purchases, participation in loyalty programs, or mailing lists). Advertisers may also supplement their own data sources by obtaining third-party data from other sources such as data providers.
Publishers may also directly collect first-party data from consumers' interactions with their own properties (e.g. users' browsing data, newsletter subscriptions, participation in competitions, and any log-in data). Publishers may similarly supplement their own data with third-party data sources to assist with optimizing the sale of their ad inventory.

Third-party data

Ad tech providers can indirectly collect a range of third-party data from:
  • advertisers (e.g. customer demographic information or target audience)
  • publishers (e.g. data on the users visiting the publisher's website)
  • third-party data providers (e.g. audience segments), and
  • other ad tech providers as a part of carrying out their roles in the ad tech supply chain (e.g. issuing bid requests, reporting ad sales).

Data collected directly from third-party sites and apps through technology

Platforms also provide a range of services and tools that third-party providers may use on their websites and apps. These include, among others, analytics tools such as Google or Facebook Analytics, advertising services such as Google AdSense, and social products. Through these tools, platforms can collect data relating to consumers' activities on third-party sites and apps such as existing user or device identifiers or their interactions with their sites.
Advertisers and publishers can allow platforms to collect observed and volunteered data directly from their own online services through technologies such as Software Development Kits (SDKs), pixel tags, and cookies. For example, Facebook partners can install such code on their websites or apps, in order to better assess the effectiveness of existing advertising campaigns to target potential customers with future ads more accurately and to obtain other insights about their user base. The code installed by partners provides information about consumers' activities on their website or app – including information about the device, websites visited, etc. – whether or not the consumer has a Facebook account or is logged into Facebook. In a similar way, advertiser and publisher websites can also install Google Analytics, which provides measurement data on how consumers are engaging with content and ads. Through Google Analytics and other tools, Google collects a wide range of data about consumers and how they interact with third-party sites and apps.
In addition, many websites and apps make use of platforms' SDKs to provide social sharing buttons, such as Facebook's Like and Share buttons and Twitter's Tweet button, to encourage existing consumers to share on platforms and attract new consumers. Through these buttons, websites and apps send additional data concerning those users' activities on that website or app to the platform through SDKs.

Data collected through sign-in functionality on third-party sites or apps

Platforms collect data when consumers sign into an app or website using their sign-in functionality, whereby consumers can securely sign into third-party apps or websites without having to create, authenticate and remember new usernames and passwords (e.g. Log in with Facebook).

Categories of data collection

Volunteered data

Information that is intentionally provided by the data subject. For example, in a social media platform context, this includes information that consumers provide when creating or updating their profiles (e.g. date of birth, gender, email address, mobile phone number, declared interests), but also their posts, photos, comments, etc. In search, it includes users' search queries.

Observed data

Information that is recorded about the person and what they do. Examples include consumers' browsing history, time spent and clicks performed on a webpage, time of the day of log-in and log-out, groups joined, and friendships on social networking platforms. Observed data also includes data derived from users' devices (device data), such as type of device (e.g. desktop vs. mobile), operating system and its version, browser, and IP address.
Market participants can also collect observed data when consumers are not actively using their services. Depending on the privacy permissions set by the consumer, mobile applications, for example, can be set to record and send to the platform the device's location at regular time intervals even if the application is running in the background.

Inferred data

Refers to additional information about the person, not directly provided by or observed from the person, but which is derived or deduced from this information. This process combines volunteered and observed data about one consumer and about other consumers to infer additional information about that one consumer. For example, a user's IP address can be used to infer their location. In turn, this can be combined with census demographic information to infer characteristics such as education, income, and ethnicity. Empirical research shows that it is possible to infer a large number of user attributes with satisfactory levels of accuracy, including some complex ones such as personality traits.
Platforms typically group these user profiles into audiences characterized by a specific intent, demographic characteristics, and interests, and these audience segments are then offered to advertisers as bases for targeted advertising. Any given individual can be a member of multiple audience segments. There are very many audience segments, some of which can be very granular, and advertisers can use combinations of segments to achieve highly targeted advertising.
The most common audience segments offered by advertising platforms are a demographic, such as Female, 25-34 years old, Education Status: Bachelor's Degree, Homeownership Status: Homeowners, Marital Status: In a Relationship, and Parental Status: Not A Parent, and a large variety of interest-based segments, such as Home Improvement, Pets, and Computer Hardware.
Search advertising platforms offer in-market audience segments based on the user's recent search queries, which are particularly valuable to advertisers, as they signal that a consumer is actively considering (or in the market for) a purchase.
Some platforms also offer advertisers the ability to create custom audiences using their own first-party data that they supply to the platform (also known as retargeting), and some platforms additionally offer to find individuals who are similar to the advertisers' existing customers (also known as similar audiences and lookalike targeting/ audiences).
Last modified 7mo ago