Powering Crumbs with the Fractal Protocol

Core value proposition: incentivized data provisioning

As shown below, the Fractal Protocol could incentivize different actors — but the core value proposition for eyeo is being incentivized to provide data.
While FCL lives in Ethereum, this incentive will be sponsored by Fractal, but once we're live with Polkadot most of those incentives will result from progressively minting an additional 400M tokens over a period of 30+ years.
Building on blockchain allows us to design economic incentives to reward participants who further the system’s objectives, and penalize those who harm it. Besides incentivizing the maintenance of a decentralized commons, cryptoeconomics also plays a key role in spurring adoption — mitigating the cold-start problem facing many multi-sided networks. We intend to disproportionately reward early adopters of the Protocol, whose participation plays a key role in kickstarting the Protocol's flywheel.
By staking FCL behind the data they contribute, eyeo stands to especially benefit from early protocol adoption. These funds are eyeo's to use as they please — e.g. to fund further product development or marketing, or to build their own incentive program and pay websites for first-party data. These rewards can equally be reaped — and possibly shared with eyeo — by browsers and other native integrators of their technology.

Decentralizing storage and improving cohort modeling

We believe this model can be improved with the protocol, for example by:
  • increasing data availability and accuracy;
  • avoiding data loss in pseudonymization;
  • delegating storage and cohort modeling responsibilities;
  • decentralizing storage to mitigate honeypot risk;
  • improving the quality of cohort modeling;
  • supporting cross-device identities.
We will start by describing the new actors involved, and then describe data and payment flows.

User Agents

These are online services that act on behalf of users. They speak a common protocol, each user having one user agent, and each user agent belonging exclusively to one user. They are composed of 3 parts:
  • A unique identifier: anonymous (e.g. a UUID), pseudonymous (e.g. hashed email address) or otherwise (e.g. phone number).
  • User data: e.g. personal data, browsing history, purchase history.
  • Data sharing rules: a set of rules describing e.g. the pricing and allowed purpose for data sharing.
Since each user agent isolates user data, we no longer have a single honeypot where data can leak from.
In the context of the Web, User Agent is technical lingo for Browser. The idea was that a browser would assist the user by intermediating their relationship with the Web — for example, by blocking ads, modifying website colors for the colorblind, or enforcing parental controls.
I like User Agents in our context because of this idea: since browsers didn't really live up to this ideal, and serve mostly their own interests, they don't deserve to be called user agents, so we reclaim the term. But this is likely to generate confusion, so we probably need a better name.
From the outside, user agents are APIs for negotiating access to user data. From the inside, they could be either online servers (in the spirit of Solid Pods) or edge computing applications living in user devices (like PolyPoly polyPods). These can be provisioned by the user themselves, but it's more likely they'd be provisioned by Fractal and Crumbs on their behalf (hinting at Crumbs being a Data Union).
Down the road, in an open market, this would be facilitated by an array of service providers (IAAS providers such as Digital Ocean, PAAS providers like Heroku, or SaaS providers like Zapier), incentivized by the protocol to build user agents with different pricing models and data collection and sharing defaults. Since users (or Crumbs) can vote with their feet by choosing which providers to use, we expect a competitive market to emerge, leading to several optimal solutions for different classes of user preferences.

Cohort Modelers

These are online services that use (possibly proprietary) modeling techniques to choose the most fitting (likely in terms of conversion) cohort for a user.
Since Crumbs can choose among different services based on cohort quality, we expect a competitive market to emerge, leading to several optimal solutions for different classes of user data and target cohorts.

Data Curators

Robots can install Crumbs too — and spam the system with data in chase of ad revenue and protocol incentives. In order to improve data quality, we introduce data curators. These curators are incentivized by the protocol (see Data Curation for how this could work) to issue their best guesses as to whether user data is to be trusted.
This process of data curation could enable cohort modelers to get a sense for how reliable the data they use is.

Data and payment flows

Below we see an example of how Crumbs could leverage the protocol to achieve these goals.
Here, the website provides their first-party data to Crumbs (0). These could be, for example, purchases the user made, or their entire browsing history on the site. These data are combined with Crumbs' own observations (1).
Instead of sending these data to a centralized database, it sends them to their user agent of choice (2), which in turn negotiates data sharing requests with cohort modelers (3) that report the best cohorts to Crumbs (4).
Below is an example of the payment flows for the model above.
These flows can be made to track value and foster incentives more precisely, if data provenance is tracked so that its originators can be compensated. For example, it's conceivable that websites are compensated directly by data modelers for their first-party data contribution.
Let's zoom in on the processes conducted by these new actors. Below is a slightly modified perspective.
  1. 1.
    Crumbs pushes data to the user agent alongside its current operations.
  2. 2.
    The user agent record the user's identity and data (properly anonymized and hashed) on-chain.
    1. 1.
      It increases modeler certainty in data accuracy.
    2. 2.
      It enables the operation of curators which would vouch for or against said data.
    3. 3.
      These data are only correlatable by the user agent.
  3. 3.
    Data curators monitor the chain for these data, correlate them with permission from user agents, and e.g. assign a score.
  4. 4.
    A cohort modeler posts a data request to the billboard: "I'll pay you X if you share your browsing history with me, provided you've read an article about dogs, purchased dog food, and visited dogs.com".
  5. 5.
    Revenue-seeking user agents monitor these data requests and choose whether or not to meet them based on its rules. As part of this negotiation, modelers can choose to consider data scored highly by curators.
  6. 6.
    Cohort modelers could deal directly with DMPs, which in turn package and sell data to DSPs and SSPs. This would enable Crumbs to continue business as usual, while this ecosystem of new actors grows on its own.

Consistent identity and data across devices

Once a User Agent has been provisioned for a user, they could make all their Crumbs installs aware of this User Agent (or Crumbs could do it for them with an account registration system). This would allow Crumbs to build better cohorts for each person, considering all their data — instead of building cohorts for devices.

Privacy concerns

The protocol needs to ensure on-chain data about a single user is not tagged with a single persistent identifier, since this could lead to correlation and re-identification. Additionally, we're mindful that the presence of curators might cause a privacy unraveling effect. We could address these concerns in 2 ways:
  • Each time data is written to the chain, it is written against a derived identifier such that only the user agent can identify which data points belong to them. This voluntary correlation would happen during the negotiation and curation processes.
  • Differential privacy techniques, such as bayesian privacy, can be used at several points in this model: when data is sent to the user agent, when it's written to the chain, and when it's shared with the modeler.

Getting started

This is a model that can work in parallel with Crumb's current operation, until the quality of available services is at a point where Crumbs can disable their own storage and modeling if they choose to do so.
To this end, Fractal would initially handle the initial provisioning of user agents and cohort modelers, possibly through grants for development teams and partnerships with related organizations.
In terms of user journey, nothing changes initially other than a privacy policy update to enable this new model. When they download an extension and set it up, Crumbs could ping Fractal's user agent API, causing a user agent to be deployed on behalf of this new user, a reference to which would be stored in crumbs. We may also want to give the user the chance to get access credentials to their user agent.

Enabling privacy-preserving tracking

Several industry initiatives, in an effort to sidestep third-party cookies, are converging around a few proposals that operate similarly to Crumbs. These proposals are all moving targets, and there's an opportunity for Crumbs to stay ahead by delivering a sound alternative that meets the same goals while keeping the user in control of their data. How could our protocol support privacy-preserving tracking in Crumbs, as MaCAW proposes to do for PARAKEET?
Crumbs intermediates ad requests. When a website sends a request to an SSP, Crumbs intercepts it, anonymizes user data, and adds user cohorts. Much like similar solutions, this intermediation disables several advertising use cases, such as brand safety and frequency capping: without good information about the user's identity and their context, these use cases cannot function.
DOVEKEY (for FLoC) and MaCAW (for PARAKEET) are two proposals that address this issue. They propose MPC (multiparty computation) schemes that would bring these use cases back in a privacy-preserving manner, by enabling them without revealing privacy-compromising information. Both proposals suggest creating a new mechanism, whether in the browser or elsewhere, to facilitate privacy-preserving ad execution using encryption. Crumbs believes these proposals aren't solid ground to work on, especially given the novelty of MPC and the potential latency issues it introduces to the process.
If Crumbs uses the protocol, one option is to simply allow these data to be shared if the user agrees. Since the user is in control of their agent, they're in control of the data flows and it would be up to them to allow or disallow them based on their rules. Here, decentralization plays a large role in pushing the decision to the user, ideally the ultimate arbiter of their data destiny.
An alternative would be for Crumbs to provide C2D (compute-to-data) functionality: if the data won't go to the algorithm, then the algorithm must come to the data. For example, instead of requesting context data to determine brand safety for a bid, the advertiser would send this algorithm to Crumbs to be ran on local data without exfiltration. C2D is comparatively trivial to implement and would likely sidestep the latency issues.
Crumbs would expose an endpoint that the advertiser's DSP would use during the bidding process. The DSP would send an algorithm to Crumbs. Crumbs would run it against the current user context and data, and return the result of the computation to the DSP.
  1. 1.
    Crumbs gets the user's context
  2. 2.
    Crumbs intercepts an ad request to the SSP
  3. 3.
    Crumbs anonymizes the ad request before sending it to the SSP
  4. 4.
    The SSP sends the anonymized request to a DSP with brand safety preferences
  5. 5.
    The DSP sends a brand safety algorithm (e.g. "is the website domain included in this list?", or "does the current website contain these words?") to Crumbs, which runs it against the current user context and data
  6. 6.
    Crumbs returns the result of the computation to the DSP
  7. 7.
    The DSP returns a bid for the ad to the SSP
  8. 8.
    The SSP accepts the bid and causes the ad to be served
This way, Crumbs can enable a brand safety decision to be made online, during the bidding process. For this, Crumbs needs to provide an execution environment (likely JavaScript and/or WebAssembly). It might also benefit from using an algorithm or DSP allowlist, in order to prevent the execution of syphoning algorithms which just return user data.
Computation results could be verified using zero-knowledge proofs and/or written to the blockchain for non-repudiation. This would help with reputation management, and would lower latency during the RTB process by allowing for post-hoc verification.
Last modified 7mo ago