- Take a few moments to read the opening README.md file which provides a brief overview of this project.
- Interact with this and other articles I'll post here. I'm looking forward to hearing your thoughts, questions, and concerns.
Discussion Link: https://hachyderm.io/@davidshq/109881053596337824
Many businesses use tracking and analytics software on their sites to collect data about their visitors. This data can be used to improve the visitor experience - e.g., recommending products/services the visitor is likely to desire. It can also be used to increase a company's profitability and incidentally or intentionally be shared with third parties.
While privacy centric services have long been available it is only within the past few years that awareness of just how much data is being collected by sites has become widespread and the public has begun utilizing these services in larger numbers.
This has driven privacy conscious individuals to use services like DuckDuckGo (search), Brave (browser), and ProtonMail (email) to lessen the data company's collect about them.
Phoebe's unique value proposition is it's use of human augmentation to improve search results. This curation isn't meant to be performed by a small cadre of employees but by the majority of the user community (millions upon millions!).1 With this being so, how can Phoebe be privacy centric?
When a user first visits Phoebe we will collect as little information as possible to service their request and we will forget what we know as soon as possible. Phoebe will not use the activities of these individuals in ranking search results.
"No account required. But you might want one."
-- Mozilla Firefox.
A user can create an account without associating PII with it. That is, no name, email address, phone number, or other PII is required.
The user can then explicitly choose to optin to various features of Phoebe that result in varying levels of deanonymization.
For example, a user might decide they want to be able to remove some sites from the results returned by a query. In order to provide this functionality Phoebe will have to store information on which sites the user has removed. This information will not be used by Phoebe to rank search results. The user reveals something about themselves by removing specific sites but the level of privacy is still quite high.
A user might even decide they want to opt into contributing their curation activity to the ranking of search results. 2 This requires more personal disclosure but because there is no PII linked to the account privacy is still maintained.
Individuals may choose to provide PII to Phoebe in order to gain some authority in the system (e.g., email address, phone number, etc.) but this is not necessary to be a contributor and delivers only limited boosts in authority.
They may also choose to contribute to the search results but decide explicitly what they want made public with the rest of their curation remaining private.
They will be able to go back and review the changes they have made publicly/privately and change the status of whether that data is shared with Phoebe or not.
"I include a lot of personal health data, test files, what bacteria live in my poop (yes, really). Feel free to share all this. I don't care about my privacy."
-- Serge Faguet, founder of Mirror AI, Tokbox, and Ostrovok.
Individuals will also have the option to make their contributions automatically public but this is an explicit optin choice and they can remove their contributions in whole or part at any time.
Lines have to be drawn somewhere so that Phoebe is something that can be built and launched in a reasonable amount of time. For that reason some features that are good have been set aside in neat boxes I hope to open in the future.
One of these aspirations is to allow users to store their PII and curation data wherever they wish. For example:
- On your local machine3
- On a server you control
- On a third party storage service (e.g., Dropbox, Microsoft OneDrive, Google Drive, etc.)
- On a self-hosted solution
I am particularly interested in seeing how Phoebe could utilize Tim Berners-Lee, et al.'s Solid project which detaches data storage from applications and offers a higher level of interoperability between applications acting on shared data.
Discussion Link: https://hachyderm.io/@davidshq/109881053596337824
Footnotes
-
This scale may seem impossible but think of how many people are curating right now - on social media platforms (Facebook, Twitter, Mastodon, TikTok, etc.) and contributing to crowdsourced projects like Wikipedia. This scale is already being achieved by other projects. ↩
-
Anonymous accounts ability to contribute to search results is limited until they have demonstrated a high-level of quality over a sustained period of time to prevent abuse of the system. ↩
-
Even if someone chose to keep their curation data off of Phoebe's servers they could still actively contribute to curation. That is, they might explicitly allow Phoebe to read and utilize the data in it's anonymized and aggregate form. This raises some significant technical hurtles but is something I'd like to explore in the future. ↩