In a modern P2P protocol white paper, under a section titled Network Privacy, there is a section that reads: “There is an inherent tradeoff in peer to peer systems of source discovery vs. user privacy.” I disagree with the statement & the impact resulting design decisions have on privacy.
The system defines a source discovery as the IP:Port pairing of a peer that has access to the data you want.
(I really wish we wouldn’t build P2P networks directly on top of IP addresses. We have better overlay tech…)
To clarify some terms, Content Discovery and Source Discovery are two different (but related) problems. Content Discovery is the problem of finding that some content exists. Source Discovery is the problem of knowing where to get it. To use an example from the old web, Content Discovery is often solved by Google. Source Discovery is is simply a direct connection to a URI.
In the P2P world, if I have a document identifier the problem becomes how do I translate that to the actual document.
(In old web terms, I have a URI, I do DNS resolution to get an IP address and I initiate a HTTP connection and the server sends me the content)
P2P tech complicates the solution because, often, a document identifier no longer belongs to a single server, it has been published to the network and could be living anywhere, in multiple places, hosted by many different, dynamic peers.
And thus, many P2P systems state that this constraint makes privacy hard. The reasoning goes that because a peer must ask other peers if they know where to get the document, and the more peers they ask the more peers who know that they have asked for a given document. This widening metadata problem is, to many, intractable.
Of course we know that Freenet presented solutions that provided pretty strong guarantees for reader anonymity and publisher anonymity nearly 2 decades ago. We’ve known that we can do private source discovery in P2P networks for literal decades.
Further, we now have networks like Tor and i2p which present really neat peer addressing solutions that anonymize IP endpoints and protect publishers and readers almost out of the box. (with some caveats)
We know that metadata analysis is the thing that drives mass surveillance systems. Why in 2018 are we building new P2P networks that don’t offer any reasonable privacy guarantees against mass surveillance capable adversaries?!
It seems like a large number of people are talking about “the new web” but it seems clear to me that these technologies and protocols have learned none of the security lessons we’ve been taught in the last few decades.
The point of P2P tech is to distribute trust. You can’t distribute trust without consent. You can’t consent without meaningful privacy. Privacy should be a foundational element in any P2P stack, not a challenge, or a footnote, or a “maybe we will get to this in the future”.
When I originally wrote this I was picking on a particular protocol. I’ve redacted that protocol from this copy because the leaders in that community responded really positively to critiques about the privacy issues with their protocol. I also made it clear in the original version that this isn’t a problem with just a single protocol, it’s a problem endemic to the new generation of P2P systems.
I don’t know why. Maybe developers think that anonymizing networks are slow and have limited bandwidth.
(This is partially true, but mostly a resource issue not a fundamental tech issue)
Maybe it’s a knowledge problem. Surveillance and privacy are marginalized issues that impact different communities unevenly. Solutions to these problems exist but may not be considered by those building the systems as high priority.
Maybe I’m being over demanding of a bunch of community led, open source projects to consider use cases that they don’t have the bandwidth to consider.
I want to live in a world where we have a diverse set of P2P systems. I want these projects to thrive. Fundamentally I think that is the only way we can hope to achieve decentralization of trust and a free and open internet ecosystem that resists censorship and surveillance.
But we have to build privacy into these systems from the ground up, at the first platform layer, not the application layer - application privacy does not work.
I’m not sure if the current generations of systems can have privacy built into them. My experience and intuition says probably not. Privacy is really hard to layer onto a system after design.
I’ll leave this with the following thought:
Privacy is not an optional design element. When you refuse to build privacy into a system you are further marginalizing populations, enforcing censorship and encouraging surveillance.
When you refuse to build privacy into a system you are stating that you believe that only certain types of people should be able to use your system, and only for certain things. You might not intend that, but that is fundamentally the result.