Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using proximity to mitigate the Sybil attack #4481

Closed
3 tasks
synctext opened this issue May 3, 2019 · 43 comments
Closed
3 tasks

Using proximity to mitigate the Sybil attack #4481

synctext opened this issue May 3, 2019 · 43 comments
Assignees

Comments

@synctext
Copy link
Member

synctext commented May 3, 2019

We utilize the inherent fraud-resilience of proximity to stop unlimited fake identity creation

Close proximity is a property which is easy to validate. Also online it is possible to prove that somebody is less than 500km away. This enables us to distinguish close acquaintance from far away strangers.

Simple methods such as sending a large number and expecting it to be returned quickly offers a tamper-proof proximity measure. The laws of physics and limited speed of light prevent tampering and can prove that you are located nearby.

This master thesis will create a mathematical model which reflects the success of proximity-based defenses for a given strength of the Sybil attack.

We make a realistic assumption on attacker capabilities. It is far harder to create fake identities in close proximity to a victim than at any other random location on the Internet. We assume that attackers have finite resources and some financial constraints. The goal is to prove strong mathematical properties of defenses and attacks. In a generic cooperative agent model build trust after several successful interactions. We believe that a successful strategy to build cooperative cluster in an overwhelmingly defectors-dominated world is to do some initial risk taking when meeting strangers. By adding our proximity mechanism it is possible to build highly effective mechanisms to identify cooperators in close proximity plus test, discard non-cooperators, and mitigate (Sybil) attackers. We further assume cooperators can gossip about the collaboration levels of others using a tamper-proof data structure, for instance, our IETF Internet Standard draft trustchain.

Seuken proved that it is impossible to defend against the Sybil attack in the general case, in any system which has single-report responsiveness. This thesis work is meticulously designed to bypass that impossibility result and create a meaningful, effective and even highly efficient system without single-report responsiveness. We have looked into restricting communication to strangers using proximity for years, #2541. Prateek Mittal from Princeton exploited the temporal dynamics of the Sybil attack for a defense. This work also builds upon our our prior work by Pim Otte.

Desired outcome:

(still tentative, we just see how far we get with this master thesis direction)

@synctext synctext added this to the Backlog milestone May 3, 2019
@alexander-stannat
Copy link

Reading List (done):

Trust, Reputation and Accounting Mechanisms

Evolutionary Game Theory

To read:

I have written a pretty comprehensive introduction to the trust problem and have been thinking about the different existing trust and reputation mechanisms, such as

  • BarterCast
  • Netflow
  • Eigentrust
  • DropEdge
  • TrustRank
  • etc

I wrote down all necessary requirements that our mechanism will have to fulfill and devised a mathematical framework for these. In the coming week I will finish the framework for trust and incorporate elements from Nowak's papers about the development of trust and reputation and evolutionary game theory, so that we can send the first draft to Japan.

For the geographic proximity I have thought about this:
When an agent (A) in the network decides that one of its peers (S) might be a sybil, controlled by another peer (B) in the network. Then A wants to determine if S is, in fact the same person as B. In order to do this, A will ping both B and S and depending on the time it takes for the ping to return, A can determine whether its assumption is correct. In order to prevent S or B from spoofing the ping, by artificially delaying their response time, A can collude with other trusted peers in the network and send out multiple pings from different locations. The goal is to geographically surround S with A's trusted partners, which will prevent S from lying about its location. The resulting ping times are cross-referenced to determine whether S and B are in the same place.

There is a slight problem with this though. How many peers does A have to work with and where should these peers be located in order for no credible delay to exist with which S can convince everyone that S and B are not in the same place. I will perform some testing over the weekend and come up with some formulas in the next sprint.

@devos50
Copy link
Contributor

devos50 commented May 18, 2019

How many peers does A have to work with and where should these peers be located in order for no credible delay to exist with which S can convince everyone that S and B are not in the same place.

I would say this highly depends on the trust requirements of the application in which you deploy this system. It would be nice to have some model that determines: peer S is a Sybil, controlled by B with X% confidence. This is influenced by the number of peers you select for the "ping test", and the variance in ping times.

@synctext
Copy link
Member Author

synctext commented May 20, 2019

Fascinating discussion outcome:

  • emergent effect: only cooperators are long-term successful
    • specifically aligned the incentives.
    • applied evolutionary game theory
  • triangulation of exact location is possible, but unacceptable violation of privacy for majority of Internet users
  • close proximity is impossible to fake, it is indeed a very restrictive mechanism
  • goal is to create a full-proof mechanism that is guaranteed to always work
  • Transitive exploration of trust
    • our mechanism is very restrictive in the bootstrap phase
    • Strategy: build cooperative cluster in an overwhelmingly defectors-dominated world
    • do some initial risk taking when meeting strangers
    • after the bootstrap phase, we continue to meet strangers
      • reliable cooperators introduce us to their reliable cooperators.
      • generalized close proximity: either close to us or to our reliable cooperators
      • Emergent effect: successful clusters of cooperation grow, link with other clusters, and merge into a single connected cooperative cluster.
  • close attacker can generate an infinite number of Sybils
    • Fundamentally impossible to stop them initially
    • Each Sybil requires an Internet point of attachment to communicate (IP address)
    • We require another mechanism to protect us, use IPv4 address blocks
  • Close attacker protection mechanism
    • Like normal city neighborhoods and houses, the Internet not only has a sense of proximity, but also the concept of a block of houses, called Internet address blocks
    • stop infinite identity creation by close attackers within your neighborhood
    • attackers always control a limited amount of Internet address blocks
    • Maintain a reputation score for each Internet address block (IPv4 range)
    • Prefer to meet strangers in higher reputed neighborhoods within your proximity (like prefer traveling up North versus going to the unsafe South)
  • to illustrate the effectiveness of repeated interaction and evolutionary survival: http://netlogoweb.org/launch#http://netlogoweb.org/assets/modelslib/Sample%20Models/Social%20Science/Unverified/Prisoner's%20Dilemma/PD%20Basic%20Evolutionary.nlogo
  • Like postal codes/zip codes the Internet has neighborhoods, see Internet address block allocation:
    image

@synctext
Copy link
Member Author

initial netlogo simulation of 90% defectors with 10% cooperators. Shows the evolutionary survival of the blue (cooperators) versus the defectors (red).
Peek 2019-05-21 07-49

@synctext
Copy link
Member Author

More on the social side of "social proximity":
Online exchanges mimic the close ties once formed through face-to-face exchanges in villages, but on a much larger and unconfined scale. In other words, technology is reinventing old forms of trust

from the book: What's Mine Is Yours: The Rise of Collaborative Consumption, plus mentioned here.

@cuileri
Copy link

cuileri commented Jun 13, 2019

And more on the trustworthiness-proximity relation, from The concept of decentralized and secure electronic marketplace, 2008:

In traditional commerce, the trust between buyers and sellers is based on such things as societal laws and customs, and on the intuition people tend to develop about each other during interpersonal interactions. **The trustworthiness of these factors is based, to a large extent, on the geographical proximity between buyers and sellers.** It is the physical venue of the trade that is subject to specific trading laws, which may be enforced by the local police; and it is the eye contact between the trading partners that may induce a degree of trust between them.

@MeadowHillSoftware
Copy link

What if B controls a botnet?

@synctext
Copy link
Member Author

synctext commented Jun 26, 2019

Generic scientific problem formulation attempt draft

We demonstrate a strategy for cooperation in which only cooperators are long-term successful, given some assumptions. Our strategy is an "evolutionary stable strategy (ESS), meaning that if adopted by a population in a given environment, is impenetrable and cannot be invaded by any alternative strategy.

We assume the existence of a secure identity function. The agent population is not assumed to be static. New agents continuously enter the system. The secure identity function ensure agents can't defect in one round, re-enter the system with a new identity, and defect in future rounds. The secure identity function creates a probability that previous defections can be linked to the misbehaving agent. In general, each agent in our system can rely on the secure identity function to test a certain property of another agent which is unequivocally true and is resilient to strategic manipulation. (protects against botnet, as correctly remarked by @MeadowHillSoftware)

We assume the ability of agents to gossip successful acts of cooperation. We define this as the goodness_gossip(). Any interaction based on gossip and trust is exploitable. Our strategy relies on mitigation the effect of defection using the secure function.

Each agents uses a reciprocity_function(secure_function(), goodness_gossip()) to accept or reject requests for cooperation. Depending on both the outcome of the secure function and the gossip the agent can classify others as pure strangers, low-trust agents or long-term cooperators.

The core of each agent strategy is to build personalized community of trusted collaborators. Each agent is either building this community (bootstrap phase) or maintaining it (bootstrapped). Specifically, each agent continuously maintains a small community (e.g. 50) of other agents it has build up a long-term reciprocal relationship with. This community is different for each agent. We make the cardinal assumption that these long-term cooperators are also truthfully reporting on the cooperative behavior of others. These collaborators are used to conduct goodness_gossip().

The uniqueness of our strategy is the cryptographic protection of goodness_gossip(). We assume the existence of a mechanism which can guarantee that the original message can not be modified or tampered with. Agents can not hide gossip. They have to truthfully report all their interactions, or collude with other agents to hide them. When interacting with honest agents, successful gossip manipulation is impossible.

Our mechanism is very restrictive in the bootstrap phase. Reliable cooperators introduce us to their reliable cooperators. After the bootstrap phase, we continuously are introduced through gossip with strangers. We then expand our network into non-reciprocal agents, we transition into network reciprocity strategy. Emergent effect: successful clusters of cooperation grow, link with other clusters, and merge into a single connected cooperative cluster.

ToDo: mathematically prove certain attacks to be unexploitable.

Any exchange of value based on trust is exploitable

@ichorid
Copy link
Contributor

ichorid commented Jun 27, 2019

After the bootstrap phase, we continuously are introduced through gossip with strangers. We then expand our network into non-reciprocal agents, we transition into network reciprocity strategy.

Sounds like a "proof of good deed" or "proof of public service".

@synctext
Copy link
Member Author

Thnx for explaining your token cap idea:

  • simple model (exploitable)
  • second model: with token limits (give newcomers initial benefit)
  • third model: add neighborhood principle and bad neigborhoods

Emergent property: certain regions have ran out of "trust"; no kindness to strangers, other regions trust strangers.

@alexander-stannat
Copy link

Current idea:

Leech-Caps:
It is our goal to enable newcomers to the network to consume some data before having to contribute. A node that wants to bootstrap is eligible for data limited by an upper bound, which we call the leech-cap. Every agent in the network has personal cap of how much they're willing to seed to an unknown node that they have no transaction history with yet. A node can leech up to the cap and no more thereafter. In order to prevent excessive leeching by a group of new nodes, this cap decreases after every contribution made. This way we can prevent uncooperative behaviour through whitewashing and Sybil attacks.
The leech-cap of node i, could be determined through an exponential function and the formula might look something like this:

Transitive Caps:
A node's leech-cap is also partially determined by the leech-cap of its neighbours (contributers only). This way if any excessive consuming without contributing occurs, node's in the region affected will reduce their caps collectively, rendering such an attack less beneficial. The effect the caps of nodes have on the caps of their neighbours is determined by the weight of the connecting edges.

We therefore obtain a recursive formula for a continuous process of cap updating in the graph. This makes Sybil attacks, without any significant work put into the network by the attacker, disadvantageous.

Maliciously increasing honest node's caps:
An attacker might decide to perform some work to increase the cap of honest nodes in the network such that other identities controlled by the attacker can benefit from the increase in capsize. This is prevented by an upper bound that the leech-cap cannot exceed. Hence, any work performed in such a way will have a decreasing return in the amount of work gained. This makes strongly beneficial Sybil attacks unfeasible.

Question:
A question that still needs to be answered is: What can be done to prevent the leaking of leech-caps into a particular region of the graph? When a node leeches from another node, some of the contributer's cap wanders over to the leecher. If the leecher does not provide any work for the network thereafter, this cap never 'moves'. If this happens a lot the network might loose all its cap and could then stagnate.

Idea:
Maybe introduce another kind of cap, called the seed-cap. While the leech-cap determined how much a node should be willing to contribute, the seed-cap determines how much a node is entitled to receiving from the network. When a node requests some work performed, the seeder will determine the requester's seed-cap and its own leech-cap and then provide no more than the minimum of the two. This could prevent excessive leaking of leech-cap out of the honest part of the network and provide some additional robustness against Sybil attacks.

Geographic Proxmitiy:
The Seed cap could be distributed among newcomers based on geographic proximity, i.e. if there are a very large number of bootstrapping nodes in a particular region then the seed-cap is split up among these nodes. This means an attacker that creates multiple identities does not gain any additional work, because the work they're entitled to is distributed among the multiple identities.

Note that this is not a mature idea yet and still needs to be given some thought.

@ichorid
Copy link
Contributor

ichorid commented Jun 28, 2019

There are two kinds of resources: the ones that can be stored (stockpiled) and the ones that can't. One cannot store bandwidth. Owning a PC that has some bandwidth available to it at all times is like owning a small personal hydro power plant: I don't care who uses my electricity as long as I (first) and my friends (second) have priority of using it at all times. I'm completely fine with giving my "free" power to strangers if no one else wants it, because I can't store it. At any moment, I cannot consume more bandwidth than my uplink allows me to, thus I can't "save up" some bandwith for the future. Basically, this is a post-scarcity economy.

We don't need accounting. We need priority queues based on reciprocity.

@synctext
Copy link
Member Author

synctext commented Jul 1, 2019

We need priority queues based on reciprocity.

Excellent point. Many users have monthly caps, so bandwidth is finite. Unused bandwidth can be given away. The effort of leaving you laptop running overnight can be a burden.

@cuileri
Copy link

cuileri commented Jul 29, 2019

We now have this fraud-resilient overlay idea:

Fraud Resiliant Overlay

We want to create an overlay network which makes it difficult for sybils to gain majority in a specific node's neighborhood.

Motivation:

  • Sybils form close proximity regions
  • Latency is difficult to fake (?? to be discussed ??)
  • Limiting the number of neighbors from a specific distance (region) decreases the chance of a sybil region having the majority in the neighborhood of a single node.

Input: Network Layer
Output: Overlay Network where a node's direct neighbors are as distant as possible from each other.
Requirement: A discovery mechanism
Procedure:

  • Using discovery mechanism, learn about as many nodes as possible.
  • Group the nodes according to their distances (ping-pong time, latency).
  • Select nodes from each group fairly (w.r.t groups) and randomly (within group)
  • Use a b-matching algorithm to find the overlay network.

Notes - Todos:

  • Find another (possibly bidimensional) distance measure: Current idea of measuring distance returns a scalar distance value. (Two very distant nodes which have same distance with another node A are considered by A to be in the same region.)
  • Adapt distributed b-matching algorithm to this case. Use randomness to create edge weights in such a way that, when the edges are ordered according to their weights, the order sequence visit the regions fairly and one by one.
  • Find and prove the probability of nodes in a sybil region gaining the majority in one's neighborhood.
  • Find and prove the probability (for a byzantine node) of getting over a “double-spend detection” mechanism built on top of fraud resiliant overlay.

Related Works:

Where to use
Fraud resiliant overlay will help the upper-layer mechanism to have the following features and guarantees:

  • Incentives for cooperation: Given the fraud resiliant overlay network, nodes select their counterparties only from their two-hop neighborhood. The node has to cooperate with its neighborhood to develop its identity.
  • Information hiding detection: (relies on the gossiping between peers)
  • Double-spending detection on Trustchain: (relies on the gossiping between peers)
  • Eventual-consistency of Trustchain: It is now our intuition that, assuming fraud-resilient overlay and a gossiping mechanism on top of this overlay, we can prove eventual consistency of Trustchain.
  • Risk-aware partner selection: Nodes have a threshold value for the maximum amount of tokens they can trade with a specific node, at a specific time. The threshold can change according to positive/negative observations about the trading partner. (Leech-cap and transitive-cap idea of Alexander)
  • Initial drawings of model:

fraud_resiliant_overlay

@grimadas
Copy link
Contributor

Some minor comments on the model:

  • Latency can be faked but only in one way. Peer can artificially slow itself down, but that would have an effect on upper layers.
  • Random walks are also helpful for avoiding Sybil regions (?)

@alexander-stannat
Copy link

Comments about my discussions with Hisashi Ohtsuki at Sokendai university in Japan:
I introduced Hisashi and his team to the trust problem in the tribler context with a presentation. The idea was to introduce the problem from the perspective of evolutionary biology and evolutionary stable strategies that induce cooperation in a population. Hisashi and Martin Nowak identified a set of 8 strategies which they labelled the leading eight, inducing cooperation and deterring defectors. They introduced a set of axioms which provide a framework for what properties such strategies need to satisfy in order to foster cooperative communities.

Hisashi's idea was to introduce the notions of reputation scores in combination with behavioural strategy sets. Given a fixed but arbitrary reputation score, agents choose a particular strategy.

Example Reputation Scores:

image

Example Behaviour Strategies

image

In Hisashi's and Nowak's case, reputation values are binary, whereby 0 is a bad reputation and 1 is a good reputation. The set of strategies is binary as well , with C standing for "cooperate" and D for "defect".

A behaviour strategy is a method , which determines whether a node with a given reputation value wants to cooperate with another node. For example:

means that if a node has reputation 0 it wants to help a node with reputation 1.

A reputation dynamic is a function that assigns a node that has made a decision whether to cooperate or defect with another node a new reputation value, i.e.

This yields reputation dynamics and behavioural strategies.

Here Hisashi introduces the concept of a relative payoff of a behavioural strategy given a reputation dynamic. He then identifies the 8 most lucrative behaviour strategies, called "leading eight". It is our goal to find a continuous equivalent to this reputation dynamic, which additionally belongs to the leading 8. We explored a number of different graph theoretical centrality measures: EigenCentrality, KCores, Shortest Paths, Katz Centrality, Betweenness Centrality, Shapley Value, etc. We came to the conclusion that none of these metrics are truly sybil resistant and that an inversed approach might be smarter, i.e. find some kind of impossibility result. So far all of our attempts have been heuristic. Hisashi advised me to take a more theoretical approach and set up a more strictly defined set of axioms for such a dynamic function. Hisashi acknowledged that the problem was not as closely related to evolutionary biology as it first appears, as non-unique identities change the nature of the problem significantly.

I'm currently devising such a set of axioms oriented along the idea of the leading eight:

image

The highlighted tuple (d,p) would be the optimal in this case. So far, I have come up with 4: Cooperate, Punish, Apologise, Forgive:

  1. Cooperation with reputable people yields increase in reputation
  2. Punishing, i.e. defecting from low reputation does not decrease one's own reputation (it might even increase it)
  3. Apologies are allowed, i.e. a low reputation peer cooperates can still cooperate with other peers in an attempt to recover from its bad reputation
  4. Nodes that have redeemed themselves are forgiven by the rest of the network

If we choose a strict enough set of axioms we can maybe come up with a nice impossibility result.

There are number of differences between our model and the model of Hisashi's work, namely:

  • Reputation values should not be binary, but continuous
  • A reputation value should be aggregated from a node's entire transaction history and not just its most recent transaction
  • Identities are not unique (Sybil attacks)

@synctext
Copy link
Member Author

synctext commented Aug 12, 2019

Focus&brainstorm for thesis direction:

  • prisoners dilemma problem
    • multiple rounds
    • large population
    • lot of defectors
    • lots of prior work
  • assumption: complete network knowledge
  • focus on a variant of above problem, usable/valid for computer science
    • stranger policy
    • dissemination of outcomes and limited information storage
    • tit-for-tat is blind to other people's prior interactions
    • reputation systems require broad knowledge of others (e.g. personalised pagerank)
  • compare:
    • blind tit-for-tat
    • pagerank
    • informed tit-for-tat:
    • repeat prior 1-hop cooperators
    • history of potential interaction partners is known
    • push out defectors due to their history
    • it is simple & effective; hopefully can prevents lazy freeriding
    • idea we propose: tamper-proof and immutable record keeping (e.g. Trustchain)
      • not local information; not global information; now we have 1-hop information
      • key issue: how to select strangers
    • extend with sybil attack problem or not?
      • limit stranger access (any non 1-hop agent)
      • bounded resources given to sybils
      • introduce secure attribute: age, ping time, proof-of-work, passport-identity, etc.

@synctext
Copy link
Member Author

Please upload you current research notes here + add a brief 1 page with possible research directions; given discussion(s) we had. (e.g. something with strangers get lower priority)

@alexander-stannat
Copy link

Research Notes, thesis draft and possible research directions have been added here

@synctext
Copy link
Member Author

remarks, as going through the current thesis draft.
Section 1.5 possibly focus on "1.5 fully self-organising systems"
Very few academically pure self-organising systems have been successfully engineered.
File sharing system Bittorrent and cybercurrency Bitcoin are two two leading example of systems without any single-points of failure. etc.
(see terminology abuse, like, "serverless")
Possibly find more appealing "Figure 1", https://commons.wikimedia.org/wiki/File:P2P_Topology.jpg

"2 Problem Description: Sybil Attacks and Misreports"
[dramatic angle] Dealing with scarce resources and our planetary limits is on of the cardinal questions of our times. Freeriding on the public good is a key open scientific problem. For all natural resources like fresh air, pristine forest, and abundant fisheries it is possible to exploit them.
In the digital world a similar freeriding problem exists. etc.. The central problems we address in this thesis are the Sybil attack and misreports. We explain in detail the background and ...

4.1 We improve upon the notation and model introduced by [PIMOTTE] and [SEUKEN].
(my advise would be copy where possible; improve when you can; needs approval of Dion)
Specifically, we add the novelty of a trust graph to the work graph. This enables a model of information availability. This enables us to reason with partial information and lack of information. Split certainty and uncertainty.

Definition 5.1
Our model introduces a mathematical notation of the IETF Trustchain draft Internet Standard. Indexes refer to hashpointers, etc.
Recall that in TrustChain elements of these sequences are ’linked’ through hashpointers. In trustchain a transaction corresponds to a single block. Engineering details such as digital signatures are covered in other parts of our model and assumptions (or something). [please remove 14-point bullet list]

"6.3.1 Allocation Policies"
This could be the road towards the key thesis contribution???
What is more fitting/grand: allocation strategy or allocation policy

Define freeriding please

Definition 5.5 (Misreport transactions).
Definition 6.5 (Misreport trust)

Define "stranger" and initial risk taking; people of the 'shadow of the graph'. No direct observation. Or another approach to a definition: if you know their interaction history they are no longer strangers. If the stranger shares their interaction history; this action transforms them into non-stranger.

Come up with some nice proof. Suggestion bounded Stealing with parameter S by strangers with this overly strict allocation policy :-)

@alexander-stannat
Copy link

Thesis draft updated.

@alexander-stannat
Copy link

alexander-stannat commented Aug 23, 2019

Good meetings yesterday and today. A quick summary of my model (a more comprehensive discussion can be found in my thesis draft):
In yesterday's meeting we introduced the idea of a third graph structure which should be based on geographic locations of agents. We now have three graphs: A proximity graph, an interaction graph and a trust graph. The latter two are old concepts. The proximity graph is introduced with the sole purpose of nullifying Sybil attacks:

The concept of work graph is repeated below:
Work Graph

We begin by introducing what we call a similarity vector (might be renamed later). This is a vector for every node in the subjective work graph, consisting of attributes pertaining to the physical location of the node in question:

Similarity Vector

The idea behind the similarity vector is that by default Sybil nodes all will have very similar values. The individual components should relate to ping times from the seed node (as well as from certain established nodes perhaps), location in the network tree (traceroutes) and IP-addresses, etc. Note that we want to keep this as generic as possible for now.

Next, the proximity graph is derived from the similarity vectors as follows:
Proximity Graph (Def)

This means that every set of nodes in the network that are very close to one-another, by standards of similarity vectors are grouped into one neighbourhood. Note that some false positives might arise, but not false negatives, i.e. some non-sybils will be grouped into the same neighbourhood as sybils, but no sybils should be overlooked. Later on our goal will be to optimise our similarity vector in such a way as to minimise the amount of false positives through use of a clever metric...

Now, using the proximity graph we redefine the subjective work graph as follows:
Collapsing Nodes (def)

This means that all nodes that are considered Sybils by our similarity vector are collapsed into one single node, whereby all edges in between the potential Sybil nodes are dropped and we obtain a "proxy node" representing all Sybils as one. If this is implemented effectively, all fake edges in between malicious nodes are removed.

In the next step we can rely on already established mechanisms of reputation, such as maxflow, netflow, hitting time, etc.

So instead of looking like this:
One-layer trust model
our new model would look as follows:
Two-layer trust model

We would call this a "two-layer trust model" instead of our previous "one-layer trust model", i.e. trust models that are only based on the unmodified work graph.

In the next week, we will have to come up with reasonable metrics for the similarity vectors as well as determine the best possible reputation mechanisms to applied on the modified work graph now.

@synctext
Copy link
Member Author

synctext commented Aug 29, 2019

Thesis raw .PDF link ongoing discussion:

  • you are a theory person, therefore we simply assume the availability of a mechanism to produce a misreport-proof proximity graph.
  • Rapid depletion of attack edges is key, keep the 2nd order depletion effect in mind (e.g. 2-hop friendlies).
    • possibly document this attack and give it a name
    • Proof: having depletion means you are vulnerable to strongly beneficial attack
    • Your proof idea: with sufficient depletion you are protected for n th order of strongly beneficial attack
  • after your two-layer trust model is documented in your thesis, possible we document a very restrictive mechanism which is proven to be Sybil proof:
    • assume everybody can help each other (repeated PD game, large populations) {file sharing relaying}
    • long-term tit-for-tat with memory
    • proof-of-work, no negative balance
    • do not allow any transitive trust
      • only 1-hop interactions
      • everybody else is a stranger
      • strangers need to build a positive balance first (e.g. become small 1-hop interactors)

@alexander-stannat
Copy link

Summary of my progress and my meeting with Dion:
I discovered a mistake in Seuken & Parkes. Theorem 1 states that in case an accounting mechanism satisfies three conditions (symmetry, independence of disconnected nodes, single-report responsiveness) there always exists a strongly beneficial (passive) sybil attack. Their proof is based on the following subjective work graph:

image

The argument is that while no work is required to go from the above graph to the one below, node k gains some reputation through adding the edge (j,k). If the weight of the edge is large enough then it will hold . Hence, the attack must be a strongly beneficial as it holds and and therefore .

However, this is actually incorrect as the work that corresponds to the edge (i,j) is not taken into account at all. Seeing as this (the attack edge) is an indispensable part of a sybil attack, I found myself disagreeing with the assertion of this theorem. I think that this kind of attack constitutes a weakly beneficial sybil attack at most, which is a pretty big difference.

I introduced two new definitions: parallel-report responsiveness and serial-report responsiveness, based on which I have come up with two theorems that actually prove the existence of strongly beneficial sybil attacks under the assumption of these slightly more restrictive conditions.

image

From this definition, in combination with the former definitions we can derive the existence of a strongly beneficial sybil attack.

image

The second definition is given as follows:

image

From this we can derive the theorem:

image

The proofs to the are on pages 20 and 23 here. In the future we might be able to prove that any accounting mechanism which does not allow for parallel and serial report responsiveness could be strongly sybil resistance. In order to do this, we need to define the reward of a sybil attack more accurately. So far, I have come up with the definitions:

image

and

image

However, the upper definition is given in terms of work consumed, while the lower definition is given in terms of reputation gained. These two are not directly comparable and it is our goal to find a function that translates an increase of reputation into an expected value/amount of additional data that can be consumed.

Over the course of the next week I will try to come up with a function for this. However, this is likely going to be a rather difficult thing to do as it is not deterministic, but rather probabilistic. Depending on the allocation policy a node will receive some data, if it is the node with the highest reputation among all nodes in the choice set. However, it does not have any influence over the nodes it is "up against". Hence an increase in reputation is only directly related to an increase in probability of being served, making this a rather difficult problem to solve.

Finally, I discussed the model of physical proximity with Dion and he suggested that in order to avoid any excessive free-riding of Sybil agents (through being assigned to the same neighbourhood as a large number of honest nodes) I should somehow limit the number of nodes in a neighbourhood of the proximity graph. I will have a think about this problem as well over the course of the coming week.

@synctext
Copy link
Member Author

Goal of thesis: a theoretical proof of full Sybil resilience. We obtain this goal by imposing very harsh and restrictive conditions.

Definition: Samaritans are nodes which give away resources for free to total strangers. After a node acted as a samaritan, they are no longer strangers.

Definition: Positive net contribution. A particular node only collaborates with another node if the sum of all historical exchanges remains positive.

Definition: Remoteness. Unless you have a positive net contribution, we consider you a stranger and will not interact.

(now we have long-term tit-for-tat, very well studied stuff)

Definition: Custodian responsibility. A custodian responsibility is an alternative form of transitive trust. Taking custodian responsibility mean introducing a stranger to a node to which you have a positive net contribution. The amount of custodian responsibility you take is subtracted from your positive net contribution and assigned to the stranger. The stranger then becomes a samaritan, through "vouching". (not incentive compatible, but we'll get there)

Desired theorem:
Every accounting system which satisfies the remoteness, etc. custodian responsibility condition is misreporting-proof.

Definition: zero-cheating exposure is loosely defined as that you can never be cheated. Thus you are very paranoid, unforgiving, and only engage in positive net contributions. You never interact with strangers, only samaritans. Only interact to strangers with custodian responsibility. Thus there exists an initial investment, proof-of-work or attack edge. Only restrictive weak minimal transitive trust exists (add more weasel words). Open question: how do we restrict this or add incentives? Nodes primarily engage in direct reciprocity and ensure a positive net contribution.

Desired theorem:
Every accounting system which satisfies the "zero-cheating exposure" condition is Sybil attack proof.

@synctext
Copy link
Member Author

@grimadas Please define the social profit incentive we just came up with.

@grimadas
Copy link
Contributor

One idea to get further the idea of Custodian responsibility. Some peer with high reputation/balance can vouch(invest) in other peer using part of reputation/balance.
This will give a chance for peers with low balance to increase their reputation, while high reputation can see it as a profit possibility. That might increase total social capital.

@ichorid
Copy link
Contributor

ichorid commented Sep 13, 2019

ISPs provide space for rent in their datacenters directly connected to the backbone. It is possible to strategically place a single node in each of these backbone facilities. Such a network will allow the attacker to emulate almost any latency mapping.

@alexander-stannat
Copy link

Progress Report:

I have solved the problem mentioned above, in which it was our goal to express the investment/cost of a Sybil attack in terms of work, which the attacker would be able to consume after the attack had been carried out. However, this was a problematic definition, as determining image is non-deterministic. It depends on the reputation scores of the sybil nodes, the reputation scores of all other nodes, the allocation policies of the nodes queried for work and the probability of other nodes being in the choice set of the node queried.

I suggested a model with rounds. In each round a proportion q of honest nodes will decide to query another honest for a fixed amount of data, d. The honest node to be queried is chosen at random from a uniform distribution. Hence for an honest node i, we find that the number of nodes in the choice set will be given by the binomial distribution

image

Now, whether a sybil node will be served by the honest node i, is given by the probability of it being the node with highest accounting value in the choice set

image

Now, we determine the random variable as image as the indicator function which is equal to one if node j is the node with the highest accounting value in the choice set of i at round n. Now the amount of work node j may be able to consume from the point of the Sybil attack

image

Now, this model may return a decent value for the return of a Sybil attack. However, it involved a number of significant restrictions. Hence, we came up with another type of model, which might be more informative and easier to determine. Instead of trying to determine the return of a Sybil attack in units of work consumed, we may try to determine the investment of a Sybil attack in terms of accounting values. Recall that the earlier definition of a Sybil attack investment

image

We redefine this definition as follows: Recall the work graph after the Sybil attack, which contains all attack edges as well as all sybils.
image

From this we derive a new third work graph
image
in which all sybils (including the attacker themselves) are collapsed into a single node, i.e. all internal edges of the sybil region are dropped and all outgoing and incoming edges are connected to the attacker j.
image

Now, we determine the investment of a Sybil attack as the reputation the attacker has legitimately received, i.e.
image

This a much cleaner and easier definition of Sybil attack investment which is now easy to compare to the return of a Sybil attack and we can from hereon out easily determine the ratio

image

A question however arises now. Are these two definitions equivalent? Is a Sybil attack that is strongly beneficial in the former definition automatically strongly beneficial in the latter definition as well? The immediate answer to this question is no. There are sybil attacks, in which the amount of reputation that can be gained is bounded while the amount of work that can be consumed as a consequence is not. There are also sybil attacks in which the amount of work that can be consumed is finite while the aggregated reputation is infinite. Hence, we find that whether or not a sybil attack is beneficial highly depends on which definition of sybil attack profit (and investment) we choose.

It is now our goal to make some restrictions on the definition of accounting mechanisms that lead to an equivalence between the two concepts. This will be worked on in the days to come.

@cuileri
Copy link

cuileri commented Sep 23, 2019

Impressive progress. Here are come comments and questions:


However, this was a problematic definition, [...] depends on the reputation scores of the sybil nodes, the reputation scores of all other nodes, the allocation policies of the nodes queried for work and the probability of other nodes being in the choice set of the node queried.

Seuken seemed to overcome this by assuming "all of this (adding sybils to the network, make multiple false reports about created sybils, etc) happens in one step". What I understand here is that is the "marginal" gain of an action of the attacker, assuming all other things remain unchanged. His reasoning seems weak as he only says "we do not need a dynamic, multi-step analysis to study sybil attacks" and I couldn't find any other intuition for it. However I wouldn't say off hand that the "marginal" gain/cost approach is wrong. What do you think? Can't we use the same assumption?


What is q in the following formula?

This distribution would work if you assume all the nodes who are to evaluate (choose/not choose) the honest node have the same number of neighbors.


Now, whether a sybil node will be served by the honest node i, is given by the probability of it being the node with highest accounting value in the choice set

It seems you assumed that the honest node will select its transaction party with the condition of "being the node with the highest accounting value in the choice set". Is there any motivation for this? Why not to use a lottery biased by the accounting value?


Instead of trying to determine the return of a Sybil attack in units of work consumed, we may try to determine the investment of a Sybil attack in terms of accounting values.

Interesting approach. It would be nice to see the difference of approaches on a simple illustrative example.

@synctext
Copy link
Member Author

synctext commented Oct 3, 2019

  • LATEST .PDF of master thesis now 56 pages long (@grimadas please read and comment if possible, today I have a 14:00 meeting with @alexander-stannat; all welcome)
  • https://www.coindesk.com/the-blockchain-paradox, thesis could mention Nobel prize theory in intro
  • please add "proof of work" model before an identity is trusted. Superior idea to the "custodian responsibility", you simply need to work(relay) for X amount of units to gain the trust of a single individual. Identities obtain some cost. Sybil attacks are no longer profitable, each fake identity needs to invest X work units to gain the trust of a single victim. Is there any related theory work to this? Is this deceptively simple idea mentioned in classical game theory?

@synctext
Copy link
Member Author

synctext commented Oct 4, 2019

similarity-vector
Great progress! (took me a while to read in detail). Your works has now reached "scientific quality" I believe.

Note that language is everything. Similarity vector, nice one. Proximity graph, you can make it a stronger concept by giving it a stronger name like clone graph used to estimate suspicious nodes likely controlled by a single entity. Something less strong does not communicate the strength that this concept represent, twin graph, coupling graph, equality graph or duplicates graph. Clone is a nice negative word that destroys individualism.

Please add the "Beta reputation system" from 2002, credited as being the first such proposal in history. How is your model more evolved and powerful?
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.5461&rep=rep1&type=pdf
How does your work differ from belief discounting?

belief-discounting

Finally, please start considering getting your work in article form before the end of this year. We might be able to get into Auckland, New Zealand; see 2019 workshop version https://sites.google.com/view/trust2019/

@alexander-stannat
Copy link

Presentation on Friday went very well.

You find the Powerpoint here.

@synctext
Copy link
Member Author

Mid-term presentation review. Result: happy. Swap "We seek a generic incentive mechanism for human cooperation", discuss this first; then file sharing? Slide 10: cite to related work in small font? Slide 19: challenge of open systems that fake identities are cheap; no central control point or Internet police.

@synctext
Copy link
Member Author

@synctext
Copy link
Member Author

synctext commented Nov 20, 2019

LATEST thesis is now 58 pages long Solid progress:

Corollary 6.1.
Every accounting mechanism SM that satisfies independence of disconnected nodes, symmetry,
single-report responsiveness and weak parallel-report responsiveness, has a strongly beneficial
sybil attack.
Definition 6.17 (Convergence of parallel reports).
7. A Mathematical Framework for Physical Proximity

Good idea now to make each chapter thesis ready.

Key Chapters:

  • 8: A two-layered Trust Model.
  • 9: A mathematical Framework for Trust and Reputation Mechanisms

Chapter 14 now contains great stuff: "Trust and Reputation are equivalent to the flow of water in between containers (equivalently capacitors). In an interaction, reputation will flow from the higher container to the lower one.". In December the thesis might be included in some form in some chapter(s):

  • perfect cooperation algorithms does not exist
    • weakly or strongly beneficial sybil attack
    • or no informativeless
  • Positive bilaterial balance; lack of informativeness (problem with cooperation with strangers)
  • Netflow (needs an Alpha to make it work)
  • Drop-Edge mechanism
  • Personalised pagerank
  • personalised hitting time.
  • priority-based allocation; using "choice set". no strong Sybil proof, very practical!
  • Perfect algorithm tentatively called 2-hop T4T
    • is slow to bootstrap
    • runs parallel, connects with numerous strangers (2-hop) only
    • always net-positive bilaterial connections
    • strong Sybil attack resilience ????

@synctext
Copy link
Member Author

synctext commented Jan 8, 2020

LATEST thesis is now 67 pages long Solid progress:

  • Focus is now on finishing the Heavy Math, text polish later
  • ongoing polishing ("What about overlaps and loops? Needs to be re-evaluated...")
  • integrate chapter 2,3 and 4 possibly?
  • Chapter focus: Requirements for sybil-proof Accounting Mechanisms
  • Please consider if you have created a circular definition as the core of the heavy math? Sybils and non-Sybils execute various algorithms. The circular algorithm relies on the execution by non-sybils to detect other non-Sybils.
  • 9 A mathematical Framework for Trust and Reputation Mechanisms. Note this actually starts with clean slate math definitions of allocation policy and Sybils.
  • within 6 weeks after green light is master defense

@synctext
Copy link
Member Author

@alexander-stannat
Copy link

Yes, please note that the current final thesis is found here.

I transitioned to the official TU Delft thesis template a few months ago. Sorry if this caused any confusion. I will make sure to "tidy up" the repository in the days to come.

@alexander-stannat
Copy link

Thesis finished! Final product can be found here. Thanks a lot everyone.

@alexander-stannat
Copy link

Due to some merge conflicts my final thesis repository has now been moved here.

@synctext
Copy link
Member Author

synctext commented Jul 14, 2020

Lets make this this the main weekly update issue. Target for coming time: a scientific publication. Also some experimental work with single Python script that can handle 120 million blocks?
Grander vision of real-world reputation, going even beyond the master thesis.
ToDo: 24July, write 10-line-ish goal for your 3-month project here please. Deploy a real algorithm?

@qstokkink
Copy link
Contributor

@synctext this thesis has been completed. Can we close the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

8 participants