Tuesday, November 3, 2009

Looking Up Data in P2P Systems


H. Balakrishnan, F. Kaashoek, D. Karger, R. Morris, I. Stoica, "Looking Up Data in P2P Systems," Communications of the ACM, V. 46, N. 2, (February 2003).


One line summary: This paper provides an overview of the various DHT-based data lookup mechanisms for peer-to-peer systems.

Summary

This paper begins by introducing the lookup problem in peer-to-peer (P2P) systems, which is that of finding data in a large P2P system containing a dynamic set of nodes in a scalable manner without introducing any centralization or hierarchy. The paper focuses primarily on solutions based on the distributed hash table (DHT) abstraction, after briefly outlining the problems with solutions involving a central database or hierarchical distributed storage structures. The main benefits of a DHT-based solution are that it has few constraints, including that all data items be uniquely identified with a numeric key and that nodes store data for other nodes. A DHT implements only a lookup operation that takes as input the identifying key of a data item and returns the location of the node that is responsible for that data item. A DHT lookup algorithm must address the following: mapping keys to nodes in a load-balanced way, forwarding lookups to the appropriate nodes, providing a distance function that describes the closeness of keys to each other and nodes to each other, and adaptively build routing tables.

The various DHT-based lookup algorithms reviewed in this paper include CAN, Chord, Kademlia, Pastry, Tapestry, and Viceroy. These vary in the data structures they use to provide O(log N) lookups, their routing dimensions, emphasis, distance functions, various operation costs, degree of fault tolerance and concurrency, routing, and security measures. For example, Chord maintains a skiplist-like data structure, while Pastry, Tapestry, and Kademlia use a tree-based structure. In contrast, CAN uses a d-dimensional Cartesian coordinate space as its abstraction. With respect to distance functions, Chord uses distance in the skiplist that is unidirectional and asymmetric, while Pastry uses prefixes and numeric distances and so is symmetric but not unidirectional, and Kademlia uses an XOR-based function that is both unidirectional and symmetric. Topics for future work in this area include analyzing operation costs when the system is large and joins and departures are frequent, improving routing latency, providing security mechanisms, and implementing efficient indexing and keyword search.

Critique

There isn’t much to say in terms of a critique since this paper is just a brief introduction to and description of various algorithms, but I did find it helpful and thought the writing was nicely structured, so I would keep it in the syllabus.

3 comments:

  1. You seem to be a bit behind in your blogging. Any thoughts on Chord, I3, etc.?

    ReplyDelete
  2. Will try to be all caught up by the end of today!

    ReplyDelete