Sara's Networks Class Blog: 2009

Tuesday, December 1, 2009

BotGraph: Large Scale Spamming Botnet Detection

Y. Zhao, Y. Xie, F. Yu, Q. Ke, Y. Yu, Y. Chen, E. Gillum, "BotGraph: Large Scale Spamming Botnet Detection," NSDI'09, (April 2009).

One line summary: This paper describes a method for detecting web account abuse attacks using EWMA and BotGraph, which creates a user-user graph to find correlations and distinguish bot-users from normal users.

Summary

This paper presents a system for detecting web account abuse attacks. These are attacks in which spammers use botnets to create many fake email accounts through free web email service providers (such as Gmail and Hotmail) from which to send spam. To detect this situation, the authors provide an algorithmic means of finding correlations among bot-users and distinguishing them from real users, as well as a mechanism for efficiently analyzing large volumes of data to reveal such correlations. The authors name their approach BotGraph. The goal of BotNet is to be able to captures spamming email accounts used by botnets.

BotGraph works by constructing a large user graph and analyzing the underlying correlations. The key observation is that bot-users share IP addresses when they log in and send emails. There are two main components of BotGraph – the first tries to detect aggressive signups and the second tries to detect the remaining stealthy bot-users based on their login activities. To detect aggressive signups the authors use a simple Exponentially Weighted Moving Average (EWMA) algorithm that detects sudden changes in signup activity. The rationale for this is that in the normal case, signups should happen infrequently at a single IP address or at least be roughly consistent over time, whereas a sudden increase in signups may indicate the IP address may be associated with a bot. To detect stealthy bot-users, the authors build a user-user graph, in which each vertex is a user and the weight of an edge between two users is based on the number of common IP addresses from which the two users logged in. This is because the authors reason that if aggressive signups are limited, each bot-user will need to log in and send more emails multiple times at different locations. This results in shared IP addresses because firstly, the number of bot-users is often much larger than the number of bots, and secondly, botnets have a high churn rate, so bot-users will be assigned to different bots over time. Since normal users also share IP addresses when dynamic IP addresses and proxies are used, the authors exclude this case by counting multiple shared IP addresses in the same autonomous system (AS) as one shared IP address. Once this graph is constructed, the authors show that bot-user groups distinguish themselves from normal user groups by forming giant components in the graph. The authors then describe a recursive algorithm for extracting the set of connected components from the graph. They prune the resultant set of components to remove normal user groups, as well as group the components by bot-user group, by using two statistics: the percentage of users who sent more than three emails per day and the percentage of users who sent out emails with a similar size.

The authors next describe how they constructed their large user-user graph from 220 GB of Hotmail login data. They use Dryad/DryadLINQ, though they suggest other programming environments for distributed, data-parallel computing might also work. They describe two methods. The first involves partitioning the data by IP address and using the well-known map and reduce operations, whereas the second involves partitioning the data by user ID and using some special optimizations. They find the second method is more suited to their application and also much much faster: 1.5 hours on 240 machines for method 2 compared to over 6 hours for method 1. However, as method 2 has increasing communication overhead it may be less scalable. The authors describe a number of optimizations as well. The authors evaluated their approach on two month-long datasets. BotGraph detected approximately 21 million bot-users in total. It was able to detect 76.84% and 85.80% respectively of known spammers with the graph approach and 85.15% and 8.15% respectively with the EWMA approach. The authors perform a false positive analysis using naming patterns and signup dates and estimate a false postive rate of 0.44%.

Critique

I liked this paper a lot. I’m not sure how much it has to teach about networking per se, not that that’s necessarily a bad thing. It was good that the authors spent some time discussing potential countermeasures attackers might take to their approach. One thing that may other students have mentioned that I’m not sure was mentioned in the paper however is that BotGraph will not detect bot-users all behind one IP address (e.g. NAT) and it is becoming increasingly common in countries like China for there to be very large numbers of machines behind a NAT, so it is something to note. Overall this paper was very good and I think it should be kept in the syllabus. I would be interested to see future work along these lines that use additional features to correlate and other methods besides graphs to detect correlations, as the authors mention.

Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks

R. Gummadi, H. Balakrishnan, P. Maniatis, S. Ratnasamy, "Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks," NSDI'09, (April 2009).

One line summary: This paper presents a system called Not-A-Bot (NAB) that distinguishes human-generated traffic from bot-generated traffic by attesting human-generated traffic at the clients that is then verified at the servers in order to mitigate such problems as spam, DDoS attacks, and click-fraud.

Summary

This paper presents a system called Not-A-Bot (NAB) for identifying human-generated web traffic as distinguished from bot-generated web traffic. The motivation for this is that bots are responsible for a large amount of spam, distributed denial-of-service (DDoS) attacks, and click fraud, so being able to determine if an email or request is human-generated as opposed to bot-generated would help mitigate these problems. NAB consists of an attester and a verifier. When an attestation is requested by an application for a particular request or email, the attester determines if that request or email was indeed generated by a human, and if so, it attaches a signed statement to it that verifies that it was sent by a human. The verifier runs on the server and checks whether the request or email has an attestation and if it is valid. If it does, the application may choose, for example, to prioritize the request, or increase the score of an email so it is more likely to get through a spam filter. Otherwise it may consider the request or email as more likely to have come from a bot.

NAB assumes that applications and the OS are untrusted and so relies on a Trusted Platform Model (TPM) to load the attester code to ensure that it is trusted. As mentioned, the NAB attester decides to grant an attestation if it determines that a human generated the associated request or email. It does this by guessing, using as a heuristic how recently before an attestation request the last keyboard or mouse activity was observed. If the attestation was requested within a certain amount of time since the last keyboard or mouse activity, the attester grants the attestation. An attestation is non-transferrable and is bound to the content of the request it is generated for. An attestation is over the entire content of the application-specific payload and is responder-specific, and where appropriate, challenger-specific. The mechanism for attesting web requests and email in the common case is straightforward. The only complicated case is script-generated email, which requires deferred attestations. The verifier is straightforward and as metioned, implements an application-specific policy. The authors provide several example policies for spam, DDoS, and click-fraud mitigation.

The authors next describe their evaluation of NAB. They evaluate the attester with respect to TCB size, CPU requirements, and application changes. They evaluate the verifier with repect to the extent to which it mitigates attacks and the rate at which it can verify attestations. They find that the attester is 500 SLOC out of 30,000 for the TCB total, that the worst-case latency for generating an attestation is 10 ms on a 2 GHz Core 2 processor, and that modifications to two programs to include attestations required less than 250 SLOC each. For the verifier, they find that the amount of spam can be reduced by 92% with no false positives, that it can reduce the peak processing load seen at mail servers, that it can filter out 89% of bot-generated DDoS activity while not filtering out human-generated requests, and that it can identify click-fraud activity with more than 87% accuracy without filtering out human-generated clicks.

Critique

I didn’t really think that much of this paper. One criticism is that though adding NAB to applications wouldn’t be technically difficult, as the authors explain, you would still have to make sure a lot of applications (i.e. email clients, web browsers, servers, etc.) did include it and then make sure a lot of hosts ran the versions of the applications that included NAB, because it seems it wouldn’t be nearly as useful if not all the clients used it. In their evaluation, all of the client programs ran NAB, but if that weren’t the case it would be less effective. Another criticism or perhaps point of confusion is with regard to their deferred attestations, which are supposed to be used for script-generated emails. I don’t see why attackers couldn’t leverage this to generate attestations for their own spam emails or whatever else they wanted. Another criticism is that with their simple heuristic (guessing that an activity is human-generated if it is within a certain amount of time of keyboard or mouse activity), bots can still generate attestations for its own traffic by, as the authors say, harvesting human activity. This is probably still sufficient for generating large amounts of spam and such. They would probably be better off using the strategy of using specific mouse or keyboard activity to decide whether or not to grant an attestation request, but in the paper they claim that strategy is too complex to implement.

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

S. Nedevschi, J. Chandrashekar, J. Liu, B. Nordman, S. Ratnasamy, N. Taft, "Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems," NSDI'09, (April 2009).

One line summary: This paper examines the value of using proxies to handle idle-time traffic for sleeping hosts with the goal of reducing wasted energy consumption in networked end-systems; it does this by analyzing and classifying traffic to see what can be ignored or automatically handled and by examining several potential proxy designs.

Summary

This paper examines the problem of reducing wasted energy consumption in powered-on but idle networked end-systems such as desktops in home and office environments. It discusses various solutions, and in particular examines the value of using a proxy to handle idle-time traffic on behalf of sleeping hosts. The main problem is that while vendors have built in hardware support for sleep (S-states) to reduce power consumption while idle, surveys of office buildings indicate that the vast majority of machines are fully on while idle, instead of taking advantage of these sleep states. One reason for this is that a sleeping machine loses its network presence i.e. it cannot send or receive network messages, and another reason is that users or administers occasionally want to be able to schedule tasks to run during these idle times. Both of these reasons cause users to not make use of sleep states. This paper thus tries to answer the following questions: (1) Is the problem worth solving? (2) What network traffic do idle machines see? (3) What is the design space for a proxy? (4) What implications does proxying have for future protocol and system design.

To begin to answer these questions, the authors first collect network and user-level activity traces from 250 client machines belonging to Intel employees and attempt to classify the traffic. They first classify each packet as being either broadcast, multicast, or unicast, then whether it is incoming or outgoing. They find outgoing traffic tends to be dominated by unicast while incoming traffic is made of significant proportions of all three. They estimate the potential for sleep in four scenarios: (a) ignore broadcast and wake for the rest, (b) ignore multicast and wake for the rest, (c) ignore both broadcast and multicast, and (d) wake for all packets. They find that broadcast and multicast are mainly responsible for reducing the amount of potential sleep time and that doing away with just one of broadcast or multicast is not effective. The authors next classify the traffic based on protocol type, and evaluate each protocol on two metrics: total volume of traffic, and something the authors call half-sleep time. A high half-sleep time means that protocol’s packets could be handled by waking the machine up, whereas a low half-sleep time means that to achieve useful amounts of potential sleep time the proxy would have to handle them. They find that the bulk of broadcast traffic is for address resolution and service discovery, and a lot of other broadcast traffic is from router-specific protocols. Broadcast traffic allows for very little sleep in the office, but significantly more in the home. A proxy could easily handle most of these broadcast protocols. Multicast traffic is mostly caused by router protocols and is often absent or extremely reduced in homes as compared to offices. All router traffic is ignorable. From their analysis of unicast traffic, they speculate that it might be possible to ignore or eliminate much of unicast traffic. Finally, they classify traffic into one of the following three categories regarding the need to proxy that traffic: don’t wake, don’t ignore, and policy-dependent, and into one of the following three categories regarding the difficulty in proxying that traffic: ignorable/drop, handle via mechanical responses, and require specialized processing.

Next, they present four proxy designs. The first ignores all traffic classified as ignorable and wakes the host for the rest. The second ignores all traffic classified as ignorable, responds to traffic listed as capable of being handled by mechanical responses, and wakes the machine for the rest. The third does the same as the second except that it wakes up for traffic belonging to a certain set and drops any other incoming traffic. Lastly, the fourth does the same as the third except that it also wakes up for a certain set of scheduled tasks. They find that the simplest proxy, proxy 1, is inadequate for office environments and nearly inadequate for home environments, but that proxy 3 achieves a good amount of sleep time in all scenarios – more than 70% of the idle time. They also find that the effectiveness of proxy 2 depends a great deal on the environment. Given this, the best trade-off between complexity of design and power savings depends on the environment. Also, the authors also note that since scheduled wake-ups are infrequent, the impact on sleep is minimal, so proxy 4 performs practically the same as proxy 3. Finally, they offer a basic proxy architecture that could serve as a framework to building the different proxies designs they considered, and to demonstrate the feasibility of building a proxy, they implemented a simple proxy prototype in Click. The authors end by speculating about how systems could be redesigned to make them more power-aware, thereby simplifying the implementation of proxies, making proxies more effective, or eliminating the need for proxies altogether.

Critique

One thing I really liked about this paper is how the authors analyzed and classified network traffic before considering proxy design. In retrospect it seems absolutely necessary to guiding the design of a proxy. It was also just informative in general to look at the traffic traces from the perspective of which packets are ignorable, which can be handled automatically, and which are actually “important”. I also liked how they examined several points in the proxy design space and compared them. Overall I thought this was a very thoughtful and well organized paper and I think it should stay in the syllabus.

Cutting the Electric Bill for Internet-Scale Systems

A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, "Cutting the Electric Bill for Internet-Scale Systems," ACM SIGCOMM Conference, (August 2009).

One line summary: In this paper the authors suggest a new method for reducing the energy costs of running large Internet-scale systems by rerouting traffic on an hourly basis to data centers in regions where the price of energy is cheaper.

Summary

This paper examines a new method for reducing energy costs of running large Internet-scale systems. This method is based on the observations (1) that electricity prices vary on an hourly basis and are not well correlated across different geographic locations and (2) that large distributed systems already incorporate request routing and replication. Considering this, the problem the authors want to solve is: given a set of datacenters or server clusters spread out geographically, map client requests to clusters such that the total electricity cost in dollars of the system is minimized, possibly subject to constraints such as staying under a maximum response time.

The authors first do an empirical market analysis. They verify that prices are not well correlated at different locations, and also note that the price differentials between any two locations generally vary hourly and somewhat unpredictably, suggesting that a pre-determined assignment is not optimal. They examine the overall differential distribution and conclude that there does exist the opportunity to exploit these differentials for savings. To evaluate their method, they use traffic data from Akamai to derive a distribution of client activity and cluster sizes and locations. They then use a simple model to map prices and cluster-traffic allocations to energy prices. They handle the issue of bandwidth costs (i.e. changing assignments of clients to clusters could increase bandwidth costs) by estimating the 95th percentile from the data and constrain their reassignments such that this 95th percentile is not increased for any location. To estimate network performance in their model, they use the geographic distance between client and server. They model the energy consumption of a cluster to be roughly proportional to its utilization. The authors assume the system is fully replicated and that the optimization for cost happens every hour. They then use a simple discrete time simulator that steps through the Akamai usage statistics. At each time step a routing module with a global view of the network allocates clusters, and from this they model each cluster’s energy consumption, and use observed hourly market prices to calculate expenditures.

The authors are able to show that existing systems can reduce energy cost by 2% without significant increase in bandwidth or reduction in client performance. They also find that savings rapidly increase with energy elasticity, which is the degree to which energy consumed by a cluster depends on the load placed on it. Lastly, they find that allowing increased distances between the client and server leads to increased savings.

Critique

This paper was somewhat thought-provoking and brought up some interesting issues. However, what I didn’t really enjoy about this paper is that their model is forced to make so many simplifying assumptions about so many things, such as network performance, cluster power usage, and bandwidth costs, to name a few, that I’m not sure how valid their actual findings turn out to be or how much credence to give them. Along this vein, it would be interesting to know to what extent their results depend on the traffic data they used. Another assumption I didn’t really like is that they assume complete replication in the system in question, so that any request can be routed anywhere. I personally can’t guess how true that is in practice but my inclination is that it is not necessarily true for all Internet-scale systems, though it may be true for many. Furthermore, while in theory the basic idea seems like it would work, the authors do hint at a few complicating factors (that are not necessarily purely computer science problems), and it’s hard to know what others might arise in actually trying to implement something like this. For instance, the authors assume that rerouting traffic in this way would not increase bandwidth prices, but that may not be true, and companies are averse to increasing their bandwidth costs. As another example, in reality, renegotiating existing contracts between companies and cluster operators work if the company rents space in a co-located and between companies and the utility companies to take into account this new method may be difficult. Given all the assumptions the authors are forced to make and other complicating details that they may or may not be aware of, it’s hard to judge whether or not something like this would be feasible in reality. And, if everybody did this, it does make you wonder how that would change things, since there is likely feedback between the number of requests routed to an area and the price of energy there. So in summary, although the authors didn’t really have a choice but to try to make good assumptions or estimates of the various unknowns in their model, I couldn’t help but be frustrated by this aspect of the paper. Regardless, I was still impressed by their model and simulation and found it interesting.

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

S. Floyd, V. Jacobson, S. McCanne, C-G Liu, L. Zhang, "A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing," ACM SIGCOMM Conference, (August 1995).

One line summary: This paper describes Scalable Reliable Multicast, a minimal framework from which applications designers can build multicast functionality suitable to their application; it also describes some analytical results pertaining to request/repair algorithms in multicast, along with several simulations.

Summary

This paper describes a reliable multicast framework called Scalable Reliable Multicast (SRM). SRM builds off of the principles of Application Level Framing (ALF), which “explicitly includes an application’s semantics in the design of that application’s protocol’, and Light-Weight Sessions (LWS), which centers on a “light-weight rendezvous mechanism based on the IP multicast distribution model” with receiver-based adaption. Thus, the SRM is designed to meet only the minimal definition of reliable multicast so as to not force on applications unnecessary overhead for functionality that they do not need. It is also designed to adhere to the core principles of TCP/IP in that it only requires the basic IP delivery model and dynamically adjusts control parameters based on observed performance, much like TCP. The authors argue that receiver-based reliability is more appropriate for SRM than sender-based reliability because the fate-sharing-based coupling of unicast does not generalize well to multicast (due to such factors as the ACK implosion effect and the need for the sender to maintain state about each of the receivers in the receiver set), and because the vocabulary of unicast conventions migrates poorly to multicast. SRM attempts to serve as a skeleton common to scalable, reliable multicast applications that supply the framework with details such as a namespace, policies and mechanisms for apportioning bandwidth, etc.

The authors go on to describe a network conferencing tool that provides a distributed whiteboard called wb, which builds off of SRM. In wb, users are members, each of whom has a globally unique identifier which is used to label pages they create and edit. Wb assumes that all data has a unique name, that the name always refers to the same data, that source-IDs are persistant, that IP multicast datagram delivery is available, and that all participants join the same multicast group. The paper describes wb’s instantiation of SRM. It then describes more generally request/repair algorithms for several simple topologies, including chains, stars, and bounded degree trees. In the context of these algorithms, it defines deterministic suppression of duplicate messages, and probabilistic suppression. They find, via simulations, that their algorithm that uses fixed timer parameters performs well in random or bounded degree trees when every node in the tree is a member of the multicast group. They use this to motivate the development of an adaptive algorithm for request/repair that adjust timer parameters as a function of delay as well as of duplicate requests or repairs in recent recovery exchanges. They demonstrate trade-offs between low delay and low number of duplicates.

Critique

I didn’t really like reading this paper. I don’t really like simulations, for one. Also, that the authors chose what they admit to be arbitrary values for some of their parameters in their algorithm annoys me. Nevertheless, however simplified the topologies in their analysis section, I would probably agree that it is good to have some mathematical results for many problems, in this case those pertaining to request/repair algorithms in multicast, in order to provide some intuition to people implementing actual systems and to help guide their choices. In that vein, I would be interested in knowing more about any work that really directly builds off of or uses their idea of a framework for scalable, reliable multicast and learn in exactly what way this particular paper was useful. On an unrelated note, it also very seriously annoyed me that the graphs in this paper lacked axis or data labels of any kind.

Scalable Application Layer Multicast

S. Banerjee, B. Bhattacharjee, C. Kommareddy, "Scalable Application Layer Multicast," ACM SIGCOMM Conference, (August 2002).

One line summary: This paper describes NICE, a new application-layer multicast protocol intended primarily for low-bandwidth, data streaming applications with large receiver sets.

Summary

This paper describes a new application-layer multicast protocol for low-bandwidth, data streaming applications with large receiver called NICE. The problem with network-layer multicast, as opposed to application-layer multicast, is that it has not been widely adopted by most ISPs, so much of the Internet is incapable of native multicast. In application-layer multicast, network infrastructure is left unchanged, while the multicast forwarding functionality is implemented at end-hosts. Application-layer multicast protocols are less efficient than network-layer multicast protocols because they must send the same packet over the same link. Thus, two good metrics by which to judge application-layer multicast protocols are stress and stretch. Stress is defined per link and is the count of the number of identical packets sent by a protocol over each underlying link in the network. Stretch is defined per multicast group member and is the ratio of the path length from the source to the member along the overlay used by the multicast protocol to the length of the direct unicast path.

The way the NICE protocol works is by logically arranging the multicast group members into a hierarchy. Each host in the hierarchy belongs to a layer, and the hosts in each layer are partitioned into a set of clusters, where each cluster has a cluster leader that is the graph-theoretic center the cluster. The size of each cluster is bounded between k and 3k-1 and clusters consist of hosts that are close to each other. Hosts are organized into layers according to the rule that all hosts are part of the lowest layer, L(0), then all the cluster leaders of all the clusters in layer L(i) join layer L(i+1). A host is allowed to belong to only a single cluster at any layer. If a host is present in layer L(i) it must be a cluster leader in each of layers L(0), ..., L(i-1). If a host is not present in a layer L(i) it cannot be present in any layer L(j) where j > i. Lastly, there are allowed to be at most logkN layers, and the highest layer has only a single member. The supercluster of any host x is defined as the cluster in the next highest layer to which x’s cluster leader y belongs.

The hierarchy is then used to define different overlay structures for control messages and data delivery paths. In NICE, the control topology is a clique and the data topology is a star, but it is possible to choose other topologies. The NICE protocol assumes the existence of a special host that all hosts know of called the Rendezvous Point (RP), and the RP is always the leader in the cluster at the highest layer of the hierarchy. The NICE protocol consists of three main components: (1) initial cluster assignment when a new host joins, (2) periodic cluster maintenance and refinement, and (3) recovery from leader failures. In initial cluster assignment, the joining hosts contacts all the members in the highest layer to identify the member closest to it, then contacts the lower-level cluster of that closest member to find the closest member there, and so on until the hosts finds its L(0) cluster. This process involves O(k log N) query-response pairs. In this process, some the center of some clusters may change, so a new cluster leader must be chosen. To aid in cluster maintenance and refinement, each member of a cluster sends periodic heartbeat messages to the other members of its cluster. There is also a method for cluster splitting and merging when to keep clusters within the size bound, as well as a method for refining attachments in case a host is not able to locate the closest cluster in a layer when joining. Leader failures are dealt with using a remove message if the leader is able to do a graceful-leave, or by detecting failures using the heartbeat messages and selecting a new leader.

To evaluate NICE, the authors first simulated their protocol and compared it to three other schemes: multi-unicast, native IP multicast (CBT), and the Narada application-layer multicast protocol. Their main findings are that NICE data paths have stretch comparable to Narada, the stress on links and routers is lower in NICE, especially as the multicast group size increases, the failure recovery of both schemes are comparable, and this performance is achievable with orders of magnitude lower control overhead for group sizes greater than 32. The authors also implemented the NICE protocol and tested their implementation in a wide-area testbed over a period of a month. They observe a maximum packet loss of 1% as members join and leave the group at random and an average control overhead of less than 1 Kbps for groups of size 100.

Critique

I liked this paper and it was informative to read about application-layer multicast as opposed to network-layer multicast. Even though the performance at the application-layer is, from what I can see, pretty much unavoidably worse than at the network-layer, if the authors are right that ISPs do not tend to support native multicast then that is a good enough reason to use the application-layer multicast despite the performance hit. I thought this paper did a nice job of describing how the NICE hierarchy is built but the authors could have done a better job explaining the control path and data paths used. Overall, as it appears that there are many application-layer multicast protocols that have been developed, this one seemed like as fine a choice as any to read about, though it’s hard to say without having read most of the other application-layer multicast protocol papers. It might be nice to read a survey paper like we did for DHT-based lookup protocols that summarizes the different kinds of multicast protocols and their differences.

Thursday, November 12, 2009

X-Trace: A Pervasive Network Tracing Framework

R. Fonseca, G. Porter, R. H. Katz, S. Shenker, I. Stoica, "X-Trace: A Pervasive Network Tracing Framework," NSDI'07, (April 2007).

One line summary: This paper describes a diagnostic tracing framework for distributed or networked applications called X-Trace that is intended to be used as a debugging tool.

Summary

This paper describes a tracing framework for distributed or networked applications called X-Trace. X-Trace is intended to be used as a diagnostic tool. It works by tagging each application task with metadata including an identifier, and each network operation originating from that task is tagged with that task identifier, along with data relevant to that operation. From a trace of these tags, it is possible to construct a task tree, which can be used to debug networked applications. In X-Trace, the trace request is sent in band, while the collected trace data itself is sent out of band back to the requesting source. The authors implemented X-Trace and tested the performance of an web server and database back-end with X-Trace enabled. They observed a small decrease in performance, but feel that this is justified given X-Trace is meant to be used for debugging. The authors also provide a number of usage scenarios.

Critique

I didn’t think that this paper was a particularly great idea. It doesn’t seem that difficult of a concept to come up with, but it does seem very unlikely that it would actually become widely used because it requires modifications to applications and network protocols. It also would not be very useful if partially deployed, as the authors mention. Other drawbacks that the authors mention include security concerns, non-tree request structures that can’t be captured, the difficulty in distinguishing lost reports from other problems and the difficulty in constructing causal connections if reports are lost, the extra traffic caused by X-Trace, and the difficulty in ascertaining whether X-Trace is actually a useful tool in helping system administrators and other users. The authors treatment of these issues did not really convince me that X-Trace was still a worthwhile consideration. I am not altogether convinced that alternative and potentially currently available diagnostic tools and methods could be just as, if not more, useful, especially considering the potential difficulty in deploying X-Trace. I think something that requires no modifications to existing applications and protocols is vastly more appropriate.

NetFPGA: A Tool for Network Research and Education

G. Watson, N. McKeown, M. Casado. "NetFPGA: A Tool for Network Research and Education," 2nd Workshop on Architecture Research using FPGA Platforms (WARFP), (February, 2006).

One line summary: This paper describes a hardware platform called NetFPGA that is intended to allow students and researchers quickly deploy and debug new networking hardware in an operational network.

Summary

This paper describes NetFPGA, a hardware platform that enables the deployment and development of networking hardware. The goal of this was originally to allow students to gain experience with the link and physical layer sides of networking. It later became interesting for use in network research. NetFPGA went through two versions, with the second improving upon the first in a number of ways. It has been used in several courses and is currently being tested for use in two research projects, Rate Control Protocol (RCP) and shunting for intrusion detection.

Critique

NetFPGA sounds like it would be fun to play with and I wish I could have taken a course that uses it. However I’m not sure why this paper is in the syllabus. I don’t really think it adds anything very useful.

A Policy-aware Switching Layer for Data Centers

D. Joseph, A. Tavakoli, I. Stoica, "A Policy-aware Switching Layer for Data Centers," ACM SIGCOMM Conference, (August 2008)

One line summary: This paper presents a new layer-2 mechanism for supporting middleboxes in datacenters called the PLayer that incorporates pswitches, which allow administrators to specify the sequence of middleboxes a given type of traffic should traverse.

Summary

This paper presents PLayer, a new layer-2 for data centers consisting of policy-aware switches called pswitches. The main goal of the PLayer is to provide a mechanism to support the use of middleboxes in datacenters. The problem with using middleboxes in datacenters today is that (1) they are hard to configure to ensure traffic traverses the correct sequence of middleboxes without physically changing the network topology or overloading other layer-2 mechanisms, which is difficult and error-prone, (2) network inflexibility makes introducing changes difficult and can result in fate-sharing between middleboxes and other traffic flow, and (3) some traffic is often forced to traverse some middleboxes unnecessarily and load balancing across multiple instances of a middlebox is difficult. PLayer seeks to address these problems by providing correctness, flexibility, and efficiency. It does this by adhering to two main design principles: (1) separate policy from reachability and (2) take middleboxes off the physical network path.

The PLayer consists of pswitches, which forward frames according to an administrator-specified policy. Policies define the sequence of middleboxes that a given type of traffic should traverse and are of the form “[start location, traffic selector] → sequence”, where a traffic selector is a 5-tuple consisting of source and destination IP addresses and port numbers and the protocol type. These policies get translated into rules of the form “[previous hop, traffic selector] → next hop”. Administrators specify policies at a centralized policy controller, which disseminates those policies to each pswitch. A centralized middlebox controller monitors the liveness of middleboxes and keeps the pswitches informed of failures. The PLayer achieves requiring minimal infrastructure changes by only requiring switches be replaced, using encapsulation, and being incrementally deployable. It allows the use of unmodified middleboxes and servers by ensuring packets only reach middleboxes and servers in the proper Ethernet format, using only nonintrusive techniques to identify previous hops, and supporting the various middlebox addressing requirements. It supports the use of non-transparent middleboxes, that is, middleboxes that modify frame headers or content, through the use of something called per-segment policies. The PLayer can be enhanced through the use of stateful switches, which store hash tables in order to track the processing of a flow. PLayer guarantees correctness under network, policy, and middlebox churn, as demonstrated in the paper. The end of the paper includes a description of the implementation of pswitches in Click and a validation of the their functionality. It also includes a performance test, and as expected, pswitches achieve less TCP throughput and increased latency, as do middleboxes deployed with them. It concludes with a summary of the limitations of the PLayer. Briefly, these include the introduction of indirect paths since the PLayer directs traffic to off-path middleboxes, difficulty in traffic classification and policy specification, incorrect packet classification when the 5-tuple is insufficient for correct classification, failure of the PLayer if middleboxes are incorrectly wired, the existence of some policies the PLayer cannot support, and the added complexity of pswitches.

Critique

I thought the PLayer seemed like a good idea, and very well-motivated. I don’t think we should have necessarily read this version of the paper, however. Some sections, such as the formal analysis and the appendix, didn’t really add much toward understanding the PLayer, but rather just made the paper an annoyingly long read. With respect to the PLayer itself, it seems that if it could overcome its poor performance when compared to current datacenter switches, it could become pretty competitive. Or, at least the idea of allowing administrators to specify which traffic should traverse which middleboxes in a flexible way requiring minimal changes to existing infrastructure seems like an idea that could be very applicable, and perhaps companies such as Cisco would do well to incorporate some of those ideas into their products.

Internet Indirection Infrastructure

I. Stoica, D. Adkins, S. Zhuang, S. Shenker, S. Surana, "Internet Indirection Infrastructure," ACM SIGCOMM Conference, (August 2002).

One line summary: This paper introduces the Internet Indirection Infrastructure, an overlay architecture for the Internet that is designed to make it easy to implement otherwise difficult to build applications such as anycast, service composition, multicast, and mobility.

Summary

This paper proposes an overlay framework for the Internet called the Internet Indirection Infrastructure (i3) that provides a communication abstraction to facilitate the implementation of applications such as multicast, anycast, and mobility. These are typically difficult to implement in the Internet, which was originally designed to provide unicast point-to-point communication between fixed locations. Many current solutions that provide functionality for one application aren’t general enough to support the other applications. i3 seeks to resolve that, primarily by adding a layer of indirection that decouples senders from receivers. Senders and receivers thus need not be aware of each other’s existence or location. The is achieved by a mechanism in which senders send packets to logical identifiers and receivers express interest in receiving packets destined to that identifier by inserting triggers. Thus, packets are pairs (id, data) and triggers are pairs (id, addr). IDs are m bits long and longest prefix matching is used to match triggers to IDs. A set of servers in the overlay network that makes up i3 is responsible for storing these triggers and forwarding packets to interested receivers. Each ID maps to a unique server. A generalization of i3 allows stacks of identifiers to be used in order to send packets to a series of identifiers, like source routing.

The authors provide examples of how i3 might be used. These include service composition, heterogeneous multicast, server selection, and large scale multicast. They then discuss several design and performance issues. One consideration is that the overlay network on which i3 is built must be robust, efficient, scalable, and stable. Thus, the authors’ prototype implementation of i3 is built on top of Chord. This provides many of the desired properties just mentioned. Some other advantages of i3 including those inherited from the Chord overlay network include that it is incrementally deployable and can provide support for legacy applications. Some of the main concerns regarding i3 are with respect to routing efficiency, hot spots, and security. The authors address each in turn and suggest possible ways to address these concerns. Many of these solutions rely on the concept of private versus public triggers. Public triggers are knowable by all end hosts in the system and are used to exchange private triggers with which subsequent communication occurs. This concept can be used to address routing efficiency concerns. The main problem here is triangle routing: the server on which a trigger is stored and through which packets are forwarded may be far away from either of the end hosts. To address this the authors suggest that end hosts probe servers to find the closest ones and choose private triggers that will be stored on these nearby servers. One security concern, eavesdropping, can also be addressed by the use of private triggers. Hot spots can be addressed by replicating triggers across more than one server. Trigger hijacking can be prevented by inserting two triggers that chain together such that the shared part of each trigger is only known to the host itself. DoS attacks can be addressed by a series of mechanisms, including challenging hosts that insert triggers to verify they intended to insert the trigger, having servers use Fair Queuing, and implementing loop detection. The authors argue that these measures make security in i3 no worse than in today’s Internet. The authors provide some simulation results, and as expected, there is some latency stretch associated with routing over i3. They also experiment on their prototype, examining the latency for trigger insertion, packet forwarding, internal i3 routing, and throughput.

Critique

I liked how the authors of this paper built upon many of the research ideas we’ve read about and really integrated a lot of the “received wisdom” into their system (e.g. indirection good). I thought they did a nice job of trying to address every potential criticism of i3 by suggesting possible solutions. However, in practical terms, I think the performance and security issues associated with i3 are in reality too much work to fix and the stigma or concern over them too great for i3 to ever actually be used. But I’m getting pretty used to the idea that most research ideas that are published will never actually see real use, at least in the form in which they are initially published, even if they seem like the greatest thing ever invented. I suppose a lot of times it is enough that the underlying ideas and contributions make it out there somehow, whether as components of or inspiration for other systems.

DNS Performance and the Effectiveness of Caching

J. Jung, E. Sit, H. Balakrishnan, "DNS Performance and the Effectiveness of Caching," IEEE/ACM Transactions on Networking, V. 10, N. 5, (October 2002).

One line summary: This paper studies the Domain Name System with respect to client-perceived performance and the effectiveness of caching by analyzing several sets of real trace data including DNS and TCP traffic.

Summary

This paper examines DNS performance from the perspective of clients by analyzing DNS and related TCP traffic traces collected at MIT and the Korea Institute of Science and Technology (KAIST). It seeks to answer two questions: (1) what the performance perceived by clients in terms of latency and failures is, and (2) how varying the TTL and cache sharing impacts the effectiveness of caching. The paper first provides an overview of DNS. It then explains the methodology. The study uses three sets of traces that include outgoing DNS queries and responses and outgoing TCP connections. It collects various statistics on these traces for analysis, including, for a given query, the number of referrals involved, the lookup latency, and whether the query was successful. The dominant type of query the authors observed was A, at approximately 60%, followed by PTR at approximately 30% and MX and ANY making up a roughly equal share of the remainder. Only 50% of the queries were associated with TCP connections and 20% of the TCP connections were not preceded by DNS queries.

The rest of the paper is organized with respect to the two main questions it attempts to answer. First, they examine client-perceived performance. They find median lookup latencies of 85 and 97 ms. 80% of lookups are resolved without any referrals, and the number of referrals increases the overall lookup latency. About 20-24% of lookups receive no response. Of the answered lookups, 90% involved no retransmissions. Queries that receive no referrals generate much more wide-area traffic than those that do. These last three facts suggest that many DNS name servers are too persistent in their retry strategies. They also find that most of the negative responses are due to typos and reverse lookups for addresses that don’t have a host name and that negative caching does not work as well as it should, probably because the distribution of names causing negative responses is heavy tailed. They find 15-18% of lookups contact root or gTLD servers and 15-27% of such lookups receive negative responses. Next they examine the effectiveness of caching. Because the distribution of domain name popularity is very long tailed, many names are accessed just once, so having a low TTL doesn’t hurt because caching doesn’t really help. However, not caching NS records would greatly increase loads to root and gTLD servers, so TTL values should not be set too low for these. Also due to the distribution of domain name popularity, sharing caches does does not increase the hit rate very much after a certain point, because while some names are very popular, the rest are likely to be of interest to just one host. They also found that increasing TTL value past a few minutes does not greatly increase the cache hit rate because most cache hits are produced by single clients looking up the same server multiple times in a row. This summarizes most of their main findings.

Critique

I always really like papers that examine systems like DNS in this way because I think it’s interesting that even though we built it, we don’t always understand and aren’t able to predict the kinds of observations that we later make in studies like this. As always, it would be interesting again to go back and do a similar study now that about a decade of time has passed and see what turns up. One thing in this paper that I am confused about is how the authors could have found that such a significant fraction of lookups never receive an answer. I don’t think they provide much in the way of an explanation for this, if I remember correctly. Are these unanswered lookups just due to dropped packets? Because that seems like a lot of dropped packets, and I guess I find that surprising, although I have no real basis for thinking so.

Development of the Domain Name System

P. Mockapetris, K. Dunlap, "Development of the Domain Name System," ACM SIGCOMM Conference, 1988.

One line summary: This paper describes the design and deployment of the Domain Name System and reflects on its surprises, successes and shortcomings from the perspective of some years after its initial introduction.

Summary

This paper discusses the design and development of the Domain Name System (DNS), candidly reflecting on its history and future. DNS provides the naming service for the Internet that primarily maps hostnames to IP addresses, among other things. Prior to the development and adoption of DNS, the method for disseminating the mappings of hostnames to IP addresses was the HOSTS.TXT file, centrally maintained on a server at SRI and distributed to machines on the Internet via file transfers. As the number of hosts on the Internet grew it became clear that this approach was not scalable. Thus another was needed, and the one developed was the DNS. The design goals of the DNS were (1) it must provide the same information as the old HOSTS.TXT system, (2) it must be able to be maintained in a distributed manner, (3) it must have no obvious size limits for names and associated data, (4) it must interoperate across the Internet, and (5) it must provide reasonable performance. In the light of these constraints, it had to be extensible and avoid forcing a single OS, architecture, or organization onto its users. Initially, the design of DNS tried to balance between being very lean and being completely general.

DNS is a hierarchical system. The main components of this system are name servers and resolvers. Name servers contain mappings and answer queries. Resolvers provide the client interface, including algorithms to find a name server to query for the information desired by the client. The DNS name space is a tree, with each node having an associated label that is the concatenation of all the labels on the path from the root of the tree to that node. Labels are case-insensitive and the zero-length label is reserved for the root. DNS decouples the tree structure from implicit semantics in order to provide applications with more choices. Each name in the DNS is stored as a resource record (RR) and has an associated type and class field along with data. Some of the more common types include A records that map hostnames to IP addresses, PTR records that provide a reverse map from IP addresses to hostnames, and NS records that map zones to name servers. A zone is a contiguous section or subtree of the namespace that is controlled by specific organizations; for example, UC Berkeley controls berkeley.edu. The controlling organization is responsible for delegating subzones, maintaining the data of that zone, and providing redundant name servers for that zone. In addition through distributing data through these zones, DNS name servers and resolvers also cache RRs, which are removed from the cache after their TTL expires.

In addition to describing the basics of DNS, the paper describes its early implementation status and its first deployment in the Berkeley network. This was apparently painful but necessary. The paper then goes on to describe a number of surprising issues in the operation of the DNS. The first was that the assumption that the form and content of the information stored by the DNS was a poor one. A second is related to performance. The performance of the underlying network was initially much worse than the designers of DNS anticipated, and as a result DNS query performance was poor. Also, they found it difficult to make reasonable measurements of DNS performance due to the interference of unrelated effects. The third surprise was the high amount of negative responses to queries, leading to the need for caching such responses, which is referred to as negative caching. There were a number of very successful aspects of DNS as well. These include the choice of using a variable-depth hierarchy, the organization-oriented structure of names, the use of UDP to access name servers, the feature that allows a name server to include additional data in response to a query if it sees fit, allowing it to anticipate some requests, the use of caching, and the agreement to use organizationally structured domain names for mail addressing and routing. Lastly, shortcomings of the DNS include difficulty in defining new classes and types of RRs, difficulty in upgrading applications to use DNS, and difficulty in conveying to system administrators the correct way of configuring and using DNS. The paper concludes by offering several pieces of advice and describing some future directions for DNS.

Critique

I really liked this paper; I usually always enjoy papers that provide historical perspective and reflection on well-known systems, like this one and the “design of the DARPA Internet protocols” paper that we read at the beginning of the semester. I especially thought that a lot of the lessons learned and resulting advice contained in this paper was just good system-building advice, plain and simple. Some particular gems I liked include, “Documentation should always be written with the assumption that only the examples are read,” “It is often more difficult to remove functions from systems than it is to get a new function added,” “The most capable implementors [sic] lose interest once a new system delivers the level of performance they expect; they are not easily motivated to optimize their use of others’ resources or provide easily used guidelines for administrators that use the systems,” and “Allowing variations in the implementation structure used to provide service is a great idea; allowing variation in the provided service causes problems.”

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications

I. Stoica, R. Morris, D. Karger, F. Kaashoek, H. Balakrishnan, "Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications," ACM SIGCOMM Conference, 2001.

One line summary: This paper presents Chord, a protocol for a distributed lookup service built on a distributed hash table abstraction.

Summary

This paper presents a distributed lookup protocol called Chord that is simple and has provable correctness and performance properties. It is based on the distributed hash table (DHT) abstraction. It addresses the problems of load balancing, decentralization, scalability, availability, and flexible naming. Chord specifies how to map key/value pairs to storage nodes. It does this through the use of consistent hashing. This works as follows: each of the nodes participating in Chord is assigned an identifier based on the hash of its IP address. Each key also has an identifier that is the hash of the key. The identifiers form an identifier circle that has 2m places in it, where m is the length of the identifier. The identifier of a node determines its place in the circle. A key is assigned to the first node that has an identifier equal to or following it on the circle. The node that the key is assigned to is responsible for storing the corresponding key/value pair. Consistent hashing is used for this because it theoretically ensures that key/value pairs are roughly evenly distributed across nodes.

Each node in the circle stores a pointer to its successor so that queries for a key/value pair can be routed to the correct node. Each node also stores a finger table in which entry i contains a pointer to the node that is at least 2i-1 places ahead of it on the circle. As an optimization, each node can also store a pointer to its predecessor. The result of this setup is that a query for a key/value pair takes O(log N) steps to answer, where N is the number of nodes. When a node joins Chord it is logically placed in the spot on the circle to which its identifier maps. The predecessors and fingers of the new node and the other nodes are updated and all key/value pairs with identifiers equal to or following the new node’s identifier are transferred from the node where they were previously stored to the new node. To join the circle, a node only need know the identity of one other node already in Chord. From that node it can learn its place on the circle. Chord periodically runs a stabilization procedure in which each node verifies its successor. To deal with failures, each Chord node keeps a list of its n successors so that it can find a new one when its old one fails.

The paper next provides an evaluation of Chord. The authors first find that due to the non-uniform distribution of node identifiers, keys are not evenly mapped across the nodes, so they suggest a workaround involving virtual nodes. They next show that the average path length traversed by a query is ½ log N, where N is the number of nodes. They find that Chord experiences virtually no lookup failures due to node failures and very few lookup failures (~3%) for lookups that occur during stabilization. Lastly, they find that lookup latency grows slowly with the number of nodes.

Critique

I liked reading this paper because I’ve seen Chord used in a number of other systems. Briefly, I have only two main criticisms of this paper. First, the load balancing or distribution of keys among Chord nodes without using virtual nodes is disappointing but foreseeable. Second, it might be nice if Chord provided some optional automatic or built-in way of replicating keys across nodes. It seems straightforward to do so it would be nice if it were just automatically provided instead of left to the user to do.

Tuesday, November 3, 2009

Looking Up Data in P2P Systems

H. Balakrishnan, F. Kaashoek, D. Karger, R. Morris, I. Stoica, "Looking Up Data in P2P Systems," Communications of the ACM, V. 46, N. 2, (February 2003).

One line summary: This paper provides an overview of the various DHT-based data lookup mechanisms for peer-to-peer systems.

Summary

This paper begins by introducing the lookup problem in peer-to-peer (P2P) systems, which is that of finding data in a large P2P system containing a dynamic set of nodes in a scalable manner without introducing any centralization or hierarchy. The paper focuses primarily on solutions based on the distributed hash table (DHT) abstraction, after briefly outlining the problems with solutions involving a central database or hierarchical distributed storage structures. The main benefits of a DHT-based solution are that it has few constraints, including that all data items be uniquely identified with a numeric key and that nodes store data for other nodes. A DHT implements only a lookup operation that takes as input the identifying key of a data item and returns the location of the node that is responsible for that data item. A DHT lookup algorithm must address the following: mapping keys to nodes in a load-balanced way, forwarding lookups to the appropriate nodes, providing a distance function that describes the closeness of keys to each other and nodes to each other, and adaptively build routing tables.

The various DHT-based lookup algorithms reviewed in this paper include CAN, Chord, Kademlia, Pastry, Tapestry, and Viceroy. These vary in the data structures they use to provide O(log N) lookups, their routing dimensions, emphasis, distance functions, various operation costs, degree of fault tolerance and concurrency, routing, and security measures. For example, Chord maintains a skiplist-like data structure, while Pastry, Tapestry, and Kademlia use a tree-based structure. In contrast, CAN uses a d-dimensional Cartesian coordinate space as its abstraction. With respect to distance functions, Chord uses distance in the skiplist that is unidirectional and asymmetric, while Pastry uses prefixes and numeric distances and so is symmetric but not unidirectional, and Kademlia uses an XOR-based function that is both unidirectional and symmetric. Topics for future work in this area include analyzing operation costs when the system is large and joins and departures are frequent, improving routing latency, providing security mechanisms, and implementing efficient indexing and keyword search.

Critique

There isn’t much to say in terms of a critique since this paper is just a brief introduction to and description of various algorithms, but I did find it helpful and thought the writing was nicely structured, so I would keep it in the syllabus.

Friday, October 30, 2009

Active Network Vision and Reality: Lessons from a Capsule-Based System

D. Wetherall, "Active Network Vision and Reality: Lessons from a Capsule-Based System," 17th Symposium on Operating Systems Principles," (December 1999).

One line summary: This paper examines some of the points of debates concerning active networks by describing the experience of designing and implementing an active network toolkit called ANTS; it explores some of the benefits and potential uses of ANTS and also mentions the practical difficulties ANTS raises concerning performance and security.

Summary

This paper studies active networks by designing and implementing the ANTS active network toolkit. The paper first defines active networks as a new approach to network architecture in which customized code for applications is executed inside the network as opposed to at the end-hosts. They are interesting because of their potential to be used for creating new Internet services, but controversial due to performance and security concerns. In this paper the authors contrast their experience with ANTS with the original vision behind active networks specifically as it pertains to the following three areas: (1) the capsule model of programmability, (2) accessibility of that model, and (3) the applications that can be constructed with that model. The components of ANTS include capsules, which users send along the network like packets, but which contain code that is executed at intermediate nodes along the path, which are programmable routers called active nodes. Capsules implement a custom forwarding routine and so direct themselves and subsequent packets through the network using the routine they implement. The reference implementation of ANTS was written in Java. Any user can develop an application with ANTS, which provides an API for querying node state, routing via shortest paths, and placing data in a temporary soft-store. Once the code is written it is signed by a trusted authority and put in a directory service and registered with a local active node for distribution.

The paper then considers the three areas pertaining to the vision behind active networks in turn. (1) With respect to capsules, the authors argue that it is feasible to carry capsules by reference and load them on demand and that their intrinsic processing overhead is low, but that since node capabilities are heterogeneous, it is important that not all nodes be slowed by capsule processing, resulting in only a partially active network. However, forwarding performance by capsules is very bad. (2) With respect to the question of who can use active networks to introduce new code into the network, security is obviously a major concern. The authors demonstrate that they were able to isolate code and state in ANTS in a way that is similar or better to static protocols used today, but that they handle the problem of global resource allocation using a certification mechanism, which has several drawbacks. (3) Lastly, in terms of services that ANTS can be used to introduce, such services tend to be ones that are typically difficult to deploy, such as multicast, anycast, and explicit congestion notification. ANTS can most compellingly be used for rapid deployment and experimentation, but the authors have yet to find a “killer app” that necessitates the use of ANTS. In order to use ANTS, a service must work under the following constraints: it must be able to be expressed using a restricted API, and it must be compact, fast, and incrementally deployable. The authors offer other observations given their experiences, touching on several points that they note all protocols must deal with in some way. There observations are that ANTS can help to enable systematic changes to protocols in the network, that it handles heterogeneous node capabilities in a clean way, that contrary to common concerns, ANTS does not generally violate the end-to-end argument, and that changes to network services must be localized in order to be easily deployable.

Critique

One thing I liked about ANTS is that it is one solution that at least talks about and attempts to address this issue of changing current Internet protocols and introducing new ones. It is sad how much of networking research never actually gets put into use just because of the difficulty of making changes or additions to the existing architecture. However, while it’s a nice idea, in reality the performance and security issues are too big to imagine ANTS ever being used in reality. One thing that annoyed me about this paper is how several times the author tried to write off the poor performance of ANTS by blaming it on Java, but at the same time credited many of the security benefits to the use of Java. It kind of seemed like the author was trying to have it both ways by saying some performance problems could be overcome by not using Java, but failing to mention that not using Java or something like it may also introduce greater security problems. It was for the best that the paper was written from the perspective of trying to take a closer look at some of the criticisms that people have raised concerning the original vision behind active networks, rather than arguing that ANTS should actually be used in the real world, since the performance and potentially the security problems are just too bad to suggest otherwise. I approve of the author framing the motivation and contribution in the way he did.

Resilient Overly Networks

D. Andersen. H. Balakrishnan, F. Kaashoek, R. Morris, "Resilient Overly Networks," 18th Symposium on Operating Systems Principles, (December 2001).

One line summary: This paper describes a Resilient Overlay Network architecture which is intended to run at the application layer over the existing Internet infrastructure to provide path failure detection and recovery as well as mechanisms for applications to choose their own routing metrics and implement expressive routing policies.

Summary

This paper describes Resilient Overlay Network (RON), an application-layer overlay that exists on top of the existing Internet. The goal of RON is to allow Internet applications to detect and recover from path failures and performance degradation quickly. That is, a RON seeks to (1) provide failure detection and recovery within 20 seconds, (2) integrate routing more tightly with applications by letting applications choose their own routing metrics, and (3) allow expressive routing policies relating to acceptable use polices and rate controls.

A RON consists of a set of nodes deployed at various locations on the Internet and each path between two RON nodes is represented as a single virtual link. An application that communicates with a RON node is a RON client that interacts with the RON node across an API. Each client instantiates a RON router that implements a routing protocol and a RON membership manager that keeps track of the list of RON members. By default, RON maintains information about three metrics for each virtual link: latency, loss rate, and throughput. It obtains this information via probes and shares it with all RON nodes by storing it in a central performance database. It also uses these probes to detect path outages. RON uses the data collected from these probes and stored in the performance database in a link-state routing protocol to build routing tables according to each metric. When it detects a path outage, RON uses single-hop indirection to route over a new path. RON provides policy routing by allowing administrators to define types of traffic allowed on certain links and it implements this policy routing using packet classification methods. RON manages membership via either a static mechanism that reads the list of members from a file or a dynamic mechanism that uses announcement flooding and stores the discovered members in soft-state at each node. One important note about RONs is that they are intended to consist of only a small number of nodes (approximately less than 50) because the many of the mechanisms used will not scale, but RON can still provide the intended benefits with only a small number of nodes.

The authors evaluated RON by deploying two wide-area RONs, one with 12 nodes (RON1) and one with 16 nodes (RON2). They found that RON1 recovered from path outages 100% of the time and RON2 60% of the time. (The lower figure from RON2 is in part due to a site becoming completely unreachable.) RON is able to detect and share information about a virtual link failure in an average of 26 seconds. It does this with an overhead of 30 Kbps of active probing. RON was also able to route around flooding links and improve performance in a small number of cases. The authors were also able to show that RON routes are relatively stable, with the median length of time for a route to persist being about 70 seconds.

Critique

RONs are a somewhat interesting idea and I do think the problem they address – that of allowing applications to recover from path failures because BGP is very slow at doing so and of allowing applications to choose their own routing metrics – is compelling. However, there are several big issues with RONs, the two most obvious being poor performance and inability to scale. With respect to the first, RON’s IP forwarder that the authors implemented could only forward packets at a maximum rate 90 Mbps. Also, RON was only able to select paths with better performance with respect to the default metrics a very small amount of the time. RON also introduced a fair amount of overhead due to its use of flooding. This also relates to its inability to scale. However, I still think many of the ideas behind RONs are useful.

Tuesday, October 27, 2009

On Inferring Autonomous System Relationships in the Internet

L. Gao, "On Inferring Autonomous System Relationships in the Internet," IEEE/ACM Transactions on Networks, V. 9, N. 6, (December 2001), pp. 733-745.

One line summary: This paper provides a heuristic for classifying relationships between autonomous systems in the internet into provider-customer, peering, and sibling relationships and evaluates the inferences of the heuristics using AT&T internal data as well as the WHOIS lookup service.

Summary

This paper proposes a method for classifying relationships between Autonomous Systems (ASs) into provider-customer, peering, and sibling relationships. These relationships are based on routing and commercial contractual relationships between administrative entities. The classification is tested using BGP routing table data collected via RouteViews. It is verified using AT&T internal information about its ASs and their relationship with neighboring ASs, as well as by using the WHOIS lookup service. The paper first reviews BGP routing and policies. BGP speaking routers in an AS enforce import policies to determine which paths to include and use in their routing tables, and export policies to determine which paths to advertise to neighbors. The relationships between an AS and its neighbors determine what export policies the AS enforces. In general, ASs follow the selective export rule, which basically states that an AS will not export its provider or peer routes to its providers or peers and it will export customer, provider, and peer routes with its siblings and customers. This rule leads to the property that the AS path of a BGP routing table entry will be valley free. For a valley free AS path there is an uphill portion of the path that consists only of customer-to-provider or sibling-to-sibling edges and a downhill path that consists only of provider-to-customer or sibling-to-sibling links. The uphill and downhill paths may be connected by a peer-to-peer link. From these definitions it is possible to define maximal uphill and maximal downhill paths. In addition, the uphill top provider is defined as the AS with the highest degree, or largest size, among all the ASs in the maximal uphill path, and the downhill top provider is defined analogously. The overall top provider is the AS with the highest degree among the uphill and downhill top providers.

Given these considerations, the basic heuristic of used to classify edges between ASs is as follows: it looks through the AS path of each BGP routing table entry. It defines the highest degree AS as the top provider. The consecutive AS pairs before the top provider are then either customer-to-provider links or sibling-to-sibling links, and those after the top provider are either provider-to-customer links or sibling-to-sibling links. A refined version of the heuristic attempts to reduce the possibility of incorrect inference from routing misconfigurations by only concluding a transit relationship exists if a certain number of routing table entries imply that. The final version of the algorithm attempts to identify peering relationships by identifying potential peers based on similarity of size (i.e. it assumes the degree of two ASs that have a peering relationship do not differ by more than a certain factor, R, that must be fine-tuned) and concludes that two ASs have a peering relationship if neither of the two provides transit for the other.

The author then compares the inferences of the algorithms with AT&T internal information. The algorithm tends to correctly infer customer relationships correctly most of the time and peering relationships a large part of the time, but tends to incorrectly infer sibling relationships. The author gives several reasons for incorrectly inferring sibling relationships, including router misconfigurations, unusual AS relationships, and heuristic inaccuracy.

Critique

I didn’t like this paper as much as the last one. I thought the author did a nice job of laying out the reasoning behind inferring the AS relationships but it could have been explained in fewer words; I also thought too much space was spent on reviewing BGP routing and policies. The heuristics didn’t do very well and they seemed pretty basic, so work that improves upon these methods would be interesting to see. I question the heuristics’ reliance on the degree or size of an AS as a way of classifying relationships, for instance, in determining peering relationships. And related to this, as the author says, the necessity of a parameter like R for determining how different in size two ASs have to be to be disqualified from consideration as peers is unfortunate. I am curious as to why the heuristic did so poorly in classifying sibling relationships but not the others. Although reasons were given they weren’t entirely satisfying and they could certainly be explored in more depth.

On Power-Law Relationships of the Internet Topology

M. Faloutsos, P. Faloutsos, C. Faloutsos, "On Power-Law Relationships of the Internet Topology," ACM SIGCOMM Conference, (September 1999).

One line summary: This paper provides three power laws that describe fundamental properties of the Internet topology, as well as several novel metrics and methods for estimating them, all of which are useful for developing and analyzing new Internet protocols and creating artificial models of the Internet for simulation purposes.

Summary

In this paper, the authors discover three power laws that describe various graph properties of the Internet topology. According to the authors, these power laws are very useful in three respects: (1) they aid in the design of efficient network protocols that can take advantage of the knowledge of the topological properties of the Internet provided by the power laws, (2) they can help researchers create more accurate artificial models of the Internet for simulation purposes, and (3) they can allow researchers to derive estimates of various topological parameters that will be useful for protocol analysis and speculation about future Internet topology. Power laws are expressions of the form y∝xa, which reads “y is proportional to x to the a,” where a is a constant and x and y are measures of interest. The important observation of the authors’ work is that there is some exponent for each graph instance, although the exponents can change over time.

To derive the power laws, the authors used three Internet instances collected at six-month intervals over a period of one year, 1998. These graphs are at the inter-domain level, where each node is a domain or an AS. They also use another graph from 1995 that is at the router level. The authors note that previously used metrics tend to be based on average, minimum, and maximum values, which are not good descriptors of skewed distributions such as those found in studying Internet topology. They also note that it would be useful to be able to characterize a certain aspect of a graph or plot using just one number, to facilitate comparisons. With this in mind, the authors propose several graph metrics for use in defining their power laws: f(d) is the frequency of outdegree d, or the number of nodes with outdegree d, r is the rank of a node, which is the index in the sequence of all nodes ordered by decreasing outdegree, P(h) is the total number of pairs of nodes within h or fewer hops, including self-pairs and counting all other pairs twice, and NN(h) is the average number of nodes in a neighborhood of h hops. They also define λ as the eigenvalue of the graph and i the order of an eigenvalue.

Given this, we can summarize the authors’ power laws. The first power law (rank exponent) states that the outdegree d of a node is proportional to its rank r to the power of a constant. The authors provide a method for estimating this constant. The difference in this constant between the interdomain graphs and the router graph suggests that it can distinguish between graphs of different types. Intuitively, the authors state that this law captures the trade-off between the gain and cost of adding an edge from a financial and functional point of view. The second power law (outdegree exponent) states that the frequency f of an outdegree d is proportional to the outdegree d to the power of a constant. The value of this constant is practically the same across all graphs which suggests that this law describes a fundamental property of the Internet topology, and that it could be useful for testing the realism of a graph intended to represent Internet topology. This power law indicates that the distribution of outdegree is not arbitrary and that lower degrees are more frequent. The third power law (eigen exponent) states that the eigenvalues of a graph are proportional to their order to the power of a constant. Again, as with the first law, this constant is nearly the same for the interdomain graphs but different for the router graph, suggesting that it could be useful for distinguishing among different types of graphs.

Lastly, the authors also describe their novel way to characterize graph connectivity, stated as an approximation (hop-plot exponent): the total number of pairs of nodes P(h) within h hops is proportional to the number of hops h to the power of a constant. This constant also allows for distinguishing among different families of graphs using a single number, and the authors give several examples. They also show how this approximation is useful for estimating how many hops are required to reach a sufficiently large part of the network. In conjunction with this approximation, the authors also provide definitions for the effective diameter of a graph and for the average neighborhood size. The effective diameter is the distance that any two nodes are from each other with high probability. The average neighborhood size is a common parameter used in analyzing the performance of network protocols, and the authors demonstrate the superiority of their estimation method. Finally, the authors conclude by giving several examples of how these power laws could be useful and by arguing that the power laws must characterize fundamental properties of the Internet due to the statistical unlikelihood of their several graphs obeying these laws simply by chance.

Critique

It was nice change of pace to read a paper with a more theoretical bent. I mostly liked this paper. It would be interesting to try and derive the exponents for the power laws using graphs of today’s Internet and see how it compares to those in this paper or see if the power laws even still hold now, especially considering the trend of content providers directly connecting to customers and circumventing the large backbone providers as we discussed in class. One critique of this paper is that they should have used more data. Another thing I would have liked to see is more visual examples of graphs with the kinds of properties the paper highlights via the power laws.

Sunday, October 18, 2009

White Space Networking with Wi-Fi like Connectivity

P. Bahl, R. Chandra, T. Moscibroda, R. Murty, M. Welsh, "White Space Networking with Wi-Fi like Connectivity", ACM SIGCOMM Conference, (August 2009).

One line summary: This paper describes a system for UHF white space wireless networking called WhiteFi that addresses the three main challenges unique to white space networking: spatial variation, temporal variation, and spectrum fragmentation in the available white space channels.

Summary

This paper presents a system for UHF white space wireless networking called WhiteFi. White spaces are the unused portions of the UHF spectrum that have recently been opened for use by unlicensed devices subject to the constraint that such devices not interfere with incumbent uses such as TVs broadcasts and wireless microphone transmissions. As a consequence of this, white space networks differ in three major ways from traditional wireless networks: spatial variation, spectrum fragmentation, and temporal variation. White space networking involves spatial variation in the availability of portions of the white space spectrum because the presence of incumbents varies over the wide area as well as on a finer scale. It involves spectrum fragmentation due to the presence of incumbents occupying certain UHF channels; fragmentation varies across area and this implies the need for using variable channel widths. Lastly, white space networking involves temporal variation largely because of the use of wireless microphones, which can be turned on and off at any time, and white space devices must switch channels when they detect a wireless microphone in that channel.

WhiteFi is supported on the KNOWS hardware platform, which consists of a PC, a scanner, and a UHF translator. Two key features of this platform are support for variable channel width use and a primary user signal detection algorithm called Signal Inspection before Fourier Transform (SIFT). Three key components of WhiteFi that build upon this are a novel spectrum assignment algorithm, SIFT for discovering white space wireless access points (APs), and a chirping protocol that permits indication of disconnection from a channel due to the appearance of an incumbent, without interfering with the incumbent. The spectrum assignment algorithm is adaptive and client-aware, picking a channel and a channel width that is free for all clients. It uses a spectrum map to indicate the presence of incumbents, an airtime utilization map to indicate the degree of utilization of each UHF channel, and control messages between the clients and AP containing these maps to share the necessary information. It uses this information along with channel probes to compute the multichannel airtime metric (MCham), which is roughly a measure of the aggregate bandwidth of given selection, and which it uses as the basic for channel selection. AP signals are detected by sampling bands of the UHF spectrum using SDR and performing an efficient time-domain analysis of the raw signal using SIFT. Sudden disconnections due to the appearance of an incumbent on a channel that an AP-client pair is using for communication is dealt with using a chirping protocol, which involves sending beacons about the white spaces now available over a backup channel.

In the evaluation of WhiteFi, several things are demonstrated. The first is that SIFT accurately detects packets over varying channel widths even with high signal attenuation, missing in the worst case at most 2%. The second is that the AP discovery algorithms are effective. The next is that WhiteFi correctly handles disconnections using its chirping protocol. The last is that WhiteFi’s channel selection algorithm adapts quickly and makes near-optimal selections to operate on the best available part of the spectrum. This last point is demonstrated via simulation.

Critique

This was one of my favorite papers that we’ve read so far (along with the classics). To me at least this seemed like an awesome feat of engineering and it’s obviously very useful technology so and exciting to think of it becoming available. I actually quite enjoyed reading it. Obviously there will be a lot of interesting work in the future that will build off of what they did here. I checked to confirm my suspicion that it won Best Paper Award at SIGCOMM this year so I like to think that somewhat legitimizes my raving about it.

Saturday, October 17, 2009

Interactive WiFi Connectivity For Moving Vehicles

A. Balasubramanian, R. Mahajan, A. Venkataramani, B. N. Levine, J. Zahorjan, "Interactive WiFi Connectivity For Moving Vehicles", ACM SIGCOMM Conference, (August 2008).

One line summary: This paper describes a protocol for providing WiFi connectivity to moving vehicles called ViFi.

Summary

This paper introduces ViFi, a protocol for providing WiFi connectivity to moving vehicles with minimal disruptions. ViFi achieves this by allowing a vehicle to use several base stations (BSes) simultaneously so that handoffs are not hard handoffs. The paper begins by evaluating six different types types of handoff policies on their VanLAN testbed at the Microsoft Redmond campus: (1) RSSI—the client associates with the base station with the highest received signal strength indicator, (2) BRR—the client associate with the base station with the highest beacon reception ratio, (3) Sticky—the client doesn’t disassociate with a base station until connectivity is absent for some defined period of time, then chooses the one with the highest signal strength, (4) History—the client associates with the base station that has historically provided the best performance at that location, (5) BestBS—the client associates, in a given second, with the base station that will provide the best performance in the next second, and (6) AllBSes—the client opportunistically uses all base stations in the vicinity. The first five policies use hard handoffs. The last is an idealized method that represents an upper bound. The metrics used to judge these policies are aggregate performance and periods of uninterrupted connectivity. Sticky performs the worst, and RSSI, BRR, and History perform similarly. AllBSes is by far the best. They also observe gray periods in which connection quality drops sharply and unpredictably.

The authors explain that the effectiveness of AllBSes stems from the fact that the vehicle is often in the range of multiple BSes and packet losses are bursty and roughly independent across senders and receivers. They design ViFi as a practical version of AllBSes. In ViFi, there is an anchor base station and a set of auxiliary base stations. The identities of these are embedded in the beacons that the vehicle broadcasts periodically, along with the identity of the previous anchor. An auxiliary base station attempts to relay packets that have not been received by the destination. This is done with according to an estimated probability that is computed according to the following guidelines: (1) relaying decisions made by other potentially relaying auxiliaries must be accounted for, (2) auxiliaries with better connectivity to the base station should be preferred, and (3) the expected number of relayed transmissions should be limited. ViFi also allows for salvaging, in which new anchors contact the previous anchor for packets that were supposed to be delivered to the vehicle but weren’t able to be delivered before the vehicle moved out of range. ViFi uses broadcasts and implements its own retransmits. It doesn’t implement exponential backoff because it assumes collision will not be a problem. To evaluate ViFi the authors deploy it on VanLAN and do a trace-driven simulation using measurements from another testbed called DieselNet. The most important findings according to the authors are: (1) the link layer performance of ViFi is close to ideal, (2) ViFi improves application performance twofold over current handoff methods, (3) ViFi places little additional load on the vehicle-base station medium, and (4) ViFi’s mechanism for coordinating relays has low false positive and false negative rates.

Critique

I didn’t find this paper very exciting but I still thought there were a lot of good things about it. The idea behind ViFi isn’t all that novel since it has been used in cellular networks. I liked that in this paper the authors first examined a variety of techniques and gave results for those. One thing I am confused about is how they implemented AllBSes to test it considering they argued that it is impractical to implement and represents the ideal. In general, the paper seemed to have a lot of evaluation in it, which I liked. Although ViFi did perform well in the settings in which it was tested, overall, I’m not sure how practically useful it is. For example, they assume that collisions are not an issue. If ViFi were to become successful and widely deployed it’s possible that collisions might become a problem, and they don’t even attempt to consider situations in which there might be collisions. Also, I can only imagine people wanting to connect to the Internet if they are going fairly long distances but as the authors mention in the paper, WiFi deployments may not be broad enough and lack of coverage between islands may make it unusable. There are several other deployment challenges too, and not just those mentioned in the paper. I just can’t imagine it being desirable in many settings. I don’t see cities wanting to invest in this. I’m not sure you could get a lot of other people besides the city government to freely provide access. It could be useful to deploy many ViFi base stations along a commuter rail line though, but in that case, since you know the speed, location, and schedule of the moving vehicles you could probably do even better than ViFi.

Sara's Networks Class Blog