Sara's Networks Class Blog: Jain

Monday, September 21, 2009

VL2: A Scalable and Flexible Data Center Network

A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, S. Sengupta, "VL2: A Scalable and Flexible Data Center Network," ACM SIGCOMM 2009, (August 2009).

One line summary: This paper presents VL2, a data center network architecture that provides layer 2 semantics, high capacity, and performance isolation between services using flat addressing, Valiant Load Balancing, and a directory service for mappings, and runs over a Clos topology.

Summary

This paper presents a data center network architecture called VL2. The goals of VL2 are to achieve uniform high capacity, performance isolation between services, layer 2 semantics, as well as agility (the ability to assign any server to any service). VL2 is built from low cost switches arranged into a Clos topology and uses Valiant Load Balancing (VLB) to spread traffic across available paths and adapt to volatility in traffic and failure patterns. The authors of the paper first conduct a data center traffic analysis by implementing a data center that supports data mining. From this their major findings were that (1) the majority of flows are small, (2) the number of concurrent flows through a machine is around ten more than half the time, (3) the variability in traffic patterns is not easily summarized and there are a large number of representative traffic matrices, (4) traffic patterns change nearly constantly and unpredictably, (5) most failures are small in size, and (6) the main causes of downtimes are network misconfigurations, firmware bugs, and faulty components, and there is no obvious way to eliminate failures from the top of the data center topology hierarchy.

Briefly, VL2 operates as follows. VL2 is built on a Clos network topology. It uses two different kinds of addresses: location specific IP addresses (LAs) and application specific IP addresses (AAs). All switches and interfaces have LAs. Each server has an AA that remains the same no matter where the application is located and is assigned when the server is provisioned to a service. All servers within a service believe they are in the same subnet. The AA of a server is associated with the LA of the ToR switch to which the server is connected. Switches run a link-state routing protocol using these LAs. In performing routing, VL2 uses VLB to distribute traffic across intermediate switches and ECMP to distribute across equal cost paths. There is a VL2 agent at each server that encapsulates packets emanating from the servers within a packet that has the LA address of the destination ToR in the header. A directory service stores AA-LA-MAC address mappings. The directory service consists of a number of read-optimized directory services that cache AA-LA mappings and a smaller number of write-optimized replicated state machine servers that reliably store AA-LA mappings. The directory service performs lookups, updates, and reactive cache updates.

The authors perform an evaluation of VL2, which serves to demonstrate that VL2 provides uniform high capacity, VLB fairness, performance isolation, convergence after link failures, and has a directory service that is scalable and resilient, with high availability and throughput. Of these experiments, I thought the most interesting was the one where they performed an all-to-all data shuffle stress test to show VL2 achieves high utilization and fairness between flows.

Critique

In terms of criticisms, here are several notes I made about this paper:

I liked that before designing VL2 they did an analysis of data center traffic. The results of this were somewhat surprising to me so I found them interesting. However, for this analysis, the authors instrumented a data mining workload, but it’s probably not the case that a data mining workload is representative of most traffic in data centers in general.

I didn’t understand why they exclude UDP traffic from consideration in their evaluations, in which they only use TCP flows, not UDP flows. I don’t know if this is realistic. They make some hand wavy claim that techniques such as STCP to make UDP “TCP friendly” are well known and can be used, but I am not convinced it is really that easy. Also in their evaluations, they state that they rely on TCP to ensure that each flow is rate limited to its fair share of the bottleneck, but many of the previous papers we read suggest that TCP is not very good at fairness. Another thing is that in their evaluation using data shuffle, they claim that a conventional design would take 11x longer but they don’t explain where they got their figures. Also, they claim several times throughout the paper that agility is a big goal for VL2 but they don’t do an experiment to show what happens when a server is migrated, as was done in the PortLand paper. I don’t think that by agility they necessarily mean ease of VM migration but perhaps this still would have been interesting to examine.

Some miscellaneous points are that in one part of the paper, they claim that that VL2 enables fine-grained path control by adjusting the randomization used in VLB but they don’t elaborate further on this. Also the performance of VL2 appears to depend on traffic obeying the hose model. I don’t know what the hose model is, but is it safe to assume all data center traffic will obey the hose model?

A general point not just about this paper in particular but about all of these papers on data center networks is that it seems that every paper only uses evaluations that highlight the good features of the solution suggested in that paper, so it is hard to compare solutions. It would be nice if there were at least some standard experiments or similar experiments that could facilitate comparison, analogous to the sets of benchmarks used for processors in architecture.

Tuesday, September 8, 2009

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

D-M Chiu, R. Jain, "Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks," Computer Networks and ISDN Systems, 17 (1989), pp 1-14.

One line summary: In this paper the authors algebraically analyze a subset of algorithms for congestion avoidance with respect to a number of criteria, including efficiency, fairness, distributedness, and convergence; they are able to conclude that the optimal algorithm uses additive increase and multiplicative decrease.

Summary

In this paper, the authors analyze decentralized algorithms for congestion avoidance based on binary feedback signals, focusing on what they call increase/decrease algorithms. They focus on an abstract system in which there is one bottleneck resource used by n users. They assume that the binary feedback signal indicates whether this resource is overloaded or underloaded, and that the signal is sent by setting a bit in the packet header. They also assume that the feedback and control loop is synchronous across all users. The authors focus on linear control functions. In this context then, increase/decrease algorithms are merely those in which one signal indicates that the resource is underloaded and the load should be increased, and the other signal means the opposite. Examples of such algorithms include multiplicative increase/additive decrease and additive increase/multiplicative decrease.

The criteria by which the authors select controls are efficiency, fairness, distributedness, and convergence. Efficiency is measure by how close the total load on a resource is to the point where any further load would cause a decrease in response time and throughput, i.e. the point of maximum utilization and throughput and lowest response time. Fairness is measured by a function that the authors define which attempts to measure how equal the allocation of the resource is across all users. Distributedness means that the users only have limited knowledge of the system, in particular, whether the resource is overloaded or underloaded. Lastly, convergence is a measure of both responsiveness, or how fast the system takes to reach equilibrium, and of smoothness, or how large the oscillations around the equilibrium are. They choose to examine distributed linear control functions that converge to efficient and fair states.

In order to analyze the set of possible control functions, the authors represent system state transitions in an n-dimensional vector space (representing the load for each of n users) across discrete time slots. The ith user’s load at time t is x_i(t) and the load at the next time interval, x_i(t+1), is a function of the control scheme used with the load at time t and the system feedback. For example, it could be that x_i(t+1) = a + b*x_i(t) if the binary signal is indicates the resource is underloaded. In this function, a is additive and b is multiplicative. With this representation, they are able to deduce algebraically a number of propositions. The first is essentially that the decrease policy should be multiplicative and the increase policy should have an additive component and can optionally have a multiplicative component with coefficient no less than one. The second proposition is that both the increase and the decrease can have both additive and multiplicative components provide that a special step is taken in the case when certain convergence restrictions would be violated. The third and most conclusive proposition is that for feasibility and optimal convergence, the increase should be additive and the decrease should be multiplicative.

At the end of the paper, the authors describe what nonlinear control functions in congestion avoidance algorithms might look like, but reason that because such control is too sensitive to various system parameters, the robustness of the control is decreased. On this note, they go on to enumerate a few practical considerations in choosing an algorithm for congestion avoidance. The first is that it should be independent of hardware and software parameters as much as possible in order to reduce the complexity of the configuration task and make configuration errors less possible. The second consideration is that the resources and allocations must be measured in integers, so this should be taken into account when choosing an algorithm. The last practical consideration they mention is ease of implementation. Given all these considerations in addition to the result of their earlier analysis, they conclude that an algorithm that uses a linear control function with additive increase and multiplicative decrease is optimal.

Critique

I didn’t like everything about this paper. I thought that it was a bit too abstract, and I think in general it is no trivial matter to translate abstract results into real working solutions. However, given that there appear to be so many different algorithms for congestion avoidance, of which they only examine a subset, their analysis does provide some solid guidance in choosing among these, especially if you accept that the conclusions they draw from their analysis of their simplified system still completely hold in the much more complex environment of real systems. Since real TCP uses additive increase/multiplicative decrease, it seems that this could be a safe bet.

I’d be curious to see if algorithms with control functions other than additive increase/multiplicative decrease have been tested in practice. The authors mention that this paper is just one in a series of papers in which they examine a range of mechanisms for congestion avoidance, so an interesting exercise might be to take some of the algorithms they look at in their other papers and actually test them against additive increase/multiplicative decrease, and also maybe examine some algorithms that use nonlinear control functions.

One thing I did really like about this paper is their use of graphs. Whereas some of the algebra was at times a bit tedious or difficult to follow, I thought the graphs helped them provide a nice, clear, and more intuitive explanation for how the control functions worked and how they converged to fairness and efficiency. Although these graphs were further simplified on top of their simplifying assumptions, in that they represented the case with only two users, they were still pretty much key in helping me to understand some of their conclusions.

Sara's Networks Class Blog

Monday, September 21, 2009

VL2: A Scalable and Flexible Data Center Network

Tuesday, September 8, 2009

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

Blog Archive

About Me