Monday, September 21, 2009

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric


R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, A. Vahdat, "PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric", ACM SIGCOMM, (August 2009).

One line summary: This paper presents PortLand, a layer 2 network fabric consisting of a set of Ethernet-compatible routing and addressing protocols tailored for data centers with multi-rooted tree topologies.

Summary

This paper presents PortLand, a network fabric for data centers with layer 2 semantics. The goals of PortLand are that it be scalable, easy to manage, flexible, fault-tolerant, and that it support virtual machine migration. The authors motivate their approach by first describing the problems with traditional Ethernet-, LAN-, and VLAN-based solutions. These problems are essentially the same ones that were mentioned in the SEATTLE paper, so refer to the SEATTLE post below for a discussion of them. One main difference between PortLand and other data center network fabric solutions is that PortLand is designed for use over a specific topology, the fat tree or similar multi-rooted tree topologies. Central in the design of PortLand is the fabric manager (FM), which maintains soft state about network configuration information such as topology. Also key is that in PortLand, each end host has a pseudo-MAC (PMAC) address, of which the hosts are unaware, that encodes the hosts’ location in the topology. These PMACs are mapped to the hosts’ actual MAC (AMAC) address.


Forwarding in PortLand is done using PMACs and host-connected edge switches perform the AMAC to PMAC header rewriting so that the hosts remain unaware of the PMACs. AMAC-PMAC-IP address mappings are maintained in the FM, which responds with this information to ARP requests that are unicast to it from the edge switches. This special handling of ARP messages is called proxy-based ARP in the paper. Switches learn of their location in the topology using the Location Discovery Protocol (LDP), which requires no administrator configuration. After the switches establish their location, they use updates from their neighbor switches to populate their forwarding tables. Forwarding in PortLand is provably loop-free (by observing up-down semantics – if a packet is forwarded down a layer in the topology it shouldn’t be forwarded back up). Multicast groups in PortLand are mapped to core switches using a hash function, and all multicast packets are forwarded to the core switch the group is hashed to. The FM is responsible for installing the proper forwarding state in core and aggregation routers to ensure that every host in a multicast group receives multicast packets for that group. LDP also provides switch failure detection and failure recovery is aided by the FM, which tracks the state of switches.


The authors conclude by evaluating PortLand in a testbed of 20 4-port switches and 16 end hosts. They measure the time for convergence for UDP flows in the presence of link failures as well as for TCP flows. They also measure multicast convergence when a link to a core switch fails. They measure scalability and VM migration as well. In all these experiments, they evaluate PortLand alone without comparing it to other solutions.


Critique

I didn't really like this paper. One thing I didn't like is that the authors claim that they restrict the amount of centralized knowledge in PortLand but they never explain what happens when the central FM goes down. They discuss several fault cases but not one that deals with this. Clearly if the FM were down this would be catastrophic, so you would think they would address this. Also, they claim that the FM is amenable to a distributed realization but they don’t elaborate on this so it is not very convincing. Also, the FM doesn’t seem very scalable, especially if they don’t distribute it somehow. Distributing the FM seems like it would be more complicated to do than they admit.


I also was struck by how often they mentioned special failure cases. Maybe this is always the case for these kinds of systems and other papers just don’t emphasize it like the authors do here, but having so many cases where PortLand could fail and special measures need to be taken makes it seem kind of fragile.


Another obvious criticism is that PortLand isn’t applicable to general topologies; they assume that the fat-tree or a similar topology is the obvious best one, so in some sense their solution depends on this assumption remaining true. Also, they mention at one point that their implementation uses a separate control network for communication between the FM and the switches. I feel like they should have been more upfront about this, and perhaps evaluated PortLand without the separate control network as well.


I also thought that the authors were slightly unfair and misleading in their description of SEATTLE. They make a lot of criticisms of SEATTLE and TRILL but they don’t compare PortLand to them in their experiments. While a direct comparison might have been difficult for many reasons, it would have been nice if they had at least attempted to do similar experiments as done in the SEATTLE and TRILL papers. It doesn’t seem fair that they didn’t.


One thing that is good about PortLand is that, as the authors claim, it is implementable using existing hardware and software, although maybe that’s true of most of the alternatives as well.


No comments:

Post a Comment