Sara's Networks Class Blog: TCP

Showing posts with label TCP. Show all posts

Sunday, October 4, 2009

Modeling Wireless Links for Transport Protocols

Andrei Gurtov, Sally Floyd, "Modeling Wireless Links for Transport Protocols," ACM SIGCOMM Computer Communications Review, Volume 34, Number 2, (April 2004).

One line summary: This paper discusses modeling wireless links, including problems with current models and how the assumptions of a model can affect the evaluation of transport protocols using that model.

Summary

This paper discusses modeling wireless links especially for the purpose of evaluating transport protocols. The paper first briefly describes the three main types of wireless links: cellular, wireless LANs, and satellite links. It then discusses common topologies found with wireless links, as well as the typical kinds of traffic that run over them. It states that common performance metrics for wireless links include throughput, delay, fairness, dynamics, and goodput. The paper goes on to give four reasons why better models are needed, along with support examples for each reason: (1) some current models are not realistic, (2) some current models are realistic but explore only a small fraction of the parameter search space, (3) some current models are overly realistic, and (4) many models are not reproducible. The paper then describes specific characteristics of wireless links: error losses and corruption, delay variation, packet reordering, on-demand resource allocation, bandwidth variation, and asymmetry in bandwidth and latency. It also discusses the effect of queue management and node mobility on transport protocols. The paper then argues that it is not necessarily the case that transport protocols must adapt to wireless links or that wireless link layer protocols must accommodate all transport protocols but rather that designers of each should take into account the characteristics of the other and their interplay. The paper discusses three particular areas where it is not clear how link layer and transport layer protocols should interact: bit error detection and correction, packet reordering, delay variation, and cross-communication between layers.

Critique

I didn’t really like this paper. It’s not that I would disagree with anything the authors said. Many of the things they pointed out are well known and they say so. It is useful that they aggregated all this information and presented it in a nicely organized and logical way. It was somewhat informative. I guess I’m just surprised that it was published at a conference when they didn’t really implement anything or prove anything or even have that many results. I would more expect to find this in a textbook or something.

Thursday, October 1, 2009

A Comparison of Mechanisms for Improving TCP Performance over Wireless Links

H. Balakrishnan, V. Padmanabhan, S. Seshan, R. H. Katz, "A Comparison of Mechanisms for Improving TCP Performance over Wireless Links," IEEE/ACM Transactions on Networking, (December 1997).

One line summary: This paper compares several end-to-end, link layer, and split connection schemes for improving the performance of TCP over wireless links, and concludes that link layer schemes incorporating TCP awareness and selective acknowledgments, as well as end-to-end schemes using Explicit Loss Notification and selective acknowledgments, were the most promising.

Summary

This paper compares a number of schemes for improving TCP performance in wireless networks using an experimental testbed consisting of a wired and a wireless link. The approaches compared fall under three categories: end-to-end, link layer, and split connection. In the end-to-end category, six specific protocols were tested. These included the de facto TCP implementation, TCP Reno (labeled E2E), TCP New Reno (labeled E2E-NEWRENO), which improves upon the performance of TCP Reno by remaining in fast recovery after a partial acknowledgement to recover from losses at the rate of one packet per RTT, two TCP schemes using selective acknowledgments (SACKs), labeled E2E-SMART and E2E-IETF-SACK, and two schemes using Explicit Loss Notification (ELN), labeled E2E-ELN and E2E-ELN-RXMT. There were four link layer schemes tested. These included a base link layer algorithm, LL, a link layer algorithm that uses SACKs, labeled LL-SMART, and two additional schemes that add to the first two TCP awareness, labeled LL-TCP-AWARE and LL-SMART-TCP-AWARE. Two split connection schemes were tested, one labeled SPLIT modeled after I-TCP, and another that adds SACKs to the first, called SPLIT-SMART.

The authors perform simple initial experiments on the mechanisms in each category. Of the link layer schemes, the authors conclude that the ones that use knowledge of TCP semantics perform the best. Of the end-to-end protocols, the results suggest that TCP New Reno is better than TCP Reno and that a scheme that makes use of both SACKs and ELN would probably perform the best. Of the split-connection schemes, SPLIT-SMART is better, but the performance of these schemes is overall not as good as those in the other categories. Using the most promising schemes from their initial experiments, the authors then test these schemes’ reaction to burst errors and performance at different error rates. Overall, the authors drew four main conclusions: (1) of all the schemes investigated, a link layer protocol with TCP awareness and selective acknowledgements performs the best, (2) splitting the end-to-end is not necessary for good performance and violates the end-to-end semantics, (3) selective acknowledgements are especially useful over lossy links when losses occur in bursts, and (4) end-to-end schemes are not as good as link layer schemes but they are promising because they do not require any support from the link layer or intermediate nodes.

Critique

The comparisons and overviews of the different techniques for improving TCP over wireless were informative. One thing that is a bit questionable is that the authors themselves implemented each of the protocols. The reason I mention this is because in class I remember Prof. Katz telling us about a company that claimed to be able to distinguish between different implementations of TCP because they each had unique signatures of sorts in the way they operated. This suggests that two different implementations of the same protocol aren’t necessarily the same. It may have been more of a fair comparison to use the “standard implementation” of each protocol but if this was not possible in all cases then implementing them all is probably better anyway. On second thought, this method still works for the sake of comparing techniques, if not implementations, and you probably wouldn’t want the results to be implementation dependent anyway. This paper was the first time I had heard about split-connection protocols, and I have to say, they don’t seem like a very good idea at all, but that could have been because of the way it was presented and tested. It confused me that they still used TCP over the wireless link using the split-connection schemes. That didn’t seem very useful. One other thing that confused me is why at one point in a footnote, the paper states that they only measure “receiver throughput”, when at an earlier point in the paper they state they measure “throughput at the receiver”. I think that in this case they intended the same meaning by these two phrases but generally I would think that these two phrases don’t mean the same thing. I also found the charts not so great to read. Lastly, I think an interesting thing to study next would be a comparison of these protocols over more complex wireless topologies, as they mention in the section on future work.

Monday, September 28, 2009

Understanding TCP Incast Throughput Collapse in Datacenter Networks

Y. Chen, R. Griffith, J. Liu, A. Joseph, R. H. Katz, "Understanding TCP Incast Throughput Collapse in Datacenter Networks," Workshop on Research in Enterprise Networks (WREN'09), (August 2009).

One line summary: This paper examines the problem of TCP incast throughput collapse by attempting to reproduce the results of previous papers and examining in depth the behavior observed, as well as developing a quantitative model that attempts to explain it.

Summary

This paper examines the TCP incast problem and (1) shows that it is a general problem and occurs in a variety of network environments, (2) reproduces the experiments from prior work, and (3) proposes a quantitative model to understand some of the observed behavior. The authors describe the difference between fixed-fragment and variable-fragment workloads used in previous papers. In variable-fragment workloads, the amount of data sent by each sender decreases as the number of senders increases. They justify their use of a fixed-fragment workload as being more representative. They perform their experiments in two different test environments, avoiding the use of simulation. They first step is to verify that the incast problem can be replicated, which they do. They then test a number of modifications, including decreasing the minimum TCP RTO, randomizing the minimum TCP RTO, setting a smaller multiplier for the RTO exponential backoff, and randomizing the multiplier value. Their results suggested that reducing the minimum TCP RTO value was the most helpful modification, while the majority of the other modifications were unhelpful.

They performed experiments in which they used a fixed-fragment workload for different minimum RTO values, varying the number of senders and measuring goodput. They found that there are three regions of interest in the resulting graph: (1) the initial goodput collapse, (2) goodput increase, and (3) another region of slower goodput decrease. They then did a similar experiment only in this case they tested different RTOs with and without delayed ACKs enabled. Unlike in previous papers by other researchers, they found that disabling ACKs resulted in suboptimal behavior. They hypothesize that this is because disabling ACKs causes the TCP congestion window to be over-driven, and examine some internal TCP state variables to verify that this hypothesis is correct. They also conclude that this sub-optimal behavior is independent of type of workload. The results for their experiments were very different from the results in previous papers. They conclude that this is in part due to the difference in workloads (fixed-fragment versus variable-fragment), but also due to the difference in the network test environments.

They propose a simple quantitative model to describe the behavior in the case of the delayed ACK enabled, low-resolution timer using the fixed-fragment workload. It comes close to describing the observed behavior for the 200ms minimum RTO timer. However, their model does not come as close for the lower 1ms minimum RTO timer. Their model reveals that the goodput is affected by both the minimum RTO value and also the inter-packet wait times between packet transmissions, and that for larger RTO values, reducing the value helps, but for smaller RTO values, controlling the inter-packet wait time is important. The authors provide a number of refinements to their model which they use to explain several details observed in the graphs from their earlier experiments.

Critique

There are many things I appreciated about this paper. At first I thought it was going to be redundant and not contribute much beyond what they paper we read before this did, but it is clear that the authors’ comparison of their results with the results from the previous paper revealed a lot of interesting things. This paper brings up a lot of points regarding important details and influencing factors that were not examined in the previous paper and that weren’t immediately obvious from reading the previous paper. It was just nice to see them try to reproduce the previous experiments and results from the earlier paper; I don’t think I’ve seen that elsewhere even though it seems like it would be good to make this standard practice.

I’m not sure I completely buy some of their explanations in this paper but I appreciate that they tried to explain the behavior they observed so thoroughly. For instance, I’m not sure their possible explanation for why randomizing the minimum and initial RTO values is unhelpful, but that could very likely be because I don’t understand it. One other thing that is not clear to me is if when they examined enabling delayed ACKs, if they adjusted the delayed ACK timer so that it wasn’t the default 40ms, or if they left it alone. I wonder if they would have drawn other conclusions had they tried adjusting the delayed ACK timer along with the minimum RTO timer. I thought their attempt at a model was an impressive undertaking but I’m not so sure the results of this attempt were very good.

Sunday, September 27, 2009

Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication

V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, B. Mueller, "Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication," ACM SIGCOMM Conference, (August 2009).

One line summary: This paper suggests reducing or eliminating the minimum RTO and enabling fine-grained RTT and RTO measurement and calculation on the order microseconds as a way of eliminating the problem of TCP incast collapse in data centers.

Summary

This paper presents a solution to the TCP incast problem. TCP incast collapse results in severely reduced throughput when multiple senders try to send to a single receiver. The preconditions for TCP incast collapse are: (1) the network must have high bandwidth and low latency with switches that have small buffers, (2) the workload must be a highly parallel barrier synchronization request workload, (3) the servers must return a small amount of data per request (high fan in). An example scenario in which TCP incast collapse occurs is one in which a client sends a request for data that has been partitioned across many servers. The servers all try to respond with their data and overload the client’s buffers. Packets are lost and some servers experience a TCP retransmission timeout (RTO). But the default minimum retransmission timeout is 200ms, so the client must wait at least this long to receive the extra data, even though the client’s link may be idle for most of this waiting time. This results in decreased throughput and link utilization. The authors postulate that in order to prevent TCP incast collapse, the RTO must operate on granularity closer to the RTT of the underlying network, which in datacenters is hundreds of microseconds or less.

The authors demonstrate via a series of simulations and experiments in real clusters that allowing for a minimum RTO on the order of microseconds or eliminating the minimum RTO altogether improves throughput and helps avoid TCP incast collapse. They also experiment with desynchronizing retransmissions by adding a randomizing component. They advocate such desychronization in data centers but note that it is likely unnecessary in the wide-area since different flows have different RTTs and thus different RTOs. The authors explain that it is not enough to lower the minimum RTO but that it is also necessary to enable measurement and computation of the RTT and RTO on a fine-grained level as well. They describe their implementation of fine-grained RTT and RTO computation using Linux high-resolution timers. This involved making modifications to the kernel and the TCP stack.

The authors then discuss whether eliminating the minimum RTO and allowing microsecond retransmissions is appropriate for the wide-area or if such techniques should be limited to the data center. They note two potential conflicts: spurious retransmissions when the network RTT suddenly spikes, and delayed ACKs acting as an additional timeout mechanism in certain situations because the delayed ACK timer is set higher than the RTO. They argue that problems arising from these conflicts don’t occur often and are mitigated by newer TCP features. They perform some wide-area experiments to test this claim and find that spurious retransmissions aren’t an issue and while better performance can be achieved by disabling delayed ACKS, leaving them enabled and with a high timer only slightly harms performance.

Critique

I liked this paper and I think it should remain on the syllabus. I don’t have many criticisms, although I did find it strange that in their wide-area experiments, two different servers experienced almost the exact same distribution of flows. This suggests that something funny might be going on with those experiments. I thought the proposed solution in this paper was good, it would be perfect if enabling fine-grained RTT and RTO measurements and calculation didn’t require special kernel hacking. Perhaps in the future the OS will automatically provide such capabilities. In the meantime, I thought their suggestion of setting the minimum RTO to the lowest value possible was at least a nice practical idea. I also liked how they did both simulations and experiments on a real cluster.

Tuesday, September 15, 2009

Congestion Control for High Bandwidth-Delay Product Networks

D. Katabi, M. Handley, C. Rohrs, "Congestion Control for High Bandwidth-Delay Product Networks," ACM SIGCOMM Conference, (August 2002).

Summary

This paper describes an alternative transport protocol to TCP called the eXplicit Control Protocol (XCP). One major problem with TCP is that as the delay-bandwidth product of links in a network increases, TCP becomes oscillatory and unstable, degrading performance, regardless of the congestion control or queueing scheme used at the routers. TCP can also be unfair to connections using high latency links. In addition, TCP unnecessarily delays the transfer of short flows, which are most of the flows in the Internet. Thus, this paper presents XCP as an alternative to TCP. Some of the main concepts underlying XCP are the decoupling of utilization from fairness control and the placement of state in the packets as opposed to the routers. The paper demonstrates that XCP is stable for any link capacity, feedback delay, or number of sources, and the parameters of XCP are constants independent of the environment. It achieves high utilization, small queues, and near zero drops in any environment. Other advantages of XCP are that allows for easy implementation of service differentiation schemes, it distinguishes error losses from congestion losses, it facilitates detection of misbehaving sources, and its performance is an incentive to deployment. Also, it is TCP-friendly.

The basic operation of XCP is as follows: An XCP sender maintains a congestion window (cwnd) and an estimate of round-trip time (RTT) just like in TCP. The XCP protocol defines a special congestion header carried in each packet used to communicate a flow’s state and feedback. This header contains a cwnd field set to the sender’s current cwnd, a RTT field set to the sender’s RTT estimate, and a feedback field, initially set to the sender’s demand and later reset by the router to provide feedback to the sender. The XCP router has an efficiency controller (EC) and a fairness controller (FC). These compute estimates of the average RTT of all flows over the link and make a single control decision every average RTT, which is the control interval. The EC computes the desired increase or decrease in the number of aggregate bytes to transmit in the next control interval. These bytes are allocated via the feedback field of the packet header. Exactly how they are divided up among the flows is decided by the FC, which uses an additive-increase multiplicative-decrease (AIMD) mechanism to converge to fairness – if an aggregate increase is desired, it allocates so that the increase in the throughput of all flows is the same, and if a decrease is desired, it allocates so that the decrease in throughput of a flow is proportional to its current throughput. In addition, the FC periodically shuffles allocation among 10% of the flows at a link in order to speed convergence to fairness.

This paper uses a fluid model to prove that XCP is stable independently of delay, capacity, and number of sources. It then provides a comparison of XCP to TCP Reno over Random Early Discard (RED), Random Early Marking (REM), Adaptive Virtual Queue (AVQ), and Core Stateless Fair Queueing (CSFQ) queueing policies. The queueing policy used with XCP almost never mattered because XCP almost never drops packets. These simulations show that XCP maintains near optimal utilization even in the face of capacity increase, feedback delay, increase in number of flows, and short web-like traffic, whereas TCP’s performance degrades. Also, XCP converges to fairness quickly and is not biased against long RTT flows like TCP, and is robust to high variance in the RTT distribution even though it uses average RTT. Moreover, XCP is able to dampen oscillations and respond quickly to sudden increases and decreases in traffic. Toward the end of the paper the authors describe how XCP can easily be used to provide differential service according to the shadow prices model and detect misbehaving sources. The also demonstrate that XCP is TCP-friendly.

Critique

I thought this paper was nice to read. It seems like their fluid model approach from control theory led to a very clean design and analysis of XCP. I tend to agree with their discussion about why the congestion signal shouldn’t be to drop packets and it needn’t even be binary. I also thought the extension to provide differential bandwidth allocation using the shadow prices model was good, especially as it seemed so straightforward to implement. I could see where that would be a commercially desirable feature of XCP.

I thought their argument that XCP could be used to facilitate policing agents that detect misbehaving sources was interesting but somewhat lacking. It might be nice to see this in action or to take a more detailed look XCP from a security angle and see if there are vulnerabilities that aren’t obvious from a first glance. This paper made me wonder why XCP isn’t supported in the Internet today. I’d be interested to see if anything ever came from this paper or if it was just a thought experiment.

Tuesday, September 8, 2009

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

D-M Chiu, R. Jain, "Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks," Computer Networks and ISDN Systems, 17 (1989), pp 1-14.

One line summary: In this paper the authors algebraically analyze a subset of algorithms for congestion avoidance with respect to a number of criteria, including efficiency, fairness, distributedness, and convergence; they are able to conclude that the optimal algorithm uses additive increase and multiplicative decrease.

Summary

In this paper, the authors analyze decentralized algorithms for congestion avoidance based on binary feedback signals, focusing on what they call increase/decrease algorithms. They focus on an abstract system in which there is one bottleneck resource used by n users. They assume that the binary feedback signal indicates whether this resource is overloaded or underloaded, and that the signal is sent by setting a bit in the packet header. They also assume that the feedback and control loop is synchronous across all users. The authors focus on linear control functions. In this context then, increase/decrease algorithms are merely those in which one signal indicates that the resource is underloaded and the load should be increased, and the other signal means the opposite. Examples of such algorithms include multiplicative increase/additive decrease and additive increase/multiplicative decrease.

The criteria by which the authors select controls are efficiency, fairness, distributedness, and convergence. Efficiency is measure by how close the total load on a resource is to the point where any further load would cause a decrease in response time and throughput, i.e. the point of maximum utilization and throughput and lowest response time. Fairness is measured by a function that the authors define which attempts to measure how equal the allocation of the resource is across all users. Distributedness means that the users only have limited knowledge of the system, in particular, whether the resource is overloaded or underloaded. Lastly, convergence is a measure of both responsiveness, or how fast the system takes to reach equilibrium, and of smoothness, or how large the oscillations around the equilibrium are. They choose to examine distributed linear control functions that converge to efficient and fair states.

In order to analyze the set of possible control functions, the authors represent system state transitions in an n-dimensional vector space (representing the load for each of n users) across discrete time slots. The ith user’s load at time t is x_i(t) and the load at the next time interval, x_i(t+1), is a function of the control scheme used with the load at time t and the system feedback. For example, it could be that x_i(t+1) = a + b*x_i(t) if the binary signal is indicates the resource is underloaded. In this function, a is additive and b is multiplicative. With this representation, they are able to deduce algebraically a number of propositions. The first is essentially that the decrease policy should be multiplicative and the increase policy should have an additive component and can optionally have a multiplicative component with coefficient no less than one. The second proposition is that both the increase and the decrease can have both additive and multiplicative components provide that a special step is taken in the case when certain convergence restrictions would be violated. The third and most conclusive proposition is that for feasibility and optimal convergence, the increase should be additive and the decrease should be multiplicative.

At the end of the paper, the authors describe what nonlinear control functions in congestion avoidance algorithms might look like, but reason that because such control is too sensitive to various system parameters, the robustness of the control is decreased. On this note, they go on to enumerate a few practical considerations in choosing an algorithm for congestion avoidance. The first is that it should be independent of hardware and software parameters as much as possible in order to reduce the complexity of the configuration task and make configuration errors less possible. The second consideration is that the resources and allocations must be measured in integers, so this should be taken into account when choosing an algorithm. The last practical consideration they mention is ease of implementation. Given all these considerations in addition to the result of their earlier analysis, they conclude that an algorithm that uses a linear control function with additive increase and multiplicative decrease is optimal.

Critique

I didn’t like everything about this paper. I thought that it was a bit too abstract, and I think in general it is no trivial matter to translate abstract results into real working solutions. However, given that there appear to be so many different algorithms for congestion avoidance, of which they only examine a subset, their analysis does provide some solid guidance in choosing among these, especially if you accept that the conclusions they draw from their analysis of their simplified system still completely hold in the much more complex environment of real systems. Since real TCP uses additive increase/multiplicative decrease, it seems that this could be a safe bet.

I’d be curious to see if algorithms with control functions other than additive increase/multiplicative decrease have been tested in practice. The authors mention that this paper is just one in a series of papers in which they examine a range of mechanisms for congestion avoidance, so an interesting exercise might be to take some of the algorithms they look at in their other papers and actually test them against additive increase/multiplicative decrease, and also maybe examine some algorithms that use nonlinear control functions.

One thing I did really like about this paper is their use of graphs. Whereas some of the algebra was at times a bit tedious or difficult to follow, I thought the graphs helped them provide a nice, clear, and more intuitive explanation for how the control functions worked and how they converged to fairness and efficiency. Although these graphs were further simplified on top of their simplifying assumptions, in that they represented the case with only two users, they were still pretty much key in helping me to understand some of their conclusions.

Monday, September 7, 2009

Congestion Avoidance and Control

V. Jacobson, M. Karels, "Congestion Avoidance and Control," ACM SIGCOMM Conference, (August 1988).

One line summary: This paper describes several important algorithms for congestion control in TCP, including slow-start, exponential backoff, multiplicative decrease additive increase, and RTT variance estimation; the paper also provides results that demonstrate that implementations of TCP using the algorithms described in the paper are better than older implementations.

Summary

The authors note that due to an explosive increase in size, computer networks have increasingly experienced congestion problems. Seven algorithms added to TCP are meant to help deal with this: round-trip time (RTT) variance estimation, exponential backoff, slow start, aggressive receiver ACK policy, dynamic window sizing, Karn’s algorithm, and fast retransmit. The authors describe the first five of these algorithms. According to the authors, these algorithms stem from the observation that a TCP connection should obey a “conservation of packets” principle, which basically states that for a connection in equilibrium, the sender should not place a new packet into the connection until an old packet leaves. There are three instances in which this principle will not be upheld: when the connection doesn’t initially reach equilibrium, when after equilibrium is reached, a new packet is sent before an old packet leaves, and when resource limitations along the connection path make equilibrium unattainable. The authors address each of these in turn.

The first failure occurs when a connection is being started or restarted. The authors note that a TCP connection is a self-clocking system, which help makes it stable, however, this also means that in order to get packets flowing there must be ACKS to clock them out, but to get ACKS there must be packets flowing. The authors thus describe the slow-start algorithm as a method for “starting the clock” and gradually increasing the number of packets flowing. The slow-start algorithm is as follows: For each connection, store a variable called the congestion window that will be used to determine the number of packets to send. When starting or restarting, set the congestion window to 1 packet. Every time a packet is acknowledged, increase the congestion window by 1. To select how many packets to send at a time, choose the minimum of the congestion window and the receiver’s advertised window. The result of this algorithm is that if no packets are lost or delayed, the congestion window doubles every RTT, and never will the sender send packets at a rate faster than twice the maximum send rate on the connection path. The authors find that this algorithm greatly decreases the number of retransmits and increases effective bandwidth.

The authors next explain that the second failure must be due to a sender’s faulty retransmit timer. This timer should estimate the RTT, which increases when load increases, in order to determine when a packet should be retransmitted, but one mistake in doing this is not estimating the variance in the RTT. The authors present an easy-to-calculate method for estimating the variance in their appendix. Another mistake is not choosing the correct backoff after a packet is retransmitted. The authors argue that only exponential backoff schemes have any hope of working in the general case. The authors demonstrate that a timer that estimates RTT using their method more accurately approximates RTT than the method previously used in TCP.

Lastly, the authors point out that if the timers are doing a good job of estimating RTT, then a timeout probably indicates a lost packet instead of a broken timer. Lost packets are typically a sign that the network is congested, causing connections to encounter resource limits, which is the third failure case. Connection endpoints must have a way of responding to congestion such that they decrease the amount they send when the network is congested and increase the amount they send when it is possible to do so, in order to achieve the proper utilization of network resources while avoiding congestion. To this end, the authors introduce a multiplicative decrease additive increase algorithm. The reason the decrease happens much faster than the increase is because it is easy for the network to become congested and much harder for it to become uncongested. The algorithm works as follows: On a timeout, the congestion window is set to half of the current congestion window size (multiplicative decrease). For each packet acknowledgement received thereafter, increase the congestion window (cwnd) by 1/cwnd (additive increase). Then when choosing the number of packets to send, choose the minimum of the congestion window and the receiver’s advertised window. The authors note that this does not include slow-start because they are two entirely different algorithms, but they should be implemented together, which is done usually by having separate slow-start and congestion avoidance phases. The authors show that their congestion avoidance approach results in fewer retransmits and higher bandwidth utilization.

Critique

The algorithms the authors propose in this paper later made it into newer implementations of TCP, so in that sense, they were very successful. Their results demonstrate that their algorithms were generally an improvement over previous implementations. However, as the authors point out, there are still improvements to be made. For instance, none of their solutions ensure fair sharing of network bandwidth, so they propose this as an area of future work. (Then again, according to Kurose and Ross, the multiplicative decrease additive increase algorithm was later shown to converge to provide equal sharing of bandwidth by Chui in 1989, a year after this paper was published, so perhaps it was the case that this work did not need to be done after all.)

Also, in certain situations, TCP’s exponential backoff and congestion avoidance can cause serious performance problems. Regular TCP can be improved for high-performance networks and there are apparently some extensions to TCP for this. Investigation reveals that some of the later changes to TCP’s congestion avoidance algorithms were suggested in order to allow it to more appropriately handle networks with LFNs and high-latency links. Lastly, some of the assumptions that the authors make about the network aren’t necessarily always valid. For instance, the authors state that packets are lost due to either network congestion or packet damage, but that the latter occurs so infrequently that it is safe to assume that a timeout is almost always a sign of network congestion. However, in wireless networks, this assumption does not generally hold, so TCP’s congestion avoidance can in this case result in link underutilization. Thus, a modified mechanism that takes this into account would be appropriate for TCP in wireless networks.

I thought this paper was good, and obviously the algorithms described in the paper are important, as they are now an integral part of TCP. Congestion control is a very important service, without which the Internet could not function well, if at all. I was somewhat confused as to how much of the algorithms suggested were original contributions, since some of the footnotes seem to indicate very similar algorithms had first been suggested elsewhere. I found that the authors’ tendency to reference a handful of papers I hadn’t read without really explaining briefly what was in those papers inhibited my understanding. And just as a matter of personal taste, I did not like it that so much information was crammed into the footnotes and appendices; to me this made the paper less readable, although aside from this it was well-written.

[EDIT: From my comment about fairness as examined by Chui it's probably clear that I read this paper before I read the other one. ;-) ]

Monday, August 31, 2009

The Design Philosophy of The DARPA Internet Protocols

D. D. Clark, "The Design Philosophy of the DARPA Internet Protocols," ACM SIGCOMM Conference, (August 1988).

One line summary: This paper examines the underlying goals and logic that motivated the design of the DARPA Internet Architecture.

Summary

The primary goal of the DARPA Internet Architecture was to connect existing interconnected networks using multiplexing. The multiplexing technique selected was packet switching, and it was assumed that networks would be connected by packet switches called gateways. The secondary goals were sevenfold and included, in order of importance, resiliency to partial failures, support for multiple types of communications services, accommodation of a variety of networks, capacity for distributed management, cost effectiveness, easy host attachment, and accountability.

These goals directly influenced the resulting design. For instance, as a consequence of the first goal, the Internet architects chose as protection against failure an approach called fate-sharing, which has several ramifications, among them, that intermediate nodes have no essential information about ongoing connections passing through them. As a consequence of the second goal relating to support for a variety of services, the designers tried to make TCP as general as possible, but found that it was too difficult to build the wide range of services needed for such support into one protocol and that more than one transport service would be necessary. As a consequence of the third goal that the Internet should operate over a wide variety of networks, the architecture had to make very few assumptions about the underlying capabilities of the network. Here, the author uses a sort of end-to-end-argument to explain why certain services were engineered at the transport level.

The author admits that of the seven secondary goals, the first three discussed above had the largest impact on the design of the Internet, but the last four were less effectively met. This fact itself has also had a large impact on the state of the Internet today. For example, the author explains that some of the biggest problems with the Internet today have to do with insufficient capacity for distributed management.

Critique

This paper is important because it provides insight into the beginnings and evolution of the Internet. Given its complexity, such historical perspective is useful for addressing existing problems in the Internet and designing new protocols and applications. Since hindsight is 20/20, there are a number of now obvious criticisms that could be made of the designers of the Internet. Several are made in the paper pertaining to the failure to fully achieve the last four secondary goals, as well as to the difficulty in providing guidance to implementers, especially with respect to performance. It is good that the author pointed out these shortcomings.

I would imagine that some of these criticisms, such as the one related to distributed management, have been addressed since the paper was written, while others remain an issue. One major oversight on the part of the designers of the Internet, as is often pointed out, was their failure to consider the possibility of malicious users on the network, how these affect the security and stability of the whole Internet, and what provisions should be taken to counteract them. This is somewhat ironic since, as the paper says “the network was designed to operate in a military context.” Today, so-called cybersecurity is an area of intense concern to the military, yet perhaps if the original designers had had some foresight into this issue, it would be less of a problem.

As a final note, I found it interesting that in the conclusion the author speaks of “the next generation of architecture” and potential alternative building blocks to the datagram. We still rely on a version of TCP/IP today, albeit an improved version. It has proven to be very challenging, if not entirely infeasible, to change the underlying architecture of the Internet at this point. The push to switch to IPV6 is one example of the difficulties involved. Perhaps this is another failing of the designers. I'd be interested to know more about what the author had in mind here.

Sara's Networks Class Blog