Sara's Networks Class Blog: power consumption

Tuesday, December 1, 2009

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

S. Nedevschi, J. Chandrashekar, J. Liu, B. Nordman, S. Ratnasamy, N. Taft, "Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems," NSDI'09, (April 2009).

One line summary: This paper examines the value of using proxies to handle idle-time traffic for sleeping hosts with the goal of reducing wasted energy consumption in networked end-systems; it does this by analyzing and classifying traffic to see what can be ignored or automatically handled and by examining several potential proxy designs.

Summary

This paper examines the problem of reducing wasted energy consumption in powered-on but idle networked end-systems such as desktops in home and office environments. It discusses various solutions, and in particular examines the value of using a proxy to handle idle-time traffic on behalf of sleeping hosts. The main problem is that while vendors have built in hardware support for sleep (S-states) to reduce power consumption while idle, surveys of office buildings indicate that the vast majority of machines are fully on while idle, instead of taking advantage of these sleep states. One reason for this is that a sleeping machine loses its network presence i.e. it cannot send or receive network messages, and another reason is that users or administers occasionally want to be able to schedule tasks to run during these idle times. Both of these reasons cause users to not make use of sleep states. This paper thus tries to answer the following questions: (1) Is the problem worth solving? (2) What network traffic do idle machines see? (3) What is the design space for a proxy? (4) What implications does proxying have for future protocol and system design.

To begin to answer these questions, the authors first collect network and user-level activity traces from 250 client machines belonging to Intel employees and attempt to classify the traffic. They first classify each packet as being either broadcast, multicast, or unicast, then whether it is incoming or outgoing. They find outgoing traffic tends to be dominated by unicast while incoming traffic is made of significant proportions of all three. They estimate the potential for sleep in four scenarios: (a) ignore broadcast and wake for the rest, (b) ignore multicast and wake for the rest, (c) ignore both broadcast and multicast, and (d) wake for all packets. They find that broadcast and multicast are mainly responsible for reducing the amount of potential sleep time and that doing away with just one of broadcast or multicast is not effective. The authors next classify the traffic based on protocol type, and evaluate each protocol on two metrics: total volume of traffic, and something the authors call half-sleep time. A high half-sleep time means that protocol’s packets could be handled by waking the machine up, whereas a low half-sleep time means that to achieve useful amounts of potential sleep time the proxy would have to handle them. They find that the bulk of broadcast traffic is for address resolution and service discovery, and a lot of other broadcast traffic is from router-specific protocols. Broadcast traffic allows for very little sleep in the office, but significantly more in the home. A proxy could easily handle most of these broadcast protocols. Multicast traffic is mostly caused by router protocols and is often absent or extremely reduced in homes as compared to offices. All router traffic is ignorable. From their analysis of unicast traffic, they speculate that it might be possible to ignore or eliminate much of unicast traffic. Finally, they classify traffic into one of the following three categories regarding the need to proxy that traffic: don’t wake, don’t ignore, and policy-dependent, and into one of the following three categories regarding the difficulty in proxying that traffic: ignorable/drop, handle via mechanical responses, and require specialized processing.

Next, they present four proxy designs. The first ignores all traffic classified as ignorable and wakes the host for the rest. The second ignores all traffic classified as ignorable, responds to traffic listed as capable of being handled by mechanical responses, and wakes the machine for the rest. The third does the same as the second except that it wakes up for traffic belonging to a certain set and drops any other incoming traffic. Lastly, the fourth does the same as the third except that it also wakes up for a certain set of scheduled tasks. They find that the simplest proxy, proxy 1, is inadequate for office environments and nearly inadequate for home environments, but that proxy 3 achieves a good amount of sleep time in all scenarios – more than 70% of the idle time. They also find that the effectiveness of proxy 2 depends a great deal on the environment. Given this, the best trade-off between complexity of design and power savings depends on the environment. Also, the authors also note that since scheduled wake-ups are infrequent, the impact on sleep is minimal, so proxy 4 performs practically the same as proxy 3. Finally, they offer a basic proxy architecture that could serve as a framework to building the different proxies designs they considered, and to demonstrate the feasibility of building a proxy, they implemented a simple proxy prototype in Click. The authors end by speculating about how systems could be redesigned to make them more power-aware, thereby simplifying the implementation of proxies, making proxies more effective, or eliminating the need for proxies altogether.

Critique

One thing I really liked about this paper is how the authors analyzed and classified network traffic before considering proxy design. In retrospect it seems absolutely necessary to guiding the design of a proxy. It was also just informative in general to look at the traffic traces from the perspective of which packets are ignorable, which can be handled automatically, and which are actually “important”. I also liked how they examined several points in the proxy design space and compared them. Overall I thought this was a very thoughtful and well organized paper and I think it should stay in the syllabus.

Cutting the Electric Bill for Internet-Scale Systems

A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, "Cutting the Electric Bill for Internet-Scale Systems," ACM SIGCOMM Conference, (August 2009).

One line summary: In this paper the authors suggest a new method for reducing the energy costs of running large Internet-scale systems by rerouting traffic on an hourly basis to data centers in regions where the price of energy is cheaper.

Summary

This paper examines a new method for reducing energy costs of running large Internet-scale systems. This method is based on the observations (1) that electricity prices vary on an hourly basis and are not well correlated across different geographic locations and (2) that large distributed systems already incorporate request routing and replication. Considering this, the problem the authors want to solve is: given a set of datacenters or server clusters spread out geographically, map client requests to clusters such that the total electricity cost in dollars of the system is minimized, possibly subject to constraints such as staying under a maximum response time.

The authors first do an empirical market analysis. They verify that prices are not well correlated at different locations, and also note that the price differentials between any two locations generally vary hourly and somewhat unpredictably, suggesting that a pre-determined assignment is not optimal. They examine the overall differential distribution and conclude that there does exist the opportunity to exploit these differentials for savings. To evaluate their method, they use traffic data from Akamai to derive a distribution of client activity and cluster sizes and locations. They then use a simple model to map prices and cluster-traffic allocations to energy prices. They handle the issue of bandwidth costs (i.e. changing assignments of clients to clusters could increase bandwidth costs) by estimating the 95th percentile from the data and constrain their reassignments such that this 95th percentile is not increased for any location. To estimate network performance in their model, they use the geographic distance between client and server. They model the energy consumption of a cluster to be roughly proportional to its utilization. The authors assume the system is fully replicated and that the optimization for cost happens every hour. They then use a simple discrete time simulator that steps through the Akamai usage statistics. At each time step a routing module with a global view of the network allocates clusters, and from this they model each cluster’s energy consumption, and use observed hourly market prices to calculate expenditures.

The authors are able to show that existing systems can reduce energy cost by 2% without significant increase in bandwidth or reduction in client performance. They also find that savings rapidly increase with energy elasticity, which is the degree to which energy consumed by a cluster depends on the load placed on it. Lastly, they find that allowing increased distances between the client and server leads to increased savings.

Critique

This paper was somewhat thought-provoking and brought up some interesting issues. However, what I didn’t really enjoy about this paper is that their model is forced to make so many simplifying assumptions about so many things, such as network performance, cluster power usage, and bandwidth costs, to name a few, that I’m not sure how valid their actual findings turn out to be or how much credence to give them. Along this vein, it would be interesting to know to what extent their results depend on the traffic data they used. Another assumption I didn’t really like is that they assume complete replication in the system in question, so that any request can be routed anywhere. I personally can’t guess how true that is in practice but my inclination is that it is not necessarily true for all Internet-scale systems, though it may be true for many. Furthermore, while in theory the basic idea seems like it would work, the authors do hint at a few complicating factors (that are not necessarily purely computer science problems), and it’s hard to know what others might arise in actually trying to implement something like this. For instance, the authors assume that rerouting traffic in this way would not increase bandwidth prices, but that may not be true, and companies are averse to increasing their bandwidth costs. As another example, in reality, renegotiating existing contracts between companies and cluster operators work if the company rents space in a co-located and between companies and the utility companies to take into account this new method may be difficult. Given all the assumptions the authors are forced to make and other complicating details that they may or may not be aware of, it’s hard to judge whether or not something like this would be feasible in reality. And, if everybody did this, it does make you wonder how that would change things, since there is likely feedback between the number of requests routed to an area and the price of energy there. So in summary, although the authors didn’t really have a choice but to try to make good assumptions or estimates of the various unknowns in their model, I couldn’t help but be frustrated by this aspect of the paper. Regardless, I was still impressed by their model and simulation and found it interesting.

Sara's Networks Class Blog

Tuesday, December 1, 2009

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

Cutting the Electric Bill for Internet-Scale Systems

Blog Archive

About Me