Tuesday, December 1, 2009

Cutting the Electric Bill for Internet-Scale Systems


A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, "Cutting the Electric Bill for Internet-Scale Systems," ACM SIGCOMM Conference, (August 2009).


One line summary: In this paper the authors suggest a new method for reducing the energy costs of running large Internet-scale systems by rerouting traffic on an hourly basis to data centers in regions where the price of energy is cheaper.

Summary

This paper examines a new method for reducing energy costs of running large Internet-scale systems. This method is based on the observations (1) that electricity prices vary on an hourly basis and are not well correlated across different geographic locations and (2) that large distributed systems already incorporate request routing and replication. Considering this, the problem the authors want to solve is: given a set of datacenters or server clusters spread out geographically, map client requests to clusters such that the total electricity cost in dollars of the system is minimized, possibly subject to constraints such as staying under a maximum response time.

The authors first do an empirical market analysis. They verify that prices are not well correlated at different locations, and also note that the price differentials between any two locations generally vary hourly and somewhat unpredictably, suggesting that a pre-determined assignment is not optimal. They examine the overall differential distribution and conclude that there does exist the opportunity to exploit these differentials for savings. To evaluate their method, they use traffic data from Akamai to derive a distribution of client activity and cluster sizes and locations. They then use a simple model to map prices and cluster-traffic allocations to energy prices. They handle the issue of bandwidth costs (i.e. changing assignments of clients to clusters could increase bandwidth costs) by estimating the 95th percentile from the data and constrain their reassignments such that this 95th percentile is not increased for any location. To estimate network performance in their model, they use the geographic distance between client and server. They model the energy consumption of a cluster to be roughly proportional to its utilization. The authors assume the system is fully replicated and that the optimization for cost happens every hour. They then use a simple discrete time simulator that steps through the Akamai usage statistics. At each time step a routing module with a global view of the network allocates clusters, and from this they model each cluster’s energy consumption, and use observed hourly market prices to calculate expenditures.

The authors are able to show that existing systems can reduce energy cost by 2% without significant increase in bandwidth or reduction in client performance. They also find that savings rapidly increase with energy elasticity, which is the degree to which energy consumed by a cluster depends on the load placed on it. Lastly, they find that allowing increased distances between the client and server leads to increased savings.

Critique

This paper was somewhat thought-provoking and brought up some interesting issues. However, what I didn’t really enjoy about this paper is that their model is forced to make so many simplifying assumptions about so many things, such as network performance, cluster power usage, and bandwidth costs, to name a few, that I’m not sure how valid their actual findings turn out to be or how much credence to give them. Along this vein, it would be interesting to know to what extent their results depend on the traffic data they used. Another assumption I didn’t really like is that they assume complete replication in the system in question, so that any request can be routed anywhere. I personally can’t guess how true that is in practice but my inclination is that it is not necessarily true for all Internet-scale systems, though it may be true for many. Furthermore, while in theory the basic idea seems like it would work, the authors do hint at a few complicating factors (that are not necessarily purely computer science problems), and it’s hard to know what others might arise in actually trying to implement something like this. For instance, the authors assume that rerouting traffic in this way would not increase bandwidth prices, but that may not be true, and companies are averse to increasing their bandwidth costs. As another example, in reality, renegotiating existing contracts between companies and cluster operators work if the company rents space in a co-located and between companies and the utility companies to take into account this new method may be difficult. Given all the assumptions the authors are forced to make and other complicating details that they may or may not be aware of, it’s hard to judge whether or not something like this would be feasible in reality. And, if everybody did this, it does make you wonder how that would change things, since there is likely feedback between the number of requests routed to an area and the price of energy there. So in summary, although the authors didn’t really have a choice but to try to make good assumptions or estimates of the various unknowns in their model, I couldn’t help but be frustrated by this aspect of the paper. Regardless, I was still impressed by their model and simulation and found it interesting.

No comments:

Post a Comment