Sara's Networks Class Blog: Bahl

P. Bahl, R. Chandra, T. Moscibroda, R. Murty, M. Welsh, "White Space Networking with Wi-Fi like Connectivity", ACM SIGCOMM Conference, (August 2009).

One line summary: This paper describes a system for UHF white space wireless networking called WhiteFi that addresses the three main challenges unique to white space networking: spatial variation, temporal variation, and spectrum fragmentation in the available white space channels.

Summary

This paper presents a system for UHF white space wireless networking called WhiteFi. White spaces are the unused portions of the UHF spectrum that have recently been opened for use by unlicensed devices subject to the constraint that such devices not interfere with incumbent uses such as TVs broadcasts and wireless microphone transmissions. As a consequence of this, white space networks differ in three major ways from traditional wireless networks: spatial variation, spectrum fragmentation, and temporal variation. White space networking involves spatial variation in the availability of portions of the white space spectrum because the presence of incumbents varies over the wide area as well as on a finer scale. It involves spectrum fragmentation due to the presence of incumbents occupying certain UHF channels; fragmentation varies across area and this implies the need for using variable channel widths. Lastly, white space networking involves temporal variation largely because of the use of wireless microphones, which can be turned on and off at any time, and white space devices must switch channels when they detect a wireless microphone in that channel.

WhiteFi is supported on the KNOWS hardware platform, which consists of a PC, a scanner, and a UHF translator. Two key features of this platform are support for variable channel width use and a primary user signal detection algorithm called Signal Inspection before Fourier Transform (SIFT). Three key components of WhiteFi that build upon this are a novel spectrum assignment algorithm, SIFT for discovering white space wireless access points (APs), and a chirping protocol that permits indication of disconnection from a channel due to the appearance of an incumbent, without interfering with the incumbent. The spectrum assignment algorithm is adaptive and client-aware, picking a channel and a channel width that is free for all clients. It uses a spectrum map to indicate the presence of incumbents, an airtime utilization map to indicate the degree of utilization of each UHF channel, and control messages between the clients and AP containing these maps to share the necessary information. It uses this information along with channel probes to compute the multichannel airtime metric (MCham), which is roughly a measure of the aggregate bandwidth of given selection, and which it uses as the basic for channel selection. AP signals are detected by sampling bands of the UHF spectrum using SDR and performing an efficient time-domain analysis of the raw signal using SIFT. Sudden disconnections due to the appearance of an incumbent on a channel that an AP-client pair is using for communication is dealt with using a chirping protocol, which involves sending beacons about the white spaces now available over a backup channel.

In the evaluation of WhiteFi, several things are demonstrated. The first is that SIFT accurately detects packets over varying channel widths even with high signal attenuation, missing in the worst case at most 2%. The second is that the AP discovery algorithms are effective. The next is that WhiteFi correctly handles disconnections using its chirping protocol. The last is that WhiteFi’s channel selection algorithm adapts quickly and makes near-optimal selections to operate on the best available part of the spectrum. This last point is demonstrated via simulation.

Critique

This was one of my favorite papers that we’ve read so far (along with the classics). To me at least this seemed like an awesome feat of engineering and it’s obviously very useful technology so and exciting to think of it becoming available. I actually quite enjoyed reading it. Obviously there will be a lot of interesting work in the future that will build off of what they did here. I checked to confirm my suspicion that it won Best Paper Award at SIGCOMM this year so I like to think that somewhat legitimizes my raving about it.

S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, J. Padhye, P. Bahl, "Detailed Diagnosis in Enterprise Networks," ACM SIGCOMM Conference, (August 2009).

One line summary: This paper presents NetMedic, a diagnostic system for enterprise networks that uses network history to infer causes of faults without application specific knowledge by representing components as a directed graph and reasoning over this structure; this paper shows NetMedic is able to pinpoint fault causes with relative specificity compared to previous diagnostic systems.

Summary

This paper describes a diagnostic system for enterprise networks called NetMedic. NetMedic approaches the problem of diagnosing faults as one of inference. The goal of NetMedic is to build a system that can identify the likely causes of a fault with as much specificity as possible and to do so with minimal application specific knowledge. It does this by modeling the network as a dependency graph and then using history to detect abnormalities and likely causes. The nodes of this graph are network components such as application processes, machines, and configurations, and network paths. There is a directed edge from a node A to a node B if A impacts B, and the weight of this edge represents the magnitude of this impact. Each component has a state consisting of many variables. The abnormality of a component at a given time is determined and used to compute the edge weights. The authors describe various extensions to make this process more robust to large and diverse sets of variables. After these weights are obtained, causes are ranked such that more likely causes have lower ranks.

The authors implement NetMedic on the Windows platform, using Windows Performance Counter framework for their source of data. They claim to be developing a prototype for Linux as well. They evaluate NetMedic in comparison to a course diagnosis method loosely based on previous methods. They evaluate in a live environment and a controlled environment, but it is unclear how realistic even the live environment is, as they inject the faults they are trying to detect. In one of their evaluations, for 80% of the faults the median rank of the true cause is 1 in NetMedic, meaning NetMedic correctly identifies the culprit in these cases. They also demonstrate the benefit of their extensions by comparing them with a version of NetMedic that has application specific information hand coded in it. Their extensions perform well in this comparison. They also study how NetMedic does when diagnosing two simultaneous faults and study the impact of the length of history used.

Critique

This paper was interesting to read. After reading the first section or two the first question that comes to mind is how they manage without knowing application specific details, because it is pretty obvious that this method isn’t workable in the general case. They have a clever way of getting around this in the analysis phase but then they do admit that in the data collection phase of their experiments they do utilize application specific information about where configuration data is stored, though they claim to be working on a way to get around this. There is one part in the section on implementation where they talk about how they handle some counters differently from others because they represent cumulative values and it made me wonder how they determine which counters fall into this category. Does that not count as having application specific information? They talk about automatically detecting cumulative variables earlier in the paper when discussing their extensions, such as aggregate relationships across variables, but the example they give with the counter (number of exceptions a process has experienced) doesn’t seem to fall into the same category as those discussed in the extension section.

Since NetMedic would be used by network administrators and you can imagine that some things about a network stay the same all the time (such as what kind of applications are running) it might be interesting to see how NetMedic could be enhanced if administrators had the option of providing some application specific details, and if there are ways NetMedic could leverage this. I’m not really sure how or if that would work, but it is something to consider.

I wasn’t particularly impressed by their evaluation. It would be more compelling if they had more data from real-world situations instead of constructed situations with injected faults. It might have been informative too if they had shown some results measuring metrics other than the rank of the correct cause, although I can’t think of another metric off the top of my head. Also, they compared their system against another that was “loosely based” on systems such as Sherlock and Score, and they don’t really discuss that system much, so it seems a bit questionable as to whether this is a fair comparison. Lastly, their evaluation in which they show NetMedic identifies a virus scanning program or sync utility as abnormal doesn’t seem like something to brag about. I’m not sure I understand this section or why this is a good thing, since presumably virus scanning is an acceptable activity. In this section, they claim to be showing how NetMedic can help with naturally occurring faults, but I’m not sure they actually accomplish that. I could be just entirely misunderstanding this section.

It might potentially be interesting to explore using variables as nodes in the graph instead of just the components. This could probably make it much harder to scale though. It would also be interesting to see how NetMedic performs when a fault is due to a confluence of factors and not just one culprit alone.

Sara's Networks Class Blog

Sunday, October 18, 2009

White Space Networking with Wi-Fi like Connectivity

Wednesday, September 16, 2009

Detailed Diagnosis in Enterprise Networks

Blog Archive

About Me