Showing posts with label Wetherall. Show all posts
Showing posts with label Wetherall. Show all posts
Friday, October 30, 2009
Active Network Vision and Reality: Lessons from a Capsule-Based System
D. Wetherall, "Active Network Vision and Reality: Lessons from a Capsule-Based System," 17th Symposium on Operating Systems Principles," (December 1999).
One line summary: This paper examines some of the points of debates concerning active networks by describing the experience of designing and implementing an active network toolkit called ANTS; it explores some of the benefits and potential uses of ANTS and also mentions the practical difficulties ANTS raises concerning performance and security.
Summary
This paper studies active networks by designing and implementing the ANTS active network toolkit. The paper first defines active networks as a new approach to network architecture in which customized code for applications is executed inside the network as opposed to at the end-hosts. They are interesting because of their potential to be used for creating new Internet services, but controversial due to performance and security concerns. In this paper the authors contrast their experience with ANTS with the original vision behind active networks specifically as it pertains to the following three areas: (1) the capsule model of programmability, (2) accessibility of that model, and (3) the applications that can be constructed with that model. The components of ANTS include capsules, which users send along the network like packets, but which contain code that is executed at intermediate nodes along the path, which are programmable routers called active nodes. Capsules implement a custom forwarding routine and so direct themselves and subsequent packets through the network using the routine they implement. The reference implementation of ANTS was written in Java. Any user can develop an application with ANTS, which provides an API for querying node state, routing via shortest paths, and placing data in a temporary soft-store. Once the code is written it is signed by a trusted authority and put in a directory service and registered with a local active node for distribution.
The paper then considers the three areas pertaining to the vision behind active networks in turn. (1) With respect to capsules, the authors argue that it is feasible to carry capsules by reference and load them on demand and that their intrinsic processing overhead is low, but that since node capabilities are heterogeneous, it is important that not all nodes be slowed by capsule processing, resulting in only a partially active network. However, forwarding performance by capsules is very bad. (2) With respect to the question of who can use active networks to introduce new code into the network, security is obviously a major concern. The authors demonstrate that they were able to isolate code and state in ANTS in a way that is similar or better to static protocols used today, but that they handle the problem of global resource allocation using a certification mechanism, which has several drawbacks. (3) Lastly, in terms of services that ANTS can be used to introduce, such services tend to be ones that are typically difficult to deploy, such as multicast, anycast, and explicit congestion notification. ANTS can most compellingly be used for rapid deployment and experimentation, but the authors have yet to find a “killer app” that necessitates the use of ANTS. In order to use ANTS, a service must work under the following constraints: it must be able to be expressed using a restricted API, and it must be compact, fast, and incrementally deployable. The authors offer other observations given their experiences, touching on several points that they note all protocols must deal with in some way. There observations are that ANTS can help to enable systematic changes to protocols in the network, that it handles heterogeneous node capabilities in a clean way, that contrary to common concerns, ANTS does not generally violate the end-to-end argument, and that changes to network services must be localized in order to be easily deployable.
Critique
One thing I liked about ANTS is that it is one solution that at least talks about and attempts to address this issue of changing current Internet protocols and introducing new ones. It is sad how much of networking research never actually gets put into use just because of the difficulty of making changes or additions to the existing architecture. However, while it’s a nice idea, in reality the performance and security issues are too big to imagine ANTS ever being used in reality. One thing that annoyed me about this paper is how several times the author tried to write off the poor performance of ANTS by blaming it on Java, but at the same time credited many of the security benefits to the use of Java. It kind of seemed like the author was trying to have it both ways by saying some performance problems could be overcome by not using Java, but failing to mention that not using Java or something like it may also introduce greater security problems. It was for the best that the paper was written from the perspective of trying to take a closer look at some of the criticisms that people have raised concerning the original vision behind active networks, rather than arguing that ANTS should actually be used in the real world, since the performance and potentially the security problems are just too bad to suggest otherwise. I approve of the author framing the motivation and contribution in the way he did.
Thursday, September 3, 2009
Understanding BGP Misconfiguration
R. Mahajan, D. Wetherall, T. Anderson, "Understanding BGP Misconfiguration," ACM SIGCOMM Conference, (August 2002).
One line summary: In this paper the authors identify and analyze BGP misconfiguration errors, measure their frequency, classify them into a number of types, examine their causes, and suggest a number of mechanisms for reducing these misconfigurations.
Summary
This paper provides a quantitative and systematic study of BGP misconfiguration. They classify misconfigurations into two main types: origin misconfiguration and export misconfiguration. Origin misconfiguration occurs when an AS inadvertently advertises an IP prefix and it becomes globally visible. Export configuration occurs when an AS fails to filter a route that should have been filtered, thereby violating policies of one or more of the ASs in the AS path. They identify a number of negative effects of such misconfigurations, including increased routing load, connectivity disruption, and policy violation. In order to measure and analyze misconfigurations, the authors collected data from 23 peers in 19 ASs over a period of three weeks. They examine new routes and assume that those that don’t last for very long are likely due to misconfiguration and failures, and so select these to investigate. As part of their investigation they used an email survey of the operators of the ASs involved, as well as a connectivity verifier to determine the extent of disruptions. They note that their method underestimates the number and effect of misconfigurations for various reasons.
The authors first describe their results for origin misconfiguration analysis. They classify the new routes that are potential results of origin misconfiguration into three categories, self-deaggregation of prefixes, announcement of a new route with an origin related to the origin of the old route via their AS paths, and announcement of a new route with a foreign origin (unrelated to that of the old route). They observe that the number of incidents from each of these three categories is roughly the same, with self-deaggregation being slightly higher. They note, however, that the success rates for identifying these different types of origin misconfigurations are different for each, as some incidents that were classified as origin misconfigurations were actually the result of failures. Some interesting conclusions they draw from their analysis are that at least 72% of new routes seen by a router in a day are the result of misconfiguration, that 13% of incidents cause connectivity disruptions, mainly caused by new routes of foreign origin, that compared to failures connectivity disruptions due to misconfigurations play a small role, and that 80% of misconfigurations are corrected within an hour, often less if the misconfiguration disrupts connectivity. The authors next examine export misconfigurations. They note that such misconfigurations do not tend to cause connectivity problems directly, and that most incidents involved providers rather than peers. The authors also examine the effect of misconfigurations on routing load, and conclude that in the extreme case, load can spike to 60%.
In the paper, the authors identify and classify a number of causes of misconfiguration, which they classify into slips and mistakes. Slips and mistakes turn out to be roughly equally responsible for misconfiguration. Mistakes in origin misconfiguration that they identified include initialization bugs, reliance on upstream filtering, and use of old configurations. Slips in origin misconfiguration include accidents (such as typos) in specifying redistribution, attachment of the wrong community attribute to prefixes, hijacks, forgotten filters, incorrect summaries, unknown errors, and miscellaneous problems. They also identify three additional mistakes causing export misconfiguration, including prefix-based configuration, bad ACL or route map, and initialization bugs. Lastly, they identify a number of causes for short-lived new routes that are not misconfigurations, including failures, testing, migration, and load balancing.
Lastly, the authors suggest a number of ways to reduce misconfigurations. These include enacting improvement to the router CLIs, implementing transactional semantics for configuration changes, developing and supporting high-level configuration tools, developing configuration checkers, and building database consistency mechanisms. They also describe a protocol extension to BGP called S-BGP which would prevent about half of the misconfigurations they observed.
Critique
In general, I thought this was an entertaining read, especially as I could relate to how difficult router CLIs are to use and how easy it is to make mistakes in configuring BGP, having had to do this in a previous networks class. It is unfortunate that because misconfigurations are hard to identify, the authors’ methodology was necessarily limited. I’d be interested to see if other newer techniques have been or could be developed to do this and similar sorts of analysis. Due to the weaknesses in their methodology, I’m not sure how meaningful some of the figures and percentages they derive actually are, but they do still provide some interesting insights, especially if they are correct in arguing that their study provides a lower bound. That said, I still think their approach was clever, given the difficulties.
I particularly like their suggestions for reducing some of the causes of misconfigurations. User interface design improvements seemed to me to be the most obvious thing to do. In general, I wonder why many of their suggested solutions, which have been used in many other contexts and computer systems, have not been used in router configuration. Although the authors do briefly discuss some of the barriers to implementing such solutions, it still surprises me that harried system administrators haven’t risen up and demanded that at least some of the more obvious steps be taken sooner, but maybe the potential for missteps makes things more interesting for them, it’s hard to say. I think investigation of some of the improvements they suggest would be an interesting area for research, although they did seem to imply at one point that industry can be a barrier to the adoption of some of these improvements, which might be too frustrating to deal with.
Subscribe to:
Posts (Atom)