Thursday, November 12, 2009

DNS Performance and the Effectiveness of Caching


J. Jung, E. Sit, H. Balakrishnan, "DNS Performance and the Effectiveness of Caching," IEEE/ACM Transactions on Networking, V. 10, N. 5, (October 2002).


One line summary: This paper studies the Domain Name System with respect to client-perceived performance and the effectiveness of caching by analyzing several sets of real trace data including DNS and TCP traffic.

Summary

This paper examines DNS performance from the perspective of clients by analyzing DNS and related TCP traffic traces collected at MIT and the Korea Institute of Science and Technology (KAIST). It seeks to answer two questions: (1) what the performance perceived by clients in terms of latency and failures is, and (2) how varying the TTL and cache sharing impacts the effectiveness of caching. The paper first provides an overview of DNS. It then explains the methodology. The study uses three sets of traces that include outgoing DNS queries and responses and outgoing TCP connections. It collects various statistics on these traces for analysis, including, for a given query, the number of referrals involved, the lookup latency, and whether the query was successful. The dominant type of query the authors observed was A, at approximately 60%, followed by PTR at approximately 30% and MX and ANY making up a roughly equal share of the remainder. Only 50% of the queries were associated with TCP connections and 20% of the TCP connections were not preceded by DNS queries.

The rest of the paper is organized with respect to the two main questions it attempts to answer. First, they examine client-perceived performance. They find median lookup latencies of 85 and 97 ms. 80% of lookups are resolved without any referrals, and the number of referrals increases the overall lookup latency. About 20-24% of lookups receive no response. Of the answered lookups, 90% involved no retransmissions. Queries that receive no referrals generate much more wide-area traffic than those that do. These last three facts suggest that many DNS name servers are too persistent in their retry strategies. They also find that most of the negative responses are due to typos and reverse lookups for addresses that don’t have a host name and that negative caching does not work as well as it should, probably because the distribution of names causing negative responses is heavy tailed. They find 15-18% of lookups contact root or gTLD servers and 15-27% of such lookups receive negative responses. Next they examine the effectiveness of caching. Because the distribution of domain name popularity is very long tailed, many names are accessed just once, so having a low TTL doesn’t hurt because caching doesn’t really help. However, not caching NS records would greatly increase loads to root and gTLD servers, so TTL values should not be set too low for these. Also due to the distribution of domain name popularity, sharing caches does does not increase the hit rate very much after a certain point, because while some names are very popular, the rest are likely to be of interest to just one host. They also found that increasing TTL value past a few minutes does not greatly increase the cache hit rate because most cache hits are produced by single clients looking up the same server multiple times in a row. This summarizes most of their main findings.

Critique

I always really like papers that examine systems like DNS in this way because I think it’s interesting that even though we built it, we don’t always understand and aren’t able to predict the kinds of observations that we later make in studies like this. As always, it would be interesting again to go back and do a similar study now that about a decade of time has passed and see what turns up. One thing in this paper that I am confused about is how the authors could have found that such a significant fraction of lookups never receive an answer. I don’t think they provide much in the way of an explanation for this, if I remember correctly. Are these unanswered lookups just due to dropped packets? Because that seems like a lot of dropped packets, and I guess I find that surprising, although I have no real basis for thinking so.

No comments:

Post a Comment