Abstract
The Internet has witnessed explosive growth over the last few decades, steadily evolving into a worldwide communication medium capable of supporting myriads of applications. While several efforts have been undertaken to improve the reliability of best-effort Internet communication, their adoption has been virtually nonexistent
due to the lack of incentive for change and the presence of heterogeneous networks not controlled by a single entity. Moreover, the Internet structure is rapidly evolving into a flatter one composed of large organizations or clouds which hampers any efforts
of retrofitting the existing Internet.
In this dissertation, we study two of the most important components of the Internet infrastructure, namely Routing and Domain Name System (DNS). We aim to find predictability in Internet routing, specifically the existence of Internet routes to prefixes, collection of IP addresses. We hypothesize that the Internet under Border Gateway Protocol (BGP), the de-facto interdomain routing protocol, while seemingly unpredictable, has a structure whereby prefix similarity can be exploited to successfully predict availability of Internet routes and route failures. We build data mining based prediction models using real-world routing data and find that this is indeed the case and the future availability of a prefix can be predicted by observing it for a limited time period and using the learned models. We also formulate BGP molecules which are the set of Internet prefixes that have similar propensity to become unreachable
from portions of the Internet, i.e. to fail. We use these molecules in four failure prediction schemes, among which a hybrid scheme achieves 91% predictability of failures with 99.3% coverage of prefixes in the Internet.
We study how DNS as an Internet infrastructure has evolved by investigating cloud-based DNS, which is the result of moving DNS services to the cloud. We perform a case-study of a recently launched cloud-based DNS, namely Google external DNS. A novel technique for geolocating data centers of cloud providers is developed and used to show that a query to Google DNS may not be redirected to the geographically closest Google data center. We also study Akamai-hosted content retrieval through cloud-based DNS and find that the client perceives worse performance as compared to the use of local DNS to retrieve content. The reasons for this poor performance are investigated and we explore the design space of methods for cloud-based DNS systems to be used by clients retrieving content. Client-side, cloud-side, and hybrid approaches are presented and compared, with the goal of achieving the best client-perceived
performance. Our work yields valuable insight into Akamai’s DNS system, revealing previously unknown features.
Finally, we present our vision of the evolution of the current Internet to the future cloud-based Internet, while specifying the lightning or interaction among clouds. We posit that while the cloud offers several advantages for hosting services, blindly using the cloud for every service can cause poor performance. Instead, a carefully balanced approach can usher a smooth transition from current Internet systems to the cloud-based Internet of tomorrow.