8.1. THE INTERNET 8-3
Modern life is difficult to imagine without the Internet. What started in the late 1960s as a simple network of a handful of computers has now grown into an immensely complex communication infrastructure with hundreds of millions of computers and which continues to grow. The Internet as a computer network is often taken to be the same as the World Wide Web (or just simply Web), yet they are fundamentally different. In this chapter we will start with first taking a look at computer networks, in particular the Internet. Second, we’ll dive a bit into what are known as overlay networks.
These networks are characterized by the fact that a (often very large) group of computers maintain their own communication network and as such form a special type of subnetwork using the Internet as their foundation. Thirdly, we’ll pay attention to the World Wide Web and explain where and how it differs from the Internet.
8.1 The Internet
The Internet as a communication network consists of a huge collection of computers connected to each other. The organization of the Internet essen-tially follows a hierarchical structure consisting of home networks, com-puter networks in organizations, networks that are owned by Internet Ser-vice Providers, and backbone networks, among other types of computer net-works. They are all connected together, often using the same infrastructure as used for telephony. Connections may occur through guided media (i.e., wires), but we are increasingly seeing wireless connections for communica-tion as well. In addicommunica-tion, the communicacommunica-tion devices vary tremendously:
ultra-small networked sensors, smartphones, laptop computers and work-stations, servers, routers, and supercomputers. One may wonder how it is even possible to say anything sensible about the structure of the Internet? To answer this question, let’s first consider some of the basics and then move onto the phenomenon of interconnected networks.
8.1.1 Computer networks Small-area networks
There are different ways of characterizing networks, but one that is conve-nient for our discussion here is simply looking at the physical diameter of a computer network. Typically, networks that span areas up to at most, say, a few hundred meters are characterized by a relatively high density of net-worked computers, also referred to as hosts. Hosts send packets to each other through the network that connects them. These networks differ from ones that span large areas, in the sense that routing plays a less prominent
role. Routing a packet from a source host A to its destination host B means that the packet is required to follow a communication path from A to B.
Typically, such paths are set up using one of the shortest path algorithms we discussed in Chapter 5. Without going into further details, setting up or finding a route in a small-area network is relatively easy. Moreover, these small-area networks are generally owned and managed by a single admin-istrative organization.
To get an impression of what we’re dealing with, Figure 8.1 shows the typical organization of a small-area network. Such a network consists of several local-area networks, or LANs, each typically being a collection of 10-100 computers connected by means of what is known as a switch. The switch ensures that a packet addressed to one of its connected computers is forwarded to that computer.
Switch
Internet Switch
Router
Router
Router
Router Security
gateway Firewall
LAN 3
Server group LAN 1
LAN 2
R1
Figure 8.1: A typical example of a small-area network, consisting of a collection of connected local-area networks.
Addresses
LANs can be connected to each other by directly connecting their respective switches, effectively leading to a larger LAN. In addition, it is common prac-tice to use connect LANs through internal routers, which we will explain shortly. What is important for our discussion is that each networked host has an address. Having an address allows us to send data packets from one
8.1. THE INTERNET 8-5
host to another. If we concentrate on the most common case for modern net-works, there are two types of addresses we need to distinguish. First, each host has a world-wide unique identifier in the form of a 48-bit number. This so-called MAC address comes with the host when it is manufactured (or, more precisely, is associated to a host’s network hardware). When a host is connected to a port of a switch (see Figure 8.2), the switch can automatically discover the host’s MAC address to subsequently uniquely associate the specific port with that address. As a consequence, when a host with MAC address MA1 (connected to port P1) requests a packet to be forwarded to host MA2 (connected to port P2), the switch uses the port identifiers to for-ward the packet from port P1 to P2, and thus implicitly from address MA1 to address MA2.
Port
to/from host
Figure 8.2: A 16-port switch as used in local-area networks.
More important, however, is the fact that a host can be assigned an IP ad-dress, where IP stands for Internet Protocol. Unlike a MAC address which is persistent, meaning that it cannot be changed, an IP address needs to be explicitly assigned when a host is connected to a network. Address assign-ment can be done manually or automatically, and can be done statically or dynamically. For example, in some cases a separate address assignment ser-vice is used to hand out IP addresses with an associated lease time. When a lease expires, the host will need to get a new IP address1.
A host with IP address I A1 normally uses that address to send a packet to a destination, say a host with IP address I A2. In contrast to MAC ad-dresses, an IP address can be used to truly route packets through a commu-nication network. In this case, routers are represented as the nodes of such a network, and physical links between routers as its edges. In essence, when-ever a host wants to send a packet, it needs to make sure that the packet gets to a router, who will then take care of the rest. To this end, it simply sends the packet using the MAC address of a locally accessible router as its desti-nation. From there on, it’s the router’s job to forward the packet toward its destination.
1The mechanism just described is generally implemented by means of a so-called DHCP server, where DHCP stands for Dynamic Host Configuration Protocol.
network identifier host identifier 32 bits
Figure 8.3: The structure of an IP address, consisting of a network identifier and a host identifier.
To avoid that routers need to discover routes to every individual host, a simple aggregation takes place by splitting an IP address into two parts:
a network identifier and a host identifier as shown in Figure 8.3. In the following we will not distinguish among the different types of IP addresses and consider only the ones that are made up of 32-bit numbers. We assume that 16 bits have been reserved for the network identifier and 16 for the host identifier. This means that there can be at most 216 = 32, 768 different net-works, each having at most 216hosts. Whenever a company wants to create a network, it needs to be assigned one or several network identifiers. These identifiers are assigned by a global organization, and will therefore need to be requested. Stepping over many practical matters, in our example net-work from Figure 8.1, we would need at least three netnet-work identifiers: one for the server group, one for LAN #1, and one for the connected LANs #2 and #3. When taking routing decisions, a router considers only the network address and completely ignores the host identifier. So, for example, when router R1 from Figure 8.1 receives a packet addressed to a host on LAN #2, it only takes a look at the network identifier in that address and subsequently forwards the packet to the switch of LAN #3, who will then take over the responsibility of getting that packet to its destination.
It turned out that the total number of available network identifiers in the Internet was not enough to support its growth. Therefore, alternative schemes and technical solutions are being used to ensure that each host can be assigned an IP address. Nevertheless, the basic approach just described, namely that each host is addressed by means of a pair of<network,host>
identifiers has been left unaltered. This observation is important as routers take decisions on where to forward packets to using only network identi-fiers.
Other small-area networks
Besides these small-area networks, there are two other types of networks worth mentioning. The first one is formed by home networks, which typ-ically consist of one to several end-user computers, along with networked devices such as set-top boxes for digital TV, Internet-enabled telephones,
8.1. THE INTERNET 8-7
and multimedia centers. These type of networks are growing fast in terms of what they offer to end users. Typically, we are seeing that many domestic appliances are becoming network aware, if alone to smoothly regulate en-ergy consumption. In addition, many home networks facilitate installation of sensors for monitoring purposes (think of burglar systems, networked smoke and fire detectors, surveillance cameras, and so forth). A home net-work generally has only a single IP address associated with it, which is sub-sequently shared between all the devices. It is beyond the scope of this text to explain how this sharing is realized. What is important is that a home net-work from the outside is often indistinguishable from a single netnet-worked computer: both have a globally unique IP address.
Secondly, there are also (wireless) access networks, whose sole purpose is to allow devices to connect to the Internet. Typically, access networks support wireless connection setups to mobile devices. When making use of such a network, a device is usually provided with a dynamically assigned IP address whose network identifier is inherited from the access network.
By keeping track of which device was assigned which IP address, packets are routed to the access network from where a router or switch can forward the packet to its destination.
Large-area networks
Small-area networks form what is known as the edge of the Internet: net-works beyond which packets are no longer forwarded. In practice, we see these small-area networks be connected to larger networks owned by orga-nizations who make it their business to provide many end users and organi-zations access to the Internet, or which offer the services to transmit packets across the Internet. These Internet Service Providers, or simply ISPs, gen-erally span much larger geographical areas than small-area networks. In contrast to the small-area networks discussed previously, routing plays an important role.
The smallest large-area networks consist of the access networks we just discussed (and in this sense, there is usually not a clear-cut distinction be-tween small and large-area networks). Examples include modern wireless access networks that span a whole neighborhood or even a city. In addition, there are many local ISPs that not only provide Internet access, but also basic services such as e-mail.
These so-called tier-3 networks have what is known as a peering rela-tionshipwith tier-2 networks. A peering relationship between networks N1 and N2 may occur when N1 has a router that is connected through a direct link with a router of N2. Such routers are also known as border gateways, as they allow for traffic to flow into and from the network, that
is, they operate at the border of a network. Tier-2 networks are often con-nected to other Tier-2 networks, allowing packets to cross larger areas. As said, routing plays a prominent role in these cases. Regional ISPs, such as those covering a (small) country are typical examples of tier-2 networks.
Finally, we distinguish tier-1 networks, which provide the backbone of the Internet. End users usually never connect directly to tier-1 networks.
Instead, these backbones provide services and routing capabilities only to tier-2 networks. Note that there may be several tier-1 networks operating in the same area. This allows regional ISPs to choose from which network they will make use. In fact, ISPs may change their peering relationships without end users even noticing.
8.1.2 Measuring the topology of the Internet
All of the networks we discussed so far are usually each managed by a sep-arate administrative unit. This is certainly the case for large-area networks.
For small-area networks, we often see that the networks are still managed separately (as is typically the case for corporate local-area networks), or management is partly delegated to end users (as with home networks).
Roughly speaking, a collection of networks that fall under the regime of the same administration and that follow the same policy regarding how to route packets, is known as an autonomous system or simply AS. By connecting autonomous systems, we essentially obtain the structure of the Internet. In other words, the Internet can be represented as a graph where a vertex rep-resents an autonomous system, and an edge the fact that two autonomous systems have a peering relationship. As of this writing, there are more than 25,000 autonomous systems.
The AS topology
Discovering what is known as the AS topology of the Internet is on the sur-face relatively easy provided certain details are not taken into account (and which we will indeed skip for now). Each autonomous system is assigned a unique number called its AS number. Note that this assignment is done through a central authority, as is the case for assigning network addresses.
Each AS announces which networks fall under its regime by essentially advertisinghAS number, network identifieripairs. Such announcements are made by the AS’s border gateways discussed previously, and are picked up by the respective neighboring border gateway of an adjacent AS. As an ex-ample, assume that AS 1 manages a network with identifier nid. A border gateway connecting AS 1 to AS 2 may send the pairhAS1, nidito AS 2. At that point, AS 2 will have discovered a route to network nid. AS 2, in turn,