Behindthescenes

Emulab - Behind the Scenes

Introduction

This document is a collection of notes about what goes on behind the scenes. It is hoped that this document will be helpful for more advanced use of the Testbed.

A brief note on terminology: The NS input file describes a virtual topology. The physical topology is the hardware (switches, PCs, wires) that actually exist in reality. The physical topology is configured in such a way to emulate the virtual topology as transparently as possible.

Physical topology

The physical topology at the Utah Testbed is a large Cisco switch. A number of PCs are connected directly to this switch. Each PC has four connections to the switch. In addition, each PC has a single wire which connects to the outside world. This wire, the control link, is used to communicate with the PC without interfering with any experimental traffic or topology.

In addition to the PCs a number of shark shelves are connected to the switch. Each shark shelf is 8 sharks connected via a single 10 Mbps wire to a switch. The switch in turn is connected to the central Cisco switch. Each switch is also connected to the outside world for control traffic.

Links

The key to emulating virtual links is VLANs. Basically, VLANs are a collection of ports on the switch. Broadcast traffic is contained within the VLAN, and all other traffic is likewise only permitted to other ports in the VLAN. A basic link is emulated by creating a VLAN on the switch with two ports in it. For example, say node A port 1 and node B port 1 are connected in the virtual topology. These node/ports correspond to ports on the switch, say 1-1 and 1-5. To emulate this link a VLAN with ports 1-1 and 1-5 are created. Thus traffic from node A port 1 will travel to the switch and then to node B port 1. No traffic, even broadcast traffic, will go anywhere else.

This method works great for 100Mbps links with negligible latency (currently defined as less than 2ms) and loss (no greater than 40' of ethernet cable). However, often the virtual topology will contain a link with different characteristics. To emulate these links another PC is allocated to the experiment and placed in the middle of this link. This PC is running special software which can impose bandwidth limitations, latency, and packet loss on the link. As the PC has four connections it can interpose on two virtual links in this fashion.

Example: Say we have the example above except the link between node A port 1 and node B port 1 is defined to have 150ms latency. A PC, delay1, is allocated and the link is passed through. So we have, say, a link from node A port 1 to node delay1 port 1 and a link from node delay1 port 2 to node B port 1. The delay node, delay1, will then be configured to forward traffic between ports 1 and 2 imposing extra latency. This setup now requires two VLANs, one containing the switch ports for (node A port 1, node delay1 port 1), and another containing the switch ports for (node delay1 port 2, node B port 1). Two VLANs are required because we must force traffic coming from either node A or B to pass through the delay node so it is properly shaped; if all ports were in the same VLAN (and thus in the same broadcast domain), node A port 1 and node B port 1 would be able to talk directly to each other.

LANS

In the simplest case LANs are just like undelayed links except that the VLAN will have more than two ports in it. For example if node A port 1, node B port 1, and node C port 1 are all in a LAN and these node/ports correspond to port 1-1, 1-5, and 1-9 then a VLAN with ports 1-1, 1-5, and 1-9 is created.

Things get much more complicated when we add delay and links LANs together (such as LAN spread over a continent).

Let's start with a basic delayed LAN. This is a collection of nodes in the same broadcast domain with some link restrictions (bw, latency, loss); each node seeing the same link behavior. To emulate this situation a delay node is allocated for each node in the LAN. The delay nodes are all put in a LAN and the nodes connected to the delay nodes. A packet between any two nodes in the LAN will then pass through two delay nodes. Thus each delay node has half the latency and "half" the loss rate. The loss rate is not actually half. Rather it is such that the probability of losing a packet passing through both delay nodes is the specified loss rate for the LAN. Let L be the loss rate for the LAN. Then the loss rate for each delay nodes works out to be 1-sqrt(1-L).

The situation is further complicated when we link nodes into LANs. In this way it is possible to have nodes in a LAN with different delay characteristics. The model behind this case is that the link characteristics of a LAN come from the transmission medium (wires, etc.) and not from the switch. Thus when traffic travels across the LAN it hits the link characteristics of two hops (one to the switch and one from). So in the current case traffic from the node should hit the link to the LAN and then a single link in the LAN (switch to node). This is emulated by creating a delay node between the node and the LAN. Thus traffic between the linked in node and another node in the LAN will go through the link delay characteristics and "half" of the LAN characteristics. This is not always what is wanted so tweaking with the delay characteristics of the link may be necessary to get the right behavior. As a final note, it is possible to have an undelayed link into a delay LAN. In this case no extra delay is created so traffic between the node and the a node in the LAN will hit only a single delay which enforces "half" of the LAN behavior.

The final complication is LAN to LAN links. Imagine a corporate LAN that is at two sites with a long link across the continent. This can be emulated by generating two LANs and linking them together with a high-latency link. To emulate this a delay node is create with one port in the VLAN of the first lan and the other port in the VLAN of the second LAN. This delay node has the delay characteristics of the LAN to LAN link. This is consistent with the above model in that traffic between a node in the first LAN and a node in the second LAN will travel to the switch, through the link to the other switch, and then to the destination. Thus the total delay will be "half" of the first LAN, the link, and "half" of the second LAN. This is not always what is wanted. For example, say you wanted to emulate the same setup except that the link between the LAN was not at the switches but rather at some gateway. In this case traffic should see the full delay of both LANs rather than only "half". To account for this the link between lans should have half the latency of each LAN added to it and a loss rate of 1-(1-Aloss)*(1-Bloss)*(1-loss)/(sqrt(Aloss)*sqrt(Bloss)) where Aloss is the loss rate of the first LAN, Bloss the loss rate of the second LAN, and loss the loss rate of the lan to lan link.

Sharks

Sharks are greatly limited in their connectivity. They must be leaf nodes; and they must be part of an undelayed 10Mbps (switched) LAN with at most 8 sharks in it. Note that each node can use up to 10Mbps (combined experimental and control traffic), but it is not shared between all nodes. The uplinks are 100Mbps, so all 8 sharks can take up to 10Mbps to the Cisco switches witout saturating the uplinks.

Virtual to physical translation

The conversion between the physical and virtual topology is an NP-complete problem in the general case. A simulated annealing process is used to do the translation. This algorithm works quite well but it is nondeterministic. As a result between runs of the same NS file the actual physical machines and ports used may change. This is made as transparent as possible. However, it is impossible to hide the port changing. Be aware that the actual interfaces used on each node for a given link may change between runs.

IP address assignment

IP addresses will be automatically generated for all nodes that you do not explicitly set IP addresses.

In the common case the IP addresses on either side of a link must be in the same subnet. Likewise, all IP addresses on a LAN should be in the same subnet. Automatically generated IP addresses will conform to this requirement. If part of a link or lan is explicitly specified with the commands below then the remainder will be automatically generated under the same subnet, assuming a netmask of 255.255.255.0 (class C).

IP address assignment is deterministic and tries to fill lower IP's first, starting at 2. Except in the partial specification case (see above), all automatic IP addresses are in the network 10.1.*.*.

If you choose to specify addresses, and don't conform to the same subnet requirements, you will need to set up the proper routing tables on all of your nodes to make the traffic go to the right places. You will always need to set up routing between any nodes that are not directly connected, even for automatically generated IP addresses.