Linktest

Linktest Tutorial

Linktest is an online validation test of the network characteristics in an Emulab experiment. It's a check that Emulab set up the network as you requested. If you care about the fidelity of your network, you should set linktest to run on all or some swapins, and you should read this entire tutorial.

Quick Start

Temporarily, while we are validating and tuning linktest, on every swapin or modify:
(1) We ignore users' requests to set the linktest level below 3; i.e., we always run linktest, at level 3 or greater.
(2) Although the activity log still reports all linktest results, including failures, we email failures only to testbed-ops, not the user.
(If you explictly invoke linktest from the "Run Linktest" page or on ops, you get whatever level you request.)

To run linktest, select a Linktest test level from the dropdown on the Begin an Experiment page.

Level 1 - Connectivity and Latency: For fastest response, select level 1. Through ping, this will check link-level connectivity and link latencies.
Level 2 - Plus Static Routing: In addition to the above, tests static routing (if applicable).
Level 3 - Plus Link Loss: Check link loss characteristics. Test levels are cumulative, so this option also includes connectivity, latency, and static routing (if applicable).
Level 4 - Plus Bandwidth: If bandwidth is important in your experiment, select level 4. Be warned that the bandwidth test takes up to 20 seconds per link. Also note that not all bandwidths can be accurately measured. If that happens, a warning will be placed in the log file for each link that is out of range for bandwidth testing.

If you select a test level other than zero, Linktest will run after the experiment completes its swapin. If a problem is found, testbed-ops is automatically notified and a message will appear in the activation log. You will also receive an email message to ensure that you are aware of the problem. Otherwise, no notification will appear. A failure in linktest will not cause the swapin to fail! If traffic shaping parameters are of critical importance to your experiments, make sure you take a closer look if linktest reports failures!

Limitations (Important, Please Read)

Not all bandwidths can be accurately measured, and linktest will skip links that it knows will give false results (e.g., slow or lossy links). Please check the output, and be sure to test those links yourself if your results depend on total accuracy.
As with any automated testing procedure, we have to balance the desire for accuracy with the possibility of false positives. To reduce the number of false positives, we allow for a small amount of fudge on any link. If your results are dependent on total accuracy, then you should test your links yourself!
When using "endnodeshaping" (so-called linkdelays), latency is less accurate because of the 1ms clock resolution at which the kernel runs. At worst, latency can be off by up to 3ms, say for a roundtrip ping packet.
Linktest can take a long time on large experiments. Even on very small experiments (5-10 nodes), doing the full bandwidth test can add 2-3 minutes. You should probably not do bandwidth tests at swapin on any experiment over 20 nodes unless you are prepared to wait a long time for the experiment to swap in. If you decide you have waited long enough, you can use the Stop Linktest menu option on the Show Experiment page. This will cancel linktest and allow the swapin to complete normally.

Understanding Linktest

Linktest is an end-to-end validation test for Emulab experiments. It verifies that experiment nodes are up, that they are reachable by static routes (when applicable), and that traffic shaping on delay nodes matches the experiment NS script.

Linktest works by reading a data file of all of the links and their attributes, and then invoking external measurement tools -- namely ping, Rude and Crude and iperf. Linktest compares the results against margins of error calculated in advance to identify major errors in configuration.

Linktest runs on each experiment node. The Linktest daemon waits for a custom event instructing it to begin testing. When it receives the event, it invokes the Linktest script to conduct the actual tests. The script invokes external processes to validate links and log any errors found. If a node detects an error, it writes an explanatory message to the experiment tbdata/linktest directory. Otherwise, no messages appear in the directory after Linktest completes its run.

Linktest uses test levels to select which tests to perform. Test levels are cumulative, so that selecting a higher test level ensures lower-level tests are also run. Test levels are ordered in length of time to complete, so that Level 4 - Bandwidth takes the most time and Level 1 - Connectivity and Latency takes the least.

Read more about each test level in the following sections:

Level 0 - Do not run Linktest

The default test level is Level 0 - Do not run Linktest. Use this level to leave Linktest turned off, performing no validation of experiment links after swapin.

Level 1 - Connectivity and Latency

Each Linktest node on a lan or direct link pings the node on the other side of the link. From the responses, the node detects whether the link is up and the latency of the link. Linktest compares the measured latency with the expected latency of the link, adjusting for known delay crossing the testbed backplane. If the measured latency is outside the 99% confidence interval for latencies at that setting, Linktest reports an error.

Note: In an effort not to get bogged down by reporting too many false positive errors, Linktest will look at the actual delta and report an error only if the measured latency is more then 0.5 millseconds below the desired value, or more than 3.5 millseconds above the desired value.

Level 2 - Static Routing

If the routing mode of the experiment is static, each Linktest node pings the remainder of nodes in the experiment. If any node cannot be reached, the Linktest node reports an error.

Level 3 - Loss

Each Linktest node on a lan or direct link with loss > 0 sends a burst of packets to the node on the other side of the link using Rude and Crude, a real-time packet emitter and collector. If the percentage of packets lost is outside the 99% confidence interval for normally-distributed loss at that setting, Linktest reports an error.

Level 4 - Bandwidth

Each Linktest node on a lan or direct link uses iperf to measure the bandwidth of the link, provided that the link is >= 1 Mbps and does not have a link loss value. If the measured bandwidth is outside the margin of error, Linktest reports an error. The Bandwidth test adds up to 20 seconds per distinct link in the experiment; 10 seconds in each direction. Linktest attempts to run tests in parallel whenever possible, but topologies such as a star will lead to longer runtimes since Linktest allows only one sender or receiver to run on a node at a time.

Note: In an effort not to get bogged down by reporting too many false positive errors, Linktest will look at the actual delta and report an error only if the measured bandwidth is more then 5% below the desired value, or more than 1% above the desired value.

Advanced Topics

To run Linktest after experiment swapin, you may use Emulab's Web Interface, or you may manually invoke the script run_linktest.pl on ops. You may also examine Linktest output in its log directory. Read about these options in the following sections:

Running Linktest from the Web Interface

If you go to the "Show Experiment" page for your experiment, you will see an option called "Run Linktest" in the auxiliary menu for the experiment. Clicking on that link will take you to the run linktest page, where you can select a level, and then start linktest running by clicking on the Start button. Once linktest starts running, you can stop it by clicking on the Stop button. Please be patient; linktest can take a long time to run. Eventually, you will be notified of its results in the window below the Start/Stop button. No email is sent.

Running Linktest on Ops

Use run_linktest.pl to run Linktest on ops. The option "-e" is mandatory for specifying the project and experiment id. Running with "-q" will run all tests except the bandwidth test. Running without "-q" runs all tests. Running with "-o" allows you to specify an output directory for log messages. Invoke run_linktest.pl with no options for a help message. Example:

run_linktest.pl -e utahstud/simple -q -o /tmp/linktest.log

You may also specify the test level using the "-l" option. Example:

run_linktest.pl -e utahstud/simple -l 1

After Linktest completes, run_linktest.pl prints out errors found during the run, if any. For scripting, run_linktest.pl returns 0 if no errors were found, or !0 if at least one error was found. No email is sent.

Linktest Log Directory

Linktest logs the results and any error reports in the tbdata/linktest directory for each experiment. By default this directory's path is /proj/<myproj>/exp/<myexpt>/tbdata/linktest/ .