Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Wiki / Kb145

Kb145

Emulab FAQ: Testbed Operations: What do I do with nodes in "hwdown"?

Emulab FAQ: Testbed Operations: What do I do with nodes in "hwdown"?

It can happen that the nodes in question were in reloading so long that something bad happened (i.e., the disk image server died).

So you can free all the nodes from the experiment:

	nfree emulab-ops reloading

which has the effect of placing them into reloadpending and then back into reloading to try again. Along the way it ensures that the necessary server daemons are really running. So now 13 of the nodes have reloaded and are free again.

You might also check the nodes in hwdown to see if they can be freed. To have any chance of successfully being reloaded the node must:

  • ping
  • be accessible via ssh

So what I usually do is slogin as root from the boss node (which is "trusted" by all the nodes). If that works and you are in FreeBSD (do "uname") then you can probably nfree the node and it will reload/free just fine.

If the ping fails, you will have to hook up a VGA and/or serial line to see what is happening.

If a node appears to be coming up in Linux when it is in the reloading experiment, it may have a bad PXE boot floppy, try cloning a floppy from another machine and try again.

If a node is in reloading and running FreeBSD, check /tmp/frisbee.out to see what it says. If it shows a bunch of dots, it is probably in the process of reloading (do "ps" to see). If it says something else, there might be a hardware problem.