Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Wiki / Hardwaremail

Hardwaremail

<pre class="wiki">This file is just an unedited transcript of mail between
Utah and ISI. It includes detailed information about
various hardware features and models, mostly Cisco switches.

It won't be that easy to follow since it is missing some
context, but it contains a lot of valuable information.

Be sure also to see the file www/doc/hw-recommend.html
for more basic information, especially the "Switch" section.

----------------------------------------------------------------------
From ricci@cs.utah.edu Mon Oct 27 17:25:49 2003
Date: Mon, 27 Oct 2003 17:25:49 -0700
From: Robert P Ricci &lt;ricci@cs.utah.edu&gt;
To: Bob Braden &lt;braden@ISI.EDU&gt;
Cc: testbed-ops@emulab.net, deter-isi@ISI.EDU, lepreau@cs.utah.edu
Subject: Re: Hardware configuration for Emulab clone
In-Reply-To: &lt;200310272219.OAA28834@gra.isi.edu&gt;; from braden@ISI.EDU on Mon, Oct 27, 2003 at 02:19:38PM -0800
Lines: 113

You may already know many of these things.

Thus spake Bob Braden on Mon, Oct 27, 2003 at 02:19:38PM -0800:
&gt; 2) DETER will purchase 4 additional 1000bT Ethernet interfaces for each
&gt; node. Ideally, the 2 64bit/33MHz PCI slots of the hosts will be
&gt; populated with dual 1000bT interface cards.

Unless these two PCI slots are on independent busses, you can probably
expect to drive no more than two gigabit interfaces full speed, and only
at half-duplex. The theoretical PCI bandwidth on 64/33 PCI is, I
believe, not much more than 2Gbps. But, of course, you'll probably have
trouble generating more than 2Gbps of traffic on a PC anyway.

&gt; 3) DETER will purchase a Cisco 6513 switch with a supervisor blade and
&gt; 6 blades of 48 1000bT ports. That will support 4 x 64 Ethernet ports
&gt; to the nodes.

This is a very complicated issue. My views on this come from our
perspective on Emulab, in which we want to guarantee (or at least be
pretty darn sure) that there won't be any artifacts due to switch
limitations. I don't know if your goals are as stringent, and of course
there's always budget limitations... (Reading Steve Schwab's comments
farther down, it looks like you guys also want guaranteed bandwidth.)

Are the 48-port gigabit modules you're looking at WS-X6548-GE-TX? This is
the only 48-port GigE module I'm aware of from Cisco. If it's something
else, from the WS-X7 series, for example, a whole different set of
issues apply.

From my understand of things (which came from reading some Cisco
documents, and from talking to an ex-Cisco engineer), this module is
very oversubscribed. It has a single 8Gbps (full-duplex) connection to
the switching fabric. I was told by the Cisco engineer that these
modules are 8x oversubscribed, though the math doesn't quite add up on
that (48 ports into an 8Gbps line would seem to be 6x oversubscribed.)
So, there may be some other bottleneck in it.

In the documentation I have about the architecture of the 65xx series,
it claims that 'SFM Single-Attached Fabric-Enabled Card's (which I think
all WS-X65 modules are), have a 16Gbps bus internally. Meaning that
you're not going to get more than 8 full-duplex, full-speed gigabit
flows out of them. If you were told that they have 80Gbps backplanes, I
can't say for sure that's wrong, but I would certainly double-check that
number. The white paper I'm referring to is online at:
http://www.cisco.com/en/US/customer/products/hw/switches/ps708/products_white_paper09186a0080092389.shtml
... in particular, I believe Figure 6 and the text below it are relevant
to the 48-port GigE modules.

So, it seems that you're not going to be able to use all this equipment
at full speed. If you want to save some cash on bandwidth you won't be
able to use, you might consider switching some of your GigE equipment to
100Mbps Ethernet.

Our conclusion was that the WS-X6516-GE-TX modules were the most
economical choice to get close to guaranteed bandwidth, though not _too_
close - they have 16 GigE ports, so they're 2x oversubscribed.

Another possibility would be to build in some links that don't go
through a switch at all - just connect up some of the nodes directly.
There's an obvious loss of flexibility, though it's clearly more
economical. Our software theoretically supports this, though we haven't
tried anything like it recently.

&gt; 4) The control plane on the Emulab cluster will be offloaded to cheaper
&gt; unmanaged switch ports. The PXE boot-capable 10/100 interface of each
&gt; node will be connected with the boot server machine using multiple 48
&gt; port 1U switches on a separate LAN . Examples of the switch would be a
&gt; 3Com 2800 series unmanaged switch. For the first 64 machines of the
&gt; cluster, DETER would purchase 2 such switches.

It's pretty important that this set of switch support multicast. Many
unmanaged switches simply treat multicast like broadcast. This could be
pretty disastrous when loading disk images, which consumes a whole lot
of bandwidth. Check to see if these switches support IGMP snooping to
create multicast groups.

&gt; 5) DETER will purchase remote power strips and console terminal muxes.
&gt; The DETER project would appreciate suggestions from the ISD staff for
&gt; which equipment models to buy.

We use Cyclades serial expander boxes in one of our servers - by putting
them in a PC, we get very good control over who is allowed to access
which ones, when. We use Cyclom Ze boxes:
http://www.cyclades.com/products/8/z_series
... which let you get 128 serial lines into one PC.

We use two types of power controllers - 8-port APC Ethernet-connected
controllers, and 20-port serial controllers from BayTech. Since you'll
have serial lines, we recommend the BayTechs, because they are cheaper
per-port. The ones we have are RPC-27s:
http://www.baytech.net/cgi-private/prodlist?show=RPC27

&gt; Our first idea was to use the same 6513 chassis and add 5
&gt; blades of 48 1000bT port to it. This would provide complete
&gt; symmetry among all the 128 niodes. However, there is some
&gt; doubt about the difficulty of wiring 4 x 128 ports to one
&gt; 6513. It may therefore be better to purchase a second 6513
&gt; chassis for Phase 1b.

We've managed to fill up a couple 6509s with 48-port modules. Not easy,
but we managed it.

&gt; ??Is there any limitation on Emulab support of the planned 6513 switch
&gt; configuration??

Nope, our software should support it just fine.

--
/-----------------------------------------------------------
| Robert P Ricci &lt;ricci@cs.utah.edu&gt; | &lt;ricci@flux.utah.edu&gt;
| Research Associate, University of Utah Flux Group
| www.flux.utah.edu | www.emulab.net
\-----------------------------------------------------------

From ricci@cs.utah.edu Tue Oct 28 10:46:59 2003
Date: Tue, 28 Oct 2003 10:46:59 -0700
From: Robert P Ricci &lt;ricci@cs.utah.edu&gt;
To: Bob Lindell &lt;bob@jensar.us&gt;
Cc: Bob Braden &lt;braden@ISI.EDU&gt;, testbed-ops@emulab.net, deter-isi@ISI.EDU,
lepreau@cs.utah.edu
Subject: Re: [Deter-isi] Re: Hardware configuration for Emulab clone
In-Reply-To: &lt;A29CA1D8-0910-11D8-BB39-000393DC7572@jensar.us&gt;; from bob@jensar.us on Mon, Oct 27, 2003 at 10:33:25PM -0800
Lines: 37

Thus spake Bob Lindell on Mon, Oct 27, 2003 at 10:33:25PM -0800:
&gt; WS-X6748-GE-TX Cat6500 48-port 10/100/1000 GE Mod: fabric enabled, RJ-45

Hmm, it looks to me like this module was not available at the time I was
investigating GigE on Ciscos. It's got a different architecture than the
one I was assuming you were talking about, so many of the things I said
yesterday don't apply. I'm a bit confused though, because the data
sheets do list it as having two 20Gbps connections to the switch fabric.
But, the architecture white papers about the 6500 series clearly label
the switch fabric connectors as being 8Gbps each (with some slots having
dual connectors.) So, hopefully this means that they are able to drive
those busses at a higher rate than originally spec'ed, and that the
whitepaper is just out of date. But, it could also mean that the 20Gbps
numbers are just marketing - it could mean, for example, that the
internal buses have 40Gbps of total bandwidth, but that the module only
gets 15Gbps (full duplex) to the fabric module. If you can get your
salesperson to put you in touch with an engineer, that would probably be
the best way to find out what the truth about this matter is. If you
find anything out, we'd definitely be interested to hear it, because me
might consider these newer modules for our own gigabit expansion.

From the fact that your specs now list a 6509 instead of a 6513, I'm
guessing you already know this, but the 6513 can only handle 5 modules
with dual switch fabric interfaces. Essentially, the maximum number of
fabric connections is 18 - so the 6509s have two to every slot, while the
6513s have 5 slots with dual connectors, and 8 with a single connector.
So, if you plan to fill a switch with these dual-ported modules, you can
get better density in a 6509. If you were going to put in some 10/100
modules with a single fabric connection, you could still do this in a
6513.

--
/-----------------------------------------------------------
| Robert P Ricci &lt;ricci@cs.utah.edu&gt; | &lt;ricci@flux.utah.edu&gt;
| Research Associate, University of Utah Flux Group
| www.flux.utah.edu | www.emulab.net
\-----------------------------------------------------------

From ricci@cs.utah.edu Wed Oct 29 13:31:57 2003
Date: Wed, 29 Oct 2003 13:31:57 -0700
From: Robert P Ricci &lt;ricci@cs.utah.edu&gt;
To: Stephen_Schwab@NAI.com, John Mehringer &lt;mehringe@isi.edu&gt;
Cc: braden@ISI.EDU, testbed-ops@emulab.net, deter-isi@ISI.EDU,
lepreau@cs.utah.edu
Subject: Re: [Deter-isi] Re: Hardware configuration for Emulab clone
In-Reply-To: &lt;613FA566484CA74288931B35D971C77E13429A@losexmb1.corp.nai.org&gt;; from Stephen_Schwab@NAI.com on Wed, Oct 29, 2003 at 11:27:45AM -0800
Lines: 30

Thus spake Stephen_Schwab@NAI.com on Wed, Oct 29, 2003 at 11:27:45AM -0800:
&gt; Could we just add two blades (possible cheap ones) to our 6509 and use
&gt; those, with VLAN support, for our split-out control nets. That way we
&gt; would also get the multicast support we need at boot time?

I think this is probably not a good idea. Routing needs to be done
between these segments. So, this would mean enabling some layer 3 and
above features on the experimental network switches. This has the
potential to interfere with experimental net traffic in unexpected ways
- as an example, we found out that our switches were checking TCP
checksums and discarding packets with bad ones. This, despite the fact
that we had no layer 4 services enabled at all on the switch. Turning on
layer 3 services on the experimental net is probably just asking for
trouble. I would think that it's also a security risk - bugs in the IOS
that runs on the MSFC card when doing routing in a Cat6k are now exposed
to the experimental net, so it could be possible to exploit one and find
a way out.

As for the idea of buying multiple unmanaged switches, the problem with
unmanaged switches is that you're going to want to be able to cut off
access to the outside world for nodes on which you're going to be trying
out worms, etc. An unmanaged switch isn't going to give you the ability
to do this.

--
/-----------------------------------------------------------
| Robert P Ricci &lt;ricci@cs.utah.edu&gt; | &lt;ricci@flux.utah.edu&gt;
| Research Associate, University of Utah Flux Group
| www.flux.utah.edu | www.emulab.net
\-----------------------------------------------------------

From mailnull@bas.flux.utah.edu Wed Oct 29 13:50:27 2003
From: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
To: Stephen_Schwab@NAI.com, John Mehringer &lt;mehringe@isi.edu&gt;, braden@isi.edu,
testbed-ops@emulab.net, deter-isi@isi.edu
Subject: Re: [Deter-isi] Re: Hardware configuration for Emulab clone
In-Reply-To: &lt;20031029133157.R51103@cs.utah.edu&gt;; from Robert P Ricci on Wed, 29 Oct 2003 13:31:57 MST
Date: Wed, 29 Oct 2003 13:49:24 MST
Lines: 18

Here's another datapoint from Kentucky's experience:
After we and they struggled for days or weeks to use some cheap 29xx
(?) router for this purpose, always running into unexplained glitches,
we suggested they toss it and use a PC running FreeBSD as a router, at
least to get going. Worked great.

I don't think it's going to be fast or secure enough for you long
term, but it will get you off the ground. However, I would want to
hear Rob's comments. I'm sure there are small Cisco or other vendoes'
routers that would work... but which ones?

Aside: I noticed in your equip list you had "MSFC memory". Not sure
that is correct, as an MSFC is the daughter card that is required to
turn a 65xx switch into a router.

We keep MSFC's out of our switches, partly to save money, but partly
to make triple sure that some higher layer stuff doesn't get turned
on by accident. These Ciscos are complex.

From ricci@cs.utah.edu Wed Oct 29 14:14:21 2003
Date: Wed, 29 Oct 2003 14:14:21 -0700
From: Robert P Ricci &lt;ricci@cs.utah.edu&gt;
To: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
Cc: Stephen_Schwab@NAI.com, John Mehringer &lt;mehringe@isi.edu&gt;,
braden@isi.edu, testbed-ops@emulab.net, deter-isi@isi.edu
Subject: Re: [Deter-isi] Re: Hardware configuration for Emulab clone
In-Reply-To: &lt;200310292049.NAA05680@fast.cs.utah.edu&gt;; from lepreau@cs.utah.edu on Wed, Oct 29, 2003 at 01:49:24PM -0700
Lines: 31

Thus spake Jay Lepreau on Wed, Oct 29, 2003 at 01:49:24PM -0700:
&gt; I don't think it's going to be fast or secure enough for you long
&gt; term, but it will get you off the ground. However, I would want to
&gt; hear Rob's comments. I'm sure there are small Cisco or other vendoes'
&gt; routers that would work... but which ones?

I think you want a router with at least 4 ports - one to connect to the
outside world, one to connect to the private VLAN, one to connect to the
public VLAN, and one to the nodes' control network interfaces. You
_could_ combine the private and public VLANs, but, as outlined in the
document I sent, this makes boss (which needs to be fairly secure, since
it's the source of all configuration informations and commands) more
open to attack from ops, a machine on which we traditionally give all
users shells. Since you're building a security testbed, I would think
you'd want to keep the infrastructure as safe from attack as possible,
and not make this shortcut.

To actually get any security out of this arrangement, you'll need a
router that can do firewalling. I believe all Cisco IOS routers can do
this, but my experience with the router side of Cisco is very limited,
so you'd have to check with a sales rep about this.

Yeah, if you have to be budget-conscious, a PC could do this job. As you
suggest, I would only view this as a temporary thing, though.

--
/-----------------------------------------------------------
| Robert P Ricci &lt;ricci@cs.utah.edu&gt; | &lt;ricci@flux.utah.edu&gt;
| Research Associate, University of Utah Flux Group
| www.flux.utah.edu | www.emulab.net
\-----------------------------------------------------------

From lepreau@fast.cs.utah.edu Mon Oct 27 23:52:02 2003
From: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
To: Bob Braden &lt;braden@ISI.EDU&gt;, deter-isi@ISI.EDU
cc: ricci@flux.utah.edu, testbed-ops@emulab.net
Subject: Re: Hardware configuration for Emulab clone
In-Reply-To: &lt;20031027172549.X95279@cs.utah.edu&gt;; from Robert P Ricci on Mon, 27 Oct 2003 17:25:49 MST
Date: Mon, 27 Oct 2003 23:51:48 MST
Lines: 32

Bob:
&gt; 5) DETER will purchase remote power strips and console terminal muxes.
&gt; The DETER project would appreciate suggestions from the ISD staff for
&gt; which equipment models to buy.


We use Cyclades serial expander boxes in one of our servers - by putting
them in a PC, we get very good control over who is allowed to access
which ones, when. We use Cyclom Ze boxes:
http://www.cyclades.com/products/8/z_series
... which let you get 128 serial lines into one PC.

Our software supports multiple terminal servers. We actually run
serial lines in two servers now, since we have &gt;128 hosts.

We use two types of power controllers - 8-port APC Ethernet-connected
controllers, and 20-port serial controllers from BayTech. Since you'll
have serial lines, we recommend the BayTechs, because they are cheaper
per-port. The ones we have are RPC-27s:
http://www.baytech.net/cgi-private/prodlist?show=RPC27

I dis-recommend anything except the above two, although probably
others from the same vendors would be ok. That is because this type
of device can be idiosyncratic and cost you and us time. In
particular, the RPCs have little operating systems inside them with
idiosyncrasies and we had to evolve our software to cope.
Eg, we had to batch power requests because they have N second dead
times after processing a command. Don't want to go through the
same trial and error with another vendor/device.

All our hardware is listed on our site, with URLs to the vendor's pages.
http://www.emulab.net/docwrapper.php3?docname=hardware.html

From mailnull@bas.flux.utah.edu Tue Oct 28 00:26:14 2003
From: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
To: Bob Braden &lt;braden@ISI.EDU&gt;
Cc: testbed-ops@emulab.net, deter-isi@ISI.EDU
Subject: Re: Hardware configuration for Emulab clone
In-Reply-To: &lt;200310272219.OAA28834@gra.isi.edu&gt;; from Bob Braden on Mon, 27 Oct 2003 14:19:38 PST
Date: Tue, 28 Oct 2003 00:25:24 MST
Lines: 31


&gt; A candidate for the Boot Server/Data Logger Equipment would be:
&gt; ...

The boot server is our so-called "boss" (as in "master") server, and
you should make sure all the devices on it will work with FreeBSD. A
port of the Emulab servers to Linux could be done... but it won't be
by us. It would greatly complicate maintenance and upgrades and QA.

We also have a so-called "users" server (for user login accounts and
terminal service) and a logically separate fileserver (but that has
always been the same as the "users" machine, so there would probably
be small glitches in splitting that off). "users" is also a FreeBSD
machine; porting it to Linux would probably be much easier thatn boss,
and users would find it friendlier.

Arguments can be made both ways about the security of having
logins on a persistent server. But Emulab currently needs it,
including for a few non-login related things.

&gt; An understanding of the needed modifications to Emulab software will
&gt; become more evident as the project progresses. For example, it is very
&gt; plausible that Emulab will need to be modified to allow the ability to
&gt; mirror traffic from a given link(s) in the emulated topology to a given
&gt; piece of monitoring equipment that can perform protocol analysis or
&gt; data logging at link rate.

In fact, that's a good example. When people need that, we provide
it manually. Would be nice to provide more generally, but there
hasn't been sufficient demand. OTOH, what is easy to use oftern
determines what gets used.

From mailnull@bas.flux.utah.edu Tue Oct 28 00:45:41 2003
From: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
To: Bob Braden &lt;braden@ISI.EDU&gt;, bob@jensar.us, Stephen_Schwab@NAI.com
Cc: testbed-ops@emulab.net, deter-isi@ISI.EDU
Subject: Re: Hardware configuration for Emulab clone
In-Reply-To: &lt;200310272219.OAA28834@gra.isi.edu&gt;; from Bob Braden on Mon, 27 Oct 2003 14:19:38 PST
Date: Tue, 28 Oct 2003 00:44:51 MST
Lines: 46



&gt; [which blades to get]
&gt; ...
&gt; This would provide complete
&gt; symmetry among all the 128 niodes.
&gt; ...
&gt; We are generally trying to obtain as much homogeneity as
&gt; possible, but in the near term we won't need the maximum
&gt; capacity so we can compromise to save money.

As I said in our phone call, strong homogeneity of nodes wrt to their
links (link symmetry) is not generally needed, as Emulab abstracts over
that, and experimenters don't specify large completely uniform topologies.
They do care that nodes themselves (eg CPUs) be homogeneous.
The only downside of modest link asymmetry is that the mapper will
take a little longer to run, and it will be harder to "approximate
the mapping in your head," which is sometimes handy.

For Dummynet, you probably do want an even number of links of the same
speed on each node.

Steve S:
&gt; In any
&gt; event, any time our topology carries enough traffic to saturate
&gt; the VLANs on the switch, the illusion of multiple simulated
&gt; networks is going to break. Over-provisioning the switch is
&gt; one way to avoid having to worry about how this affects the
&gt; correctness of our experiments. But if we have to worry about
&gt; this, then so be it.]

We've talked about changing the switch model fed to our resource
mapper to be hierarchical, ie adding a "blade" with higher intra-blade
BW than inter-blade. I would think this wouldn't be hard, but I think
Rob said it could be. If that was done, then we could accurately
conservatively allocate resources.

However, Cisco BW probably depends on packet size.

Bob Lindell:
&gt; Either way, 48 GE ports is 48Gb/s FD. That will slightly over
&gt; subscribe the blade to backplane interface.


What blade to backplane BW have you been told?
How sure are you?

From mailnull@bas.flux.utah.edu Tue Oct 28 22:56:31 2003
From: Stephen_Schwab@NAI.com
Subject: shedding some light on the new Cisco 720Gb/s switch fabric
Date: Tue, 28 Oct 2003 21:55:37 -0800
To: &lt;deter-isi@isi.edu&gt;, &lt;testbed-ops@emulab.net&gt;
Lines: 27

Hi,

I think I see the confusion -- it appears that Cisco dropped a new
switch fabric into the 6500s by putting the switch fabric on the
supervisor module.

If you search through this web page:

http://www.cisco.com/en/US/products/hw/switches/ps708/products_data_sheet09186a00800ff916.html

you can find a reference buried where it describe the switch fabrics.

I can't quite see how they wire this beast -- perhaps they physically
re-cable the slot connectors from the internal 256 Gb/s switch fabric
to the 720 Gb/s switch fabric on the supervisor module.

There is a reference somewhere else to the auto-sensing/auto-switching
cababilities of the 720 Gb/s switch fabric -- so if you happen to plug
in older 16 or 8 Gb/s blades, the switch fabric will still talk to
them.

The WS-X6748-GE-TX blades are definitely designed to talk to the 720
Gb/s switch fabric. The way we will use them, it is unlikely that
more than 24 Gb/s will ever be sourced or sinked on a blade, so 40
Gb/s will be enough headroom.

But there is a gotcha: we didn't plan to order any WS-F6700-DFC3A
daughter cards!

The way Cisco gets the packet processing rate up is to decentralize
the forwarding onto daughter cards -- each of these dCEF (distributed
Cisco Express Forwarding) cards is co-located on a blade, and the
supervisor module downloads forwarding rules to all the dCEFs. It is
really unclear to me what happens if you try to forward all those
packets from 6 blades through a single supervisor module 720s CEF
engine (the MSFC3 PFC3A daughter card). The performance is listed as
400 Mpps with dCEF, but the table doesn't have any numbers for
centralized CEF.

However, I think we should just go ahead and get 6 of the
WS-X6748-GE-TX blades and try out CEF. That will give us 288
10/100/1000 ports, allowing us to support up to 72 machines. If we
find we are over-subscribing something, we can decide whether to
redistribute across more 6509s, or add the dCEF modules.

The alternative is to just use 100BaseT in the first 64 PCs, and plan
to upgrade to gigabit later, on the assumption that the price will
drop. If we did that, we could just buy the 256 Gb/s switch fabric
also -- in fact, we would just be buying the 6509 configurations that
Utah's emulab uses.

--Steve



From ricci@cs.utah.edu Wed Oct 29 10:59:55 2003
Date: Wed, 29 Oct 2003 10:59:55 -0700
From: Robert P Ricci &lt;ricci@cs.utah.edu&gt;
To: Stephen_Schwab@NAI.com
Cc: deter-isi@isi.edu, testbed-ops@emulab.net
Subject: Re: shedding some light on the new Cisco 720Gb/s switch fabric
In-Reply-To: &lt;613FA566484CA74288931B35D971C77E13428C@losexmb1.corp.nai.org&gt;; from Stephen_Schwab@NAI.com on Tue, Oct 28, 2003 at 09:55:37PM -0800
Lines: 71

Thus spake Stephen_Schwab@NAI.com on Tue, Oct 28, 2003 at 09:55:37PM -0800:
&gt; I think I see the confusion -- it appears that Cisco dropped a new
&gt; switch fabric into the 6500s by putting the switch fabric on the
&gt; supervisor module.

This is how the 'older' (CEF256) fabric modules work too - we have a
6513 with fabric-enabled cards, and it also has a fabric module. It
seems that one of the main things they've done with the Sup720 is put
the fabric module into the supervisor. This is very nice, since in our
6513, these are separate modules, taking up two slots.

I finally found that whitepaper I've been talking about - it's at:
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper09186a0080092389.shtml

If you look at Figure 3 (more readable in the PDF version linked from
the top of the page), it clearly labels the crossbar connectors as being
8Gbps. But, the more I'm seeing, the more I'm convinced that this
whitepaper is just out of date, and the new modules really can drive
those connectors at a higher rate.

&gt; There is a reference somewhere else to the auto-sensing/auto-switching
&gt; cababilities of the 720 Gb/s switch fabric -- so if you happen to plug
&gt; in older 16 or 8 Gb/s blades, the switch fabric will still talk to
&gt; them.

Good to hear it! The older fabric modules, we found out the hard way,
don't interoperate with other types of modules.

&gt; The WS-X6748-GE-TX blades are definitely designed to talk to the 720
&gt; Gb/s switch fabric. The way we will use them, it is unlikely that
&gt; more than 24 Gb/s will ever be sourced or sinked on a blade, so 40
&gt; Gb/s will be enough headroom.

Yeah, I agree, that sounds like plenty.

&gt; However, I think we should just go ahead and get 6 of the
&gt; WS-X6748-GE-TX blades and try out CEF. That will give us 288
&gt; 10/100/1000 ports, allowing us to support up to 72 machines. If we
&gt; find we are over-subscribing something, we can decide whether to
&gt; redistribute across more 6509s, or add the dCEF modules.

Adding the dCEF modules would probably be preferable - since
interconnecting the 6509s will cost quite a bit (presumably, you're
going to want to connect the switches with links at least an order of
magnitude faster that what's on the PCs). Might as well have fewer
switches, that you can fill up all the way.

&gt; The alternative is to just use 100BaseT in the first 64 PCs, and plan
&gt; to upgrade to gigabit later, on the assumption that the price will
&gt; drop. If we did that, we could just buy the 256 Gb/s switch fabric
&gt; also -- in fact, we would just be buying the 6509 configurations that
&gt; Utah's emulab uses.

Our experience so far has been that Cisco prices don't drop - looks to
us like their model is to just leave prices alone, and price new modules
when they come out. Which leads to some weird pricing anomalies - IIRC,
last time I looked, single-port 10Gbit modules were more expensive than
the newer 4-port 10Gbmit modules.

Also, for most of our switches, we actually don't use the fabric at all
- we're just using the 32Gbps backplane bus, which is fine for a switch
full of 10/100 ports. But obviously fabric is the way to go from her eon
out, with gigabit and such coming.


--
/-----------------------------------------------------------
| Robert P Ricci &lt;ricci@cs.utah.edu&gt; | &lt;ricci@flux.utah.edu&gt;
| Research Associate, University of Utah Flux Group
| www.flux.utah.edu | www.emulab.net
\-----------------------------------------------------------

From mailnull@bas.flux.utah.edu Wed Oct 29 14:01:24 2003
From: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
To: Stephen_Schwab@NAI.com
Cc: deter-isi@isi.edu, testbed-ops@emulab.net
Subject: Re: shedding some light on the new Cisco 720Gb/s switch fabric
In-Reply-To: &lt;613FA566484CA74288931B35D971C77E13428C@losexmb1.corp.nai.org&gt;; from Stephen_Schwab@NAI.com on Tue, 28 Oct 2003 21:55:37 PST
Date: Wed, 29 Oct 2003 14:00:36 MST
Lines: 22


&gt; However, I think we should just go ahead and get 6 of the
&gt; WS-X6748-GE-TX blades and try out CEF. That will give us 288
&gt; 10/100/1000 ports, allowing us to support up to 72 machines. If we
&gt; find we are over-subscribing something, we can decide whether to
&gt; redistribute across more 6509s, or add the dCEF modules.

&gt; The alternative is to just use 100BaseT in the first 64 PCs, and plan
&gt; to upgrade to gigabit later, on the assumption that the price will
&gt; drop. If we did that, we could just buy the 256 Gb/s switch fabric
&gt; also -- in fact, we would just be buying the 6509 configurations that
&gt; Utah's emulab uses.

A reasonable and probably better alternative is to use 2 Gbit and 2
100Mbit on each machine. Your PC won't drive 4 Gbit interfaces anyway.

Or perhaps:
16 nodes with 4G lines
32 nodes with 2G + 2 100Mbit

That is the sort of expansion we were going to do.
Emulab will handle the resource assignment just fine.

From mailnull@bas.flux.utah.edu Wed Oct 29 16:28:09 2003
From: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
To: Stephen_Schwab@NAI.com, deter-isi@isi.edu, testbed-ops@emulab.net
Subject: Re: shedding some light on the new Cisco 720Gb/s switch fabric
In-Reply-To: &lt;200310292100.OAA05793@fast.cs.utah.edu&gt;; from Jay Lepreau on Wed, 29 Oct 2003 14:00:36 MST
Date: Wed, 29 Oct 2003 16:27:17 MST
Lines: 14

I had said:
&gt; Or perhaps:
&gt; 16 nodes with 4G lines
&gt; 32 nodes with 2G + 2 100Mbit

I was thinking you had 48 total. Do some obvious adaptation
for the 64 you do have.

One thing you will discover is that people do lots of LAN experiments,
which will only use one interface on a node. I suspect that will
extend to some degree to your testbed, too. For one thing, that is a
reasonable way to model the Internet, with different latency/bw
characteristics on each node's interface to the LAN (modeling a
last mile bottleneck).

From ricci@cs.utah.edu Wed Oct 29 11:01:42 2003
Date: Wed, 29 Oct 2003 11:01:42 -0700
From: Robert P Ricci &lt;ricci@cs.utah.edu&gt;
To: Stephen_Schwab@NAI.com
Cc: deter-isi@isi.edu, testbed-ops@emulab.net
Subject: Re: NAMs for the 6509s -- required or optional?
In-Reply-To: &lt;613FA566484CA74288931B35D971C77E13428D@losexmb1.corp.nai.org&gt;; from Stephen_Schwab@NAI.com on Tue, Oct 28, 2003 at 09:57:27PM -0800
Lines: 15

Thus spake Stephen_Schwab@NAI.com on Tue, Oct 28, 2003 at 09:57:27PM -0800:
&gt; We don't have any NAMs (Network Analysis Modules) in our
&gt; configuration. Do we need the NAMs for anything?

They are certainly not required. We suspect that they could be
tremendously useful for experimenters. But, we have yet to use ours at
all (lack of time, not lack of interest). So, clearly, you can get by
okay without them.

--
/-----------------------------------------------------------
| Robert P Ricci &lt;ricci@cs.utah.edu&gt; | &lt;ricci@flux.utah.edu&gt;
| Research Associate, University of Utah Flux Group
| www.flux.utah.edu | www.emulab.net
\-----------------------------------------------------------

From braden@ISI.EDU Wed Nov 19 14:10:00 2003
From: Bob Braden &lt;braden@ISI.EDU&gt;
Date: Wed, 19 Nov 2003 13:09:21 -0800 (PST)
To: lepreau@cs.utah.edu, ricci@cs.utah.edu, deter-isi@ISI.EDU
Subject: Performance figures
Cc: braden@ISI.EDU
Lines: 28



I have been trying to digest the performance figures that we have
been bandying about. Comments/corrections appreciated.

PC 1000 bT Cisco Line Card
(48 ports@1000 bT)
_____________________________________________________________

Max bit rate ~&lt; 0.5 Gbps ~&lt; 1 Gbps 40 Gbps FD


Max pkts per ~&lt; 1 Mpps ~&lt; 2 Mpps 30 Mpps * (FD??)
sec (pps)



*Note: Rises to 48 Mpps with dCEF (distributed Cisco Express Forwarding
Cards; cost $27K for 6 line cards.

According to these figures, we might be over-subscribed if we spread
each PC across 4 line cards. OTOH, if we plug each PC into a single
line card, the head room is about a factor of 2 - 3.

Does this make sense???

Bob


From mailnull@bas.flux.utah.edu Wed Nov 19 15:13:30 2003
From: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
To: Bob Braden &lt;braden@ISI.EDU&gt;
Cc: ricci@cs.utah.edu, deter-isi@ISI.EDU, testbed-ops@flux.utah.edu
Subject: Re: Performance figures
In-Reply-To: &lt;200311192109.NAA04791@gra.isi.edu&gt;; from Bob Braden on Wed, 19 Nov 2003 13:09:21 PST
Date: Wed, 19 Nov 2003 15:13:30 -0700
Lines: 48

Added testbed-ops so you get more, more informed people.
Some quick remarks, not complete. I have not digested your numbers.

---------------
From: Bob Braden &lt;braden@ISI.EDU&gt;
To: lepreau@cs.utah.edu, ricci@cs.utah.edu, deter-isi@ISI.EDU
Subject: Performance figures
Cc: braden@ISI.EDU


I have been trying to digest the performance figures that we have
been bandying about. Comments/corrections appreciated.

PC 1000 bT Cisco Line Card
(48 ports@1000 bT)
_____________________________________________________________

Max bit rate ~&lt; 0.5 Gbps ~&lt; 1 Gbps 40 Gbps FD


Max pkts per ~&lt; 1 Mpps ~&lt; 2 Mpps 30 Mpps * (FD??)
sec (pps)

PCs can fwd a lot more than 1Mpps in polling mode if pkts are short.

*Note: Rises to 48 Mpps with dCEF (distributed Cisco Express Forwarding
Cards; cost $27K for 6 line cards.

According to these figures, we might be over-subscribed if we spread
Which device is oversubscribed?
each PC across 4 line cards. OTOH, if we plug each PC into a single
line card, the head room is about a factor of 2 - 3.
You can't set up a very interesting topology with just link per machine!
Without our virtual network stuff, that is.
But there are reasons people often want a "real" dedicated network link.


Does this make sense???

Bob
------------

Get PCI-X busses on your PCs. You need that if you run Gbit.

Lots of expts won't be running Gbit!! Most of the Internet
is *lots* slower than that.


From ricci@cs.utah.edu Wed Nov 19 15:43:57 2003
Date: Wed, 19 Nov 2003 15:43:57 -0700
From: Robert P Ricci &lt;ricci@cs.utah.edu&gt;
To: Bob Braden &lt;braden@ISI.EDU&gt;
Cc: deter-isi@ISI.EDU, testbed-ops@flux.utah.edu,
Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
Subject: Re: Performance figures
In-Reply-To: &lt;200311192213.hAJMDULQ067267@bas.flux.utah.edu&gt;; from lepreau@cs.utah.edu on Wed, Nov 19, 2003 at 03:13:30PM -0700
Lines: 26

Note: I have not verified your numbers.

Thus spake Jay Lepreau on Wed, Nov 19, 2003 at 03:13:30PM -0700:
&gt; According to these figures, we might be over-subscribed if we spread
&gt; each PC across 4 line cards. OTOH, if we plug each PC into a single
&gt; line card, the head room is about a factor of 2 - 3.
&gt;
&gt; Does this make sense???

So, it sounds like you avoid oversubscription, accoring to your numbers,
in the latter case because, though it has 4 1000Mbps interfaces, a PC
cannot saturate all of them? To me, this would suggest that you could
save a lot by not putting more 1000Mbps interfaces on a PC than it can
handle, and giving some of them a mix of 1000Mbps and 100Mbps
interfaces.

But yes, if your PCs will be your limiting factor in traffic generation
(which is certainly beleivable), this seems to me like a reasonable way
to avoid overloading the switch.

--
/-----------------------------------------------------------
| Robert P Ricci &lt;ricci@cs.utah.edu&gt; | &lt;ricci@flux.utah.edu&gt;
| Research Associate, University of Utah Flux Group
| www.flux.utah.edu | www.emulab.net
\-----------------------------------------------------------

From mailnull@bas.flux.utah.edu Wed Nov 19 15:53:22 2003
From: Jay Lepreau &lt;lepreau@cs.utah.edu&gt;
To: Robert P Ricci &lt;ricci@cs.utah.edu&gt;
Cc: Bob Braden &lt;braden@ISI.EDU&gt;, deter-isi@ISI.EDU, testbed-ops@flux.utah.edu
Subject: Re: Performance figures
In-Reply-To: &lt;20031119154357.I534@cs.utah.edu&gt;; from Robert P Ricci on Wed, 19 Nov 2003 15:43:57 MST
Date: Wed, 19 Nov 2003 15:53:22 -0700
Lines: 10

&gt; To me, this would suggest that you could save a lot by not putting more
&gt; 1000Mbps interfaces on a PC than it can handle, and giving some of
&gt; them a mix of 1000Mbps and 100Mbps interfaces.

Yes. I think I've recommended before that ISI do 2 Gbit and 2 100Mbit
lines on each machine.

Note that Gbit and FE NICs are not much different in price; it's the switch
ports that really cost. So buy all Gbit NICs and run some at 100.
Gives you later flexibility.

----------------------------------------------------------------------
</pre>