Multicast server with multiple NICs connected causing slowdown
Quick background:
Having issues with multicasting images out from our HP Proliant DL380G4 server and began looking into the issue.
Ran several tests with a HP DL380G4 server runnint Windows 2003 R2 fully updated with GSS 2.5 patched to 2165.
Test systems PXE boot to WinPE 2.1 environment. If multicast is run with only one of the onboard NICs connected, multicasting speeds average around 506-560MBs (from the ghost console display). If multicast is run with both onboard NICs connected, multicasting average speeds drop to around 320MBs/min.
The question at this point is what is the Ghost multicast session doing that would cause this type of issue? From what I understand, the multicast session should be initiated only on the Network Connection dedicated to this type of traffic. The second NIC is a different IP range and is even plugged into a separate switch and the issue remains.
Running the server with only one NIC connected in the Production environment is not a valid option at this point.
Comments
In principle, it's not doing anything different.
The GhostCast server shouldn't be affected by (and isn't, in our environments) how many NICs tend to be present and installed on a machine; during the process of gathering up all the clients for a multicast session, it discovers what interface the packets for a client belong to, and during the transmission phase of the session it then requests that Windows send the multicast frames only out the specific NICs that it detected as being actively used by a client .
Of course, that's all done at the Windows Sockets API level - it's the Windows TCP/IP stack that handles most everything else, and of course vendor-specific drivers underneath that. Since GhostCast isn't normally affected in any way by multiple NICs being present, presumably there's something unusual about your environment but having never seen a case of this first-hand I wouldn't know what might cause it. Network slowdowns can even have causes beyond a single machine; for instance with Gigabit Ethernet when the frame-based flow control mechanisms are active, a suitably configured switch could elect to throttle the outbound bandwidth on a NIC.
Probably the only thing that you can really look at easily to rule out the obvious is to find out whether any multicast image traffic is being sent out of the second NIC port; a packet capture with a tool like WireShark should be able to determine whether that's the case and if so, a detailed examination of the inbound traffic should help determine if there's any evidence as to why.
Nigel, Forgot to mention
Nigel,
Forgot to mention that. Ran Network Monitor on both adapters. All traffic is going out across the Deployment connection and nothing over the 2nd NIC.
We run Altiris on the same server and multicast speeds from this program are equal to the speeds from Ghost using one network connection. What are the main differences between Altiris and Ghost multicasting?
Just so you know, all tests run have been with the same switch, same network cables and same laptops (Dell Latitude D630). Only thing that has changed would be the 2nd NIC connection.
Probably not different in the application layer
Right; well, that's what I would expect, in which case it's hard to see why the cause of the slowdown would be due to the application code. As far as GhostCast is concerned, it's not doing anything differently either - it should in principle be issuing exactly the same sequence of API calls to the Windows kernel whether the second NIC is connected or not. This would imply that the most likely cause of the performance regression lies inside the Windows kernel somewhere. Since the Windows TCP/IP code doesn't show this kind of performance regression from a second NIC normally, that tends to suggest something might be up in the NIC drivers, especially since if memory serves the ProLiant series run an unusual variant of the Broadcom family.
Tthe NetXtreme II chips in those are odd beasts with not only their own special teaming support, there is a special TCP/IP Offload Engine in the chipset which is designed for teaming.I wonder if running the two halves of the chip both active but non-teamed is affecting the use of the TOE in this chipset or something of that nature. It's really hard to tell more without having an environment that shows the problem at hand, however.
> What are the main differences between Altiris and Ghost multicasting?
They are quite different designs; RDeploy doesn't have anything like the GhostCast server, and actually only retrieves images unicast over TCP/IP, either from a network share or HTTP server. RDeploy's "multicast" instead consists of the machines on a single subset trying to discover whether they are trying to retrieve the same image and then sharing it with each other using a subnet-only dynamic group. There's no particular benefit to that over Ghost's approach performance-wise, but it does mean that without a dedicated server the Altiris management platform (which it completely dependent on mapped network shares) doesn't need to treat image deployment separately from any other kind of job it runs on a client, whereas the GSS server treats image deployment quite differently in order to set up the server side of things. So, if you captured traffic on your RDeploy "server", you'd just see normal unicast traffic, whereas with GSS the server actually sends the multicast traffic itself.
Would you like to reply?
Login or Register to post your comment.