Ghost Solution Suite

 View Only
  • 1.  Multicast Issues

    Posted Nov 30, 2011 09:11 AM

    Hi all, I work for a school board and we have Ghost Solution Suite 2.5 running independantly at all our school sites.  Multicasting has always been hit and miss so I have been tasked with tracking down the issue.  After reading many of these forums it was evident that we were missing the IGMP snooping on our network switches.  I configured that at site 1 and still the multicast would randomly freeze.  More investigation led me to believe that we had no IGMP Querrier on the network.  Our routers do not support IGMP querrier, Cisco models with firmware before multicast routing was supported.  I installed new Dlink 1210-52 switch at site 1 and enabled IGMP querrier state and enabled IGMP snooping on it and the additional Dlink 1252 switch.  Was able to push a 20GB compressed image to 30 clients in 40 min.  More than acceptable.  Move on to site 2.  Same config.  One Dlink 1210-52 with IGMP querrier state and one 1252 with IGMP snooping enabled.  Try to push image to 27 clients and it either hangs with 3MB pushed(couple machines just sit at waiting for ghost cast to start) or if the session actually starts it randomly hangs.  Unicast works fine, but not fast enough. I am fresh out of ideas and looking for some suggestions.  Both sites the ghost server is an XP VM, same hardware, same config.

    Couple questions:

    If i have IGMP snooping on my main switches but a user has added an 8-port mini switch from future shop somewhere on my network will that kill the multicast, even if there are no multicast clients connected to it?

    Will wireless access points affect the multicast transmission?

    Is there a way to test my network for multicast ability to make sure that is the issue?

    Any other ideas or suggestions would be greatly appreciated as I am ready to pull my hair out.  

    Thanks all!!!!!

    Paul O'Connor



  • 2.  RE: Multicast Issues

    Posted Nov 30, 2011 06:39 PM

    If i have IGMP snooping on my main switches but a user has added an 8-port mini switch from future shop somewhere on my network will that kill the multicast, even if there are no multicast clients connected to it?

    It shouldn't do; a benefit of having IGMP snooping support in your backbone switches (although of course it's benefiical all the way to the endpoints) is that they filter the multicast traffic such that if a switch port isn't actively subscribed to a group it doesn't see any of the traffic at all.

    So, as long as the consumer-grade switch is on a central switch port that is being filtered upstream, it can't cause any trouble - there won't be any offered load of GhostCast traffic impinging on the switch, and no back-flow effects. And indeed, it should normally be fine even if it does get involved in a multicast session.

    Consumer-grade switches (and unmanaged office switches) do vary a lot in terms of how well or how badly they handle multicast traffic hitting them. The biggest issue with them generally comes from adapting between speeds, because Ethernet today isn't the same as the Ethernet of yore.

    In 10Mbps Ethernet, the cable (the "ether" in the name) was a shared resource which the nodes had a protocol for sharing - CSMA/CD, carrier-sense multiple access with collision detection. With the introduction of twisted-pair cables, intially passive "hubs" were used to emulate the single shared cable, but switching meant that every cable became electrically independent and so collisions became a thing of the past. When 100Mbps Ethernet was introduced, collision-detection was deprecated but still supported, but when the 1Gbps Ethernet was introduced, all of the collision mechanisms in 10Mbps ethernet were gone and an entirely different approach to controlling flow between multiple levels of switch was standardised.

    This causes complications when switches adapt speeds; from 1Gbps down to 100Mbps is fine, but going all the way down to 10Mbps gets strange because not only is there a 100:1 speed difference, the entire model of how the network works is different.

    This matters because what many machines do when doing into low-power modes for being woken up by the network is to negotiate a very low Ethernet link speed. This is where things get hard.

    If you have a consumer-grade switch at the edge of your network, it will treat multicast as broadcast. That's fine, nothing bad happens. But if that switch also has a port which is running at low speed, you can have a 1Gbps incoming traffic flow hit a 10Mbps port. And because it's a consumer-grade device, almost anything can happen - it could choose to just discard the traffic hitting that port in which case all will be fine, or it could choose to discard almost the *entire* incoming 1Gbps flow completely, effectively bringing down the active 1Gbps ports connected to it to only receive data at a 10Mbps rate.

    And this then means that the central Ghost server tries to adapt to that; an essential part of Ghost's multicast being robust is adapting to the different speeds of the different machines.

    Now, one of the things we were actively discussing adding to Ghost in the 2008 timeframe was changing it so that GhostCast could have a lower bound on speed; if a node tried to go slower than a minimum speed due to an effect like the one above, it would get kicked out of the multicast session (and ideally, the session would split into a completely parallel one so the node on a faulty switch could still complete). However, at the time Symantec had mostly disinvested from Ghost, so we were considering this for 3.0 but hadn't had the resources to attempt an experimental implementation until we'd finished the main GSS 3.0 features, and then in 2009 GSS development was cancelled and the development team laid off anyway.

    In principle Symantec could still add support for this, but part of the management thinking in leaving the Ghost business to rot was that Deployment Solution would be replacing Ghost Solution Suite, and as Deployment Solution doesn't have the capability to use traditional GhostCast (actually it could have, using an unreleased tool I wrote, but the folks on the DS side didn't really care that much) it likely won't be addressed unless and until the DS customer base calls it out as a priority.

    Will wireless access points affect the multicast transmission?

    That's a slightly trickier question, because there's even more variability in equipment on the wireless side. The big thing is that access points still run awfully slowly, with transmission being higher-latency as well as outright slower in raw throughput than wired networks. But on the other hand even consumer-grade wireless equipment functions more like a managed router on the wireless side rather than a switch, so whether the really pathological behaviour of some consumer-grade switches emerges there is ... well, harder to say in principle. 

    Even an old WRT54G I have here at home running stock 2005 firmware has some multicast support, including the ability to block multicast outright from the wired to the wireless side; the only issue I'm aware of there is as with the wired networks, speed adaptation - with there being a much wider variability in actual transmission speed on the wireless side (and variation in a/b/g/n devices around), multicast on wireless is not all that friendly to Ghost-type protocols so my intuition would be configure them just to not carry multicast traffic from the wired side.

    Is there a way to test my network for multicast ability to make sure that is the issue?

    Not easily. The problem with multicast is that often due to things like IGMP snooping, casual tests will pass - until the switches time out their IGMP group membership and shut off ports. And problems caused by the combination of consumer-grade switches and machines going into low-power states have the annoying property that whether someone somewhere has their laptop lid open or closed at any moment can affect this.

    Basically, diagnosing these kinds of network problems is best done by a combination of traffic analysis using packet captures with a tool like Wireshark and the old Mk. I eyeball (a lot of problems really do show up clearly even in the blinkenlights on the switches once you know what you are looking at). One advantage you have is in learning how things *should* look, since you've got sites which are working which provide you a baseline, not just for the GhostCast traffic but for the kind of supervisory chatter (IGMP query frames and the like) that needs to be going on in a healthy network.

    [ While I personally don't mind helping folks out with looking at GhostCast traffic samples, we're in talks to have my current project - a rather different approach to Ghost for deployment and backup, including on the network side - acquired by a vendor, which will make things a bit more complicated for me. ]



  • 3.  RE: Multicast Issues

    Posted Dec 08, 2011 11:05 AM

    Thanks very much for the reply. 

    I have used some of the tools you mentioned here to do some investigative work and here is an update on our problem.

    When we use a ghostcast server session and boot the machines using PXE or USB boot sticks they boot fine and connect to the session but only some machines seem to join the multicast group entry on my switch,  If we boot all the machines using PXE to the console and run an image task all machines show up in the multicast table on the switch but were timing out before the session ended.  I increase the host timeout in the IGMP settings on the switch and was able to get a lab of 30 machines to image a 20GB image in about 45 mins.  Very acceptable.

    So now comes the problem, I appear to have solved some of my problem but really still have no idea why it is happening so I can't fix it completely.

    Is there any reason why machines booted to a ghostcast session would not join the multicast group but machines that are booted to the console would? 

    Thanks again for your assistance it is greatly appreciated

    Paul



  • 4.  RE: Multicast Issues

    Posted Dec 08, 2011 09:01 PM

    The main thing that comes to mind which would affect that is the fact that in normal interactive use, the Ghost client also uses multicast to discover the GhostCast server hosting a session name. In the console environment the same overall process is followed, but because the management client already knows the specific IP address of the management server, it passes that along to Ghost (via the -jaddr command-line switch) and the session-lookup query is sent unicast, whereas when run standalone it's sent first to the 224.77.0.0 multicast group, which all the GhostCast instances subscribe to in order to be able to retrieve those queries and match the looked-for session against their session name.

    [ Multicast is a great technology in several respects, and location-independent lookup like this is at least as important a use of it as for efficient bulk distribution, which is why multicast/anycast are such a big part of IPv6 ]

    While under normal circumstances there should be no difference between managed and unmanaged cloning, it's possible that there's something in your environment which is negatively affecting the visibility of the 224.77.0.0 session from the point of view of the clients, especially if you have multiple levels of switching.

    Under normal circumstances what should happen is that the device generating IGMP queries is a router, and all those queries should propagate out, and more importantly that router should be subscribed itself to an all-routers multicast group so that a) when host IGMP subscriptions occur they go there, and b) when lower-tier devices like switches receive multicast traffic to groups they don't know, it should still be forwarded to the upstream router.

    Without knowing how your switches are configured in detail, a possible source of trouble would happen if the switches at the client tier of your network aren't forwarding queries to 224.77.0.0 on because they aren't seeing the subscription being issued at the server end of the network by GhostCast to receive the client queries, and for whatever reason aren't forwarding them on to any other device which does receive the initial group subscriptions (whereas the IGMP subscriptions that the clients do en masse to join a specific GhostCast transfer clearly are being seen where they need to be).



  • 5.  RE: Multicast Issues

    Posted Jan 27, 2012 11:05 AM

    I know this is very old but I have spent the better part of the last 2 months troubleshooting this and I am closer but still not quite there.  Here is where I am at.  I have been able to enable mutlicast on our ethernet interface of our router.  Enabled IGMP query on the router.  Have IGMP snooping enabled on Dlink 1210-52 and 1252 Switches.  All updated to latest firmware.  I am running Ghost SS 2.5 on a VM ESXi 4.0 box for my console.  The PCs involved are Levovo 6072 and Lenovo 7360.  Intel Pro1000 NICs.  I have updated the Windows PE to the latest Driver from Intel for the client NICs.  I have also tried the DOS version of the drivers for the task with same results. I have 2 issues that occur when I try to run a multicast.  If I run an image task from the console, the multicast entry table on the switch does not seem to include all client machines so the task fails.  Image "starts" but never actually starts then they time out.  If I cancel the task a couple times and restart it, it seems to eventually build the multicast table and the image starts.  Speed is great.  No network floding, seems to be great.  Gets to the host timeout value on the IGMP settings of the switch and multicast entry table loses ports included in the multicast and the session stops and eventually times out.  If I increase the host timeout value in the IGMP settings of the switch to a longer amount than the image session, it will finish with great speed.  From my research it would indicate that I had an IMGP query problem, but when I run wireshark I am seeing the IGMP qeury report every minute.  I also see the IGMP join requests, membership reports and I don't see any IGMP leave requests so I am not sure why it is timing out.  These symtoms are in multiple school networks.  We never image outside the LAN it is always done within the internal schools network.  Just wondering if I am missing something here, or if you have any further afvice for me.  Here is a screen shot of the router setttings

     

    FastEthernet0/0 is up, line protocol is up
      Internet address is 10.0.65.1/24
      IGMP is enabled on interface
      Current IGMP version is 2
      CGMP is disabled on interface
      IGMP query interval is 60 seconds
      IGMP querier timeout is 120 seconds
      IGMP max query response time is 10 seconds
      Last member query response interval is 1000 ms
      Inbound IGMP access group is not set
      IGMP activity: 57 joins, 50 leaves
      Multicast routing is enabled on interface
      Multicast TTL threshold is 1
      Multicast designated router (DR) is 10.0.65.1 (this system)
      IGMP querying router is 10.0.65.1 (this system)
      Multicast groups joined (number of users):
          224.0.1.40(1)
    Serial0/0 is up, line protocol is up
      Internet protocol processing disabled
    Serial0/0.1 is up, line protocol is up
      Internet protocol processing disabled
    BVI1 is up, line protocol is up
      Internet address is 10.0.201.58/30
      IGMP is disabled on interface
      Multicast routing is disabled on interface
      Multicast TTL threshold is 0
      No multicast groups joined
     
    Thanks in advance for your help
     
    Paul


  • 6.  RE: Multicast Issues

    Posted Jan 31, 2012 09:17 PM

    If I run an image task from the console, the multicast entry table on the switch does not seem to include all client machines so the task fails.

    Well, that sounds like a switch problem. All the clients on the directly-connected ports will be sending join messages, and so the switch should be handling them. If the switch isn't seeing the join messages from every client then that's just flat-out wrong.

    Unfortunately there's not much I can glean from your description to guess what the root problem is, having never personally seen a network which exhibits this particular of collection of problems.

    Pretty much the only thing I'm reminded of is a common fault in bargain-basement switches like these DLinks is that the multicast group table is very small. Once upon a time (not these days, but typical of hardware from the early 2000's and some budget equipment) it was common for multicast handling in switches to use a special piece of content-addressed memory to map a group address to a port list, and this memory tended to be quite small to help it be fast relative to the CPU speeds of the time in order to forward packets at full rate. Worse, every time you create a VLAN it chops the capacity of this CAM in half.

    So, some years ago it was quite common for customers to create networks that didn't work, by careless over-use of VLANs that forced the multicast group table in the switches to be too small to work - the CAM would have what looked like a generous 32 entries to start, but then customers would create 12 VLANs and so each VLAN would only have 2 entries - and those would be taken by the standard groups like the all-routers and all-hosts groups, leaving no internal memory space in the switch for any other multicast groups - as each IGMP message for a different group would come in, the switch would be evicting ones it had just seen from the CAM to make room for the new ones.

     From my research it would indicate that I had an IMGP query problem, but when I run wireshark I am seeing the IGMP qeury report every minute

    If you're using Cisco routers they will be generating IGMP queries; standard IPv4 multicast was designed in the late 80's for routers (this being before switches even existed). So, the IGMP was generated and consumed as a protocol between endpoints (aka hosts) and routers, with the routers generating queries to ensure the freshness of their tables.

    Then, separately, the routers would exchange information between themselves to collectively build up a map of the subscriptions and flows to figure out what flows to send to each other - this was eventually standardised through the PIM protocol, and generally on any routing device if you enable PIM you'll automatically get IGMP queries being generated because if you're asking the router to understand multicast, IGMP support is implied.

    If your switches aren't correctly maintaining group memberships despite these queries being generated, then there's something pretty seriously wrong. As per the above, that was a common problem in badly misconfigured customer networks in the mid-2000s because of the internal memory limit referred to above but I haven't seen anything like that reported in some years.

    According to the product manual the DLink 1210-52 has a 256-entry multicast group limit (so it's built on an early-2000's kind of switch design, as you'd expect being a 100Mbps unit with a slow CPU and not much memory), so in principle that should be adequate for normal levels of VLAN use.

     



  • 7.  RE: Multicast Issues

    Posted Jan 31, 2012 11:53 PM
    Thanks again for taking the time to look into my problem. I will imaging a school tomorrow and will grab some wireshark info during the transmission and see if I can gain some further information to add. We are using a single vLAN and imaging 30 machines at a time. Speeds are great when it works, when I crank up the host timeout value of IGMP to longer than the image takes I average around 420mb which is quite acceptable. Can push a 20gb image to 30 machines in under an hour. Just not sure why the querier is not functioning properly. I have expressed my concern with the quality of the DLink switches in the past but unfortunately I don't have purchasing power. Maybe i will try and get them to replace the switches at one site to see if it makes a difference. Is there an issue with running the ghost console from an ESXi vmWare machine? They are Cisco 2620 routers with ios ver 12.1. We had no IGMP query reports on the network until I enabled PIM Dense-mode which enabled multicast support. Then I see the IGMP query report. I had the same symptoms when I enabled the IGMP query function on the Dlink switches. Machines would timeout when we hit the host timeout value. It has me baffled. Thanks again and will update tomorrow if I come across anything new....


  • 8.  RE: Multicast Issues

    Posted Feb 01, 2012 04:55 PM

     

    We had no IGMP query reports on the network until I enabled PIM Dense-mode which enabled multicast support

    Right, that's as it should be. PIM implies IGMP, and all routers with PIM support will just generate queries by default.

    Incidentally, one of the many unusual things about these DLinks (including why a company which is totally about bulk-manufacturing consumer-grade equipment is even making a device like this) is all the many configuration options it has for IGMP. They just aren't needed if the implementation is sound, and you should never have to touch most of these since the defaults in the more-than-20-year-old spec still work just fine today; most switches should pretty much need nothing except an "on" or "off" setting.

     Machines would timeout when we hit the host timeout value. It has me baffled.

    Me too, but it's surprising how often IGMP support is actually just outright broken. Just last year I helped someone running several sites with Dell PowerConnect switches - what you would think would be decent as Dell sell these to the enterprise market - and the IGMP support on the switches was just flat out not working; queries were being send and hosts were responding to the queries with subscription refreshes, but the switches were still just timing them out as in your case.

    In that case, although I tried to help the customer by analysing packet traces, what turned out to be the decisive thing was that Dell's firmware on these models was simply broken unless you used IGMPv1 - once the querier version was dialled back so that the hosts were making subscription reports using IGMPv1, suddenly the switches started playing the game, which is crazy as the differences between v1 and v2 are microscopic (unlike v3).

    That's another thing I guess you can experiment with; IGMP version control is pretty simple, in that all hosts start out using the most recent version of IGMP they support (which is normally IGMPv3, as that's been around for well over a decade). However, if they ever see an IGMP query, the hosts dial themselves back so that their responses (and future announcements) use the same version as the query.

    That's usually the first way to tell whether the host/querier relationship is working; most routers issue IGMPv2 for some reason, and so IGMPv3 reports usually only come from hosts who don't have visibility to the query frames. In the Dell PowerConnect case, this meant that dialling the querier back to using IGMPv1 meant that the hosts issued IGMPv1 responses, and this then evaded whatever the bug is in the switch firmware that stopped them working.

    That's something you can at least try; dialling the querier back to use IGMPv1 should show the hosts responding in kind in the packet traces, which helps to validate that everything outside the switches is playing the game properly.

    but unfortunately I don't have purchasing power. Maybe i will try and get them to replace the switches at one site to see if it makes a difference

    It may have to come to that; even though the multicast standards are pretty damn simple and Ghost just uses standard IPv4 multicast in a pretty straightforward way, it seems to just be a magnet for problems.

    In 2008 not long before Symantec shut the product down our QA manager was testing several budget 48-port switches with a similar feature set to these DLinks (all ports 1Gbps of course, and yet still these were sub-NZ$1K devices, even here in New Zealand where we pay higher prices than the rest of the world for such gear). They were a Linksys and an HP, and both performed flawlessly even though these were aggressively priced. But then, IGMP snooping really *isn't* an exotic or remotely new feature and the vendors can easily test it properly using Ghost themselves, so they should be getting this right even in budget gear.

    Given that there are budget switches out there which work flawlessly when even really expensive "enterprise" gear doesn't, basically what I'd say is that before committing to any switch it's well worth the time (and twisting the vendor's arm to agree to) some pre-purchase qualification using Ghost as a test platform. Switches just stay in the field too long, especially with schools where they tend to come out of thin capital-expenditure budgets.