For the last many months, we've been looking into reports of Dell Optiplex 755 computers spontaneously rebooting. The reboots are random, and we've never found a way to force the behavior on demand. In parallel we've had another issue of the 755s hanging on login with a specific image. The image is hardware independent, and only hangs (although not reproducibly) on the 755s.
Because of the reproducibility issue, its been a long road to troubleshoot. We have a test network, and the problems have not manifested there. What we've found is,
- For the computer hangs on login we've discovered the problem vanishes if the Intel disk controller is configured from AHCI to 'Legacy IDE'
- If we disconnect the DVD drive with the controller in AHCI mode the issue vanishes
So, degrading the controller to 'Legacy IDE' seemed to be good resolution/workaround at least for the hanging issue, so we were going to proceed by changing the BIOS configurations. I was troubled though by the lack of any documentation in the Intel AHCI release notes which could be linked to this issue,
However, today we found that Dell last week released an 'optional' update for the DVD Drives in these machines, the GH30N 16x SATA DVD drives. This optional update resolves an 'auto reboot issue'. My first thought was, an auto-reboot fix is optional?? And my second thought was 'Ah...dodgy DVD firmware might also be responsible for the login hangs too'.
So, over the next few days we'll be seeing if this updated firmware solves our months of woe. I hope it does.
Computers are complex beasts. Our test network did not have exactly the same hardware as our deployed set, and further to that we also we offer model variations all of which just can't be kept on a test network.
So, its important to keep your hardware variations within your purchased model ranges to a minimum. This will give you a higher chance of catching odd issues like this before you more to your pilot deployments. With smaller hardware variations, you'll also be able to keep on-top of vendor announcements on items such as firmware updates. An daunting task if you allow a dozen hardware variations for users on each model.
Finally, your pilot deployments need to be taken seriously, by both the pilot users and the administrators. Pilot deployments are not a rubber stamping phase, this is your first real chance to see how your software/OS/hardware works in the real world. As such, the pilot deployments need to be examined in detail and assessed by the most pedantic staff you have. Otherwise, you run the risk of you full deployment becoming a bit of a nightmare.
I hope this little entry helps others who might be experiencing (or have experienced) oddities with the 755s.