Video Screencast Help

Happy International System Adminstrator Appreciation Day! And A Chance to Win...

Created: 29 Jul 2010 • Updated: 14 May 2014 • 10 comments • Page Views: 10
Kimberley's picture
+3 3 Votes
Login to vote

Wishing all of the admins out there a Happy System <Storage, Server, DBA> Admin Day from the Symantec Connect community team!!!! May your day be full of smoothly-running machines, fine-tuned software, and everyone in your office taking you out for lunch and generally appreciating your fabulousness.

Last year, we asked all of you for your best admin stories/nightmares. You really are a put-upon bunch - our condolences. We had so much fun reading them that we're doing the same thing this year. So, in appreciation for all the last-minute, late-night projects (not to mention the incomprehensible requests) that you've survived in the past year,  tell us about your biggest meltdown, how you saved the day on a project, or send us a photo of your wildest wiring job in the comments below. We'll give 3 of you a passel of extra reward points based on nothing more than if your story/photo made us laugh out loud, everyone else voted that you made them laugh out loud, or we just felt sorry for you. Completely subjective :)

So post your stories in the comments below before Monday, August 2, and remember to tell your boss that today is INTERNATIONAL SYSTEM ADMINISTRATOR APPRECIATION DAY and maybe you'll even get a free lunch from the boss, too. Or here's a list of gift suggestions from the official SysAdminDay site, which you could conveniently forward to them. And here's a link to the Symantec Facebook page, where they're also sharing stories.

A special thanks to the folks behind the scenes here at our Symantec Connect site, who keep this beast well fed and tended.

Happy Sys/Storage/Server/DB Admin Day,
From all of the Symantec Connect Community Managers

Blog Author:
I've been working as a Community Manager for Symantec since 2008 and have had the great fortune to watch this community grow and develop over the past few years. Thanks to all who participate!

Comments 10 CommentsJump to latest comment

CraigV's picture

Well, I have actually had 3 of them!!! All 3 sites were catastrophic failures, all requiring restores of an entire site's data.
I'll only bore you with 1 of them...

Details
VM site running ESX 3.5 on an HP StorageWorks MSA 2000 G1 (no), with 2 HP ProLiant DL385 G5s as hosts, and BackuP Exec 2010 running on a seperate server acting as the Virtual Center (HO ProLiant DL165 G5).

Anyways, the site was going down for maintenance, due to untidy cabling. Change Control had been put through, but nobody consulted me...the storage and backup engineer...about the correct procedure to shut down the site. The VMs and hosts were duly powered down correctly, and the site rep then promptly pulled the power from both controllers. Normally this wouldn't be a problem. Since it is an MSA 2000, it was, and the RAID array promptly lost 6 of the 10 drives, causing catastrophic failure. ALL VMs lost; ALL data lost.

I happened to also be on standby...the first time (out of 2) that I had to do the same thing! What followed was a complete rebuilding of the storage, all the VMs, and then restoring the data. Unfortunately, this was while my in-laws were vistiting us for a weekend, so I was pretty much out of action.

What it DID prove was the following:

1. 100MB switches don't lend themselves to good restore speeds. We are rolling out 1GB switches to the site as a result.
2. It proved that our DR procedures for this particular site were working, and we could show this to the customer.
3. It proved that using Backup Exec 2010, we were able to restore the site's data successfuly.

The long and short...it proved BE worked, and worked well; that we need to look into cleverer ways of protecting our VM infrastructucture (amazing how many times you can raise the issue and it gets ignored, but lose a site...!!!); and lastly, that a storage/backup engineer's job really is a thankless job.

Thanks for making this day an unofficial holiday cool

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

+2
Login to vote
Jason1222's picture

HOLIDAY!  I'm going home! Oh yeah, WE can't.

Oh boy, oh boy, oh boy...  Where to begin, and which horror story would you guys find the most appealing?

Okay, I know.  And I really wish I had pictures, but we dropped the camera, and well...  noone was courageous enough to go swimming for it.
* * * * * * * *
Let me give you some background on WHERE exactly we are located.

We are a relatively small company, only about 100 employees here at production time.  We have a mother company, where they are several thousand. 
We have a beautiful view of the mountain side and are about 50 miles from the closest "major city".  Really, we're in a town.  If you like to ski/snowboard, in winter you walk outside, cross the street and hop onto a chirlift.  Waterslides in the summer time, golf down the road and you're closest neighbour, well, let's say you don't walk there...

That being said, here is the kicker.  **Not for the faint of heart**

We are on a downslope from the neighbouring town.  We do not have access to the 'local aqueduct system'.  So what has happened is we have had built a septic tank, in which exists 2 bilge pumps (I guess you would call them- we just call them sh*t pumps).  Because we are usually over 100 employees, men and women, it is not really feasible to have the sceptic tank emptied every other week.  We quite literally "pump the sh*t" (and everything that goes with it) up the hill.  It's only about 450 yards on a 30 degree slope into the neighbouring towns aqueduct system.  We have some women that work here as well.  I mentionned we have 2 pumps. 

One VERY HOT day, we get alerted about "flooding" in one of the 3 server rooms.  The medium sized one, where none of our storage is.  Rather the Firewall, Phone lines, AC units (for that room alone), and alot of video equipment that runs through the buidlings.  About 2 miles worth of cabling in a 20" floating floor.  Some electrical outlet and so forth. 

So, we get alerted to the "flooding alarm" in the server room.  I live the furthest away and my boss is already here (director of IT).  His words to me are and I quote: "we are in deep sh*t, hurry up..."  Nothing could have prepared me for what I was about to witness.  The night before, one of our female employees, disposed on some feminine products, causing the first of 2 pumps to fail.  The second one took over as it was supposed to.  The alarm did not go off to indicate the pump failure. 
Lo and behold.  Later that evening, the second pump got blocked from a combination of hand tissues and of all things a Wacom tablet pen??  *Still scratching my head about that one*

Back to the flooding.  The room is 24 feet by 19 feet over a 20" floating floor.  The wires go through the floors in PVC pipes.  And it was flowing everywhere...  We were literally knee deep in sh*t in 94 degree weather...  Waiting for the sanitation guys to fix our pumps.  Honest to god, ShopVac will pick up anything.  Trip after trip to the "dumping ground", hour after hour of the most foul smell you can think of. 

It took 2 ShopVacs and a sanitation truck pumping from the other side a full day (not work day) to get it cleaned up.  To this day, we still have a stained floor and stained wiring.  We did the best we could, but no way were we cleaning inch by inch of cabling. 

We have been talking about redoing the wiring for months...  Noone wants to pull it out and see what we didn't get, that stayed in the piping... 

A very sh*tty day.

+9
Login to vote
michael cole's picture

You are my hero. I *WAS* going to post my story of a node room having a roof leak and finding out half way through it was the ladies toilets upstairs that was raining down on our heads, but you sir, take the prize utterly.

Scotland salutes your bravery...

Michael Cole

Remote Product Specialist

Business Critical Services

+3
Login to vote
deepak.vasudevan's picture

We should be grateful to the helpdesk team who put odd  hours to keep the systems up and running, ticking hale and healthy, vibrating with a secure heart beat.

I thank Symantec for considering an article besides a forum post on the System Administrators Day.

+5
Login to vote
JamesEwing's picture

Back in 1999, I built a SQL cluster on Microsoft Cluster Server for digital convergence (remember the cue cats that you got free from radio shack?) (they no longer exist so I get to use the real name in the story).
 
The cluster was backed by a 2TB HP MSA (filled an entire rack back then) and was fully redundant (or so I thought).  I was given two separate 30amp power distribution units for the system, side A and B from the generator backed power plant.  Everything went as planned and the system went into production a week later.   A month after installation I received a panicked call at 10am during a big marketing push (a $1MM ad for the cue cat was running during the superbowl).  Both racks were totally dark.
 
It turned out that the two hefty 30 amp power distribution units (mounted in the racks) had been run up from the raised floor.  Under the floor they were plugged into little white power strips (you know the $2.99 ones you get at wallmart?).  Well, after running for a while it must have taxed one of the strips a bit too much and taken the strip on side A over the edge, and when redundant power kicked in it increased the load on side B and the other one tripped.
 
After running big extension cords straight to the PDUs, we booted the system.  The SAN had lost almost an entire shelf of drives in its RAID10 array.  I guess MSAs doesn’t like have their power turned off suddenly (much like the poor guy above that’s had this happen 3 times).
                                                                                                                                                                                                                                                        
Rebuilding the array without the bad drives and then restoring the data took 10 hours to complete (gotta love the good ole days of slow tape) but we were up in time for the superbowl.
 
Ironically, we expected over a million hits that day, and ended up getting just over 200,000.  Cue that one up to not investigating the simplest of possible weak links.
 
James Ewing
Serial Entrepreneur and Virtualization Architect
Enterprise Leadership Consultants
je@entlead.com

 
 

+3
Login to vote
Davinci_uk's picture

Had quite a few over the years (nothing as bad as sewage thankfully!), ones that stick in the mind include:

Many years ago a few of us floorwalking slow performance and freezing for 500 users (in same building) using TS/Citrix - well over a hundred users on each floor, all open plan - the really arsey type of user as well.  To increase load we reconfigured some of the DR servers at a nearby site (that have a very similar naming convention to the prod server) as a quick fix while liong term solution could be planned.  Also disabled log ins to some of the prod ones that had a fault.  I asked the assitant working with us at the time to shutdown the servers with the applciation issues to stop any confusion and the helpdesk enabling log ins again, that they were so good at doing even when told ot to!

At the time users were having everything from slow log ins, to freezing to kick outs and generally moaning, grumbling, being as sarastic as they could when we asked them anythng and generally frustated at us getting them to log out and back into different servers..etc - we took this hostility for about an hour when all of a sudden a few people started tutting, saying they have just been thrown out - only a few on the floor, so assumed the same freezes that were going on.  Then another bank of users and more and more - something wasn't right. directors running out of their office up to see to see whats happening - you just want to run and jump out of the window :-).   Quickly went to the management console and some of the good servers now had no users conencted??  I could no longer ping them - WTF??

Turned to the assistant helping us to doublecheck what servers he shutdown - he had only shutdown the very servers we were shifting users onto!!!  He got the server numbers right, but at the time it was only 2 characters different in the pord and DR server names!    He just looked at me and went a little pale.  Quickly used the remote cards to get the servers back and did some quick talking and indirectly blamed the network (which at the time was the cause of 9 out of 10 probkems) and we was on our way - us It folk have to look out for each other you know!

The users went from being annoyed at being a little slow, to compleely helpless at being thrown out/unable to log back to being annoyed, which they were now happier being :-)  problem solved a few days later, but that moment when you are under pressure anyway and diaster strikes is horrible - al you can do is sigh, roll your sleeves up and crak on!

Others that spring to mind that have all happened include:

- Tripping over some terrible cabling that powered down a £mil mainframe - stop payments being processed for a few hours!
- Rebuilding a head office exchange server back to back over 32 hours (they did pay for pizza)
- A helpdesk call trying to help a user use their mouse for first time (we're going back a little to a Pc rollout!!), they were holding and moving their mouse actually 'on' the screen "I can't see no pointer, my mouse is on top of it' - doh!
- A colleague being asked to turn their laptop on at airport security, suddenyl realising what they were doing with it last and not being able to do anything about it > their educational video started to resume playing!!  No reaction from the female security staff and YES it was a colleague!!
- Users pulling the power cable out of a server everytime they wanted to go out for a smoke by the fire exit.
- NEVER be in a server room when the carbon dioxide system goes off - really!

Has been many years since I worked in support or service delievery, but good to look back and be able to laugh!  Everyone shoudl appreciate the hell we have to go through!
 

+1
Login to vote
Kimberley's picture

Thanks to all of you for sharing your stories! CraigV, JamesEwing and Jason1222, by popular vote and agreed upon by the community managers, we're awarding you 100 reward points each, with Jason getting an addition 50 points to put towards his 'dropped camera' fund (which will make perfect sense if you've read the post).

Hope you all had a great Sys Admin Appreciation day. Keep the stories coming, and remember, the next contest of wild stories is only 361 days away and next year we're going to want pictures!

Best,
The Connect Community Managers Team

Thanks for participating in the community!

+1
Login to vote
CraigV's picture

AWESOME! Thanks Kimberley!
Congrats to the rest... :)

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

+3
Login to vote
Jason1222's picture

Thank you and glad you guys got a "good chuckle".

Stil working on the mental scarring...  LOL

You know, we never did find that camera...  hahaha

-1
Login to vote
jesternl's picture

thanks for the laughs, some fun stories.. i was too late to add my own heap of misery :)

-1
Login to vote