2010 (2896.9) and 11D (7170.63) RALUS Agent Consistently Crashes
Hello all thanks for looking at my post. Symantec personnel PLEASE look into this.
Let me start off by saying this issue occurred on RHEL 5.4 prior, but we bought a new machine, did a fresh install and upgraded to RHEL 5.5 and the issue got worse, and of course there is no going back.
My issue is that the RALUS agent crashes nearly every day but will actually back up if you keep restarting the agent and rekick the job.
We recently upgraded from RALUS 11D (7170.63) to 2010 (2896.9) in an attempt to fix this issue.
I have seen “glibc invalid free” errors and segfaults across all versions of RALUS, but this was an intermittent issue. Since upgrades to libstdc++, kernels and glibc this issue has only gotten worse.
Lets start off with relevant system information.
Dell R710 with 32GB Multi-bit ECC Memory
2X Xeon 6 core processors @ 1333mhz bus
500GB RAID 1 local storage via perc6i
8 SAN attached LUNS with nested mount paths under "/opt" (using 2x Emulex Lightpulse Fiber Channel 12002 HBA’s over multipath on ext3 file systems utilizing GPT disk labels)
Note: this issue happens on non SAN attached servers as well.
Oracle Enterprise Linux 5 Update 5 (5.5). This is a PAID RHEL distribution made by Oracle for Oracle Database Servers
Oracle 10G2 Latest Patches as of today
Linux xxxxxxxxxxxxxx 2.6.18-22.214.171.124.4.el5 #1 SMP Thu Apr 8 18:35:38 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
Dmesg output from last 4 days
beremote: segfault at 0000000074706f3b rip 00000000001fb860 rsp 00000000f41f62ac error 4
- beremote: segfault at 000000002f2f2f8b rip 00000000f4c0ea6d rsp 00000000f40f9de0 error 4
- beremote: segfault at 0000000074706f3b rip 00000000001fd055 rsp 00000000f416f3ac error 4
- beremote: segfault at 0000000074706f3b rip 00000000001fb860 rsp 00000000f41572ac error 4
- beremote general protection rip:3bc9873005 rsp:40bc1cf0 error:0
- beremote: segfault at 000000690000008a rip 0000003bc987078f rsp 0000000043eda800 error 6
As you can see below the “beremote” file which is the Backup Exec (RALUS) main executable would indicate that it was compiled against a VERY old kernel, thus not allowing it to have access to the newer version of glibc, libstdc++ or the like. I’m not sure why Symantec is keeping this around but it is causing huge issues with newer distributions.
beremote: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), for GNU/Linux 2.4.0, stripped
Honestly I can’t see why more people aren’t having this issue.
From my experience it really seems that the content that is being backed up is more of the issue than newer glibc/libstdc++/kernel versions.
IE: If I back up 100GB of Archive data (DBF, tar.gz or other binary data), the agent does just fine. But when I attempt to back up my Oracle install (mainly the Java stuff) or “/etc/” the agent crashes with “general protection”, “segfault” or “glibc invalid free” errors. This seems to become a bigger issue when there are lots of (system created) symlinks that link to symlinks.
Can someone please offer any advice on why this may be happening and any possible workarounds/patches etc?