2-node VCS CFS - du command takes 10x longer on one node!
I have setup a 2 node VCS/CFS Cluster. Both nodes are identical HP Blades (460G6 I think), storage is on EMC Symmetrix arrays. Servers are split between data centers and communications is over Sonet Ring with Storage going over DWDM. Both nodes are running RedHat 5u4 64-bit and have identical packages and kernel parameters. Both were built from the same kickstart files. 2 QLogic HBA's are in each blade and DMP is managing the multipathing.
Storage is divided into 2 plexes mirrored by VxVM over the data centers. One plex is on an array(s) in each data center.
iozone profiling shows I/O rates are pretty identical from both nodes.
du -hsc on shared filesystem is the only issue I'm currently seeing. The shared filesystem has 66,000 files in it currently. du on one node takes about 3 minutes, but on the other node takes anywhere from 30-48 minutes. No other activity is occuring. I've tried changing the master, but it doesn't seem to matter. I have strace output if that would help, but my analysis shows that lstat on the slow node is much slower (.001 on ave) so that times 66,000 kind of gets close to the time disparity. Also noted the an lstat immediately following getdent is much, much slower than subsequent lstat commands.
Has anyone seen this behavior?