Video Screencast Help

2-node VCS CFS - du command takes 10x longer on one node!

Created: 13 Apr 2010 • Updated: 27 Oct 2010 | 4 comments

Hi All,

I have setup a 2 node VCS/CFS Cluster. Both nodes are identical HP Blades (460G6 I think), storage is on EMC Symmetrix arrays. Servers are split between data centers and communications is over Sonet Ring with Storage going over DWDM. Both nodes are running RedHat 5u4 64-bit and have identical packages and kernel parameters. Both were built from the same kickstart files. 2 QLogic HBA's are in each blade and DMP is managing the multipathing.

Storage is divided into 2 plexes mirrored by VxVM over the data centers. One plex is on an array(s) in each data center.

iozone profiling shows I/O rates are pretty identical from both nodes.

du -hsc on shared filesystem is the only issue I'm currently seeing. The shared filesystem has 66,000 files in it currently. du on one node takes about 3 minutes, but on the other node takes anywhere from 30-48 minutes. No other activity is occuring. I've tried changing the master, but it doesn't seem to matter. I have strace output if that would help, but my analysis shows that lstat on the slow node is much slower (.001 on ave) so that times 66,000 kind of gets close to the time disparity. Also noted the an lstat immediately following getdent is much, much slower than subsequent lstat commands.

Has anyone seen this behavior?

Thanks, Danté

Comments 4 CommentsJump to latest comment

Kimberley's picture

I touched base with Tech Support on your post, and they suggested that you may want to open a case. Also, they'll need more information like what version you're using.

Best,
Kimberley

Thanks for participating in the community!

mgeiser's picture

Yes, it would be best if you opened a case with the SFCFS team at Symantec support on this issue. That way, we can gather all of the necessary data to troubleshoot your issue.

Warm Regards,

Michael

Jawahar Mohan's picture

ls -l on a large directory in a CFS can be taxing as ls -l does aquire lock for each inode which means lot of traffic over private interconnect. Is the slower node always the one that is on the other datacenter accessing primary plex from local datacenter? Since the nodes are in different datacenters, I believe the private networks are also over the DWDM/Sonet ring. If so, check the roundtrip time between the nodes across the private network. You can use iperf tool to measure the bandwidth/round trip time by plumbing up the private interconnects. When you open a ticket, also provide a high level network and SAN layout and the throughput details.