Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

INFO: task vx_worklist_thr

Created: 27 Jul 2012 • Updated: 03 Aug 2012 | 18 comments
Zahid.Haseeb's picture
This issue has been solved. See solution.

Environment

Operating system version = Linux Redhat 6.2

Storage Foundation version = 6.0 with RP1

Replication between one to one node with local disks

Error under dmesg logs

Hostnamexxx kernel: INFO: task vx_worklist_thr:1712 blocked for more than 120 seconds.
Hostnamexxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Feedback required

  • I have found the following TN what I have come to know that the Kernel is not able to process much request or what is this ?

           http://www.symantec.com/business/support/index?page=content&id=TECH186855

Second few question

  • Does the replication between two nodes can cause the mentioned error ?
  • Because of this my volume with vxfs filesystem which was being replicated got hunged and I have to reboot the primary node as the mounted volume was not able to access. The question is that the dmesg logs show this as INFO message, does not it be a ERROR ?
Discussion Filed Under:

Comments 18 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Zahid,

vx_worklist is a very common vxfs thread, I would say this would not be correct to guess at very high level ..

the technote you mentioned may not necessarily match to the thread unless you have a full panic string exactly matching with what is mentioned in tech note ..

To answer your rest of queries as well..

I would recommend to have a full crash dump collected during the time of hang & get the same analyzed with Symantec support,  this could be specific to your environment .. As mentioned, this is a vxfs thread so might be possible that VVR is not in picture here at all however again, no guess, better get a case raised with Symantec & get the data analyzed ..

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Zahid.Haseeb's picture

Thanks for your kind words Gaurav. Any TN to collect the dump please ?

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Gaurav Sangamnerkar's picture

I use to refer to

http://docs.redhat.com/docs/en-US/Red_Hat_Enterpri...

or

http://www.dedoimedo.com/computers/crash.html

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Zahid.Haseeb's picture

Again thanks for out of the way support gaurav cheeky

Means that after configuring the kdump the problem has to occur, so that we can collect the dump ?

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Gaurav Sangamnerkar's picture

Yep, configure the dump to be ready ..

once issue occurs, collect a crash dump, open a case with support, upload the logs to Symantec & get them analyzed ..

It is very important to collect the dump when issue occurs else the RCA may not be conclusive ..

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Zahid.Haseeb's picture

One more thing I would like to add. Our system did not actually crashed but got hunged as I said. So what I feel that all data should have written in the dmesg from the memory as the system did not crashed but hunged. We had to reboot the system. If supose the dmesg is enough, can we ourself examine the data in dmesg related to the problem? (As we have already created the case with Symantec and working on this, will shaare the result)

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Gaurav Sangamnerkar's picture

Nop, dmesg doesn't have enough information, a crash dump is a full dump of memory in the kernel. If you were hitting an issue & server is hung, chances are there that dmesg also could not be written ?

so a full crash dump helps to have a look at complete holistic view of what was happening inside the kernel ..

If the server was hung or gets hung in future, would prefer to trigger a crash dump so that you can get right evidence to find the root cause of the issue...

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Zahid.Haseeb's picture

Sorry for short information Gaurav:

Actually as I said hunged meant that both drives under sfha got hunged and we were not able to mount/unmount/access these drives. So thats why I said that the memory could have written the data in to the dmesg logs

(we have three drives, one is for OS(basic disk) and 2nd+3rd drive is designated for replication drive and SRL)

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Gaurav Sangamnerkar's picture

Even then, dmesg is not sufficient to find the inside of kernel ... Crash dump is what you need ...

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Zahid.Haseeb's picture

I checked the system and find the /var/crash folder exist. Means core dump is already enabled but I did not find any thing in the /var/crash directory.

What I feel that the system did not crashed or either hanged completely (as I said that only two drives were not able to accessed/mount/unmount) thats why the crash directory may not have any thing inside.

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Gaurav Sangamnerkar's picture

As you said, system didn't crash on its own, you may want to force crash the server to collect a crash dump when the issue occurred,

Refer the doc links which I gave before in my earlier posts ..

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Zahid.Haseeb's picture

did not understand. Kindly elaborate please.

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Gaurav Sangamnerkar's picture

What I mean is, not necessarily system may crash every time ... in system hang situations, to know the exact RCA, you can manually force a crash dump so that you get a full crash dump which can be analyzed by the vendor ..

some more useful links

http://www.symantec.com/docs/TECH147736

http://www.redhat.com/archives/rhl-list/2005-April...

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

SOLUTION
Zahid.Haseeb's picture

As suggested by you we opened support case but the case was closed without any conclusion , now i want you to advice me if i increase the maximum numbers of inodes on vxfs file system would it help in avoiding any sort of extended inode operation problems in future .

For now we have nothing on system logs to conclude the case ,and we cannot sit and simply wait for re-occurance to get forced core dump ,we have to take some preventive measures as the sever is in production . 

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Gaurav Sangamnerkar's picture

As the RCA is in-conclusive at this stage, there is nothing specific you can do as an preventive measure ... You can run through any generic recommendations like patches etc

you don't even know that its an inode related issue so what would you achieve by increasing inodes ?

As you have opened the case already, check some suggestions from support ..

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Zahid.Haseeb's picture

Thanks Gaurav for your kind words. As per the below suggested link if in future if my both/additional drives got hanged and unable to access and unmount at that time I take a force dump and here are the below commands. Kindly correct me if I am wrong.

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-kdump-configuration-testing.html

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Gaurav Sangamnerkar's picture

yep, however this would work once crash dump is correctly configured as explained in previous posts (links)

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Zahid.Haseeb's picture

yes definately :)

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com