Video Screencast Help
Search Video Help Close Back
to help
Not able to make it to Vision this year? Get a sampling in the Best of Vision on Demand group.

{CWoC 2010} aila 0.5.0 Samples: Using the Query Shell, Part II

Updated: 21 Apr 2011
ludovic_ferre's picture
0 0 Votes
Login to vote

So, we are reaching the final part of the aila 0.5.0 documentation. Using the query shell part II.

To start this second round of detailed look at the query-shell we'll use a real life scenario here on Symantec Connect: the xmlbasehttpcallback problems [https://www-secure.symantec.com/connect/forums/seeing-lot-basexmlhttpcallback-exceptions].

In short, the problem is that some Altiris Agent are sending empty xml string to the Task Server in infinite loop.

Here's what we can see from the IIS log files, and how we can drill into the root cause to find the affected clients by IP address or ResourceGuid.

 ludovic@smallisbeautiful:~/PROD_003/SW-SRC/altiris-ns-tooling/aila/payload$ aila -f iis_6_logs/ex110418.log -qs -n0
Current line #: 1180000
Program read 206128068 bytes from 1183545 lines

LOG FILE ANALYSIS:: SUMMARY

Mime type results:
<< data truncated here >>

Altiris Agent requests results:
<< data truncated here >>

IIS Web-applications results:
    Webapp= /Altiris/NS/Agent/, dir hits =   129775
    Webapp= /Altiris/NS/      , dir hits =     2007
    Webapp= /Altiris/IRA[1]/  , dir hits =     4410
    Webapp= /Altiris/Packages/, dir hits =    19743
    Webapp= /Altiris/CTS[4]/  , dir hits =  1008779
    Webapp= /Altiris/TaskMgmt/, dir hits =    13882
    Webapp= /Altiris/         , dir hits =     1942
    Webapp= Other             , dir hits =     2999

We can see on the web-app statistics that the ClientTaskServer is causing most of the workload, with a whooping 1 million hits (for 1.1M requests on the log file).

At this rate we are almost assured that the detailed statistics will show us the top hitters ip address and computer guid:

LOG FILE ANALYSIS:: DETAILED

IIS status code results:
    IIS Status code= Success  (1xx,2xx), hits count =  1180327
    IIS Status code= Redirected   (3xx), hits count =      162
    IIS Status code= Client error (4xx), hits count =     1041
    IIS Status code= Server error (5xx), hits count =     2007

IIS sub-status code results:
    Sub Status code= 0, hits count =  1182562
    << data truncated here >>

IIS Win32 status code results:
    Win32 Status code= Win32 Success    , hits count =  1174061
    Win32 Status code= Win32 Failure    , hits count =     9476

24 hour hit counters:
    Hits counted during hour  7 to  8  was   31947
    Hits counted during hour  8 to  9  was  106662
    Hits counted during hour  9 to 10  was  126182
    Hits counted during hour 10 to 11  was  129480
    Hits counted during hour 11 to 12  was  144856
    << data truncated here >>

Ip cache top 20 entries:
    14901, 192.168.10.51
    12661, 169.254.3421
    10866, 192.168.20.58
    10451, 192.168.2.134
    10233, 192.168.72.154
    << data truncated here >>

Guid cache top 20 entries:
    14762, 965c234c-9e45-4d19-8f83-bb9970f401d5
    10845, 62bd92f5-b1bb-41eb-899d-90e08ee2dbcb
    10350, 6a52a043-1594-41d5-bd17-669533ff6e83
    10162, 83cbe1f5-7c2d-4071-976f-fb522f699f23
    << data truncated here >>

But for the purpose of drilling into the tool functionalities and to share some troubleshooting tips we'll go onto the query-shell to verify what we are seeing (confirm a hunch or double checking that what we think is backed by hard evidences).

So we'll drill into the CTS stats to see how bad the basexmlhttpcallback has an impact on the server.

So first let's get the filter help, to see what options are available.

Enter a command (help or ?) or exit to quit:
aila-qs/:filter help

Currently supported commands are:

    print: display the current filter specifications

    clear: reset the filter to default value (c-ip, no string)
    
    not: switch (false -> true, true -> false) the exclude entries matching
        the current filter from exec commands
    
    not <field name> <value>: set an exclude filter explicitly

    
    help <field name>: to get help on the filter field specified, for exam-
        ple use 'filter help uri-webapp' to get the help message for the
        uri-webapp filter.

    <field name> <value>

        Global filter:
            top n
            from n

        Standard filter:
            c-ip <client_ip>
            uri-agent [0..6]
            uri-webapp [0..10]
            uri-mime [0..3]
            guid <guid>
            iis-status n
            win32-status n
            time-taken n

    The 2 global filters can be combined with the standard filter, so you
    can specify where to start from, how much data to look at and other keys
    keys.

Given we're interested in the ClientTaskServer details we can get the syntax using 'filter help uri-webapp' from the shell root:

aila-qs/:filter help uri-webapp

    Field name: uri-webapp
    Values:
         0: Altiris Agent         1: NSCAP
         2: NS Console             3: Resource Manager
         4: Patch Inventory Rule     5: SWD Packages
         6: SWD Portal             7: Client Task Agent
         8: Client Task Server         9: Task Management
        10: Other Altiris apps        11: Other directories

    Sample: 'filter uri-agent 6' to match Software Portal requests only.

The identifier for the CTS web-app is 8, so we can craft the filter and verify it using the following commands:

aila-qs/:filter uri-webapp 8
aila-qs/:filter print
uri-webapp 8

Now that our filter is defined as we need, we can stat the store with the filter applied, so we can see information specific to this web-app only:

aila-qs/:exec stat store
Processed 1008779 requests from the store.

LOG FILE ANALYSIS:: SUMMARY

Mime type results:
    File type= aspx , page hits=  1008779

Altiris Agent requests results:

IIS Web-applications results:
    Webapp= /Altiris/CTS[4]/  , dir hits =  1008779

[1] IRA is an abbreviation of InventoryRuleManagement/Agent
[2] SWD is an abbreviation of SWD/SWDWin32/SWPortal
[3] CTA is an abbreviation of ClientTaskAgent
[4] CTS is an abbreviation of ClientTaskServer

LOG FILE ANALYSIS:: DETAILED
    << data truncated here >>

24 hour hit counters:
    Hits counted during hour  7 to  8  was   20263
    Hits counted during hour  8 to  9  was   59116
    Hits counted during hour  9 to 10  was  116957
    Hits counted during hour 10 to 11  was  113101
    Hits counted during hour 11 to 12  was  130859
    << data truncated here >>

This shows quite a lot of incoming requests from the task agent.

Next then, we can take a look at a sample set of requests. To do this we use the "from" keyword, which as the name indicates allow the exec commands to start at a specific location.

Here we will start at ~10:00, using the start position at request 300,000 (using the 24 hour hit counter data above). Given we'll print the output to the console we also use the "top" keyword to avoid flooding the console.

aila-qs/:
aila-qs/:filter top 50
aila-qs/:filter from 300000
aila-qs/:exec print store
300000, 192.168.17.196, 166, /Altiris/CTS[4]/, (null), aspx, 20110328, 101545, Success  (1xx,2xx), 0, 0, 5d1d341b-86fc-4d45-956b-2a466fee0f1f
300001, 192.168.15.102, 166, /Altiris/CTS[4]/, (null), aspx, 20110328, 101545, Success  (1xx,2xx), 0, 0, 325b9b88-8ab3-4fb2-a083-c23e8c7427c1
300002, 192.168.18.186, 166, /Altiris/CTS[4]/, (null), aspx, 20110328, 101545, Success  (1xx,2xx), 0, 0, d888e779-a0b7-40ac-b265-37d6b43a7934
300007, 192.168.67.68, 165, /Altiris/CTS[4]/, (null), aspx, 20110328, 101545, Success  (1xx,2xx), 0, 0, e10f5d20-d8fd-47f2-86a3-8660e7aef929
<< data truncated here >>
300075, 192.168.72.154, 166, /Altiris/CTS[4]/, (null), aspx, 20110328, 101547, Success  (1xx,2xx), 0, 0, 5377e478-49ee-40b3-8c42-e744c5873fc3
aila-qs/:quit
Brought to you by {Connect Winter of Code}

Now to be 100% accurate we could filter the input file to ensure it only contains TaskManagement requests. This is done with the Unix tool grep (available by default on cygwin).

But to parse the resulting file with aila we need to make sure the resulting file contains the header (which contains the w3c schema details needed by aila).

So we start creating the file with the 'head' command and then we append to it the matches from grep:

ludovic@smallisbeautiful:~/PROD_003/SW-SRC/altiris-ns-tooling/aila/payload$ head -n 4 iis_6_logs/ex110418.log > cts_only
ludovic@smallisbeautiful:~/PROD_003/SW-SRC/altiris-ns-tooling/aila/payload$ grep -i "ClientTaskServer" iis_6_logs/ex110418.log >> cts_only

And off we go again with aila.

ludovic@smallisbeautiful:~/PROD_003/SW-SRC/altiris-ns-tooling/aila/payload$ aila -f cts_only -n0 -qs
Current line #: 1010000
Program read 167569311 bytes from 1010226 lines

LOG FILE ANALYSIS:: SUMMARY

Mime type results:
    File type= aspx , page hits=  1010222

Altiris Agent requests results:

IIS Web-applications results:
    Webapp= /Altiris/CTS[4]/  , dir hits =  1008779
    Webapp= /Altiris/TaskMgmt/, dir hits =     1443

 << data truncated here >>

Ip cache top 20 entries:
    14763, 192.168.10.51
    10845, 192.168.20.58
    10350, 192.168.2.134
    10162, 192.168.14.154
    10158, 192.168.72.154
    <<data truncated here>>

Guid cache top 20 entries:
    14762, 965c234c-9e45-4d19-8f83-bb9970f401d5
    10845, 62bd92f5-b1bb-41eb-899d-90e08ee2dbcb
    10350, 6a52a043-1594-41d5-bd17-669533ff6e83
    10162, 83cbe1f5-7c2d-4071-976f-fb522f699f23
    10158, 5377e478-49ee-40b3-8c42-e744c5873fc3
    <<data truncated here>>

Enter a command (help or ?) or exit to quit:

As a sanity check I normally filter out the data I want to work with, to verify if there's anything else (false positive or any other kind of pollution). That we use the same filter as before, with the addition of the 'not' keyword:

Enter a command (help or ?) or exit to quit:
aila-qs/:filter not uri-webapp 8
aila-qs/:filter print
uri-webapp not 8
aila-qs/:exec stat store
Processed 1443 requests from the store.

LOG FILE ANALYSIS:: SUMMARY

Mime type results:
    File type= aspx , page hits=     1443

Altiris Agent requests results:

IIS Web-applications results:
    Webapp= /Altiris/TaskMgmt/, dir hits =     1443

[1] IRA is an abbreviation of InventoryRuleManagement/Agent
[2] SWD is an abbreviation of SWD/SWDWin32/SWPortal
[3] CTA is an abbreviation of ClientTaskAgent
[4] CTS is an abbreviation of ClientTaskServer

LOG FILE ANALYSIS:: DETAILED

IIS status code results:
    IIS Status code= Success  (1xx,2xx), hits count =     1443

IIS sub-status code results:
    Sub Status code= 0, hits count =     1443

IIS Win32 status code results:
    Win32 Status code= Win32 Success    , hits count =     1426
    Win32 Status code= Win32 Failure    , hits count =       17

24 hour hit counters:
    Hits counted during hour  7 to  8  was     119
    Hits counted during hour  8 to  9  was     344
    << data truncated here >>

This shows 1,443 entries that are not related to the Client Task Server. This should be explained by the grep command use, as it matches any requests with "TaskManagement", which is valid for the TaskManagement webapp.

A quick print out to the console allows us to verify this (so we can safely discard the false positive as a user fault, rather than a tool or server problem):

aila-qs/:filter top 20
aila-qs/:exec print store
669, 169.254.3421, 185, /Altiris/TaskMgmt/, (null), aspx, 20110328, 74421, Success  (1xx,2xx), 0, 0,
738, 169.254.3421, 185, /Altiris/TaskMgmt/, (null), aspx, 20110328, 74422, Success  (1xx,2xx), 0, 0,
<< data truncated here >>
1057, 169.254.3421, 185, /Altiris/TaskMgmt/, (null), aspx, 20110328, 74424, Success  (1xx,2xx), 0, 0,
1123, 169.254.3421, 185, /Altiris/TaskMgmt/, (null), aspx, 20110328, 74424, Success  (1xx,2xx), 0, 0,
aila-qs/:quit

We can satisfy ourselves that the errors are so marginal they shouldn't impact the ip or guid cache in any way. So we can go back to the shell and use the "print" commands to list the first 10,50 or 500 ip address or guid, as you find suitable.

Finally, should you need to get all the ip or guid from the aila analysis remember that the cache_dump.csv file is created by default, so you can use that directly too ;).

Should you need more samples or details on the tool and usage, please come back to me here on Connect!