Video Screencast Help
Protect Your POS Environment Against Retail Data Breaches. Learn More.
Endpoint Management Community Blog

{CWoC} A Good Knock on the Performance Issue, But It Did Not Come from Any Indexing Yet

Created: 30 Dec 2009 • Updated: 30 Dec 2009
Ludovic Ferre's picture
0 0 Votes
Login to vote

In my previous blog post (here) and self-reply I was searching for a solution to the negative impact on performance a large cache had.

It included some really wacky ideas (the front and rear indexing for example) and some better one (storing ip's in their original form i.e. 32-bit unsigned integers). I spent much time thinking about the best possible indexing (considering tree implementations, sorted list, as well as block memory allocation to limit the hits on mallocs) and after a walk this morning (I went to the weekly street market refill my fresh basket) I thought that performance was prime, thus I decided to revert the cache store to the basic array used originally.

With that decided I knew I would gain some performance over the linked list implementation, but I wanted to gain from the experience and to implement some mechanism in order to reduce the overall time-to-delivery of a parsed file result.

So with the single array I took the front~rear idea in to implement a kinda-sorted list: guids are stacked from the array head, and ip addresses are stacked from the array tail. Also I made the move to store ip addresses in native form, converting the dotted decimal presentation to a 32-bit int before storing the value in-lieu of a pointer in the array. This saves again on the malloc system calls, as well as on the processing time.

A final change that was made possible with the 2 dimension array (did I specify this before: 1 dimension for the data or data pointer and the other dimension for the hit counter) was to drop the string cache entry structure (sce) so the processing is simpler and there's less operations in the current version of the code (I need to review my plans, but I think 0.2.3 is going to make the next tag :D).

Finally, the implemented two-dim array with the hit counters gives us a good option to improve performance and possibly sort the array to some extent. I'm wondering how much improvement sorting moving the top hitters to the bottom of the stack so stack search would find them quicker.

Anyway we will see, but I suspect I can spare some cycles to this, as a big (cache) hitter located near the top of the stack when 2,000 entries are present could rapidly reach the million of search cycles wasted (only a 1,000 hit would be needed to waste a million cycles with the cache located at position 1,000).

Finally, some data on the current version of aila (on my 32-bit Xubuntu):

ludovic@xubuntu-laptop:~/dev/altiris-ns-tooling/aila$ time ./aila-linux-0.2.2-long --file ../DATA/ex091207.log > /dev/null

real    0m30.784s
user    0m30.282s
sys     0m0.484s

ludovic@xubuntu-laptop:~/dev/altiris-ns-tooling/aila$ time ./aila-linux-0.2.2-long --file ../DATA/ex091207.log > /dev/null

real    0m31.890s
user    0m31.274s
sys     0m0.620s

ludovic@xubuntu-laptop:~/dev/altiris-ns-tooling/aila$ time ./aila-linux-0.2.2-int --file ../DATA/ex091207.log > /dev/null

 

real 0m31.879s user 0m31.278s sys 0m0.600s ludovic@xubuntu-laptop:~/dev/altiris-ns-tooling/aila$ time ./aila-linux-0.2.2-long --file ../DATA/ex091207.log > /dev/null real 0m33.422s user 0m32.594s sys 0m0.780s

And here's the full output (no cache dump available for public view yet):

Program read 173584167 bytes from 988232 lines 

 

Mime type analysis summary results: File type= htm , page hits= 5660 File type= js , page hits= 1008 File type= css , page hits= 438 File type= aspx, page hits= 539084 File type= asmx, page hits= 31500 File type= other, page hits= 410534 Altiris Agent request analysis summary results: Agent request= Reg Client, page hits= 113 Agent request= Get Policies, page hits= 43821 Agent request= Get Pkg Info, page hits= 31944 Agent request= Get Snapshot, page hits= 459178 Agent request= Post Event , page hits= 296685 Agent request= Other, page hits= 156483 IIS Web-applications analysis summary results: Webapp= /Altiris/NS/Agent/, dir hits = 832048 Webapp= /Altiris/NS/NSCap/, dir hits = 127 Webapp= /Altiris/NS/, dir hits = 32722 Webapp= /Altiris/Resource/, dir hits = 196 Webapp= /Altiris/IRA[1]/, dir hits = 10041 Webapp= Others, dir hits = 113090 [1] IRA is an abbreviation of InventoryRuleManagement/Agent Detailed IIS status code analysis results: IIS Status code= Success (1xx,2xx), hits count = 972126 IIS Status code= Redirected (3xx), hits count = 15465 IIS Status code= Client error (4xx), hits count = 621 IIS Status code= Server error (5xx), hits count = 12 Detailed IIS status code analysis results: Sub Status code= 0, hits count = 988223 Sub Status code= 9, hits count = 1 Detailed IIS Win32 status code analysis results: Win32 Status code= Win32 Success, hits count = 967105 Win32 Status code= Win32 Failure > 0, hits count = 21119 24 hour hit counters: Hits counted during hour 0 to 1 was 27845 Hits counted during hour 1 to 2 was 23114 Hits counted during hour 2 to 3 was 27810 Hits counted during hour 3 to 4 was 20752 Hits counted during hour 4 to 5 was 24887 Hits counted during hour 5 to 6 was 20665 Hits counted during hour 6 to 7 was 29071 Hits counted during hour 7 to 8 was 44988 Hits counted during hour 8 to 9 was 82585 Hits counted during hour 9 to 10 was 80390 Hits counted during hour 10 to 11 was 62526 Hits counted during hour 11 to 12 was 65265 Hits counted during hour 12 to 13 was 85627 Hits counted during hour 13 to 14 was 61712 Hits counted during hour 14 to 15 was 39595 Hits counted during hour 15 to 16 was 48961 Hits counted during hour 16 to 17 was 42308 Hits counted during hour 17 to 18 was 29446 Hits counted during hour 18 to 19 was 29471 Hits counted during hour 19 to 20 was 24499 Hits counted during hour 20 to 21 was 26200 Hits counted during hour 21 to 22 was 20509 Hits counted during hour 22 to 23 was 31691 Hits counted during hour 23 to 24 was 38307 Brought to you by {Connect Winter of Code}