Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.
Endpoint Management Community Blog

{CWoC} Post tag 0.2.0 review and preparation for the next tag ( looking back & looking ahead)

Created: 27 Dec 2009 • Updated: 28 Dec 2009
Ludovic Ferre's picture
0 0 Votes
Login to vote

As we did right after the previous tag (0.1.8) [1] it is time to take a break from coding to look back upon what we achieved and look ahead for what remains to be done before the next tag can be created.

Looking back

The following objectives were decided (in two stages - the post and a self reply):

  1. Naming of the resulting program (generally v0xx)
  2. Naming of the main source file (currently prog.c)
  3. Externalization to purpose driven c files for major functions
  4. Implement a make file to handle the above changes?
  5. Logging out-source to a log_event function and handler
  6. Clean up code from global variables when possible
  7. Document the code and interfaces used as well as the flow
  8. Add a string cache

Steps 1-3 were completed easily. Step 4 is not yet required, as I have a compilation shell script in place. Note that the script is under source control and uses gcc with the -Wall so that compilation highlights any coding issues. Steps 5-8 are also complete, albeit the event logger functions are not as simple as they could be as it uses a 2 stage process: create a string message using snprintf (printf n chars into a variable) to write the message to the event logger log_buff (so variables can be converted to string easily thanks to printf handling of variable length parameters). Then call log_event to output the message (default is currently stderr).

Looking ahead

So what should we look at before we have our next tag, and on also when will aila be feature complete -> i.e. which other features are desired or required?

Here's a quick list with detailed answers (or suggestions, ideas, consideration) after:

  1. [n+] A Simple Linked List implementation (dynamic array)
  2. [n+] IP Address search in the IIS log files
  3. [++] Contextual data for guid and ip searches
  4. [n+] A Win32 branch to ensure code portability
  5. [++] Use of the link list to store all data in memory
  6. [++] Creation of a shell to query in memory data

[n+] means this should be in before the next tagging takes place.
[++] means these are features.

1. We need a simple linked list in order to not use static arrays to store data. The problems related to statically dimensioned arrays are multiple, and can be seen in the string cache already: using a large array (8192 entries, aka 8 Ki)is quite wasteful if the parsed file doesn't contain many different entries (512 ~ 1Ki). This would be even worse if we plan aila to handle 1,000,000 lines of log: even if we summarize these lines to 32 bytes (and that's a stretch -> 8 integers on 32-bit systems, 4 only on 64-bit!) it would take 32MB of data consumed from memory in all cases, even if the parsed file is only that size of less (highly plausible for many average NS implementations).

The simple link list will be used as a never ending array to which we can add entries at any time without the requirement to size it before use. This should allow us to store parsed data (line summary and line details pairs) by groups of 1,000 entries on n chunks (1,000 chunks in the case of a million lines, 10,000 for 10 millions etc).

The simple link list could be defined as:

  • A structure containing the HEAD entry (struct simple_list { void *, "HEAD" })
  • A function to add entries (int add_to_list( void * ) )
  • A function to initialize the list (int _init_list( struct simple_list * ) )

So we could use the list to accumulate data at a very low cost and retrieve it (at a much higher cost of a list traversal, but that's the case in any standard array). This could be used in the string cache to avoid the fixed size array and of course to store parse result chunks.

2. Implement a search to find ip addresses inside log entries. This shouldn't be too complex, as the ip address forms are in the worse case so we can match any three dots in a sequence of 4 to 12 digits. What would be interesting is to store the ip address in its native for, i.e. a 32-bit unsigned integer so we could store the values into a single int (32 and 64-bit platforms alike)

3. Contextual data gathering: this is planned for a future tag, but is required in order to make sense of the gathered data. Is the ip address related to the client, server or contained within the http parameters? Is the guid a resource guid, a package guid or any other guid? This should not be too hard to extract given the large data samples I have at hand (close to 1 GB of data).

4. A Win32 branch in order to provide native Win32 functionality for users as most won't have a Linux system at hand. Also this would allow aila to run native on production servers. Finally cross platform compiling can help clean up the code in some instances (and yes, it also bring some issues, hence I have to branch to get aila to work in Win32 -> direct compilation currently works but we have some segmentation faults most likely due to array bound checks etc).

5. As stated before this is a long term goal for aila. The vision is to give the user the option to run aila as a single parse and output data program or to run the parse and drop to a custom shell to query the gathered data and extract valuable information using iterative process that only human can do (in order to pin-point problem areas and find the root cause etc)

6. This is the natural progression from step 5. There is more value to old the parsed data in memory and query it directly until the investigations required are over rather than to run multiple parses with different command lines. Specially considering that a 200MiB file could be shrunk (parsed) to a 20MiB

With that said there's only one thing to do now: go coding!