After spending a little bit of time Sunday morning coding a c program to handle the command line arguments (to some mixed results) I built a similar program using .Net (and Mono Develop) in less than 10 minutes, with a file parser in less than 20.
Okay, the prototype doesn't do anything that stunning but it counts the lines and returns the value as well as a count of bytes found in all line (so it should be a little less than the file size, given we don't count the eol char).
Last night I was drawing in order to work out how to best store statics on the parsed lines in memory. And the following data hierarchy came about this design session:
The 3 data structures are explained here:
- Line data: summary information and statistics on the content (all values = 1)
- Chuncks: this is an intermediate level to avoid keeping all LineData in memory. It could be done but having a mid-tier is also useful to feed data out (progress bar of some kind etc)
- File data: this is the file overall data containing a summary and statistiqcs
In details the 3 structures are containing the same information (but more or less aggregated) so all in all we only have one data structure (for now):
So next we'll be interested in how we can populated this data and do the aggregation at the chunck and file levels.