Video Screencast Help

Dedupe ratio predictor

Created: 05 Aug 2013 | 3 comments
effiko's picture

I have the idea to write a small application in C++ (so it is portable between unix and Windows) to traverse the backup file tree and compute for each file (if smaller than a block) or block an MD5 hash. Block size shall be a parameter.

There is no need to have an incremental run as only the hashing is computed, but in version 2, this can be added using some portable free DB.

All the hash numbers shall be processed by either Excel or awk to produce a histogram which will give some idea of the expected dedupe ratio.

I wonder if anybody had it done already or you think something is wrong with this idea.

I'll apreciate your inputs.

Operating Systems:

Comments 3 CommentsJump to latest comment

effiko's picture

Thanks Nagalla,

The mentioned post is a post backup analysis for images already in the DataDomain. What I have in mind is a program that will predict the dedupe ratio before installing any backup or dedupe engine on the clients premises more like a presales tool.

RamNagalla's picture

good idea.... it realy helps to predict the storage requirests also to somelevel...