I recommend testing IDM on various datasets as the results may be quite surprising. You need to have some level of confidence to be able to measure the results and write effective policies accordingly.
Try fingerprinting the SDLP documents and generate a number of test files to help you identify desireable results.
Condier using the SDLP 11.1 Admin Guide is 1231 pages in length. Copy 500 pages to a new document. Copy 25 pages, Copy 300 pages and edit 4 or 5 paragraphs in the middle of the text. Export PDF file to RTF, doc or txt, etc...
How many pages would have to be copied into a target document to meet or exceed 10% similarity?
What percentage similarity rating would be identified if you converted the PDF file to MSWord?
What sort of severity or threshold would you set for a 40% similar document? 90%?
What should an incident handler do with a document or documents that are similar but not exact?
I like IDM and I think it is a valuable detection method but by no means should it be relied on as a sole or primary method for data loss protection or control. There are unfortunate limitations (cannot be used on the endpoint) Once you answer the questions above you will be in a much better place to create policies and controls.
If you have access to a large amount of documents that are very similar in content you may want to consider using VML. The use of VML is worthy of another post/forum.