There are two sections within the use of keyphrases that define rule within Symantec DLP. One is a list of keywords and the other is proximity. Keywords are good for very unique keywords but should not be used for common words as the results will unnecssarily inflate and obscure match count. Use proximity for highest recall and precision of keyphrases.
Proximity is defined by two sets of expression lists (A and B). You can have as many combinations of expression lists as you like in a single ruleset. Also consider ANDing keyword expression lists for even greater precision.
I personally have tested distance at 50 words for high recall but usually rely on distance between 10 and 25. There are tools that can be used to identify keyword proximity combinations like nearest neighbor or n-gram recommendations. Any future release should consider recommending words and distances.
Max value is 99. The higher the number the greater the recall.
30 is the default value for EDM.
Some proximity rulesets may require that you only look in one direction (forward/reverse). If this is the case they can be defined with a regular expression.
I rely on proximity more than any other method with 11.x of SDLP because it has the highest combination of recall and precision in a single method and it overcomes the significant drawback of using plain old keyword lists that get counted for every single occurence.