Exact Data Match (EDM) Indexing Proximity Logic Demystified
There's always a lot of confusion regarding the Proximity Logic for matching on multiple data elements with an Exact Data Match rule. I hope to demystify how this works with this article. I myself have always been a little unsure of this, so this post is backed by some basic testing that I have performed to get definitive validation of how token proximity works.
The current definition for the proximity setting on a detection server, EDM.SimpleTextProximityRadius, states:
Number of tokens to the left and to the right of the current token that are evaluated together when the proximity check is enabled.
The default value for this setting is 35.
A "token", as it is spoken about here, is not simply a character or space, which seems to be the common misconception. Many people tend to interpret this as "If data element 1 is within 35 characters of data element 2, a match will be detected, but if those elements are more than 35 characters apart, no match will be detected". This is not the case, however. A "token", in simple terms, can be thought of as a word or other string of characters separated by common delimiters such as a space or a tab. So what this is really saying is that "if there 35 or more tokens to the left and 35 or more tokens to the right of the current token that is being read, then this is not a match".
Consider an EDM profile that contains Credit Card Numbers and Last Name, with a rule that is looking for both of those elements. If the data that is being evaluated looked like this:
"My last name is Smith and my current credit card number is 6011456734231982"
...then when this data is being evaluated, the detection engine will eventually look at the token "credit", and find that there is a Last Name that is 4 tokens to the left, and a Card Number that is 4 tokens to the right of the token being evaluated (the word "credit"), and as a result it will detect a match.
I find that a more simple way to think about this, rather than counting from the center token as illustrated above, is to multiply the EDM.SimpleTextProximityRadius by two and subtract one (to account for the current token being evaluated) , and use that number as the number of tokens between matching elements.
Max Tokens Between Elements to Detect a Match = (EDM.SimpleTextProximityRadius * 2) - 1 = (35 * 2) - 1 = 69
So with a default setting of 35 for this parameter, if there are 70 tokens between Last Name and Card Number, a match will not be detected. If there 69 tokens or less between these elements, a match will be detected.
I hope you find this helpful in understanding this parameter better, and its effect on detection using Exact Data Match profiles. I welcome any comments or feedback.