Video Screencast Help

DLP EndPoint Regular Expression Parser

Created: 23 Jun 2011 • Updated: 23 Jun 2011 | 4 comments
This issue has been solved. See solution.

I recently encountered an issue with a policy definition that included a Regular Expression for content interrogation on content being “sent” from a PC to a USB attached media device. The RegEx was one that is being used for policies deployed to all of the other available Monitor and Prevent capabilities with no issue.  Through the assistance of Symantec Support it was discovered that the EndPoint parser does not support the use of (i) for ignore case but requires “simple match” with all iterations [aA]. The existing documentation addressing DLP and RegEx is fairly light and due to this possible deviation in “parsing” there appears to be a greater need for more “rich” content speaking to the RegEx support and best practices.

  • Is there any documentation that outlines differences in RegEx parsers for each of the DLP agents (Network, EndPoint, File Discover, SharePoint Discover, etc.)?
  • Is there any documentation that addresses best practices and examples for building proper RegEx’s for each agent?
  • Does Symantec recommend any RegEx validation utilities when these types of complex expressions are needed for policy definition?

This information would be very valuable to the community as a whole.

Comments 4 CommentsJump to latest comment

xlloyd's picture

Hi John,

These regex guidelines can be found in the DLP Administration guide document that comes with the software. They don't provide a tutorial or anything but there is a syntax list.

In the online training for DLP, they suggest Regex buddy but I found that this site works pretty well for me:
http://www.regextester.com/

I believe that it uses the Javascript engine but I'm really not 100% sure about that. Hopefully someone else can speak to that.

Regards
~Xavier

If this post has helped you, please vote up or mark as solution
xlloyd's picture

Look at this post in the Connect forum for more info. It seems I was right about the Java engine for all section except endpoint which apparently uses the "boost" engine. I've never heard of it and I'm not sure what tools can be used to test it. It may have even changed since that version for all I know. The post is a good read in terms of resources though. Check it out.

https://www-secure.symantec.com/connect/forums/exp...

If this post has helped you, please vote up or mark as solution
Keith Reynolds - ExchangeTek's picture

Xavier is dead on about that.  Endpoint DOES use the Boost regex engine, whereas detection servers use the Java engine.  In MOST cases, these are pretty similar and that regextester.com site gives a pretty good representation of the results you'll see with both.

I recall running into problems with some differences between the two implementations of regex with a customer of mine some time ago.  It had to do with the support of positive look aheads (or maybe it was negative look behinds) with Boost (one or the other didn't seem to be supported), so be on the lookout for that if you're using those structures in your regex.  To date I have never found a good online tester for the Boost implementation of Regular Expressions.

Hint, and what I've been playing with lately...you might be able to accomplish what you want better with a Custom Data Identifier with a Custom Script Validator (available in V11 now).  The scripting language (basically a very limited implemenation of Perl, and descibed in the Custom Detection Guide) will allow for additional validation of the match as built into the Custom DI, and would work consistently among Endpoint and Detection Servers.  It may even be more efficient than a Regex.

~Keith

SOLUTION
John.Armga's picture

This is invaluable information, does anyone have any idea what version of Boost or Java is compiled into the specific agents?

The information at http://www.boost.org is much more detailed than what is provided by Symantec, but there are multiple releases which have variants.

Keith I will start looking at the "Custom Data Identifier" and "Custom Script Validator" as a consistent implementation of rules interogation would benefit environments where there are rich deployments of monitor and prevent rules across all the available agents.