Video Screencast Help
Symantec Appoints Michael A. Brown CEO. Learn more.

Expression Prefix/Suffix

Created: 21 Apr 2010 • Updated: 29 Oct 2010 | 3 comments

What is everyone using as prefix and suffix on their expressions? At the moment I have

Prefix: (?<=(^|(?:[^)+\d][^-\w+])|\t))
Suffix: (?=(?:[^-\w])|$)

but i've been reading up on expressions and the following looks a little better. A Regex benchmark also shows a 2.6% improvement in processing speed.

Prefix:(?<=(^|(?:[^\p{P}\w])|\t))
Suffix:(?=(?:[^\p{P}\w]|$))

Any ideas/suggestions? I am hoping to make this alot more efficient; the prefix/suffix decreases processing speed by 2.5x.

Comments 3 CommentsJump to latest comment

jgt10's picture

DLP uses the Java RegEx engine.  Check the Java reference pages for information.

I'd also suggest the "Mastering Regular Expression" book from O'Reilly.

Do you understand what the prefix and suffix patterns are looking for?  If not, go to the above resources.

Are they appropriate for your data?  You may not need the features of the "default" boundary expressions.  You can adjust those expressions to better match your data.

Is there a Data Identifier to cover what you are looking for?  If so, use it as it is much more efficient than a RegEx and does a lot more validation.

Why the eye towards performance? 

JGT

--
John G. Thompson
JOAT(MON)

Alex Foley's picture

Does every feature of DLP use the same RegEx engine?  Do any of the scanners, etc., use a separate engine, or Network vs. Endpoint or Prevent vs. Network Monitor vs. Network Discover?

---
Alex Foley

jgt10's picture

The only part of DLP that doesn't use the Java endgine is the endpoint.  I believe that uses the boost regex engine.

--
John G. Thompson
JOAT(MON)