New York Data Loss Prevention User Group

 View Only
  • 1.  Expression Prefix/Suffix

    Posted Apr 21, 2010 03:24 PM

    What is everyone using as prefix and suffix on their expressions? At the moment I have


    Prefix: (?<=(^|(?:[^)+\d][^-\w+])|\t))
    Suffix: (?=(?:[^-\w])|$)

    but i've been reading up on expressions and the following looks a little better. A Regex benchmark also shows a 2.6% improvement in processing speed.

    Prefix:(?<=(^|(?:[^\p{P}\w])|\t))
    Suffix:(?=(?:[^\p{P}\w]|$))


    Any ideas/suggestions? I am hoping to make this alot more efficient; the prefix/suffix decreases processing speed by 2.5x.



  • 2.  RE: Expression Prefix/Suffix

    Posted Apr 28, 2010 04:11 PM

    DLP uses the Java RegEx engine.  Check the Java reference pages for information.

    I'd also suggest the "Mastering Regular Expression" book from O'Reilly.

    Do you understand what the prefix and suffix patterns are looking for?  If not, go to the above resources.

    Are they appropriate for your data?  You may not need the features of the "default" boundary expressions.  You can adjust those expressions to better match your data.

    Is there a Data Identifier to cover what you are looking for?  If so, use it as it is much more efficient than a RegEx and does a lot more validation.

    Why the eye towards performance? 

    JGT


  • 3.  RE: Expression Prefix/Suffix

    Posted Apr 28, 2010 04:27 PM

    Does every feature of DLP use the same RegEx engine?  Do any of the scanners, etc., use a separate engine, or Network vs. Endpoint or Prevent vs. Network Monitor vs. Network Discover?



  • 4.  RE: Expression Prefix/Suffix

    Posted Apr 28, 2010 09:44 PM

    The only part of DLP that doesn't use the Java endgine is the endpoint.  I believe that uses the boost regex engine.