Data Loss Prevention

 View Only
Expand all | Collapse all

Regex to detect multi-line matches

RonCaplinger

RonCaplingerOct 04, 2016 10:03 AM

  • 1.  Regex to detect multi-line matches

    Posted Sep 28, 2016 11:33 AM

    I am trying to create a regex to catch a match that spans two lines where a number is causing false-positives for social security numbers:

    name: pi
    value: 123456789

    I have tried "/name: ..\nvalue: \d{9}/m", but it is not catching the above text.  The website regexr.com shows this should work.  Is there something I'm missing?  Does DLP not support the Perl flag  "/m"?



  • 2.  RE: Regex to detect multi-line matches

    Posted Sep 28, 2016 11:38 AM

    Hello,

     

    Just tested as well in http://www.regextester.com/ and it works.. How are you generating the detection?

     

    BR,



  • 3.  RE: Regex to detect multi-line matches

    Posted Sep 28, 2016 12:07 PM

    I am posting the "name...value..." into dlptest.com as both HTTP and HTTPS. 

    My goal is to exclude these false-positive incidents because those 9-digit numbers are being detected as SSNs, but I need to get the regex to first detect them so I can then use that regex in the exclude rule.



  • 4.  RE: Regex to detect multi-line matches

    Trusted Advisor
    Posted Sep 29, 2016 02:28 AM

    hello Ron,

     try this regexp

    name:\s\w+(\r\n|\n)value:\s\d{9}

     

    newline could have different translation depending on where your source comes from. i have tested it on detection server and it works.

    it may be different for endpoint as regexp engine is not the same.

     regards



  • 5.  RE: Regex to detect multi-line matches

    Posted Oct 03, 2016 03:26 PM

    Stephane,

    I have tried your suggestion, it is not being detected by my Web Prevent or Email Prevent servers.  I have multiple users each day taht generate about 20 or more incidents from web browsing, and it is not detecting them.  I also tried emailing the above text and it was not identified.  Anything using multiple lines seems to be missed.  I can change it to watch for keyword "name: pi" or a regex of "value:\d{9}" and many incidents are being caught, but as soon as I try the \n or \r\n, such as "pi\nvalue: \d{9}" it is ignored and doesn't create an incident in either email or web prevent servers.



  • 6.  RE: Regex to detect multi-line matches

    Trusted Advisor
    Posted Oct 04, 2016 01:55 AM
      |   view attached

    hi ron,

     which dlp version do you have ?

    for me it is working for mail prevent v14.0.1 at least in detection (i did not try to use it as an exception) (cf screen capture)

     

    did you check that it is really a new line between the two using for example notepad++ or any other text editor to show all characters (even the non text one like tab, new line,...) ?

    regards.



  • 7.  RE: Regex to detect multi-line matches

    Posted Oct 04, 2016 09:03 AM

    Are you trying to use the regex in a Data Identifier or policy rule? Policy rule is where I would use it.



  • 8.  RE: Regex to detect multi-line matches

    Posted Oct 04, 2016 10:02 AM

    I'm also using 14.0.1, running on RHEL 6.  I don't know if there is really a new line between them, the data is coming in from a Web Prevent incident and is displayed in the Message Body in Enforce. 

    DLPFalsePositive.jpg

    I've also tried sending an email using the same text above and it isn't caught either, so I'm thinking maybe there is an advanced setting for your detection servers that you have already changed to allow multiline matches, but perhaps I haven't changed it from the default settings yet.  Is that possible?



  • 9.  RE: Regex to detect multi-line matches

    Posted Oct 04, 2016 10:03 AM

    It's in a policy rule.



  • 10.  RE: Regex to detect multi-line matches

    Trusted Advisor
    Posted Oct 04, 2016 10:32 AM

    try to edit "original message" as message body is interpreted by DLP UI so you may have some html tag in original text.



  • 11.  RE: Regex to detect multi-line matches

    Posted Oct 04, 2016 11:20 AM

    I tried the "Original Message" link in an incident and this makes it even more confusing.  This is what DLP shows as matching:

    120px_DLPFalsePositive1.jpg

    Here's what is in the Message Body:

    180px_DLPFalsePositive2.jpg

    And in the original message, the matched text above is not displayed the same way at all.  The second incidence of the 9-digit number is preceeded by "pi%3D", but the text before and after it don't contain the word "value" prior to the number, even though that is what is showing as matching in the first screenshot above.  In fact, the word "value" doesn't even show up in the original message.

    180px_DLPFalsePositive3.jpg

     

     



  • 12.  RE: Regex to detect multi-line matches
    Best Answer

    Trusted Advisor
    Posted Oct 05, 2016 02:23 AM

    hi ron,

     %3D means "equal" but it is URL encoded character.

    it means (name / value) are just an interpretation done by DLP UI and not original text analyzed by web prevent.

    name i svariable name (pi) and value (is the value of this variable).

    so you may try to exclude "pi=9digit". (pi\=\d{9}).

    If you need, i can perform more test on my plateform. just share with me (in IM if you want) a sample message (removing confidential part if there is any within this message).

     

    Regards



  • 13.  RE: Regex to detect multi-line matches

    Posted Oct 11, 2016 10:49 AM

    Stephane,

    Your info on the %3D encoded character helped me find a regex to detect the "pi=" in both iterations of it that I have seen in the ad network postback code:

    \&amp\;pi(\=|\%3D)\d{9}(\n|\&amp)

     

    Thanks!