Data Loss Prevention

 View Only
  • 1.  Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 06, 2011 01:02 PM

    I've successfully written a regex expression for detecting comma delimited address lists and included it n a test policy.   The policy is successfully detecting comma-delimited address lists in Word and text files ending with the .txt extension, but that's all.

     The policy is very simple. It contains two rules: the protocol/endpoint monitoring rule with removable media and CD/DVD discs selected, and the second rule with my regex expression.

    I have tried a couple of ways to create a csv file that contains the same exact test cases and the resulting files are not detected by my policy. The first was simply by changing the file extension from .txt to .csv. In the second method, I imported the text file into Excel and then exported it as a csv  file. There are no visible differences in the address list entries inside the exported csv file.

    This leads me to wonder if the endpoint agent is somehow treating the .csv file differently.  Has anyone else seen this behavior?  How is this problem solved?



  • 2.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 06, 2011 02:13 PM

    Hi Bill,

    Can you share the regex you used please?



  • 3.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 06, 2011 03:01 PM

    */ Format for lists - Phone Number Title First Name Last Name Address part 1  Address part 2  City State zip code or Zip plus four with a dash

    \x28\d{3}\x29\d{7}                             # (Area code) seven-digit phone number
    (\s|,)                                         # White space or comma delimiter
    (M(R|S))                                   # Title (gender)
    (\s|,)                                         # White space or comma delimiter
    [\w\.\']{2,}                                   # First Name
    (\s|,)                                         # White space or comma delimiter
    [\w\.\']{2,}                                   # Last Name
    (\s|,)                                         # White space or comma delimiter
    \d{1,}                                         # Address1 is assumed to be numeric
    (\s|,)                                         # White space or comma delimiter
    ([\w\d\.\']{1,}(\s|,)){1,5}                    # Address2 is assumed to one to five part alpha numeric
    ([\w\.\']{2,}(\s|,)){1,3}                      # City is assumed to be one to three part alpha numeric
    (A[LKSZRAP]|C[AOT]|D[EC]|F[LM]|G[AU]|          # State is assumed to be two-letter Upper case
    HI|I[ADL N]|K[SY]|LA|M[ADEHINOPST]|
    N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|
    T[NX]|UT|V[AIT]|W[AIVY])
    (\s|,)                                         # White space or comma delimiter
    \d{5}-\d{4}|\d{5}                              # Zip code or Zip code plus four with dash

    Entire Regex expression:

    \x28\d{3}\x29\d{7}(/s|,)(M(R|S))(\s|,)[\w\.\']{2,}(\s|,)[\w\.\']{2,}(\s|,)\d{1,}(\s|,)([\w\d\.\']{1,}(\s|,)){1,5}([\w\.\']{2,}(\s|,)){1,3}(A[LKSZRAP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADL N]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])(\s|,)(\d{5}-\d{4}|\d{5})


    Test Cases

    (402)5551212,MR,Fred,Flintstone,4545,Rock Ridge Road Apt 5,Bedrock,NE,68101-4545
    (402)5551212,MR,Fred,Flintstone,4545,Rock Ridge Road Apt 5,Bedrock City,NE,68101
    (402)5551212,MR,Fred,Flintstone,4545,Rock Ridge Road Apt 5A,Bedrock City,NE,68101-4545
    (402)5551212,MR,Fred,Flintstone,4545,Rock Ridge Road Apt 5,New Bedrock City,NE,68101-4545



  • 4.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 06, 2011 03:22 PM

    I think the way DLP treats the CSV is that it would be looking for that entire regex inside a cell (or between 2 commas).

    If you want this to pick up on CSV files, you'll have to split up each section of the regex that would be in an individual cell into individual conditions in a single detection rule. This would cause all the conditions to be joined by "AND" and your policy should match.

    You should still be able to match on text files even though you would not match for the comma delimiter because. If you really wanted to match on the commas then you could just repeat the rule you have now in a separate detection rule which would be joined by "OR".

    Hope this helps,
    ~Xavier



  • 5.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 06, 2011 03:42 PM

    Will test this out. It might also help with Excel spreadsheet detections.  I'll post the results here.



  • 6.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 06, 2011 04:39 PM

    I did set up another rule with several of the  unique regex atoms, namely phone number, title, and state. The revised policy now detects address lists in Word, Excel spreadsheets, and text files ending in .txt but still not in csv files.

    Still much head scratching here.



  • 7.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 06, 2011 04:52 PM

    Wow...this is really puzzling! What version of DLP are you running?

    I'm going to try replicate it in my environment and see what happens.



  • 8.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 06, 2011 05:16 PM

    Ok just tested it on DLP 11 and it works as you would hope for it to. It catches the file when I rename it with a csv extension as well as xls and txt.

    If you are running an earlier version then that must be the problem. If you're running the same version, then you should contact support and have them look at it. It is a curious issue though.



  • 9.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 07, 2011 10:09 AM

    Hi,

    CSV filteype detection is built into DLP out-of-the box, just create a new policy rule "Message Attachment or File Type Match" and select "Comma Separated Values" then add a compound rule with your regex-es to find specific data in the CSV file.

    It will help you as well, if the format of the CSV changed for some reason.

    Good Luck,

    Barnabas



  • 10.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 17, 2011 11:58 AM

    Regretably, Barnabas' suggestion did not work..



  • 11.  RE: Writing regex expressions for CSV files in Endpoint DLP

    Posted Oct 18, 2011 09:25 AM

    Which version are you running?