Video Screencast Help

Items to Export are Less Than Number of Search Hits

Created: 25 Jun 2014 • Updated: 11 Nov 2014 | 7 comments
This issue has been solved. See solution.

I posted on this topic before but it has come up again.  I am running EV and DA 9 SP3.  The number of hits for a search (search criteria is To From only) is greater than the number of items for export.  Does this have to do with the search setting "include items already in review"?  It seems that the "include items already in review" option would produce a hit for items even if they are in review.  So I am trying to figure out why the number of items to export is less than the number of hits.  Is it because of de-duplication?  Is there a SQL query I can run to see what the difference is, de-duplicate or other?

Thanks!!

Operating Systems:

Comments 7 CommentsJump to latest comment

TonySterling's picture

So if you run multiple searches and have the option checked to include items already in review some items be counted as a hit more than once.  In that case, the number of items exported would be less than the number of hits.

Not sure if there is a query but I will take a look to see what I can find.

tmurray1's picture

Thanks for your reply. This case does already have items in review. So it sounds like the hit count can be inflated for subsequent searches and that the true number of items is realized during export.

tmurray1's picture

Okay I can test this by creating a new case and running the same search in that case. Hit count should equal export count in other case.

-Tammy

tmurray1's picture

I created a new case and ran the search in that case and found that the search hits are the same for both cases.  I would have to accept the search and then export to compare the number of items for export but I would think that number would be the same as the search hit count.  It seems that counts for subsequent exports change, will alwasy be less than searh hits if item exists in other accepted search.  The end user wants all items exported not the difference between other items already in review or wherever the difference is coming from.  Anyway, I think this behavior is by design and to work around it I would have to create a new case and do the various searches in that case.  It seems like a workflow or business process.

I'll update as I have more details.  If anyone has anything to add, please add away.

tmurray1's picture

I created a new case and ran the search in that case and found that the search hits are the same for both cases.  I would have to accept the search and then export to compare the number of items to be exported but I would think that number would be the same as the search hit count.  It is just counts for subsequent exports that change.  The end user wants all items exported not the difference between other items already in review or wherever the difference is coming from.  Anyway, I think this behavior is by design and to work around it I would have to create a new case and do the searches in that case.

Bilgore's picture

I would think the difference could be attributed to the fact that search "hits" represent the number of times a term or phrase appear in the body of data, while export items represents the actual number of messages or documents in which the hits appear. For example, lets say I conduct a search for the word dog in my journal archive; and lets say the journal archive contains 10 million messages. The search results indicate 1,000 hits--the number of times the word dog appears in the body of data. Now, there may be some email messages where the word dog appears 2, or 5, or 10 times lets say. So, the total actual number of e-mail messages is not 1,000--its some lesser amount, say 300 or 400.

If it does not work this way, then what you are saying is that DA creates multiple copies of items on export (that is really the only way the search hits would match the # of export items...unless you did a search that is extraordinarily specific. Otherwise, I think what I describe is happening. If anyone disagrees or can explain where I am wrong, please do so!

Thanks

Kenneth Adams's picture

Sorry, Bilgore, but your theory about the hit counts is not correct.  A search for the word 'dog' does not take into count how many times the word is present in an item.  The item found is just 1 hit out of the total hits.  So, 1000 hits equals 1000 items (an item is an e-mail message with or without attachments, files, IM or fax messages, etc).

tmurray1, is your export set to exclude duplicates (i.e., similar items if Analytics is not enabled or true Duplicates if Analytics is enabled)?  If the export is set to exclude duplicates, then your export count will be lower than your total item count.

Also, when you look in the Review tab for just that case, you should see X number of items total with no facet selections and on Stacking option.  If you then change the Stacking option to 'Similar', you should see less initial items, but duplicates will be stacked under a randomly selected 'primary' where you would need to expand that 'primary' in order to see the duplicates.  The total count won't change, but the number of items you'll see will decrease if all duplicates are collapsed under their 'primary'.

Now, go the Export tab within the case under the Cases tab and start an export properties population.  One option will be to include duplicates.  If that option is not selected, all duplicates will be exported and the number of items to export should equal the total number of items that I refered to as X in the above paragraph.

As for SQL queries, the following should give you a count of total unique items (in this instance, unique would be a message and any attachments to that message counted as 1) in the case:

SELECT COUNT(*) FROM tblIntDiscoveredItems WHERE CaseID = #;  -- replace the # with the Case ID

To obtain the CaseID, hover your mouse pointer over the name of a search in the Searches sub-tab of the case.  After a few seconds, there will be a pop-up balloon appear with the SearchID and CaseID on separate lines.  Get the CaseID number and replace the # in the above query, then run the query.  The number returned would be the number of items to be exported if no filters are applied to the export criteria.

Now, if there are any export criteria applied, such as exporting only items marked as relevant, items that have been marked as reviewed, and / or do not include duplicates, the count of items to export will be decreased from the total count of unique items.

I know this information is a bit lengthy.  I just want to provide as much information as I think is needed to best explain the possible causes of the behavior you are seeing in the export counts versus total item counts.

Kind regards,

Ken

Ken Adams

Backline Support for CA, DA, ACE, UCE, PSTD, ARMS, EVDC
US Support Region

SOLUTION