Video Screencast Help
Protect Your POS Environment Against Retail Data Breaches. Learn More.

Single Instancing

Created: 17 Aug 2011 • Updated: 02 Sep 2011 | 12 comments
This issue has been solved. See solution.

I am testing EV9 SP2, in particular the single instancing component.  I have set up a vault store to Share within Vault Store.  I created an email with 3 attachments (1.3MB, 4.1MB, 2.3MB, total 7.7MB) and sent it to myself and 2 others. 

On the partition, it has created 3xdvscc, 3xdvssp and 1xdvs components of the email as expected.  But it has done this 4 times (I assume 3 recipients plus 1 sent item).  While the file sizes are smaller (dvssp files are 1.9MB, 1.1MB, 708Kb) (EV compression I assume), the total for these file is 16MB.  Should this not have stored one dvssp item of each then made references in the fingerprint db?  Does it take time to remove the shared parts from the partition (I thought it was instant at archive time)?

What am I missing?

Attached a screen grab of the partition.

Thanks,

Discussion Filed Under:

Comments 12 CommentsJump to latest comment

Rob.Wilcox's picture

Was the vault store previously set to no-sharing?  Did you restart the storage service after changing the sharing level?

Sortid's picture

No, it was created with that sharing level. 

JesusWept3's picture

And what sharing level does it have?

Sortid's picture

Oops, thought I had responded.  It was set to share within Vault Store when it was first created.

Rob.Wilcox's picture

I have done a little bit of testing myself on this, and I do see similar behaviour:

 

I set up two new users.  And a new Vault Store set to be sharing within itself.

 

I sent one mail with 3 attachments, from my first user to my second user and then ran the archiving task for these two users (archiving only).  The attachments were a 2 Mb, 145 Kb and 311 Kb text files.

 

When I'm done ..  I see :

 

2 x DVS file -- that's to be expected if I remember correctly.  You always get that it contains the per-user information of old.

6 x DVSSP --- I would have expected three.

 

I'll ask around to see if anyone else has observed this behaviour.

JesusWept3's picture

what are the files you are archiving though?
I mean something must be changing the files physically so the hash's dont match, right?
I know that will fail out on Office documents because it tracks things such as Print dates and what not

Also what version of Exchange is being used?
Wondering if maybe Exchange 2010's lack of SIS is somehow messing things up also

Rob.Wilcox's picture

I was using plain text files..

 

And I know now the reasons.

 

It's down to timing, multi-threadness, and quantity of items in the archive.

 

So if you send a test mail with the 3 test txt attachments, and just run an archiving task against test users, since archiving is multi-threaded, and these are uses with little data, what happens is that the data being written to disk isn't committed "enough" at the time of another thread processing the other mailbox and trying to see if the data can be shared.

 

So instead, a more realistic test is :-

Send the test message.

Run the archiving task on the first mailbox.

Then run the archiving task a minute or so later on the second mailbox.

 

When you do that you get 2 x DVS file, and 3 x DVSSP... as is expected.

 

I've seen this in the dim and distant past when OSIS was first introduced with EV 8, and activity in Public Folders, which operate slightly differently.

 

Other options are a big chunky mailbox or three.  You'll get sharing then, because the data is all over the mailboxes, and it takes time to process etc.  The reason in these tests for no sharing, is multi-threaded, timing, lack of data, probably in that order :)

SOLUTION
JesusWept3's picture

So in theory journaling could see this issue on quiet journal servers
Say you are archiving three journal mailboxes that are quiet and the same message hits all three you could end up taking 300% space?

Sortid's picture

That seems to be right.  If you send the email a few times it stores the attachments once, then creates an additional folder on the partition for each subsequent email but without the attachments. 

I know you can set multiple threads per server, does this correlate to only one per mailbox?  So if you are journaling, it should hit each item in turn and not multiple at the same time on the one journal mailbox.

JesusWept3's picture

by default its 5 threads per task, whether its the journal task, archive task or what not.
You could have 3 journal mailboxes looked after by the same task, and since you can only have one thread per mailbox, you would theoretically have three mailboxes processing at the same time with two threads spare as opposed to serially processing them.

So if you have one task, three journal mailboxes, the same item hitting all three mailboxes at the same time, you could see this situation quite often

Sortid's picture

So I guess it's only certain circumstances where you could run into this issue - multiple journals being targetted at once; multiple user mailboxes being hit without journaling enabled or journaling to a different non-shared vault store. 

Thanks for your help.