Video Screencast Help
Protect Your POS Environment Against Retail Data Breaches. Learn More.

Ingesting PSTs: WAN Connected Sites in Cached Mode

Created: 12 Jul 2010 | 7 comments

I've searched for this subject but couldn't find anything relatively recent (meaning newer than 2 years or so), so I'm posting anew in the hopes of getting some feedback WRT to the current (or upcoming...) version of EV.

We're looking to implement EV8 SP4 (or EV9, depending on required functionality), and our first project phase is to ingest PSTs across our enterprise and then prohibit further creation thereof. We have a central corporate HQ, and based on the testing I've done thus far the whole "Locate/Collect/Migrate" method should work for all users whose workstations and file/print server is local to the Archive and our Exchange Cluster.

However, we have about 80 sites located all across the country, and users at these sites do NOT have their own Exchange server. Their mailboxes reside on our Exchange cluster at our data center, they all run their Outlook in Cached mode, and each remote site has its own file/print server. We have a hub/spoke WAN architecture, and each remote site connects back to the HQ Data Center via 384K-768K lines, with a few topping out at 1.5MB. Getting the PSTs for these users into the EV archive (and presenting them back to the remote desktops) is a problem for which I've been unable to find an efficient (and officially supported) solution, which is why I'm posting here.
Based on a script that we ran across our enterprise, we'll have a total of about 12,000 PSTs totaling about 3 TB (that's TERABYTES with a 'T') in volume to get off of these remote servers and desktops (we have ~7 TB total PST data). We've got about 6000 total users, with only ~20% of those being local to our HQ. Suffice it to say we have a challenge in front of us. We're currently using Exchange 2003 & most clients have Outlook 2003 (all remote users are in cached mode).

Because of our limited bandwidth to/from these sites, I can't see the locate/migrate method being a very efficient model. For these remote sites we're considering the Virtual Vault or, some EV-managed cousin (via EVPM or similar) to Managed Folders (because our Exchange/Outlook version doesn't currently support them).

I've been working with our Symantec reps on this, and the only thing I've had presented to me is this:
1. Get the Outlook Add-in installed on user desktops & Mark the PSTs.
2. Run a successful 'Post-PST-Marking' Puredisk backup (our current process/tool for remote site backups) or copy all marked PSTs to an external HD and physically ship it back to our data center.
3. Stamp all remote PSTs as read-only and prohibit further creation in user Outlook profiles. Stop all further backups of PSTs.
4. Restore COPIES of remote PSTs to alternate location and ingest using the 'PST Migration Wizard'
5. Once ingested, await shortcut replication back to remote desktop & Outlook Profile

Here is where I have a huge problem with this:

My testing showed that by default no action will have been taken by EV components on the 'original' PSTs as found/marked in each user's mailbox. Thus, after the above process completes a duplicate PST/Archived-PST folder structure will be created. Since we're migrating COPIES of PSTs (albeit marked ones) the 'post-migration' action as set in the Wizard is taken on the COPY of the PST, not the original PST. The Outlook Client Add-In isn’t smart enough to distinguish the archived PSTs ingested via the ‘PureDisk-Assisted’ (or any other Wizard-based) method from the ‘original’ PSTs. I can’t see how it should be, as what we’re doing is manually populating the mailbox archive with data from a completely different location than where it originally existed. We can’t expect the client to do things it isn’t designed to do, and this (IMHO) is definitely beyond its design. So what’s the workaround? How do we do this, track it, and (most importantly) make it easy for our users? I can’t see this process working at all for our remote sites without a solution for this. An efficient method MUST be developed to get rid of the original PST file pointers in users' Outlook Profiles as well as their actual physical existence on the desktops/servers. Otherwise, users will have duplicate PSTs or some period where their PST data is simply unavailable until the replication of the stubbed PST data back to their cached mailboxes has completed. If this were a LAN environment this problem wouldn't exist, but because we're in a WAN situation with these sites I'm stuck with it.

As I see it, the primary pain points with this approach would be:
1. Bandwidth Saturation over the WAN links, impacting all services to/from the sites
2. The excruciatingly long amount of time that this would take to complete
3. User resistance, complaints, and lack of access to their PST data

Symantec staff first provided me with this approach/solution, but when I posed the question to multiple staff as to whether the above is an OFFICIALLY supported Symantec solution, all I heard were crickets... I haven't heard anything besides "we're checking with our product team". If anyone can comment on this aspect it would be appreciated.
If there's a Best Practices doc out there somewhere I'd welcome that (mention was made of such in a post that was several years old).

We're also considering an Exchange 2010 upgrade (skipping Exch 2007), but of course I know all about the issues with that. No EV support until EV 9.0, and then only with Exchange 2010 SP1 (I hope), and most people are aware of the quagmire that exists with EV and Outlook 2010 (we'll be lucky to see that by end of CY 2010).

Any all help/comments/recommendations are encouraged.

Thanks,

El Kabong

Discussion Filed Under:

Comments 7 CommentsJump to latest comment

Peter Kozak's picture

Hi El Kabong,

you might want to have a look on PST-Flighdeck...
http://www.evtools.net/products/pst-flightdeck

Regards,
Peter

QUADROtech Solutions AG - www.quadrotech-it.com

Rob.Wilcox's picture

How about client side PST migration?
You can enable that for batches of users, and it will migrate chunks of PSTs per user over time.

EV-ASSIST's picture

Indeed Client side migration also has the ability to prepopulate the Vault Cache as well so that you don't have to bring everything across the wire again.

El-Kabong's picture

I've read about the Client-side method, but our Symantec reps haven't said too much about it for us.  One thing about it that worries me is that the 'chunks' that are referred to are 10MB in size.  In a WAN situation that's still a large number and we've had circuits brought to a halt by users sending 10-15MB emails to an 'all users-Site-X' DL.  We have packet-shapers in place and Exchange traffic is not given a high priority, so time would still be a significant issue here.  Would we be able to make the existing PSTs read-only throughout this process, or does EV need some sort of write access to them?  Is the 'chunk' size configurable or cast in stone?

Further, regardless of the method, we will still have the Cached Mode issue to deal with.  The reality that each message must still make a round trip (or more) between the remote desktop/site and the archive cannot be ignored.   I've been told a bit about the 'Proactive Caching' WRT the Virtual Vault, and while it may be helpful for the PST migration, I'm not sure it will work for the overall archiving strategy that our legal/mgmt departments are wanting to take.  They want the archive to be a totally user-driven process, where each user decides for themselves what is important enough to be kept in the archive.  Couple that with making PSTs verboten and (knowing users...) you have a situation where they will not be selective about their record-keeping and instead decide to use the archive the same way they used their PSTs:  Store EVERYTHING.  If that happens, IMHO we could have the potential for a continually saturated pipe, impacted business processes, and VERY unhappy users.

If the 'pre-population' or 'Proactive Caching' is used, what form do the messages take when they are added to the Vault Cache?  Are they already stubbed/shortcutted (and if so, where/how does this process take place and where are these settings configured?), are they in their full original form, or what?  I would think that there would have to be some kind of reconciliation process between EV and the client to get the PSTs into the archive; update the client MAPI profile (including the VV folders); and then get rid of the original PSTs stored on the desktop.

Rob:  I've been searching through this forum enough to recognize your name, so I'll put the question to you:  Do you know whether the 'puredisk-assisted' (or external HD-assisted) method of PST ingestion that I described above is something that would be supported by Symantec?

Thanks for the input. 
El Kabong

Rob.Wilcox's picture

Hi again,

I'm not sure how Supported it is ... but if I remember correctly the chunk size *can* be changed from 10 Mb, upwards, or downwards - push beg and holler your Symantec person to get you the details.  I believe there is a good article on the forums that explains the full process of client side PST migration, it's not perfect, I will give you that, but it has as one of it's main purposes "remote" clients as a direct-goal.  What happens in general terms is the PST is split up in to 10 Mb chunks, and then at the end any additional data that was added to the PST is also then uploaded.  So the PST can be in use, during the migration.

Proactive caching, aka trawling, takes place against the Outlook OST file.  The idea being that "stuff" is added to Vault Cache that is "soon to be archived", so that when it's actually archived, it doesn't then have to be downloaded from the EV server to the Vaul Cache.  The item that is trawled, or pre-emptively cached, or proactively cached (depending on your terminology) is the full item that existed in your OST file.  Like I say it's to stop the roundtrip download from the EV server when it is actually archived.

Proactive caching, doesn't touch PST files.

If your users drag and drop stacks of data into Virtual Vault from PSTs then that will trigger a PUSH of that data up to the EV server.  You can configure limits in your policy.

Regarding the final question ...  I have no idea what the Puredisk-assisted (or external HD-assisted) methods are.  I've not heard of them before.  With regards to external HD, if you can "somehow" get a stack of PSTs from each user copied to an external hard drive, transported to your EV data centre, and imported via the VAC or by search/locate/migrate, then that will work... I just think it's a *very* difficult job to get the PSTs on to the external hard drives in the first place.

The options are all yours..  pro's and con's, and really this is the land of Symantec Consultancy (or a 3rd party)... I don't think there is a quick way to give a definite answer of "yes this way will work", but hopefully we've discussed some useful ideas at least?  (Keep the ideas/comments/questioins coming too, if you need more input)

El-Kabong's picture

Rob, thanks very much for your input. 

PST's are my primary focus right now, so it's good to know that it's moot to pursue the Proactive caching for this task.  You also confirmed my suspicion about what would happen if users drag/drop PST data into their virtual vault folder. 

Regarding Proactive caching, specifically your point that:  The item that is trawled,  proactively cached (depending on your terminology) is the full item that existed in your OST file.  If we're going the route of exclusive user-driven manual archiving (instead of automatic date/policy driven archiving) how does the full item get added to the archive, and since it's the full item that gets added, at some point does the stubbed version of the message  come back and replace the full version in the VV Folder?  Does something take place between the server and the archive and then the stub is just pushed back?

As for what I have called my  'PureDisk/HD-Assisted Method,' this is something that one of our Symantec reps came up with (and actually named!) and presented to me.  I certainly hadn't heard of it before either, and it seemed to be an ad-hoc solution cobbled together with spit and duct tape.  I couldn't find anything about it in any of the docs or KBs.  That's why I asked again in this forum, and I presume that's also why I haven't heard back from any of our Symantec reps about this.  Yes, the difficulty/complexity level of such an undertaking is extraordinarily high. 

I agree about the services/consultancy.  We'll be sending out an RFP for this, and this posting is certainly helping me distill my points down to key issues that would need to be addressed by a services provider.
Thanks,
El Kabong

Rob.Wilcox's picture

Regarding your question relating to user driven manual archiving.

When you have Offline Vault in EV 2007 or Vault Cache in EV 8 enabled, and you manually archive an item in the Outlook client, by default, the item is copied from the OST file to Vault Cache/Offline Vault... at the same time the EV server is "nudged" to go to the users Exchange server and archive the actual item.  There may be an exchange of a small amount of data between all 3 concerned parties, but the idea is that when you manually archive something in the client which is say 15 Mb in size, nothing of that size travels between the client, and the EV server, or the Exchange server.

Hope that helps,