How long will my index take to rebuild?
One of the more common questions when administering Enterprise Vault and dealing with indexes, is just how long will an index take to rebuild? Unfortunately with this question, it becomes like any thing else, what are the specs of the servers? What kind of storage and network do you have? How much down time will there be for backups and such?
The question list becomes endless, and trying to calculate how much time can become difficult at best. That’s not to say there aren’t indicators however, as recently in Enterprise Vault 2007 onwards, an estimated index rebuild is posted to the Enterprise Vault logs every so often.
Firstly, let’s take a look at the three types of index operations you can have.
1. Index Rebuild (Entire Index)
2. Index Rebuild (Index Volume)
3. Index Update
4. Index Repair
Before reading this, I would like to point out that Index Rebuilds (Either Entire or per index volume) is only recommended when you have completely lost your index data that can’t be retrieved from a backup OR you have changed something such as Indexing level (from Brief to Full for instance)
Updates and repairs should be the only index operations you should have to perform.
1. Index Rebuild (Entire Index)
To perform an entire index rebuild, simply go in to the Vault Admin Console, find the Archive whose index you wish to rebuild, right click and go to properties, then go to the Advanced Tab and select the Rebuild Index button.
What this will do is delete all the index volumes within the SQL Tables for that user as well as the physical directories themselves from the Index locations on your physical storage. The only one it will not delete is the First index volume, it will simply delete the contents
From there, it will then start re-indexing items from the first Index Sequence number in the database all the way through to the highest index sequence number in the database. This will cause EV to open each and every piece of archived email that user has.
Note that if you use secondary storage such as NetBackup, the times can increase as the CAB files need to be recalled from tape, to disk as an ARCHCAB then have the item extracted to an ARCHDVS or ARCHDVSCC depending on your version of Enterprise Vault.
2. Index Rebuild (Index Volume)
In Enterprise Vault, it is possible for the index to “roll over” in to separate index volumes, thus rather than having one large index for an entire archive, it splits them in to volumes when a certain amount of content has been indexed. This allows for quicker/simpler repairs.
There may however come a time that you want to rebuild one of these index volumes, to do this, simply open the Vault Admin Console, find the Archive that you wish to look at , go to its properties then select the Index Volumes tab. Right click the volume you wish to rebuild and then hit the Rebuild option.
What this will do is delete that selected volume and those after it, and will then rebuild it from that Sequence number.
So for instance lets say you have the following:
Range: 1-10,000 | Status: Normal | Location : I:\Index_Location1
Range: 10,001 – 20,000 | Status: Failed | Location : I:\Index_Location2
Range: 20,001 – 30,000 | Status: Normal | Location : I:\Index_Location3
If you then rebuild the 2nd index volume, it would delete the 3rd location, and clear the contents of the 2nd location, it would then start indexing from the lowest Index Sequence Number (for that volume) and index from there.
In this situation, 1-10,000 (index volume 1) would be left completely alone, then the indexing would start going through the ISN’s from 10,000 all the way through to 30,000
3. Index Update
To perform an index update, go to the Vault Admin Console, then expand out archives, and then go to the properties of the archive that you wish to update the index against, go to the Index Volumes tab, right click the index and select Update Index Volume
Most commonly updates are pushed against Indexes that have “Failed” and thus you wish to put back online.
The way the index update works is it opens up the Index Metadata and finds the Highest Index Sequence Number and compares to what is in the database for that index.
If the Database says it has 20,000 items that are indexed, but the index has a highest ISN of 10,000 then it will simply start indexing item 10,001, 10,002 etc until it has completed
4. Index Repair
When an item fails to index, the failure is marked in the database and a file called IndexMissing.log is created in the Index location itself, this log file has the item that’s missing as well as a reason for why it couldn’t be indexed.
The most common reason for an item being missing from the index is due to something occurring with the storage where the item could not be read, so for instance if it is attempting to index item 10,001 and the storage goes offline, it will attempt to read the item three times, if it cannot then it marks it as failed to index and then moves on to the next item.
To repair an index, open the vault admin console, expand out the archives list and right click the Archive you wish to have the index repaired for. Then go to the Index Volumes tab. Look to the Failed Items column, if Failed Items is 0 or Null, then the “Repair Index” will be grayed out and you will not be able to repair the index.
Otherwise if there is a number in there, you can right click and go to Repair Index.
What this then does is open the IndexMissing.log file, and then attempts to retrieve and index each item listed. Note though, if there is a number in Failed Items column but no IndexMissing.log file exists, when you attempt to repair, it will remain in a “Rebuilding” state and neither fail or succeed.
If the IndexMissing.log file itself is missing, you can regenerate it using the following command
indexcheck.exe –c MissingDocs –f <yourIndexFolder> -db yourSQLServer
What this will then do is scan the index for any missing items and then regenerate the IndexMissing.log file. Note: that if this is run against legacy Indexes (Indexes created by Enterprise Vault 5 and earlier) then it is possible for it to mark each item in the index as being missing.
Once it is run, it will have updated the Failed Items column and you can attempt to repair the index again. Keep an eye on the Event Viewer to determine whether items are failing to be recalled, it could be because they are missing from disk, or they could also be corrupt.
That’s great, but how long will it take to complete these operations?!
Understanding what each operation does and how it interacts with Enterprise Vault is crucial for getting an understanding of how long it will take for the index to be up to date and operational.
Generally the quickest to perform are the Repairs, as they have a strict set of items to add to the index within the log files.
Updates are the next quickest, as a healthy index should never be too far behind the highest index sequence number, unless its been offline or failed for a very very long time.
That being said, in Enterprise Vault 2007 onwards, if an index is failed, Enterprise Vault will simply attempt to bring it back online every 2 hours, if after number of attempts it cannot bring it back up online, it will be up to the administrator to determine the cause and then run the updates again.
But this brings us back to how long should a rebuild take?
Well a lot of things are made of the Performance guides where the virtues of Emails indexed per second/minutes/hours/days are extolled. But this is a little disingenuous as it’s more about the amount of words that Enterprise Vault can index in a given amount of time.
Take for instance
100 emails with 20 words (such as “hi hello how are you? Etc)
100 emails with 20 word documents (all containing 1000 words each)
If you indexed all these items at the same time, would we expect the small emails to index the same speed as those other emails that have the word documents?
In this example this is 2,000 words indexed vs 2,000,000 indexed. So there is where the biggest discrepancy comes in.
The Updates.log file
The biggest and greatest tool in determining Index activity and speeds.
Before we examine the file, let’s just cover a quick couple of things
1. When an item is indexed, it is first held in memory
2. When it writes from memory to disk, this is called a MakeStable
3. The two ways it determines when it should commit to disk and makestable are
- When 2 million words are in memory
- When 15 minutes has elapsed
So if it has not gotten 2 million words within memory, it will wait up until 15 minutes, else it “times out” and then writes the contents to disk.
An example of it breaching the 2 million words limit looks as following
2008-07-11 09:11:37.178 M Index commit (Makestable) Reason=Words threshold exceeded Items added=151 Words added=2011955
An example of it passing the 15 minute time out looks as following
2008-07-11 04:30:19.780 M Index commit (Makestable) Reason=Write timer expiry Items added=3166 Words added=1085789
The above example is interesting for the fact that in the first example, we only indexed 151 items, but remembering what was mentioned above, it breached the 2,000,000 words limit, meaning these are large emails with either attachments or large message bodies. Yet if we look at the second example, it had indexed 3,166 items (2100% more than the 151 items) yet it indexed only 50% the amount of words. This means that those 3,000 items were most likely simple small email.
Reviewing the Updates.log file can also indicate when there are storage issues when it may be taking a long time to recall an item, for instance take the following
2008-07-03 13:45:52.242 M Index commit (Makestable) Reason=Write timer expiry Items added=674 Words added=730150
The above shows that only 674 items and 730,000 words have been indexed in a 15 minute period, if it keeps that up, then in an hour it would have indexed 2,696 items with 2,920,000 words.
Consider the other examples, would have indexed 12,664 items with 4,343,156 words.
So this really shows us the speed of the index that’s rebuilding, but it really doesn’t tell us how long its going to take us to rebuild. Well every so often, the index server will output an event that will give an estimated time to completion.
The event looks like this
Event Type: Information
Event Source: Enterprise Vault
Event Category: Index Server
Event ID: 7305
Time: 10:28:14 AM
Index volume update in progress
Index Volume: 1BAAC719A2565F642A55F993AC5F3D2F71110000evsite/Volume:2 (Journal Archive)
Percentage completed: 52%
Estimated time of completion: 2/15/2010 16:33
Index Volume Path: I:\Index_Location1\1BAAC719A2565F642A55F993AC5F3D2F71110000evsite_10001
Job Id: Vol_2
Job Author: Indexing Service
Job Description: Update rollover index volume
Job elapsed time: 17:17:53 (hours:minutes:seconds)
Processing time: 17:15:09 (hours:minutes:seconds)
Number of items added: 193281
Number of items additions abandoned: 0
Number of items deleted: 0
So what does the above tell us?
Well its showing that it has run for 17 hours currently, and has indexed 193,281 items (an average of 3 items per second) and It is currently 52% complete, and will be completed 15th February 2010.
But be aware, this date is an estimate, it can go further back than the date, or it could be done sooner, simply for the fact that the avg is built out of how many items there are to index, but again it does not know how many words there are to index.
So take for example 1,000,000 items to index, its been 2 days so far and its indexed 500,000 of those items, you would then expect it to take 2 more days, however if we then start coming across these complex items we may actually index 10% of the items but the same amount of words, but based on the items count, the avg has now slipped and the Eta will give a later date than what the previous event showed.
So if we look at another example, this is the index updates going full steam ahead, and you will see a large amount of items and a large amount of words being indexed in a short amount of time:
2008-07-10 01:51:28.816 Index commit (Makestable) Reason=Words threshold exceeded Items added=3491 Words added=2092674
2008-07-10 01:54:46.542 Index commit (Makestable) Reason=Words threshold exceeded Items added=3058 Words added=2010682
2008-07-10 01:59:12.990 Index commit (Makestable) Reason=Words threshold exceeded Items added=2871 Words added=2021586
You can see above in 15 minutes it has indexed 9,420 items worth 6,124,942 words.
Bare in mind the average size of users archives, its possible that for the majority of indexes, they are very quick to rebuild.
How can I determine where the indexes are located for these users?
Although you can tell from the index volumes tab where a location may be, one of the more confusing things is telling where exactly a specific volume may be, especially when you have several index locations.
The following SQL Query can be run that will show the index locations, how many items are in the index, highest and lowest ISN’s
SELECT A.ArchiveName, (IRP.IndexRootPath + '\'+ IV.FolderName) AS Folder, IV.FirstItemSequenceNumber AS FirstISN, IV.HighestItemSequenceNumber AS LastISN, IV.IndexedItems, IV.Rebuilding, IV.Failed, IV.FailedItems
FROM IndexRootPathEntry IRP,
WHERE A.RootIdentity = R.Rootidentity
AND R.RootIdentity = IV.RootIdentity
AND IV.IndexRootPathEntryId = IRP.IndexRootPathEntryId
AND A.ArchiveName = 'your Archive Name'
Once the query returns, you can simply copy and paste the Folder in to Windows Explorer and you will see the index folder and can locate the Updates.log from there
Although it can be extremely difficult and in some circumstances impossible to predict or know how long an index will take to rebuild, I hope the information above will help give the understanding to judge how quickly an index is progressing and to be able to judge for yourself to as and when it should complete