One of life’s complexities in our ever shrinking world is dealing with different cultures, habits and languages. In the software world, this includes dealing with different character sets, or right-to-left languages such as Hebrew or Arabic.
I want to talk in particular about shortcut creation in Exchange mailboxes and how Enterprise Vault decides whether to format the shortcut as right-to-left text.
The idea of a shortcut in an Exchange mailbox is to reduce the space used in the mailbox whilst still allowing the user access to the item in the archive. As such, we want that shortcut to be as small as possible, but still contain some useful information for the user. Customised shortcuts allow for a configurable number of characters to be included in the shortcut, and the obvious place to get these from is the plain text version of the message, i.e. just the text with no formatting.
So how do we decide if the shortcut should be right-to-left, or left-to-right? Given the international nature of many people’s inboxes, they may very well have a mix of left-to-right and right-to-left messages, so really we want to make the decision based on the specific message being archived. The answer is in the code page of the message being archived: we use the PR_MESSAGE_CODE_PAGE property of the message, and check this against a configurable list of code pages to see if it should be formatted right-to-left.
For example, a message with an English code page will be formatted left-to-right, whereas a message using the Hebrew code page (1255) will be turned into a right-to-left shortcut. The list of code pages can be changed in the advanced settings of the Mailbox Policy. Under the Archiving General section, look for Code pages for right-to-left custom shortcuts.
In fact, this setting has been in place since way back in version 7, and for the last few years, all has been well with this method. However we have recently seen that messages created by newer clients may have different values for that property that may result in shortcuts that look wrong. In one example, the message code page indicated an English message, but the content was actually Hebrew, and formatted right-to-left. The shortcut was created left-to-right though, due to the English code page.
The problem when looking at these messages is that the formatting may be buried deep within HTML and without substantial and expensive parsing of the message, it is difficult to determine which way round the text should be presented. Remember that during customised shortcut creation, we only take the plain text version, with no formatting. In fact, if the HTML indicates right-to-left formatting, but the only characters being retained are English, or vice versa, then the shortcut may never look correct anyway.
However, what we have seen is that another property on the message, PR_INTERNET_CPID may give a different code page to the PR_MESSAGE_CODE_PAGE. Although this will not be correct in every case, a forthcoming change will allow PR_INTERNET_CPID to be used instead of the PR_MESSAGE_CODE_PAGE property. This will allow administrators the flexibility to choose which code page property to look at, and decide which works best in their environment, without introducing a whole new complex algorithm and slowing down shortcut creation considerably.