Hello all.
I have a really odd situation that is upsetting a lot of users. This is going to be somewhat long.
We have one BMG Contol Center and four BMG Scanner appliances - two in Europe and two in USA. All identically configured, all version 8.0.12 now.
We have always noticed that occasionally some messages got stuck in the delivery queues with "timed out while transmitting data" error. After upgrading to version 8 the wordage changed to simply "time out".
Well, last couple of weeks we had a real rash of these incidents with some important financial information not reaching the clients.
It seems that most target domains to whom the messages got stuck use Barracuda. But maybe this is just a coincidence.
We spoke to some of the IT guys on the receiving end and they told us "it looks like most of your messages get stuck from your European gateways and usually get delivered without problems from your USA gateways".
OK... I thought that maybe our European users were sending out different type of messages compared to our USA users?
So I changed our Exchange outbound routes to send all company's outbound mail via USA Brightmails. Nothing got stuck.
I temporarily smart-hosted the European Brightmails to use our USA Brightmail for outbound delivery and flushed the European queues - all the stuck messages got delivered to their destinations.
I asked our network engineers if there was something different between our EU vs USA network configuration or Internet provider configuration. They swear there is no difference.
I asked them to perform a network capture while one of the stuck messages in the European Brightmail queue was re-trying and we saw something interesting - the connection to the remote server (Barracuda) got established without problems, MAIL FROM, RCPT TO - no problem, DATA - no problem UNTIL our BMG started transmitting the headers, specifically the Thread-Index and References headers. This is when the remote mail server strated shouting "ACK, FIN" a whole bunch of times and jammed the session closed.
I do have to say that the message had abnormally long Thread-Index and References headers. Looks like it was one in a long chain of replies.
But anyway - if the long headers are a problem, why do these messages get sent out via our USA Brightmails just fine?