If you didn't read the first part of this post then you might want to start reading from here. This post is the wrap-up.
Previously: In order to eliminate Microsoft Exchange server idiosyncracies I ran a pure SMTP (client) to SMTP (server) test with the same (bad) results that Stefan had reported. This test although it failed, did narrow down the problem, indicating me that the error most likely was not within the Microsoft Exchange server and most likely originated at the client site. Further investigation also showed that the client had multiple mail servers defined, not all of which were actually working.
Back to the client
I spent another day or two trying to contact the administrators at the bigcmailsrvr site to go over my findings and to work out a plan of action to fix the problem. When I told the email administrator about the non-working mail server entry in their configuration, their response was not what I expected. The administrator said that the dummy (non-responding) email server entry was intentional and that it was recommended that they configure it that way by a consultant. The rationale was that spammers go after the 'highest' pref server first, perhaps assuming that the highest numbered (and therefore lowest priority) server is probably used for internal organization messaging functions and email coming into the network via that server would be less likely to be filtered or blocked. The admin said that by having that server as a dummy entry it knocked out 80 percent of their incoming spam.
I'm not an email expert although I have worked with SMTP mail on and off for several years and I am fairly comfortable with the protocol, however this was the first time that I had heard about using the mail server pref value in this way. The basic process is something like the following:
- The mail server wants to send an email to an outside (i.e non-local) recipient and makes a DNS query to find out where to connect.
- The DNS server returns one or more MX records to the mail server, where each MX record contains the Fully Qualified Domain Name (FQDN) for a mail server (note: the MX record data should not contain an IP address).
- Each MX record returned has a pref value. The MX record with the lowest pref value is the one that the sending mail server is supposed to use to make the connection. The next higher pref value servers are only selected if the chosen recipient mail server does not respond.
I pointed out to the administrator that he had just told me a few days earlier that he did not do any filtering or blocking of email and that this appeared to be a pretty big email filter to me. I eventually got the administrator to remove the dummy mail server for a quick test and immediately all of the mail worked. The administrator then put the configuration back with the dummy mail server intact (and our mail connection to them effectively blocked).
After explaining the situation to Stefan and to Tony, the CFO. Tony managed to get his counterpart at the client site to 'persuade' his email administrators to get the connection working. Statistics can be so overrated they may have stopped 80 percent of spam from coming in with their dummy mail server configuration but I would bet that some of that non-arriving 80 percent spam mail was really legitimate mail that was now being blocked and whose senders did not have the time (or maybe the ability) to determine why the mail wasn't being delivered.
I was going to call Microsoft on this anyway, to report the problem as an apparent bug in Exchange (after all our Exchange server should have been using the Mail server with the lowest pref value not the highest). At the end of the day though I did not call them because I was not convinced that the bigcmailsrvr site did not have something else mis-configured and contributing to the problem because my email test client which did not use Exchange, also failed to deliver the test message.
The immediate problem solved, I left it up to the "C" level execs to work out a mutual business arrangement as to whether the permanent solution should be to keep the current configuration, set-up some form of white-list or try something else.