Clever internet people! I have a weird network problem.
My local machine is a Linux box. It runs Exim, a mail transport agent. When I send an email, my mail user agent talks to Exim, and Exim passes the message on via SMTP to one of two remote smarthosts for further delivery.
For one of those two, but not the other, something times out when trying to make the SMTP connection. The message in the Exim logs is "SMTP timeout after initial connection" but I am not certain there has actually been an "initial connection" in any useful sense.
When it's trying to deliver such a message, netstat shows the connection to the relevant server as in state SYN_SENT.
Usually I think this would indicate some rather low-level network problem, like a broken firewall rule.
BUT
Yes, I have tried turning it off and then on again.
Anyone got any idea what might be happening?
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@gjm Is it possible that exim supports STARTTLS, the Python library doesn’t by default, and the connection is timing out when TLS is attempted?
(If so, the obvious followup question will be why TLS is making the connection fail with a timeout rather than something more informative.)
=> More informations about this toot | More toots from mathew@universeodon.com
@mathew I'm explicitly invoking STARTTLS when I do it in Python, and I get a response from the server indicating that it worked. So I don't think it can be that.
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@gjm Hmm, OK, next random guess: is one using IPv4 and the other IPv6 for whatever reason?
=> More informations about this toot | More toots from mathew@universeodon.com
@mathew @gjm i suspect some magic firewall rules on your box that are dropping packets depending on which process is sending them
=> More informations about this toot | More toots from fanf@mendeddrum.org
@fanf @mathew I cannot think of any reason why I would have such firewall rules on my box, and still less any reason why they would suddenly have appeared (or suddenly started applying to Exim where they hadn't before) a few hours ago. But I guess it's not impossible.
(I think I have no firewall rules at all on this box. There is a separate box between it and the internet with plenty of firewall rules on it, but obviously that box has no knowledge of what process on this box any packet is coming from. This box is running Ubuntu 22.04, I am fairly sure I have never knowingly configured a firewall on it. The default firewall on recent-ish Ubuntu is ufw, and ufw status
says Status: inactive
.)
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@gjm @mathew hmm i no longer suspect firewall magic
does your mua submit to exim via smtp or exec(sendmail)? can you reproduce the problem by feeding exim a test message from a shell prompt? that would make it easier to turn on debug tracing so you can see if there’s anything suspicious in the logs
=> More informations about this toot | More toots from fanf@mendeddrum.org
@fanf @mathew MUA passes mail to MTA via SMTP.
If I do exim -i -t < mail-test.txt
where mail-test.txt
contains an RFC822 mail message, I get the same behaviour.
Unfortunately, exim -d+all -i -t < mail-test.txt
(run as root) doesn't generate debug info for any of the relevant stuff -- it hands the message off to a separate process for delivery.
However, once delivery has failed once, if I clear the retry database, stop my already-running exim process, and do exim -d+all -bd -q30m
(i.e., the same thing that usually runs in the background, but requesting debug output), the relevant-looking bits go like this (summarized because of character limit):
hostlist:
'smtp.pobox.com' IP 103.168.172.64 -1
'smtp.pobox.com' IP 103.168.172.49 -1
[stuff involving hints database; "no host retry record"; "no message retry record"; "retry-status = usable"]
103.168.72.64 in serialize_hosts? no (option unset)
delivering [messageID] to smtp.pobox.com [103.168.172.64] (...)
[checks hosts_require_dane and hosts_pipe_connect; no to both]
Connecting to smtp.pobox.com [103.168.172.64]:25 ... 103.168.172.64 in hosts_try_fastopen? yes (matched "*")
TFO mode sendto, no data: EINPROGRESS
connected
[stuff to do with $primary_hostname]
selecting on subprocess pipes
selecting on subprocess pipes
SMTP(Connection timed out)<<
SMTP(close)>>
LOG:MAIN
H=smtp.pobox.com [103.168.172.64]: SMTP timeout after initial connection: Connection timed out
The bit that takes all the time is where it says "selecting on subprocess pipes" twice.
(Some comments to follow.)
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@fanf @mathew What strikes me here is the "fastopen" bit, so I'm going to try disabling that and see whether it makes anything better. (Initial indications are that it doesn't, but it's possible I screwed something up.)
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@fanf @mathew I had screwed it up, but unfortunately after unscrewing (and checking that fastopen was then actually disabled) it fails in the same way as before.
The debug output looks much the same as it did before. The failure message it displays after the timeout is now "failed: Connection timed out (timeout=5m)" which isn't quite the same as before.
netstat still shows a connection in SYN_SENT state while it's busy timing out, as before.
Thanks for looking at this, btw!
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@gjm @fanf At this point I’d be inclined to try WireSharking it to see exactly what’s going on.
Does Exim give you a full transcript of everything sent by the remote server? Wondering if there could be a clue there.
=> More informations about this toot | More toots from mathew@universeodon.com
@mathew @fanf I don't think it gets as far as anything being sent by the remote server.
I'm not an Exim expert but I'd have expected that telling it "turn on all debugging" would make it dump out any full transcripts it might be capable of dumping.
If I run tshark while making it retry one of my failed messages, the only things involving the relevant IP address are (1) querying a DNS server and getting those addresses back, and (2) the following lines which look to me as if my box is sending a SYN but it's never getting a response:
14 1.188041335 172.16.123.1 → 103.168.172.64 TCP 78 37218 → 25 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2465038006 TSecr=0 WS=128 TFO=R
16 2.191082237 172.16.123.1 → 103.168.172.64 TCP 74 [TCP Retransmission] [TCP Port numbers reused] 37218 → 25 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2465039010 TSecr=0 WS=128
19 3.215323691 172.16.123.1 → 103.168.172.64 TCP 74 [TCP Retransmission] [TCP Port numbers reused] 37218 → 25 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2465040034 TSecr=0 WS=128
20 4.239356867 172.16.123.1 → 103.168.172.64 TCP 74 [TCP Retransmission] [TCP Port numbers reused] 37218 → 25 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2465041058 TSecr=0 WS=128
35 5.263335677 172.16.123.1 → 103.168.172.64 TCP 74 [TCP Retransmission] [TCP Port numbers reused] 37218 → 25 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2465042082 TSecr=0 WS=128
and a few more retransmits looking very like the ones above.
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@fanf @gjm Hmm, OK, that’s pretty baffling. Paging @rk in case he might have some ideas. If I think of anything in my sleep I’ll let you know.
=> More informations about this toot | More toots from mathew@universeodon.com
@mathew @fanf @rk Thanks!
I bet that I've somehow done some incredibly boneheaded thing, but at the moment I can't imagine what it might be.
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@mathew @fanf @rk "Fixed" now by explicitly putting "port = 587" in Exim's SMTP smarthost transport config. I still don't know what changed (I don't think that was ever there before), and I suspect this isn't the right way to fix it, but it seems like it's working now.
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@mathew @fanf Wait, I just noticed something extremely damning in those packets. It's trying to talk to port 25 on the remote server, not port 587. That's got to be wrong, and somehow it's got to be my fault, though I don't know what I did that would cause it. But this feels like something that should be solvable.
[EDITED to add:] No, I'm a moron, and the initial SMTP connection is meant to be on port 25. Right? I am so ignorant. I will try it with Python smtplib again and watch exactly what it does.
[EDITED again:] Nope, Python/smtplib connects on port 587 right from the start and never attempts port 25. This must be what's broken.
[EDITED a last time:] Aaaand the ghastly hack of adding "port = 587" to the smtp smarthost section of my Exim configuration makes it work again. I suspect I should be doing something less ghastly. I still don't know what made this break.
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@gjm @fanf Plain text SMTP should be on port 25; STARTTLS on port 587; and “always on” implicit TLS on port 465.
My guess is that pobox used to allow port 25, maybe even with STARTTLS, for maximum compatibility with badly written customer SMTP clients. They probably stopped doing so because both plaintext SMTP and STARTTLS are considered deprecated at this point (per RFC 8314), and there just aren’t that many flavors of email client out there any more.
The other possibility that springs to mind is that there’s some config file that specifies a default port for exim, and a routine version update resulted in that config file getting changed.
=> More informations about this toot | More toots from mathew@universeodon.com
@mathew @fanf Yeah, sounds plausible. Though I'm fairly sure I didn't accidentally update exim, and looking at things like file modification times I'm now really quite sure I didn't.
=> More informations about this toot | More toots from gjm@mathstodon.xyz
@mathew Exim is using IPv4, or at least it explicitly lists IPv4 addresses for the things it tries to talk to (including in particular the server it's failing to talk to). For the Python, I tried connecting using the DNS name (which for all I know might be doing IPv6) but also using the explicit IPv4 address. Exim failed, Python worked. I'm pretty sure the above is good evidence that both were using IPv4.
(The DNS name actually has records for two different IPv4 addresses. I only tried one of them in Python, but I have observed Exim failing to talk to that same one.)
=> More informations about this toot | More toots from gjm@mathstodon.xyz This content has been proxied by September (3851b).Proxy Information
text/gemini