Just tracked down a really weird issue: when sending mail from my server, the Cyrillic letter х would be replaced by � and a newline.
I was inclined to blame it on other mail servers initially, but the issue turned out to be with my mail filters. And it’s of course due to the way this letter is encoded as two bytes in UTF-8: D1 85.
Doesn’t ring a bell? No, I didn’t get it either. Apparently, str.splitlines() in Python will consider various exotic line endings as well. One of these is the 85 byte. Back in 1973 the standard ISO 2022 apparently extended ASCII with C1 control codes, and this one stands for “Next Line.”
Does anyone use C1 control codes these days? No idea. But I had to replace str.splitlines() by re.split() call that would only split by \n and \r\n.
Of course the real issue is that I’m treating mails as strings rather than binary data. I somewhat remember there being a reason for this back when I wrote this code, probably BytesIO not allowing reading by lines. Well, maybe it’s just time to revisit that.
=> More informations about this toot | View the thread | More toots from WPalant@infosec.exchange
text/gemini
This content has been proxied by September (ba2dc).