Here's an interesting question for you:
Can RFC 2047 encoded text in the Subject line of an email contain encoded line break characters (i.e.,, ^J, a.k.a. 0x0A)?
I don't think they should, because the point of RFC 2047 encoding is to encode non-ASCII characters which would otherwise be legal in the Subject line, not to encode characters which would otherwise be illegal, which includes line breaks.
RFC 2047 itself doesn't give a definitive answer.
What do you think?
[#]email #MIME #SMTP #SysAdmin
=> More informations about this toot | More toots from jik@federate.social
Why am I asking this? Because I have a filter which runs on my incoming email which decodes RFC 2047 encoding that didn't need to be encoded, and I just discovered that it's breaking the formatting of some emails from LinkedIn because these emails are including LF characters in the encoded text. I'm going to fix the filter to work around the problem, but I am curious to hear what other people think.
=> More informations about this toot | More toots from jik@federate.social
For example:
Subject: =?UTF-8?Q?Carly_-_Director,_Franchise_Secur?=
=?UTF-8?Q?ity_Strategy_posted:_One_of_the_f?=
=?UTF-8?Q?irst_engineers_I_hired_found_me_o?=
=?UTF-8?Q?n_LinkedIn._=0A=0AHere=E2=80=99s_what_he_did=E2=80=A6?=
See the linefeed character lurking there in the last encoded blob?
=> More informations about this toot | More toots from jik@federate.social
And here's the one line of code I had to add to my filter to fix the problem (preceded by a five-line comment explaining what it's for and why it works):
1 while ($value =~ s/\n([^ \t])/\n $1/g);
Yes, it's written in #Perl. I still write stuff in Perl when it's the right tool for the job. Simple tasks that revolve around text processing are more easily done in Perl than any other language, IMO.
A 🏆 to the first person who can explain why the fix is a loop instead of a single search/replace.
=> More informations about this toot | More toots from jik@federate.social
@jik
If the ([^ \t]) matches a \n, then the regex engine will have already moved past it, so even though you've got the /g option that particular \n won't match the first character of your regex? But when you pass it back in on the next loop it will?
Basically, "because \n\n is possible".
This feels like a situation where one of those extended regex attributes that means something like "match but don't consume" might do the trick for a single pass solution, but I never learned those. 😁
=> More informations about this toot | More toots from jztusk@mastodon.social
@jik
Okay, it was stuff like "positive lookahead assertion" that I was thinking of. But as far I can understand them, that won't do what you want. Let's hope future you appreciates the extended comment current you wrote.
=> More informations about this toot | More toots from jztusk@mastodon.social
@jztusk @jztusk Yes, you win the 🏆 for the first and correct answer.
And although it hadn't occurred to me, I actually think you're right that a positive lookahead assertion would work here in stead of the loop. I think I could have done:
$value =~ s/\n(?=[^ \t])/\n /g;
=> More informations about this toot | More toots from jik@federate.social
@jik
Full confession time: I didn't fully understand the problem description, so I can't be sure this second one works 😬. I just kinda made a stab at it.
I'm like a programmers rubber duck, but occasionally my quacks are even meaningful.
And thanks for the trophy. I put it in my profile.
=> More informations about this toot | More toots from jztusk@mastodon.social
@jztusk @jik
Nice all around
(except for silly website inserting =0A into subject lines)
=> More informations about this toot | More toots from BRicker@fosstodon.org This content has been proxied by September (ba2dc).Proxy Information
text/gemini