Attacks

=> gemini://gemini.conman.org/boston/2025/01/03.1

What I find annoying is the lack of any type of attack as an example.

An "arbitrary code execution" vulnerability lets an attacker do whatever they want, and w3m has had several of these, and may have more. Granted, measures such as ASLR and W^X can make life more difficult for an attacker, and you might notice w3m crashing as the attackers try to get the stars to line up for their ROP gadget to work as you (or some automation) try to download a malicious page over and over. Or, you could get unlucky and they are now running whatever code they want, or reading all your files.

Defense in depth is a thing. Why bother with reserves when you have a Maginot line? Pledge and unveil are just the sort of mobile reserve to box in attackers who do manage get a toehold into a process. Oh, you did not have any reserves? Well that sucks.

What I would like to see how opening a text editor with the contents of an HTML could be attacked. What are the actual attack surfaces? And no, I won't accept “just … bad things, man!” as an answer. What, exactly?

Where is your formal verification for the lack of errors?

Otherwise, there is some amount of code executed to make that textarea work, all of which is the "actual attack surface". If you look at the CVE for w3m (nevermind the code w3m uses from SSL, curses, iconv, intl, libc, etc.) one may find:

Format string vulnerability in the inputAnswer function in file.c in w3m before 0.5.2, when run with the dump or backend option, allows remote attackers to execute arbitrary code via format string specifiers in the Common Name (CN) field of an SSL certificate associated with an https URL.
w3m before 0.3.2.2 does not properly escape HTML tags in the ALT attribute of an IMG tag, which could allow remote attackers to access files or cookies.
Buffer overflow in w3m 0.2.1 and earlier allows a remote attacker to execute arbitrary code via a long base64 encoded MIME header.

Granted, this is a very short list compared to more poorly written browsers, and they're pretty old exploits. However, the flaws allowed "arbitrary code execution" which is exactly what it says on the tin, or let the attacker read cookies and files. Opinions certainly differ here, but I'd default to disallowing that sort of access should any future exploits be found, or added. Another common error is to screw up system(3) which run strings through the shell. w3m's use of system(3) does not appear to be so poorly written, but many things are. More bugs have and doubtless will be found in the SSL, libc, or other libraries that an attacker could use.

So, again, the attack surface is any code that executes whilst that textarea (and everything else) is spirited from computer A to computer B. w3m has about 55,000 lines of C, plus that of the various libraries and the kernel, minus whatever percentage isn't involved to download, decrypt, un-MIME, decompress, parse, display, interact with, and possibly submit whatever resources are involved. So the threat boils down to how much faith you have that there are not any relevant errors lurking in all the code involved. It's not like w3m or really much of anything is formally verified, so who knows? Too expensive. Maybe there aren't any bugs, or maybe there are some number of actual threats that have not been found (I strongly suspect that there are bugs), or have not been made known, yet, or someone (quite possibly me) could add a new so-called regression. w3m does have a test directory, but it does not look very complete.

My stance is that there have been too many CVE, even after subtracting the fake CVE that some folks submit to pad their security credentials with. Why too many? I've had to explain to programmers in a well regarded CSE department recently why their code was… sub-optimal. Less polite words could be used. They were running remote, user-supplied strings through a system(3) call, and it took a few emails to convince them that this was kind of bad. Far too many more such examples can be found in the CVE database, and doubtless will be found in the future. Some programmers don't even take to the argument that they shouldn't throw random strings to system(3), and are presumably still shipping their code to who knows where to do who knows what. It's not like there's much market pressure to relegate poorly written software off to the dustbin of history.

Moreover, it's fairly simple to pledge and unveil a process to remove classes of system calls (such as executing other programs) or remove access to swathes of the filesystem (so an attacker will have a harder time to run off with your SSH keys). There can be some sticky wickets, like how to best support editing textareas given that w3m is no longer allowed to execute arbitrary programs. What can my w3m do? Pretty much only write files to a temporary directory. Works great for me. Threats? They could fill up the tmp directory, or try to run that partition out of inodes, or read draft postings such as this. Much less bad than the w3m default of "read everything, run anything". Why wouldn't I want that peace of mind, that attackers are less likely to be able to run off with everything?

(Attackers can be stupid as well, and may content themselves to run some bitcoin nonesense and won't see how far the rabbit hole of their access goes. But why should I have faith that the attackers only will be stupid?)

This is why I feel such articles are bad—by not talking about actual threats they enfoce[sic] a form of “learned helplessness.”

How, exactly, is adding pledge and unveil to w3m "helplessness", and then iterating on that design as one gains more experience?

Actual threats, are, again, remote code execution via buffer overflow or string format vulns or memory allocation errors—see the CVE database for these and many more such attack classes, or roll d20 to disbelieve. The attackers are not rolling to disbelieve the "bad things, man!", by the way—that lead to arbitrary file access or arbitrary program execution because some programmer, once again, forgot to dot their t's and cross their i's somewher in a code path involved. After supporting production systems for too many years, I do not trust programmers (nor myself) to not write errors, so look to pledge and unveil by default, especially for "runs anything, accesses remote content" browser code.

Everything is dangerous and we must submit to onerous measures to keep ourselves safe. Sprinkling calls to pledge() [4] aren't the answer. Yes, it helps, but not thinking critically about security leads to a worse experience overall, such as having to manually edit a file which would still be subject to all three of the above attacks anyway.

Everything? I have lots of code (most of it, in fact) I have not added pledge to. Low threat, doesn't do browser-y things that set off all the alarm bells.

And how, exactly, is adding pledge and unveil onerous? It took much less time to add to w3m than writing this post did; most of the time for w3m was spent figuring out how to disable color support, kill off images, and to get the CFLAGS aright. It is almost zero maintenance once done and documented.

And how, exactly, am I not thinking critically about security? Do I just run w3m and pray that there are no exploits in all the executed code? Getting w3m up to at least the level Firefox and Chromium are at on OpenBSD is a no-brainer.

By the way, /usr/bin/vi -S is used to edit the temporary file. This does a pledge so that vi cannot run random programs. A simple iteration would be a vi variant (perhaps a modification of the -S flag, or a new flag) with unveil to limit filesystem access to only the temporary directories involved. Not that difficult. Meanwhile, the vi -S "stdio rpath wpath cpath fattr flock getpw tty" pledge does not let an attacker run any programs, nor to make internet connections, so the security threats would involve malicious file deletion (they could wipe out your SSH keys, forcing time to be spent debugging what went wrong and restoring from backups), or, less worryingly, denial of service, resource consumption, or vi display glitches. Low threat, but also low cost to implement. To be precise, it would take less time and consume many fewer characters than went into this most onerous paragraph. (vi(1) on OpenBSD has a fair bit of historical bloat so there's non-zero odds of there being an error an attacker could tickle. In fact I have a workaround for a crash, but darned if I can figure out how to reliably trigger the condition and thus report something useful to a programmer. Maybe the crash is exploitable? Other editors have had their share of security issues, e.g. CVE-2022-45939 or CVE-2024-53920.)

pledge and unveil for w3m? w3m is the very model of a high threat program: it by default makes network connections, has a lot of code, most or all of said code being neither verified nor tested, has access to all the files, can run all the programs. pledge is, again, low cost to implement, and can greatly limit the scope of what an attacker can do. A no-brainer, in other words.

Or maybe you could elaborate on what, exactly, you found onerous about pledge and unveil? I'm not a programmer, and did not find the interfaces difficult, nor most of my code nor what I run very difficult to adapt. You've probably seen code around here, but I'm more a sysadmin who can write a script or two. I would most definitely never try to write a HTML parser.

And to address the bit about parsing HTML—is parsing really that fraught with danger? All you need to parse HTML is to follow the explicit (and in excruciating detail) HTML5 specification [5]. How hard can that be?

It is rather easy to find CVE for errors in HTML parsing code, besides the "did not properly escape HTML tags in the ALT attribute" thing w3m was doing that lead to arbitrary file access.

CVE-2021-23346, CVE-2024-52595, CVE-2022-0801, CVE-2021-40444, CVE-2024-45338, CVE-2022-24839, CVE-2022-36033, CVE-2023-33733, …

The most recent patch for OpenBSD is, coincidentally, an XML parser bug in libexpat. So, from where I am, the answer is "pretty damn hard". Something software professionals still manage to sometimes screw up. This only serves to bolster my "better add some pledge and unveil to that thing, because who knows what the code allows for" argument.

Proxy Information

Original URL: gemini://thrig.me/blog/2025/01/04/attacks.gmi
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 1015.200697 milliseconds
Gemini-to-HTML Time: 1.78125 milliseconds

This content has been proxied by September (ba2dc).