2024-12-23

update: Bad hypens break documentation?

=> #software

Short version: hyphen '-' 0x2d is good, dle '‐' 0x90 is bad on shell command lines. First reported here:

=> /en/2024/20241216-bad-hypen-breaks-man-page.gmi

The kind folks of my local Linux User Group suggested, that the value of the well known environment Variable LANG might be responsible for this. In my alplinelinux instance LANG ist set to 'C.UTF-8'. And resetting that to just 'C' makes the funny characters go away. So an alias might help:

alias man='LANG=C /usr/bin/man'

This is clearly a workaround for a phenomenon not understood properly.

thrig had some comments about "invisible things"

=> gemini://thrig.me/blog/2024/12/16/invisibles.gmi | local copy

They suggest another alias:

alias man='/usr/bin/man -Tascii'

Please note that there is no space between the option '-T' and its value 'ascii'. Otherwise the default output driver of troff is being used resulting in (nicely formatted) PostScript. Still the source of the phenomenon is not understood.

Someone else suggested that this behaviour might arise from mandoc. However, I'm utf-8 illiterate when it comes to programming. I also suspected that the modern terminal emulator foot was interfering with the stream of byte to be shown.

However, Daniel Kalak reached out to explain a few more things.

Let's inspect the relevant bytes in the dump again:

$ echo -n 'text ‐Dn' | od -tx1a
0000000  74  65  78  74  20  e2  80  90  44  6e
          t   e   x   t  sp   b nul dle   D   n
0000012

The character in question is three bytes: 0xe2 0x80 0x90. od is interpreting them as ascii7 bytes thus dropping the highest bit and producing an incorrect transcription. In this case od is the wrong tool, it cannot show that this is a unicode character. The three bytes in question are the presentation of U+2010, "HYPHEN". Ok, that's actually a thing I kind of suspected, but did not succeed to verify.

But there is more! Daniel kindly points to groff_man(7) or groff_man_style(7). In there I find the following snippet:

Option dashes are specified with the ‘-’ escape sequence; this is an important practice to make them clearly visible and to facilitate cut-and-paste from the rendered man page to a shell prompt or text file.

Now, this looks a lot more like a problem in the source of the man page rather than a full fledged flaw in the tool chain. Nice! So let's check out the code then ... well, yet another indirection. The manpage is written in sgml and requires docbook2man ... so a few installs and some editing the Makefile we find this:

$ cd wpa_supplicant-2.10/wpa_supplicant/doc/docbook
$ grep -C1  Dnl80211 wpa_supplicant.sgml
wpa_supplicant -Dnl80211,wext -c/etc/wpa_supplicant.conf -iwlan0

The sgml source lists the line as programlisting, which at least looks plausible to my innocent eyes. Let's try to build this:

$ sed -i.bak -e 's/docbook2man/docbook-to-man/g' Makefile
$ make man
$ grep Dnl80211 wpa_supplicant.8
wpa_supplicant -Dnl80211,wext -c/etc/wpa_supplicant.conf -iwlan0

And there it is. There are normal hyphens '-' used in the generated man page and not '-' escapes as requested from groff_man.

So, where exactly is the correct place to fix this?

Big thanks to Daniel to point me into the right direction! No, this is not solved yet, but free/libre software let me inspect it all the way to this point. Fantastic!

Cheers!

=> https://codepoints.net/U+2010 | https://w1.fi/releases/wpa_supplicant-2.10.tar.gz

=> Home

Proxy Information
Original URL
gemini://ew.srht.site/en/2024/20241223-upd-bad-hypen-breaks-man-page.gmi
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
137.845959 milliseconds
Gemini-to-HTML Time
1.032952 milliseconds

This content has been proxied by September (ba2dc).