Formatting Gemtext for Gopher [II]

=> ================================

Recently I have been contemplating mirroring my capsule on Gopher. To

do that I would need to recreate the landing page as a Gophermap but

what about the posts? Since they are plain text I initially thought

that I could perhaps just present the .gmi files directly. Sadly not.

characters), while Gemtext is not wrapped.

widespread compatibility in Gopher, 'pure' ASCII would be better.

still be improved when read directly.

In summary, I need to reformat Gemtext before serving the posts on

Gopher. On the plus side, since one of big benefits of Gemtext is its

simplicity, I quickly realised I could do that with just a few lines

of shell script, using 'sed' and 'fmt'.

TL;DR

~ Sample Gemtext file:

/files/gmi2txt-sample.gmi

~ Sample Gemtext file - reformatted:

/files/gmi2txt-sample.txt

~ Shell script filter to reformat .gmi files [EDIT 2022-01-05 12:27: tweaked]:

https://gist.githubusercontent.com/ruario/3bd570d265ca5a42cb039092ed4f1299/raw/5323b3880c55d0f679eb24400b053865dfbb413c/gmi2txt.sh

[EDIT 2022-01-08] To use, make it executable, then you pipe or redirect

the Gemtext in.

$ ./gmi2txt.sh < yourfile.gmi

Handling non-ASCII


Some Gopher servers and clients can actually handle UTF-8 but it is by

no means universal and likely fairly rare, at least on the (reader's)

client side, which I cannot control. I did recently note that if I

look at recent posts by Alex Schroeder on his Gopher site in either

VF-1 or Lagrange, I sometimes see characters like "EUR" and even the

odd emoji. Interestingly, if I browse the same site using a client like

Lynx, the characters get replaced--[EDIT 2022-01-05] Lynx can support

UTF-8: S. Comments - 2022-01-05 03:32. No doubt there is some 'magic'

going on the server side, to understand what the client is capable of

and then doing automatic replacements as needed.

~ 2021-12-25 Donations - Alex Schroeder:

gopher://alexschroeder.ch:70/0page/2021-12-25%20Donations

~ 2021-12-26 The confusing world of Reddit - Alex Schroeder:

gopher://alexschroeder.ch:70/0page/2021-12-26%20The%20confusing%20world%20of%20Reddit

So there are two ways I could handle UTF-8 characters.

The latter is perhaps not as daunting as it sounds, since I would be

making this for my own personal usage and thus only need to handle the

characters that I regularly use. The other nice thing with doing this

myself is that I can decide exactly what characters are replaced with

and I can create a uniform experience across all Gopher clients.

Simply piping through sed would allow me to convert a bunch of

characters, e.g. '-e s/[:D:D:D]/:D/g'. Yes some of the 'subtlety'

of those different emojis is lost but... 'does it matter?'. I could

try and think of a clever (ASCII only) emoticon for something like

'[shrugs]' or I could just do 's/[shrugs]/[shrug]/g'. Alternatively,

I may decide that my usage of emojis is largely for decoration and

wipe them out altogether (s/[:D:D:D[shrugs]]//g). If I do it myself,

I can also update and tweak these replacements and deletions going

forward as my usage and opinions on the matter change.

Handling Gemtext


Gemtext is designed to be parsed at line level. [EDIT 14:58: clarified]

Seven of the eight line types (roughly equivalent to: , , ,

[], , , ) start with a recognisable pattern,

so it is easy to detect and apply different formatting to each of

them. The last one (similar to: ) can start with any character but

this is detected by virtue of not being one of the others.

-Heading: Level 1-

These lines start with a single '#'.

My Post

I feel it looks clearer to remove this and underline them with '='.

My Post

=======

-Heading: Level 2-

These lines start with two '#' characters.

Subsection

Again, a simple underline with '-' looks very clean and is arguably

more in keeping with Gopher.

Subsection


-Heading: Level 3-

These lines start with three '#' characters.

Sub Subsection

Starting and ending these lines with '-' retains the clean feel of the

other two heading types, while keeping them less prominent.

-Sub Subsection-

-Lists-

Lists begin with '* ' (including the space).

I indent any wrapping that takes place. In addition, I add newline

spacing between each bullet. This makes multiline, wrapped bullet points

more readable (IMHO).

sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

-Links-

Links start with '=>' followed by the link and the

title/description. However, this can make them long and can 'bury'

the URL slightly.

=> gopher://example.com This is a cool site

I would like the URLs to stand out by being on their own line. This

is particularly important on Gopher where most clients do not extract

links embedded in pure text or make them directly 'clickable'. By having

them on their own line you get the next best thing, as you can quickly

select a complete line via a triple click, thus making them far easier

to copy and paste [EDIT 2022-01-05: clarified the benefit].

~ This is a cool site:

gopher://example.com

-Quotes-

Quotes start with '>'.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Strictly speaking it would probably be 'most correct' to keep

these largely as they are, since this concept of quoting is widely

understood. Then I would only need to handle adding extra '>'

characters when wrapping. However, I am rather taken with the way

Lagrange (and several websites) display quotes, with a single opening

quote character and intended lines. This is actually not too hard

to replicate, using two grave characters '``' to simulate an opening

'curly' quote character and an extra newline after the quote to give

a bit of space before regular text continues.

``

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod

tempor incididunt ut labore et dolore magna aliqua.

-Preformatted [EDIT 14:52: mistakenly skipped initially]-

Preformatted lines are slightly different from the others in that they

begin AFTER a line that starts with '```' and end BEFORE the next such

line. Here I will remove the three grave characters and just indent

the rest of lines by two, so that they do not align with regular text

and thus visually stand out.

They will then align with lists, links and quotes (which makes things

look neat and tidy) but can still be differentiated because they have no

leading characters ('*', or '~') and no ' ``' from the proceeding line

(in the case of my proposed quote display).

I will not wrap them or even attempt to filter non ASCII characters,

leaving them mostly pure. The only downside is the two leading spaces

may need to be removed from the start after copy and pasting them but

on the flip side this is relatively trivial to deal with in any decent

editor. Additionally, for many use cases (e.g. certain code types)

that could even be skipped (as they would treated as non functional

indentation).

-Regular lines-

Anything that does not match one of the above starting sequences is a

regular line and needs only simple wrapping.

Other changes


In addition to making the text better suited for display, I also need

to rewrite all internal (capsule specific) URLs and remove the the

navigation links I add to the bottom of my posts but I think it makes

more sense to do that in an additional script, that I can pipe the

results of the first one through. [thinks]

Am I missing anything? Thoughts and comments are welcome!

P.S. As a bonus for those that made it this far.

~ This post converted--how meta is that?:

/files/2022-01-04_Formatting_Gemtext_for_Gopher.txt

An extra, even more basic example


[EDIT 2022-01-05: I rewrote this whole section again with more clarity

and a warning]

Here is a more basic version of this script that does a bit of 'fancy'

wrapping to Gemtext (with indentation for links and lists, and extra

'>' characters for the additional newlines within quotes). Again the

idea would be for potential display on Gopher. It does not impose any

other significant formatting changes.

It is worth noting that due to wrapping and indentation, after this

conversion you no longer have valid Gemtext. Just lightly formatted

plaintext that superficially looks like Gemtext. You cannot permentantly

alter your files like this with the intention of then serving the exact

same source over both Gemini and Gopher. Such files, served over Gemini

with a .gmi extension would likely have issues with unexpected wrapping,

and longer link lines and lists would display incorrectly.

Since there are no character replacements, it is assumed that a person

who might want to use something like this would avoid using large amounts

of non-ASCII, expect their readers to have a UTF-8 capable client,

or use some other server side character translation system like Alex's

[S. Handling non-ASCII].

~ Sample Gemtext file - wrapped:

/files/gmi2txt-sample-wrapped-only.txt

~ Shell script filter to wrap .gmi files [EDIT 2022-01-05 13:00: simplified]:

https://gist.github.com/ruario/4ec4bc02e820a42830e9a8dc05b042e7

Comments


-2022-01-04 16:32 (UTC+1)-

Omar Polo (yumh):

~ Convertire text/gemini in testo semplice IT:

gemini://it.omarpolo.com/articoli/text-gemini-a-testo-semplice/

[i] Takes some of the concepts above and creates a new version.

-2022-01-05 03:32 (UTC+1)-

James Tomasino:

~ https://github.com/jamestomasino/dotfiles-minimal/blob/master/.profile#L135

``

You can tell lynx to use [utf-8]. :) It works with gopher sites too.

Nice [conversion] script, though. Good work

-2022-01-05 22:21 (UTC+1)-

Sandra Snan (Idiomdrottning):

``

How about turning them into Gopher maps? So hyperlinks could work.

Yes, I had considered this and I know that others, like Tomasino always

do this. I also recall reading Solderpunk's "The true spirit of gopher"

post where he talks about how according to RFC1436 this is "semantic

abuse of gopher" and yet he also says, "building type 1 only gopher

holes pretty much just works, and it does offer a nicer user experience

[...]" (i.e. he is not actually critical of it)

~ The true spirit of gopher: P. 12 [>] 13 - Solderpunk:

gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-soul-of-gopher.txt

In the end it is clear that this is what lead him to Gemini and I

already have that in my Gemini blog. So for Gopher I think I would

like it to be intentionally different, as part of the reason for even

having a Gopher version. i.e. the Gemini version is true to Gemini and

the Gopher one more "faithful" to Gopher.

That said, whilst reading about these topics I did note that some modern

Gopher clients like VF-1 and GemiNaut make links in plain text directly

usable by pattern matching for obvious URLs.

~ Linkification in Gopher clients:

2021-12-14_Linkification_in_gopher_clients.gmi

~ Linkification in Gopher clients (Part 2) – OK, I am just stupid:

2021-12-16_Linkification_in_gopher_clients_Part2_I_am_stupid.gmi

Thus I decided that plain text for the Gopher articles works well

enough. Those who are more old school get what they expect and those

who are more modern are more likely to run a modern client that will

handle links for them anyway.

In addition, I did carefully think about how I would display links

in posts, 'S. Handling Gemtext - Links' includes, "By having them on

their own line you get the next best thing, as you can quickly select

a complete line via a triple click, thus making them far easier to copy

and paste."

In summary, is this the right way? I don't know but this is how I

generation briefly, so perhaps I will change my mind. ;)

-2022-01-06 13:57 (UTC+1)-

Sandra Snan (Idiomdrottning):

``

Please note that I did see "By having them on their own line you get

the next best thing, as you can quickly select a complete line via

a triple click, thus making them far easier to copy and paste." I

might be a sloppy reader, but not to that extent! <3

I just didn't think this was a counterargument to the gophermaps thing

(to the extent that the other stuff you bring up is). I don't have

a gopherhole of my own so it's not that I have that much of a say

about Gopher! <3

Fair enough and I hope I did not offend you. To be honest I am amazed

that anyone got through my boring post! ;)

I will add that I had some thoughts about going the "Gophermaps all

the way down" route and I am a fickle creature, so who knows, perhaps

I will change my mind and do just that. XD

-2022-01-08 14:46 (UTC+1)-

Szczezuja:

``

I have one lame question for your script:

[...]

How it should be used? There are no information in the source, I made

some tries without success.

Oh sorry, it is a filter [...] pipe or redirect the Gemtext in:

$ ./gmi2txt.sh < yourfile.gmi

(would print the converted text to screen)

or to save to a file:

$ ./gmi2txt.sh < yourfile.gmi > yourfile.txt

-2022-01-10 11:38 (UTC+1)-

~ Formatting Gemtext for Gopher - further tweaks [II]:

2022-01-10_Formatting_Gemtext_for_Gopher-further_tweaks.gmi

~ Comment:

../contact.gmi

~ Gemlog index:

.

~ Capsule index:

..

Proxy Information
Original URL
gemini://ruario.flounder.online/files/2022-01-04_Formatting_Gemtext_for_Gopher.txt
Status Code
Success (20)
Meta
text/plain; charset=utf-8
Capsule Response Time
811.398546 milliseconds
Gemini-to-HTML Time
3.289836 milliseconds

This content has been proxied by September (ba2dc).