Formatting Gemtext for Gopher [II]
=> ================================
Recently I have been contemplating mirroring my capsule on Gopher. To
do that I would need to recreate the landing page as a Gophermap but
what about the posts? Since they are plain text I initially thought
that I could perhaps just present the .gmi files directly. Sadly not.
characters), while Gemtext is not wrapped.
widespread compatibility in Gopher, 'pure' ASCII would be better.
still be improved when read directly.
In summary, I need to reformat Gemtext before serving the posts on
Gopher. On the plus side, since one of big benefits of Gemtext is its
simplicity, I quickly realised I could do that with just a few lines
of shell script, using 'sed' and 'fmt'.
TL;DR
~ Sample Gemtext file:
/files/gmi2txt-sample.gmi
~ Sample Gemtext file - reformatted:
/files/gmi2txt-sample.txt
~ Shell script filter to reformat .gmi files [EDIT 2022-01-05 12:27: tweaked]:
https://gist.githubusercontent.com/ruario/3bd570d265ca5a42cb039092ed4f1299/raw/5323b3880c55d0f679eb24400b053865dfbb413c/gmi2txt.sh
[EDIT 2022-01-08] To use, make it executable, then you pipe or redirect
the Gemtext in.
$ ./gmi2txt.sh < yourfile.gmi
Handling non-ASCII
Some Gopher servers and clients can actually handle UTF-8 but it is by
no means universal and likely fairly rare, at least on the (reader's)
client side, which I cannot control. I did recently note that if I
look at recent posts by Alex Schroeder on his Gopher site in either
VF-1 or Lagrange, I sometimes see characters like "EUR" and even the
odd emoji. Interestingly, if I browse the same site using a client like
Lynx, the characters get replaced--[EDIT 2022-01-05] Lynx can support
UTF-8: S. Comments - 2022-01-05 03:32. No doubt there is some 'magic'
going on the server side, to understand what the client is capable of
and then doing automatic replacements as needed.
~ 2021-12-25 Donations - Alex Schroeder:
gopher://alexschroeder.ch:70/0page/2021-12-25%20Donations
~ 2021-12-26 The confusing world of Reddit - Alex Schroeder:
gopher://alexschroeder.ch:70/0page/2021-12-26%20The%20confusing%20world%20of%20Reddit
So there are two ways I could handle UTF-8 characters.
The latter is perhaps not as daunting as it sounds, since I would be
making this for my own personal usage and thus only need to handle the
characters that I regularly use. The other nice thing with doing this
myself is that I can decide exactly what characters are replaced with
and I can create a uniform experience across all Gopher clients.
Simply piping through sed would allow me to convert a bunch of
characters, e.g. '-e s/[:D:D:D]/:D/g'. Yes some of the 'subtlety'
of those different emojis is lost but... 'does it matter?'. I could
try and think of a clever (ASCII only) emoticon for something like
'[shrugs]' or I could just do 's/[shrugs]/[shrug]/g'. Alternatively,
I may decide that my usage of emojis is largely for decoration and
wipe them out altogether (s/[:D:D:D[shrugs]]//g). If I do it myself,
I can also update and tweak these replacements and deletions going
forward as my usage and opinions on the matter change.
Handling Gemtext
Gemtext is designed to be parsed at line level. [EDIT 14:58: clarified]
Seven of the eight line types (roughly equivalent to: , , ,
[], , , ) start with a recognisable pattern,
so it is easy to detect and apply different formatting to each of
them. The last one (similar to: ) can start with any character but
this is detected by virtue of not being one of the others.
-Heading: Level 1-
These lines start with a single '#'.
I feel it looks clearer to remove this and underline them with '='.
My Post
=======
-Heading: Level 2-
These lines start with two '#' characters.
Again, a simple underline with '-' looks very clean and is arguably
more in keeping with Gopher.
Subsection
-Heading: Level 3-
These lines start with three '#' characters.
Starting and ending these lines with '-' retains the clean feel of the
other two heading types, while keeping them less prominent.
-Sub Subsection-
-Lists-
Lists begin with '* ' (including the space).
I indent any wrapping that takes place. In addition, I add newline
spacing between each bullet. This makes multiline, wrapped bullet points
more readable (IMHO).
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
-Links-
Links start with '=>' followed by the link and the
title/description. However, this can make them long and can 'bury'
the URL slightly.
=> gopher://example.com This is a cool site
I would like the URLs to stand out by being on their own line. This
is particularly important on Gopher where most clients do not extract
links embedded in pure text or make them directly 'clickable'. By having
them on their own line you get the next best thing, as you can quickly
select a complete line via a triple click, thus making them far easier
to copy and paste [EDIT 2022-01-05: clarified the benefit].
~ This is a cool site:
gopher://example.com
-Quotes-
Quotes start with '>'.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Strictly speaking it would probably be 'most correct' to keep
these largely as they are, since this concept of quoting is widely
understood. Then I would only need to handle adding extra '>'
characters when wrapping. However, I am rather taken with the way
Lagrange (and several websites) display quotes, with a single opening
quote character and intended lines. This is actually not too hard
to replicate, using two grave characters '``' to simulate an opening
'curly' quote character and an extra newline after the quote to give
a bit of space before regular text continues.
``
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua.
-Preformatted [EDIT 14:52: mistakenly skipped initially]-
Preformatted lines are slightly different from the others in that they
begin AFTER a line that starts with '```' and end BEFORE the next such
line. Here I will remove the three grave characters and just indent
the rest of lines by two, so that they do not align with regular text
and thus visually stand out.
They will then align with lists, links and quotes (which makes things
look neat and tidy) but can still be differentiated because they have no
leading characters ('*', or '~') and no ' ``' from the proceeding line
(in the case of my proposed quote display).
I will not wrap them or even attempt to filter non ASCII characters,
leaving them mostly pure. The only downside is the two leading spaces
may need to be removed from the start after copy and pasting them but
on the flip side this is relatively trivial to deal with in any decent
editor. Additionally, for many use cases (e.g. certain code types)
that could even be skipped (as they would treated as non functional
indentation).
-Regular lines-
Anything that does not match one of the above starting sequences is a
regular line and needs only simple wrapping.
Other changes
In addition to making the text better suited for display, I also need
to rewrite all internal (capsule specific) URLs and remove the the
navigation links I add to the bottom of my posts but I think it makes
more sense to do that in an additional script, that I can pipe the
results of the first one through. [thinks]
Am I missing anything? Thoughts and comments are welcome!
P.S. As a bonus for those that made it this far.
~ This post converted--how meta is that?:
/files/2022-01-04_Formatting_Gemtext_for_Gopher.txt
An extra, even more basic example
[EDIT 2022-01-05: I rewrote this whole section again with more clarity
and a warning]
Here is a more basic version of this script that does a bit of 'fancy'
wrapping to Gemtext (with indentation for links and lists, and extra
'>' characters for the additional newlines within quotes). Again the
idea would be for potential display on Gopher. It does not impose any
other significant formatting changes.
It is worth noting that due to wrapping and indentation, after this
conversion you no longer have valid Gemtext. Just lightly formatted
plaintext that superficially looks like Gemtext. You cannot permentantly
alter your files like this with the intention of then serving the exact
same source over both Gemini and Gopher. Such files, served over Gemini
with a .gmi extension would likely have issues with unexpected wrapping,
and longer link lines and lists would display incorrectly.
Since there are no character replacements, it is assumed that a person
who might want to use something like this would avoid using large amounts
of non-ASCII, expect their readers to have a UTF-8 capable client,
or use some other server side character translation system like Alex's
[S. Handling non-ASCII].
~ Sample Gemtext file - wrapped:
/files/gmi2txt-sample-wrapped-only.txt
~ Shell script filter to wrap .gmi files [EDIT 2022-01-05 13:00: simplified]:
https://gist.github.com/ruario/4ec4bc02e820a42830e9a8dc05b042e7
Comments
-2022-01-04 16:32 (UTC+1)-
Omar Polo (yumh):
~ Convertire text/gemini in testo semplice IT:
gemini://it.omarpolo.com/articoli/text-gemini-a-testo-semplice/
[i] Takes some of the concepts above and creates a new version.
-2022-01-05 03:32 (UTC+1)-
James Tomasino:
~ https://github.com/jamestomasino/dotfiles-minimal/blob/master/.profile#L135
``
You can tell lynx to use [utf-8]. :) It works with gopher sites too.
Nice [conversion] script, though. Good work
-2022-01-05 22:21 (UTC+1)-
Sandra Snan (Idiomdrottning):
``
How about turning them into Gopher maps? So hyperlinks could work.
Yes, I had considered this and I know that others, like Tomasino always
do this. I also recall reading Solderpunk's "The true spirit of gopher"
post where he talks about how according to RFC1436 this is "semantic
abuse of gopher" and yet he also says, "building type 1 only gopher
holes pretty much just works, and it does offer a nicer user experience
[...]" (i.e. he is not actually critical of it)
~ The true spirit of gopher: P. 12 [>] 13 - Solderpunk:
gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-soul-of-gopher.txt
In the end it is clear that this is what lead him to Gemini and I
already have that in my Gemini blog. So for Gopher I think I would
like it to be intentionally different, as part of the reason for even
having a Gopher version. i.e. the Gemini version is true to Gemini and
the Gopher one more "faithful" to Gopher.
That said, whilst reading about these topics I did note that some modern
Gopher clients like VF-1 and GemiNaut make links in plain text directly
usable by pattern matching for obvious URLs.
~ Linkification in Gopher clients:
2021-12-14_Linkification_in_gopher_clients.gmi
~ Linkification in Gopher clients (Part 2) – OK, I am just stupid:
2021-12-16_Linkification_in_gopher_clients_Part2_I_am_stupid.gmi
Thus I decided that plain text for the Gopher articles works well
enough. Those who are more old school get what they expect and those
who are more modern are more likely to run a modern client that will
handle links for them anyway.
In addition, I did carefully think about how I would display links
in posts, 'S. Handling Gemtext - Links' includes, "By having them on
their own line you get the next best thing, as you can quickly select
a complete line via a triple click, thus making them far easier to copy
and paste."
In summary, is this the right way? I don't know but this is how I
generation briefly, so perhaps I will change my mind. ;)
-2022-01-06 13:57 (UTC+1)-
Sandra Snan (Idiomdrottning):
``
Please note that I did see "By having them on their own line you get
the next best thing, as you can quickly select a complete line via
a triple click, thus making them far easier to copy and paste." I
might be a sloppy reader, but not to that extent! <3
I just didn't think this was a counterargument to the gophermaps thing
(to the extent that the other stuff you bring up is). I don't have
a gopherhole of my own so it's not that I have that much of a say
about Gopher! <3
Fair enough and I hope I did not offend you. To be honest I am amazed
that anyone got through my boring post! ;)
I will add that I had some thoughts about going the "Gophermaps all
the way down" route and I am a fickle creature, so who knows, perhaps
I will change my mind and do just that. XD
-2022-01-08 14:46 (UTC+1)-
Szczezuja:
``
I have one lame question for your script:
[...]
How it should be used? There are no information in the source, I made
some tries without success.
Oh sorry, it is a filter [...] pipe or redirect the Gemtext in:
$ ./gmi2txt.sh < yourfile.gmi
(would print the converted text to screen)
or to save to a file:
$ ./gmi2txt.sh < yourfile.gmi > yourfile.txt
-2022-01-10 11:38 (UTC+1)-
~ Formatting Gemtext for Gopher - further tweaks [II]:
2022-01-10_Formatting_Gemtext_for_Gopher-further_tweaks.gmi
~ Comment:
../contact.gmi
~ Gemlog index:
.
~ Capsule index:
..
text/plain; charset=utf-8
This content has been proxied by September (ba2dc).