This page permanently redirects to gemini://dkalak.de/software/gmi2html/.
=> gmi2html.py
The script I present here reads Gemtext from stdin, converts it to HTML, and writes it to stdout. It also checks the input for prettiness. If it encounters ugly parts, it writes warnings to stderr. It exits 0 if nothing was written to stderr and 1 otherwise.
The script is written in Python. Maybe I will rewrite it in C once I am proficient enough.
There are many Gemtext to HTML conversion tools. gemini.circumlunar.space lists some.
I have tried the following 2:
=> huntingb’s gemtext-html-converter (in Python) | Nicholas Johnson’s gemini2html (in C)
Both render adjacent text lines in the Gemtext input as separate p elements in HTML, and empty lines in the Gemtext input either as empty lines in HTML (which are insignificant) or as br tags outside of p elements. (I have also encountered other issues, like unescaped > characters and some files failing to be converted at all.) The web version of gemini.circumlunar.space seems to have separate p elements for adjacent text lines too, and empty p elements for empty lines.
I think that these approaches are semantically incorrect. Instead, I think that:
By “empty line”, I mean a line that contains either no characters or only whitespace characters. By “block” in Gemtext, I mean:
Every block in the Gemtext input is represented by a block element in HTML (p, ul, blockquote, pre, h1, h2, h3). (I don’t count li elements as block elements here, although they technically are.)
A quote block, stripped of the > character in each line, in turn contains paragraphs. They are represented by p elements inside the blockquote element in the HTML output.
The script reads 1 line from stdin into memory at a time, determines its kind, and writes the corresponding output to stdout and stderr. It not only distinguishes between lines belonging to different kinds of blocks, but also between:
Because only 1 line is buffered, the script doesn’t insert CSS margin statements in cases where Gemtext blocks aren’t separated by exactly 1 empty line, or where a quote block or the document begins or ends with an empty line. While margin-top statements are feasible with a single-line buffer, margin-bottom statements (e.g. if a paragraph is followed by empty lines at the end of a quote block) require that the entire block in question is buffered before the amount and status (i.e. whether they are at the end of a quote block or the document) of the following empty lines can be determined. There is also no obvious solution for how to treat quote blocks of arbitrary lengths that only contain empty lines.
So instead of trying to handle those cases, the script assumes that the input is formatted in a “pretty” way that doesn’t require such handling, and writes warnings to stderr (without doing anything special) when the input is “ugly” (i.e. not pretty). Pretty input needs to meet the following criteria:
The script also issues ugliness warnings if any of the following criteria aren’t met:
lines mustn’t contain any text after the
.
These last criteria aren’t related to the question of CSS margins. Some of them are more or less expressions of personal taste that are enforced to be applied consistently. While those warnings don’t necessarily mean that the HTML output is broken (=> and > lines without a space after the => or > work just fine, for example), the script also doesn’t strip away leading or trailing whitespace that it warns against, which might look ugly.
The script writes lists as ul elements and link lists as p elements with the different a elements separated by br tags.
The script doesn’t write optional closing tags (, ). Opening and closing ul, blockquote, and pre tags are written on their own lines. Other opening and self-closing tags (p, br, li, h1, h2, h3) are written at the beginning of a line, other closing tags (h1, h2, h3) at the end of a line.
The script writes newlines in the HTML output for every input line it reads (even empty ones), so block elements are separated by (insignificant, but pretty) empty lines in the HTML output if the input is pretty.
The output covers only the HTML code that you would put inside a body element. The body tags themselves and everything else that is needed for a valid HTML document are not included in the output.
You can pipe the output to fmt -s to get more consistent line lengths. Make sure that no line in a code block exceeds the maximum and goal line lengths of fmt. You can get the maximum code block line length of the output like so:
cat out.html | awk ' /<\/pre>/ { pre = 0 } pre == 1 { print } // { pre = 1 }' | wc -LCorrection (2023-02-01)
I read today that my statement above about adjacent text lines semantically representing a single paragraph actually goes against the Gemini specification. Use the software at your discretion.
EOF
Proxy Information
- Original URL
- gemini://dkalak.de/software/gmi2html
- Status Code
- Success (20)
- Meta
text/gemini; lang=en
- Capsule Response Time
- 259.56818 milliseconds
- Gemini-to-HTML Time
- 1.666226 milliseconds
This content has been proxied by September (3851b).