A proposed scheme for parsing preformatted alt text

This post is a short write up of a chat session we had today on the IRC #Gemini channel.

The conversation got off to a start with a question of whether there could be a means to support tables in gemtext. Clearly you can have a link to a standard format for your table, such as CSV or TSV, but perhaps a client could render that table directly?

Maybe you could put CSV into the preformatted area and clients could optionally render it as a pretty table?

Whilst the discussion started from a question of table formatting, the proposal below is not about what particular syntax we use for tables per se, but rather to develop a community practice about how additional attributes could be conveyed in the alt text which the Gemini spec permits to be non-empty, as well as supporting the current practice which is that the alt text is empty or just a simple label.

It would be helpful if this became part of the Gemini spec and the conventions were solidified, but it is not necessary at this stage.

Preformatted regions in Gemini

Preformatted areas in gemtext are opened and closed by lines commencing with three backticks ```

here is some      preformatted       text
where white       space              is significant

Clients typically render such preformatted text using a fixed width font and with all the original whitespace.

These preformatted regions have a number of uses, for example to show source code:



  
  Page title
  

A very simple HTML 5 page.

or ascii/unicode art

 _____  _       _       _      ___            _   
|   __||_| ___ | | ___ | |_   |  _| ___  ___ | |_ 
|   __|| || . || || -_||  _|  |  _|| . ||   ||  _|
|__|   |_||_  ||_||___||_|    |_|  |___||_|_||_|  
          |___|                                   

A key consideration, particularly for ascii art is how non-visual clients will render the content, since the picture is a graphical one albeit constructed from characters and punctuation. Other gemini agents such as web crawlers might want to index this content for search purposes.

For the opening backtick delimiter, there is a space after the delimiter which does not have any specific meaning, and the Gemini spec says it should not be displayed, but may be interpreted.

Any text following the leading "```" of a preformat toggle line which toggles preformatted mode on MAY be interpreted by the client as "alt text" pertaining to the preformatted text lines which follow the toggle line. Use of alt text is at the client's discretion,

This is a location where the content author can provide some "alt text" that can be interpreted, and can assist the processing and display of the preformatted content.

Parsing the alt text - Bouncepaw's scheme

Bouncepaw wrote a piece proposing some options for parsing this alt text

=> Bouncepaw: Extending gemtext's preformatted text

Bouncepaw's scheme proposes a number of different "types" to indicate the role of the content:

 ```type=table
 (preformatted content continues)
 ```

A valid point that came up in our IRC discussions was that we should support screen readers and legacy clients that may not want to parse the content further.

The following scheme builds on this idea of delimited alt text within Bouncepaw's proposal and attempts to make it more flexible and backwards compatible.

Proposal

The proposed scheme to parse the alt text is presented below. There are a number of design considerations it seeks to satisfy:

  1. Support screenreaders and other clients that wish to extract a plain text description

  1. Low complexity with minimum and recognisible syntax

  1. Support multiple attributes if necessary

  1. No pre-conceptions about attribute names and values

The scheme is as follows

 ```(;)

Essentially this is a CSS defined delimitation scheme, attribute/value pairs separated by semi-colons and using a colon to separate the attribute from the value.

CSS is chosen as it is a well established, human friendly syntax that permits multiple attributes to be provided.

Remarks

 ```A description; attribute1: value 1 ; attribute2: value 2
 
 This is equivalent to:
   
    alt="A description"
    attribute1="value 1"
    attribute2="value 2"
    
 ```

Initial attributes

The following attributes are proposed as those that could be of immediate value.

alt

The first un-named attribute is the alt attribute. It can be used elsewhere in the alt text expression, but should normally be the first attribute, in which case the "alt: content" form is not needed. This is for backwards compatibility reasons, and to give the alt text attribute a name.

content-type

This attribute is to indicate the type of text held in the preformatted region. This can assist clients, user agents and end users in correctly understanding and interpreting the content. In some cases, they may decide to render the content in one or more alternative ways. For example

This is to indicate the type of text shown in the region. It is not to be used to express embedded binary content of any other kind, or extended to arbitrary mime types. The text encoding of the current page is applicable.

The mime type value is not case sensitive.

 ```here is a table in csv;content-type:text/csv

 ```here is some python that your client could show with syntax highlighting; content-type: application/xpython
 
 ```here is a graph that could be visualised using graphviz; content-type: text/vnd.graphviz
 

Here is a table example using tab delimited text (TSV).

 ```Here is a label about the table; content-type: text/tsv
 *    1    2    3
 1    2    3    4
 2    3    4    5
 3    4    5    6
 ``` 

lang

This attribute is to indicate the language of the content, using a standard ISO 639-1 two letter code, as used in HTML. This enables the quoting of content in other languages than that stated by the media type of the current page.

 ``` Some English content; lang:en
 Hello English Speaking World
 ```
 
 ```Un peu de Français; lang:fr
 Bonjour mes amis
 ```

Feedback

Let me know your thoughts and feedback by email or perhaps through a followup post of your own.

luke at marmaladefoo dot com


=> Gemlog index | Home

Proxy Information
Original URL
gemini://gemini.marmaladefoo.com/blog/7-Sep-2020_Parsing_preformatted_alt_text.gmi
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
921.844726 milliseconds
Gemini-to-HTML Time
1.237937 milliseconds

This content has been proxied by September (3851b).