Gemini allows to tag the pages with a language tag. These language tags, standardized in document BCP 47 allow you to say "this document is in korean" or "this document is in mongolian, written in the cyrillic script". They are expressed as short characters strings, "ko" for the first one ("this document is in korean"), "mn-Cyrl" for the second one ("this document is in mongolian, written in the cyrillic script". They also allow you to indicate the country, for instance if this country has a specific spelling ("en-US" is english as written in the USA, "en-GB" english as written in the United Kingdom). A lot of other indications are possible.
=> BCP 47 is the document standardizing language tags
Very often, in the geminispace, we see language tags that are too specific. For instance, "it-IT" (italian as written in Italy) is over-specific since there is no place where people write italian differently than in its home country. Over-specification may create problems for search engines, statistical programs and humans, who may search "it" and forget about "it-IT".
BCP 47, mentioned before, says:
- Use as precise a tag as possible, but no more specific than is
justified. Avoid using subtags that are not important for
distinguishing content in an application.
- For example, 'de' might suffice for tagging an email written
in German, while "de-CH-1996" is probably unnecessarily
precise for such a task.
I recommend gemini authors to read the entire section 4.1 of this document, "Choice of language tag".
For instance, if you manage a Gemini capsule in portuguese where texts are available both in the brazilian spelling and the portuguese one, it makes sense to tag them "pt-BR" and "pt-PT". But it is not always relevant. There is more over-specification than under-specification in the geminispace (or in the Web).
The Lupa crawler gather statistics about languages tagged in the geminispace. It displays both the full language tag and the first subtag (the one identifying the language).
=> Lupa statistics This content has been proxied by September (ba2dc).Proxy Information
text/gemini; lang=en