Ancestors

Toot

Written by Simon Cozens on 2024-11-27 at 10:32

I've always complained that checking a font's "language support" just by counting Unicode codepoints is unreliable and incomplete; the font has to behave correctly to support a language.

Shaperglot is my (and @googlefonts's) response to this problem. And now it has a web interface so you can drop your fonts on and get a report on what works, what doesn't and what can be improved.

Try it out at https://googlefonts.github.io/shaperglot/ (all processing is done on your local computer; nothing is uploaded.)

=> View attached media

=> More informations about this toot | More toots from simoncozens@typo.social

Descendants

Written by Dan Burzo on 2024-11-27 at 13:09

@simoncozens I was looking at Romanian, wondering about the necessity of combining marks as independent codepoints to declare it supported. As long as “Ă” and “ă” exist, a combining breve is just a nice to have (maybe to form the historical ĕ, ĭ, ŭ)?

=> More informations about this toot | More toots from db@typo.social

Written by Simon Cozens on 2024-11-27 at 13:17

@db Good point. Our database doesn't make a distinction between "marks you need to form base characters" and "marks which can attach to base characters". I shall think about this!

=> More informations about this toot | More toots from simoncozens@typo.social

Written by Denis Moyogo Jacquerye on 2024-11-27 at 17:32

@db Are you making Unicode fonts?

=> More informations about this toot | More toots from moyogo@typo.social

Written by Dan Burzo on 2024-11-27 at 17:59

@moyogo oh, far from it! Just picking Simon’s brain in regards to a result I found surprising.

=> More informations about this toot | More toots from db@typo.social

Written by Denis Moyogo Jacquerye on 2024-11-27 at 18:11

@db Lol, sorry for being confusing. That was a rhetorical question.

In Unicode both ă U+0103 and ă U+0061 U+0306 are equivalent. Unicode Romanian text may use either. While the first form is more common, the second form occurs nonetheless. Unicode fonts should support both for proper language support.

A large Romanian corpus may contain around 25 second forms per million. This may vary depending on where the corpus comes from.

=> More informations about this toot | More toots from moyogo@typo.social

Written by Denis Moyogo Jacquerye on 2024-11-27 at 18:13

@db See for example https://www.bing.com/search?q=%22s%C4%83%22 vs. https://www.bing.com/search?q=%22sa%CC%86%22

=> More informations about this toot | More toots from moyogo@typo.social

Written by Dan Burzo on 2024-11-27 at 18:15

@moyogo aah, gotcha. Thanks for the clarification, makes sense! I was so focused on inputting text that I forgot about the idea of existing corpora :-)

=> More informations about this toot | More toots from db@typo.social

Written by Rosetta on 2024-11-27 at 18:26

@db @moyogo regarding: “Unicode fonts should support both for proper language support.” I do not think Unicode says you ought to support both. I think it is a good practice. But that is not what determines a sufficient language support.

Again, this is a difference between strict approach (useful in Google’s QA) and more lenient approach that eliminates false negatives (Hyperglot).

=> More informations about this toot | More toots from rosetta@mastodon.design

Written by Rosetta on 2024-11-27 at 18:28

@db @moyogo These are the kind of conversations we could be having if @googlefonts focused on cooperation with existing projects rather than starting a new thingy of their own.

=> More informations about this toot | More toots from rosetta@mastodon.design

Written by Simon Cozens on 2024-11-28 at 08:46

@rosetta @db @moyogo I don't speak for @googlefonts but the licensing conditions of existing projects were unclear. But the good thing is we can still have conversations about what language support means none the less.

=> More informations about this toot | More toots from simoncozens@typo.social

Written by Rosetta on 2024-11-28 at 10:14

According to Dave, there are no licensing issues. I have responded to you on GitHub about that two years (?) ago.

=> More informations about this toot | More toots from rosetta@mastodon.design

Written by Simon Cozens on 2024-11-28 at 08:49

@rosetta @db @moyogo You're correct that Unicode doesn't say you should support both, but that's because Unicode doesn't really say anything about fonts. The important question here is what normalisation is done by the shaping engine. Different engines behave differently, which is why Denis' advice to support both is correct.

=> More informations about this toot | More toots from simoncozens@typo.social

Written by Rosetta on 2024-11-28 at 09:45

@simoncozens @db @moyogo because why would we fix shaping engines instead of rejecting thousands of fonts? I am not after correct or incorrect. You are doing a QA tool which is fine, but there are different notions of language support. That also pertains to my comment about discussion. Over and over we have seen people at @googlefonts asking the same questions we have already resolved in Hyperglot. It’s a waste of potential. Same with @benkiel ’s comment. That is resolved in Hyperglot, too.

=> More informations about this toot | More toots from rosetta@mastodon.design

Written by Simon Cozens on 2024-11-28 at 09:47

@rosetta @db @moyogo @googlefonts @benkiel I think I can see how these "conversations" are likely to go.

=> More informations about this toot | More toots from simoncozens@typo.social

Written by Rosetta on 2024-11-28 at 10:16

@simoncozens @db @moyogo @googlefonts @benkiel frankly, what you have done is a fantastic feat of engineering. I only wish it was more collaborative.

=> More informations about this toot | More toots from rosetta@mastodon.design

Written by Simon Cozens on 2024-11-28 at 10:29

@rosetta @db @moyogo @googlefonts @benkiel Thank you. I don't know how much collaboration is possible in practice if there are different licensing terms, different goals, and different understandings of what language support means. (And as we move to Rust, different languages.) But I'm very open to collaborate technically where possible and, even where not, to have these kinds of discussions.

=> More informations about this toot | More toots from simoncozens@typo.social

Written by Pere Farrando on 2024-11-29 at 07:57

@db @simoncozens One would think that the point of combining marks is to be available, full stop. That is, beyond strictly orthographic requirements. To serve for future orthographic reforms or whatever might happen in the future.

=> More informations about this toot | More toots from PereFarrando@typo.social

Written by Ben Kiel on 2024-11-27 at 16:03

@simoncozens @googlefonts LOVE THIS, thank you! One question: not having ſ for French/German support feels extreme? Or am I missing something.

=> More informations about this toot | More toots from benkiel@typo.social

Written by Simon Cozens on 2024-11-27 at 16:07

@benkiel @googlefonts Yeah that does feel extreme.

=> More informations about this toot | More toots from simoncozens@typo.social

Written by Ben Kiel on 2024-11-27 at 16:09

@simoncozens @googlefonts 100% understand the challenge of “this glyph is used in the language at some point and is official, but no one in modern context uses this: it for historical documents”

=> More informations about this toot | More toots from benkiel@typo.social

Written by Denis Moyogo Jacquerye on 2024-11-27 at 17:54

@benkiel @simoncozens @googlefonts the tool can only be as good as the data it uses. There’s plenty of nonsensical stuff in the data as some of it wasn’t collected for this purpose or other errors crept in where it was copied from. The quick fix in your example it to not count auxiliary as necessary.

=> More informations about this toot | More toots from moyogo@typo.social

Written by Simon Cozens on 2024-11-27 at 18:16

@moyogo @benkiel @googlefonts Indeed. 99% support with one missing auxiliary should be enough for you to say "Yeah, that's pretty much good enough" - and that's why you get a "does not fully support" instead of "does not support".

Maybe I should make the headline leves more positive: "does not support" / "supports" / "comprehensively supports"...

=> More informations about this toot | More toots from simoncozens@typo.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113554458565352940
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
531.205938 milliseconds
Gemini-to-HTML Time
7.201879 milliseconds

This content has been proxied by September (3851b).