I've always complained that checking a font's "language support" just by counting Unicode codepoints is unreliable and incomplete; the font has to behave correctly to support a language.
Shaperglot is my (and @googlefonts's) response to this problem. And now it has a web interface so you can drop your fonts on and get a report on what works, what doesn't and what can be improved.
Try it out at https://googlefonts.github.io/shaperglot/ (all processing is done on your local computer; nothing is uploaded.)
=> More informations about this toot | More toots from simoncozens@typo.social
@simoncozens I was looking at Romanian, wondering about the necessity of combining marks as independent codepoints to declare it supported. As long as “Ă” and “ă” exist, a combining breve is just a nice to have (maybe to form the historical ĕ, ĭ, ŭ)?
=> More informations about this toot | More toots from db@typo.social
@db Good point. Our database doesn't make a distinction between "marks you need to form base characters" and "marks which can attach to base characters". I shall think about this!
=> More informations about this toot | More toots from simoncozens@typo.social
@db Are you making Unicode fonts?
=> More informations about this toot | More toots from moyogo@typo.social
@moyogo oh, far from it! Just picking Simon’s brain in regards to a result I found surprising.
=> More informations about this toot | More toots from db@typo.social
@db Lol, sorry for being confusing. That was a rhetorical question.
In Unicode both ă U+0103 and ă U+0061 U+0306 are equivalent. Unicode Romanian text may use either. While the first form is more common, the second form occurs nonetheless. Unicode fonts should support both for proper language support.
A large Romanian corpus may contain around 25 second forms per million. This may vary depending on where the corpus comes from.
=> More informations about this toot | More toots from moyogo@typo.social
@db See for example https://www.bing.com/search?q=%22s%C4%83%22 vs. https://www.bing.com/search?q=%22sa%CC%86%22
=> More informations about this toot | More toots from moyogo@typo.social
@moyogo aah, gotcha. Thanks for the clarification, makes sense! I was so focused on inputting text that I forgot about the idea of existing corpora :-)
=> More informations about this toot | More toots from db@typo.social
@db @moyogo regarding: “Unicode fonts should support both for proper language support.” I do not think Unicode says you ought to support both. I think it is a good practice. But that is not what determines a sufficient language support.
Again, this is a difference between strict approach (useful in Google’s QA) and more lenient approach that eliminates false negatives (Hyperglot).
=> More informations about this toot | More toots from rosetta@mastodon.design
@db @moyogo These are the kind of conversations we could be having if @googlefonts focused on cooperation with existing projects rather than starting a new thingy of their own.
=> More informations about this toot | More toots from rosetta@mastodon.design
@rosetta @db @moyogo I don't speak for @googlefonts but the licensing conditions of existing projects were unclear. But the good thing is we can still have conversations about what language support means none the less.
=> More informations about this toot | More toots from simoncozens@typo.social
According to Dave, there are no licensing issues. I have responded to you on GitHub about that two years (?) ago.
=> More informations about this toot | More toots from rosetta@mastodon.design
@rosetta @db @moyogo You're correct that Unicode doesn't say you should support both, but that's because Unicode doesn't really say anything about fonts. The important question here is what normalisation is done by the shaping engine. Different engines behave differently, which is why Denis' advice to support both is correct.
=> More informations about this toot | More toots from simoncozens@typo.social
@simoncozens @db @moyogo because why would we fix shaping engines instead of rejecting thousands of fonts? I am not after correct or incorrect. You are doing a QA tool which is fine, but there are different notions of language support. That also pertains to my comment about discussion. Over and over we have seen people at @googlefonts asking the same questions we have already resolved in Hyperglot. It’s a waste of potential. Same with @benkiel ’s comment. That is resolved in Hyperglot, too.
=> More informations about this toot | More toots from rosetta@mastodon.design
@rosetta @db @moyogo @googlefonts @benkiel I think I can see how these "conversations" are likely to go.
=> More informations about this toot | More toots from simoncozens@typo.social
@simoncozens @db @moyogo @googlefonts @benkiel frankly, what you have done is a fantastic feat of engineering. I only wish it was more collaborative.
=> More informations about this toot | More toots from rosetta@mastodon.design
@rosetta @db @moyogo @googlefonts @benkiel Thank you. I don't know how much collaboration is possible in practice if there are different licensing terms, different goals, and different understandings of what language support means. (And as we move to Rust, different languages.) But I'm very open to collaborate technically where possible and, even where not, to have these kinds of discussions.
=> More informations about this toot | More toots from simoncozens@typo.social
@db @simoncozens One would think that the point of combining marks is to be available, full stop. That is, beyond strictly orthographic requirements. To serve for future orthographic reforms or whatever might happen in the future.
=> More informations about this toot | More toots from PereFarrando@typo.social
@simoncozens @googlefonts LOVE THIS, thank you! One question: not having ſ for French/German support feels extreme? Or am I missing something.
=> More informations about this toot | More toots from benkiel@typo.social
@benkiel @googlefonts Yeah that does feel extreme.
=> More informations about this toot | More toots from simoncozens@typo.social
@simoncozens @googlefonts 100% understand the challenge of “this glyph is used in the language at some point and is official, but no one in modern context uses this: it for historical documents”
=> More informations about this toot | More toots from benkiel@typo.social
@benkiel @simoncozens @googlefonts the tool can only be as good as the data it uses. There’s plenty of nonsensical stuff in the data as some of it wasn’t collected for this purpose or other errors crept in where it was copied from. The quick fix in your example it to not count auxiliary as necessary.
=> More informations about this toot | More toots from moyogo@typo.social
@moyogo @benkiel @googlefonts Indeed. 99% support with one missing auxiliary should be enough for you to say "Yeah, that's pretty much good enough" - and that's why you get a "does not fully support" instead of "does not support".
Maybe I should make the headline leves more positive: "does not support" / "supports" / "comprehensively supports"...
=> More informations about this toot | More toots from simoncozens@typo.social This content has been proxied by September (3851b).Proxy Information
text/gemini