This page permanently redirects to gemini://d.moonfire.us/blog/2015/02/17/mfgames-culture-api-languages/.
=> Up a Level
This is the first part of a short series on the MfGames Culture CIL API. It is currently alpha software, but I'm looking for critiques, opinions, and general feedback. All of my work for this is in the Github repository[1] in the drem-0.0.0
branch. It is licensed with MIT.
=> 1: http://github.com/dmoonfire/mfgames-culture-cil/
This page is also a form of documentation by example.
When I started working on the culture logic, I decided to hang the code off as many standards as possible. I was very familiar with ISO 639[2]. ISO 639 is a standardized list of languages and codes to identify them. You can see these in various programs and places such as en
or fr
(English and French respectively).
=> 2: http://en.wikipedia.org/wiki/ISO_639
=> Introduction
There are a few components to the ISO 639 code:
en
and fr
).
eng
).
Actually, there are two versions, a bibliographic and a terminologic code. These are known as the B
and T
codes respectively. The bibliographic code is based on the English translation of the name while the terminologic is based on the language's name for itself.
For example, the bibliographic code for Armenian is arm
while the terminologic is hye
.
According to Wikipedia, the terminologic is the preferred over the bibliographic.
Also, en
and eng
are identical codes, but if you treat them simply as a string, they are different.
To my surprise, there is no dedicated object in the base library for C# for ISO 639 codes. There is some properties in System.Globalization
on CultureInfo
, but nothing that handles the equivalency of en
and eng
. And I haven't had a lot of success with creating non-standard languages (my xmi
for Miwāfu) inside the framework.
There are some enum versions of the ISO code, but they don't have the flexibility to add custom languages.
Unable to find something already there, I created my own ISO 639 class for handling these codes. I called it LanguageCode
because I didn't like how Iso639
looked. It does ignore the other standards for languages right now, but I was thinking that LanguageCode
could handle all of those as separate properties.
var english1 = new MfGames.Culture.Codes.LanguageCode("eng"); var english2 = new MfGames.Culture.Codes.LanguageCode("eng", "en"); Assert.AreEqual(english1, english2);
I set it up so the ToString
translates into the preferred three-character code.
var armenian = new LanguageCode("hye", "hy", "arm"); Assert.AreEqual("hye", armenian.IsoAlpha3); Assert.AreEqual("hye", armenian.IsoAlpha3T); Assert.AreEqual("arm", armenian.IsoAlpha3B); Assert.AreEqual("hy", armenian.IsoAlpha2); Assert.AreEqual("hye", armenian.ToString());
## Memory Memory is something I concern myself with. With a single code, you have: * The pointer to the code * The class overhead for LanguageCode * Three pointers to strings * Three strings in memory. Using an interned string for the code means that the three pointers will remain, but at least I won't have a huge number of three- and two-character strings in memory. ## Singleton I still wanted to potentially reduce the memory pressure even further. To do this, I created a singleton class `LanguageCodeManager` which provides a singleton access to the LanguageCode.
var manager = LanguageCodeManager.Instance;
var english1 = manager.Get("eng");
var english2 = manager.Get("en");
var english3 = manager.GetIsoAlpha3("eng");
var english4 = manager.GetIsoAlpha3T("eng");
This way, you'll only have one instance of “English” regardless of how many pointers you use. Of course, if you also decide to manually create an English tag, it will continue to compare against the singleton version even though it is a separate object. I made `LanuageCodeManager` an injectable singleton to provide for customizations.
LanguageCodeManager.Instance = new LanguageCodeManager();
LanguageCodeManager.Instance.Add(new LanguageCode("xmi")); // Miwāfu
LanguageCodeManager.Instance.Add("xlo"); // Lorban
Assert.AreEqual(2, LanguageCodeManager.Instance.Count);
foreach (LanguageCode lc in LanguageCodeManager.Instance)
{
Assert.IsNull(lc.IsoAlpha2);
}
This also means that most methods that use language codes actually take a `LanguageCodeManager` as a parameter to facilitate testing and isolation. So far, I found that this adds a bit of overhead with many functions but I think it gives the flexibility needed; I'm in the process of converting most of those to argument objects to simplify the process. The default `LanguageCodeManager` does not have any of the ISO codes. It is an empty list of codes. To add the ones stored as a manifest resource, you can use `AddDefaults()` to include them. The initially created `LanguageCodeManager` has these defaults already added. ## Why a class? I decided to make `LanguageCode` a class despite the overhead of the class mainly to make it easy to pass `null` in. Also because if I used a struct, then the item would have at least three string pointers everywhere it is used instead of a single one. ## Why not string? The main reason I just didn't leave this as a string is because of type-safety. I like passing in a language code when it is suppose to be a language code and not worry that one of the five different strings is suppose to be the three-character code. Or if it is suppose to be a two-character. Or something else.
var english = LanguageCodeManager.Instance.Get("eng");
var translation = GetTranslation(english, "bob");
## Special There is one `LanguageCode` that doesn't fit with the ISO standard, “Canonical”. This has a code of `*` for all of the fields and is used to do the final matching or determine the canonical name of something.
var canonical = LanguageCode.Canonical;
Assert.AreEqual("*", canonical.IsoAlpha3);
text/gemini;lang=en-US
This content has been proxied by September (3851b).