This page permanently redirects to gemini://d.moonfire.us/blog/2015/04/01/dictionaries/.
=> Up a Level
For the last four years, I've been trying to write a program called Author Intrusion[1]. There were a number of reasons for this, but one of the biggest was that I couldn't find any program that handled dictionaries (really word lists, but a lot of people use the wrong name).
This morning, when I woke up, I ended up doing a random search that took me through a long winding journey that finally gave me an interim solution that is pretty solid until I can get Author Intrusion finished (which may be another four… decades or so).
As with any long-term writing project, I've created a large number of characters, groups, and locations. Most of them are based on a conlang while others just sounded cool. However, when I'm spell-checking my chapters, I need to have those names in the dictionary otherwise they'll continually show up as a typo.
One common solution is to add those names to the program's dictionary. This works out pretty well, until the end of the project. Then, the hundreds of names are not longer relevant for the next series but still show up in suggestions for every project in the future.
My preferred novel-writing editor, Emacs[2], has the ability to have per file word lists. This is called “LocalWords”, but it means that I can identify a list of valid words without adding it to my permanent dictionary. Of course, this means I have to keep copying that per file list into each new chapter, which then gets the new words for the characters I've introduced in that chapter. And when I create the chapter after that, it keeps moving and growing.
=> 2: http://en.wikipedia.org/wiki/Emacs
Because I just finished the draft of Sand and Bone[3], I have built up a three book collection of proper names. This list is in the top of every file, which means I have to scroll down a little to even see the title of the chapter.
Rutejìmo Chimípu Pidòhu Shimusògo Tateshyúso Pabinkúe Jìmo Mípu Dòhu Pidòhu's Desòchu Sòchu Mapábyo Kechikìma Hyonèku Opōgyo Chimípu's Gemènyo Mènyo Pábyo Kìma Mapábyo's Zotetsūchi Rutejìmo's Hyonèku's Gemènyo's Ryayusúki Wamifuko Nèku Hána Zúchi Mépu Nenemépu Shimusògo's Desòchu's Myunédo Shimusogo Karawàbi Wàbi Tsubàyo Bàyo Tsubàyo's Tejíko Palasaid Markon Tejíko's Mifuníko Yutsupazéso Yutsupazéso's Karawàbi's Nibonyāchu Jyotekàbi Yunujyoraze Byomími nibonyāchu ranuchyahāhi Mípu's shimusogo dépa alchemical dépa's Mifukiga Chobāni Rabedájyo Badenfumi Shigáto Porlin Kamanen Kakasaba Mioshigàma Pabinkue Mikáryo Mikáryo's tazágu Palarin Mistan rikunámi Ryachuikùo Tateshyúso's Nedorómi Chidomifu Kapōra Káryo Chyábi Ganifúma Ralador Markin Kidorīsi Mifúno Mafimára pyābi Mifuno Faríhyo mizonekima chyòre Rolan Madranir Kiríshi Som figaki tòra chyóre's shikāfu Tachìra's Chobìre's Wh Tachìra Monafuma Gidon Kormar Nigímo wabōryo Faríhyo's avian's Ríhyo Gímo ryodifūne Tsudakìmo Myobùshi Funikogo Ganósho Myobùshi's Gidorámi Pyatose myofūne Pyatòse Gichyòbi Higoryo Ríshi Jacin Torabin Kishifín's Makohūni's Tsu Rojikinomi Fimúchi Rojikinòmi Rapinbun Finol Pokīmu Waryōni Nyochizoma clanless Chizoki Miyóna Kyōti Tijikóse Chyobizo Nichikōse Tifukòmi Talsir Shifáni Milifor Krum Opōgyo's banyosiōu kojinōmi kojinōmi's Nyobichóhi Mifúno's helmed Kitópi Piròma Tópi Bakóki Bakóki's Nifùni Byochína Chobìre Midoshina Kafūma Korechyoki Baroshìko Tedoku Nuchikomu Machikimu Garènu Piròma's Kitópi's Nana dépas Kosobyo Kosòbyo nocked Fidochìma Foteramàsu Foteramasu chima Tsupòbi Dimóryo Fùni petabiryōchi Chína Techyomása Mioráshi Kosòbyo's Kidóri Atefómu's Kidóri's ambushers Tikói Menodàka Tateshyuso Kos Ràchyo Záji Gichyòbi's
That's a lot of names, including a couple that were removed for pacing. Almost every single one of them isn't in the final chapter of Sand and Bone, but they were in one of the hundred or so chapters before it.
There is also no easy way of removing the Miwāfu names and passing them into the next story since those are pretty common across any story I have in the desert.
As far as I could tell, there were only two ways of handling all those names: put it in the permanent dictionary or shovel it along the chapters as I went.
About a year ago, I found out that Vim[4] had a setting that allowed multiple dictionaries, but I didn't want to grok a new writing environment when I had (high) hopes for getting Author Intrusion done.
=> 4: http://en.wikipedia.org/wiki/Vim_%28text_editor%29
This morning, I found a random link that led to another. Eventually, I came up with Wcheck[5]. It looked like it had potential for resolving my dictionary problem, so I spent an hour or so trying it out.
=> 5: http://www.emacswiki.org/emacs/WcheckMode
In the end, I couldn't get it to work. But, the process of trying gave me a little epiphany on what could work. Instead of changing the library, I decided to write a wrapper around aspell
that interrupted checking words and substituted my own lookups instead.
The results fell into place pretty easily. With a local.words
file in the same directory as the chapters, my newly created caspell
program loads it into memory. When Emacs asks for a word to spell, it checks to see if it knows about the world already and verifies it as passed even if the base dictionary doesn't know about it.
Likewise, adding a word adds it to the local.words
file, not the aspell
personal dictionary.
The basic format of the file is pretty simple.
word nibonyāchu word dépa
I originally went with “&” as the suggestion used in the pipe, but then I realized I could use readable words without too much of a problem. So, it became “word” and made things a lot easier to process.
Getting the basic lookup was a nice little rush, but then I realized that I could return suggestions. That lead into writing code that gave suggestions for “incorrect” words that I want to expand into real ones.
suggest Shimu = Shimusogo, Shimusògo suggest shimu = Shimusogo
There is a certain mindset when things are working. It is easy to move into the next code, though times to the results take longer to develop. In this case, I decided to allow one file to include another. This pulls in the words and suggestions from other files but doesn't merge them together.
command include "../../sand-and-blood/chapters/local.words"
And then I had it. Dictionaries for per file, per project, per world, and any other combination that I need. I'm planning on creating them over the next couple files, but I think it will let me chain dictionaries so book two will include book one's words. And book three will add book two's which also includes ones. And then Raging Alone[6] includes all three books.
There was one more thing I ended up doing before I stopped. I used Emacs's abbrev-mode
to do auto-corrections while writing. That way, I can type “Rute” and have it expand into “Rutejìmo” complete with accents. Same with various greetings, names, and locations.
As you can guess, I added that feature into the file too.
replace GS = Great Shimusogo replace GT = Great Tateshyuso
This feature isn't built in, so I wrote a special mode for the program that takes a local.words
and creates a abbrev.el
file for the mode.
$ ls local.words $ caspell --emacs -p . $ ls abbrev.el local.words
A larger example for the local.words
for Raging Alone:
command include "../../sand-and-blood/chapters/local.words" suggest Shimu = Shimusogo, Shimusògo suggest shimu = Shimusogo replace GS = Great Shimusogo replace GT = Great Tateshyuso word Badenfumi word Basamiku
The entire thing is rewritten whenever I add a word to the dictionary. Each section (except for commands) is sorted so it always produces a consistent order. This makes source control easier to work with (always sort output for that reason, it saves a lot of time later).
Once all the files are created and populated, I had to tell Emacs about the new program and how to hook up the abbrevations. This is done in the .emacs
file. I have a hook for text mode that automatically configures what I need.
(defun my-text-hook () (setq fill-column 99999) (setq abbrev-file-name (concat (file-name-directory (buffer-file-name)) "abbrev_defs.el")) (quietly-read-abbrev-file (concat (file-name-directory (buffer-file-name)) "abbrev.el")) (setq save-abbrevs nil) (abbrev-mode 1) (setq ispell-program-name "caspell") (setq ispell-personal-dictionary (file-name-directory (buffer-file-name))) (flyspell-mode 1) (visual-line-mode) ) (add-hook 'text-mode-hook 'my-text-hook) (add-hook 'markdown-mode-hook 'my-text-hook)
The key parts are the “ispell” lines for hooking up caspell
. The “personal dictionary” uses the name of the text file ((buffer-file-name)
), figures out the directory, and then passes it into caspell
via the -p
parameter.
The other bit is the “abbrev” lines to look for abbrev.el
in the same directory as the text file and uses it. It seems to work and I'm pretty happy with the results so far.
Like almost everything else I write, I threw it up on Github[7] along with a few other programs I've been using. I'll document them eventually but the caspell
is pretty functional as-is.
=> 7: https://github.com/dmoonfire/mfgames-writing-perl/
Categories:
=> Programming
Tags:
=> Author Intrusion | Emacs | Perl
Below are various useful links within this site and to related sites (not all have been converted over to Gemini).
=> Now | Contact | Biography | Bibliography | Support
=> Fiction | Fedran | Coding | The Moonfires
=> Categories | Tags
=> Privacy | Colophon | License
=> Mailing List
=> https://d.moonfire.us/blog/2015/04/01/dictionaries/ This content has been proxied by September (ba2dc).Proxy Information
text/gemini;lang=en-US