In a comment on my last post about the J programming language, Peter Kotrčka mentioned a flaw in the simple parsing trick I had used. I was basically assuming that the numbers in my input were separated by exactly one newline (or other J “word”).
=> J programming language | Peter Kotrčka
I had spent some time trying to implement a parser that just looks for digits and ignores everything else, but failed. There’s an example of a sequential machine in their help pages, but I couldn’t get it to simply emit a list of numbers.
Peter made me return to that parser and I think I finally understood how it works! Thanks. 🙂
First, create an array mapping every input byte to a code. In this case, we create an array with 256 zeroes, and then we set the code to 1 for every digit.
m=: 256$0 m=: 1 (a.i.'0123456789')}m
The result looks something like this: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... – as you can see there are a bunch of ones in there. 🙂
Next we need a state transition table. It has 3 dimensions.
OK, so here’s how I built it:
NB. state 0 state 1 s=: 2 2 2$ 0 0 1 1 0 3 1 0
Or graphically:
<"1 s ┌───┬───┐ │0 0│1 1│ ├───┼───┤ │0 3│1 0│ └───┴───┘
Rows are the current state, columns are the input code we’re looking at.
Can you see it? I started thinking about it like this:
42 16
, starting at index 0, in state 0, with j being the beginning of a word pointing at -1 (in other words, no word is started) ...[1 1]
(current state 0, input code 1)[1 0]
(current state 1, input code 1)[0 3]
(current state 1, input code 0)And so on.
The output code 3 means we emit a word and we do not begin a new word. This is important at the end: we could have used the output code 2 but that means we emit a word and begin a new one (j remains set), and the result is that any trailing garbage at the end is turned into a word.
And now the parser works for everything:
(0;s;m) ;: '42x16xx1' ┌──┬──┬─┐ │42│16│1│ └──┴──┴─┘
You might be wondering about the 0 at the beginning of that parser definition. The answer is that this tells the parser what to do with the words it finds. 0 means that the words end up in boxes. 2 means the word index and length, for example:
(2;s;m) ;: '42x16xx1' 0 2 3 2 7 1
At last I understand it!
#J #Programming
(Please contact me if you want to remove your comment.)
⁂
I installed j701 on the App Store.
=> j701
I was also happy to learn:
J is written in portable C and is available for Windows, Linux, Mac, iOS, Android and Raspberry Pi. J can be installed and distributed for free. The source is provided under both commercial and GPL 3 licenses.
=> source
– Alex Schroeder 2019-12-13 00:31 UTC
When installing j901 on Debian, I had to install the following:
I don’t know what I should have installed instead.
– Alex Schroeder 2019-12-13 17:55 UTC
text/gemini
This content has been proxied by September (3851b).