I’m planning on replacing the code that’s used on the teletype to parse user input.
The current version
The current function parse uses strtok, strcmp and strtol. An input is split into tokens using strtok, then if it starts with a digit (or -) it is parsed as a number, otherwise it’s strcmp'd one by one with each entry in to ops table until a match is found.
There are a few downsides to this:
-
strtol is very permissive with inputs (i.e. it will parse 123ADD as 123), on the plus side you can type hex and octal numbers in, e.g. ADD 0x100 0x10 will work on your teletype!
-
Using strcmp to identify ops is probably a bit inefficient. I’m not sure how much difference it really makes though, as parsing isn’t done very often.
-
strtok has hidden state! It’s definitely one of those “what were they thinking” kind of functions. Due to it’s simplicity it really limits what we can do with the language, the most obvious case being the : separator, which currently requires spaces around it.
The new version
I’ve spent the last few days playing with Ragel, I’ll be honest, my head hurts quite a bit now. Ragel can construct quite complex state machines in C (or other languages), but one of it’s uses is as a lexer (or scanner). Anyway, the plan is to use it in 2 ways, in fact pretty similarly to how the current code works.
-
The first will tokenise the input, much as strtok did, but with the ability to recognise different delimiters. This function will be very liberally in what it allows as a token. That token will be handed over to…
-
The second function will identify the token as either a number or an op, replacing the current use of strcmp. This function will be extremely conservative in what it allows to match, e.g. the entire token must be consumed to form a match.
Challenges
The biggest downside will be the additional complexity of building and contributing to the teletype code. Everyone will need to install Ragel on their computers (FYI, brew install ragel, pacman -Sy ragel).
Another complexity is informing Ragel what ops are available and how it returns information on what op it has found. Currently an op is referenced by it’s index in the ops table (stored as a 16bit int), the simplest solution would be for Ragel to return a pointer directly to the op struct, but that would require storing a 32bit pointer, doubling the size that a scene/script takes up in RAM/ROM. My current plan is to write a small C program that reads the op table a generates the Ragel file mapping op name to op index, e.g.
%%{
"ADD" => { return 0; };
"SUB" => { return 1; };
}%%
Another option is to keep using strcmp for the 2nd function.
Anyway, school run beckons…