Serialisation of Presets

Continuing the discussion from Teletype Firmware: USB Disk Mode Interface:

This is to continue the discussion on binary vs text serialisation formats for preset saving.

Potential future format changes

note: this is not a list of upcoming features, please don’t interpret as such, merely a way to think through future issues

  • Timeline
  • Increasing the number of patterns from 4 to … 8?
  • Increasing the number of scripts
  • Increasing the number of lines in each script
  • Additional pattern variables (e.g. P.WRAP)
  • Saved variables
  • Reducing the number of scene description lines
  • Removal of an existing OP
1 Like

Sorry to bother you on this. But I just want to 100% confirm that you’re happy with either a text based or a binary based solution.

Absolutely!

My preference is for whatever produces safe, clean, efficient code.

I’ve stated before that my preference is for text files. But I would personally feel that I could build a safer version using binary serialisation.

I’m quoting myself from the previous thread

I think the testing framework and the corpus is more important than the format.


edit: I’ve started trying to list all the potential changes we might wish to make to the file format in the first post. I think it will be good to enumerate the possibilities so that we can think how the file format would cope with them

2 Likes

It’s absent any sort of test code yet, but here’s what my clean-slate implementation of a text-based deserializer for the existing file format looks like.

It’s big and clunky and makes me yearn for a library to handle the cruft. Not a final judgement, just a data point.

2 Likes

From a non-technical perspective my main concern is maintaining ease of sharing scenes. As long as it is easy for people to post, share, and collaborate on TT scripts I’m happy with whatever makes sense.

Being able to maintain a library like the github Codex is important.

3 Likes

do you have any recommendations for a library that would facilitate this whole endeavor?

in case it is at all relevant or helpful…

so i’ve been working a lot with json in c/++ lately and have had some opportunity to play with different libraries. a nice thing about json’s ubiquity is that there are a lot of libraries to work with it, and there seem to be a lot of crazy russians intent on winning the prize for tiniest/fastest parser (hard to be both.)

the tiniest one i’ve found is jsmn (~400 LOC in a single header/source.) it also does pretty well in speed benchmarks up to some size like 4kB, after which performance drops off quickly. so, well suited to embedded applications. (here are some unofficial examples). also doesn’t require any memory allocation.

similar but with more test coverage is cJSON, weighing in at ~700 LOC.

both are pure ANSI C89.

(definitely not relevant to the embedded side, but maybe for host side: nlohmann::json is my favorite solution for modern c++ projects, an amazing feat of STL-like library design, and can produce CBOR and messagepack binary output.)

btw: messagepack is a good, suitable format. but AFAIK there aren’t any really suitable implementations for very small embedded platforms (weird.) the canonical msgpack is not huge but not tiny (maybe 2-3k LOC?) and does lots and lots of memory allocation.

5 Likes

I’m fine with JSON, as it has numerous benefits, and will yield to @zebra’s experience and your assessment of what would be best for the monome codebase / ecosystem.

I took (Teletype) out of the title because I believe that the goal is to unify preset format between monome devices. Please correct me if this assumption is wrong.

Serialization Options

  • Text
    • Current format for Teletype
    • Not fault-tolerant
  • Library-driven Binary
    • Inflexible
    • Requires off-device editor / translator
  • JSON
    • Flexible
    • Plenty of editors
    • Easy development of custom editor

Timeline

Teletype version 2.2 will be out within 1 month. If @scanner_darkly’s grid branch is ready at that point, it will likely constitute the next release.

The USB interface redesign will be the next addition and necessitates a rewrite of the preset parser at very least, and I could continue testing the new parser, but if we can decide on a new serialization format, I will include it at the same time.

If the goal is to unify serialization in monome, then the serialization library should exist in libavr32. After library selection, it would be best to expedite its coherent inclusion to libavr32 so that work in the teletype (and other) codebases can proceed with the implementation.


I admit ignorance of most of the monome ecosystem, so I don’t know what other products might want to leverage preset serialization. I do know that there’s some support to allow teletype to manage the presets of i2c slaves, so any such device would benefit from the unified format.

3 Likes

Excellent discussion…

An interesting aspect of this would be the ability to “upload” scripts to external TT “expanders”.
Meaning:

  • Define a script as external w. i2c slave adress specified
  • When changed, send serialized form of script over i2c
  • External “headless” TT expander (with its own I/O) can execute separately

Furthermore, a headless TT expander with it’s own scripts could implement weird things like lambdas etc that might be more suited to real-time DSP.

Of course the expander would have to parse/validate code it receives so not as much feedback on syntax errors

3 Likes

it is a good idea! this was a proposed feature for ansible. a sort of screen-less teletype that has similar i/o and would run standalone, but updatable via i2c.

not enough code-hours in the day.

I’m going to propose using JSON, as I have cycles to put in and the USB disk revamp is waiting on this.

Any objections, considering this impacts all this i2c scene recall discussion?

5 Likes

i’m good with JSON (i give a lot of weight to the opinion of those who are volunteering the code hours)

5 Likes

i’m curious

any progress on this?

Not much. My workstation has been much too cold to work at (wood stove, lol), and watching the world burn was keeping my insides warm.

I have a JSON document format and supporting documentation, but zero work on a parser.

Hi @sliderule, curious if you have any more info on the document format you could share. I’m interested in contributing to adding USB disk functionality to Ansible, and from the work on your branch on usb_disk_mode.c it looks like a lot of what remains is to work out the JSON serialization code.

I figure that one component needed to use the Ansible preset save/load feature would be a tool for loading a pre-USB-support ansible.hex image and writing the presets from the firmware dump to a JSON file, so that users can carry over their existing presets. I wrote a Python program that does this conversion on a hex file, with a goal in mind of making it simple to add support for reading presets from older firmware versions or alt firmware like Ansible Earthsea.

I also thought this kind of tool could prove useful for hashing out the exact JSON format to use in a language where that’s easier to prototype than with a low level embedded-friendly C library, as well as giving me an example document I can put on a USB stick and try to parse from Ansible. Right now the format this program writes is a very straight-ahead dump of all the structs/arrays for the Ansible apps, basically as they’re named in the source code. A couple questions I’ve been thinking about for the format include:

  • Would this simple approach to building the JSON objects cause problems for backward-compatibility, or otherwise pose difficulties for supporting this serialization format on other modules or apps? For the broader I2C ecosystem and a possible future desire to create presets that operate in a multi-module context?
  • The structure I started with lets each app have its own section of the JSON document, within which I suppose a given app can do whatever it wants. There’s also a shared section for scales and a meta section for preset version, etc. Should apps be versioned independently, or is firmware version sufficient/more desirable?
  • Is storage space a concern for all the string literals that the parser would need to match, to the point that strings like alt_note or meta_lswap should be abbreviated? If so, how best to trade this off with human-readability?
  • This is a pretty verbose format. If you’ve got an array of uint8_ts, each byte gets its own number in the JSON representation (edit: realized you can reduce this ~2x by just storing byte arrays as hex strings, ~3x if you’re willing to decode base64). The file I converted from my Ansible Earthsea dump is about 1.15 MB, with the Earthsea part using a little more than half of that - a little less than 10x larger than the nvram_data_t struct in ROM. This might make parsing/emitting individual values from structs on the target easier, and less prone to fiddly issues with byte order/packing, as well as making the format more human readable (though I probably wouldn’t want to do a whole lot of hand editing of a JSON file this size). Storage size is probably not a huge issue given the size of USB disks, but perhaps you don’t want to try to parse/stringify a document that size all at once. So maybe we need to select an incremental parser? (<-- now it’s a question!)

As far as selecting a JSON library goes, I found this benchmark comparing a number of C and C++ options in various ways. Tables comparing all these libraries where you can sort by specific metrics are here. There’s also Frozen, which is not included in this benchmark but says it specifically targets embedded systems. I haven’t done a lot of filtering this list down yet on criteria like how actively maintained they are or whether they have features like incremental parsing.

Apologies for a bit of a brain dump, I realize this is all kinda open ended but would love to get some feedback from folks with some experience working on the Monome firmware.

6 Likes

nice work!

wish i could contribute more time, but my quick-reaction feedback would be

benchmarks

code size and memory use will absolutely be more critical than execution time. (obvs)

need to select an incremental parser?

oh yes. i would also try to use one with only static memory allocation - just found a couple by searching https://zserge.com/jsmn.html
http://www.catb.org/~esr/microjson/microjson.html

Would this simple approach to building the JSON objects cause problems for backward-compatibility, or otherwise pose difficulties for supporting this serialization format on other modules or apps?

i feel like once you have things in a structured format like JSON, you can just deal with compatiblity problems as they come up with specific version<->version patches, it’s always going to be an annoyance but shouldn’t ever become a dealbreaker.

I don’t think @sliderule is that active on here anymore. IIRC he had a change in circumstance (nothing bad I should add) that meant he had less time for firmware dev.

great idea for a conversion utility!

re: versioning - i think it makes sense for each app to maintain its own version, and the parser should be able to skip parts it does not understand (so, if one app was updated and somehow it cannot be made backwards compatible it shouldn’t prevent any other apps from being able to read their data). this leaves solving the issue to whoever is maintaining a specific app which is preferable with multiple devs sharing the codebase. more care will be needed for shared or meta data obviously.

i don’t think storage is an issue really, so readability would probably be preferred. but i wonder if a parser should just provide functions for typical data structures and property names would be set by the app itself? basically what i’m suggesting is each app implementing its own serializing code instead of a universal parser for any data structure (this also covers the case for calculated data members that don’t need to be stored).

i would definitely pack arrays (as a comma separate list of values or a blob or something like that) but in some way that would make it still humanely editable if possible.

incremental parser definitely makes sense. i also wonder if YAML would actually be better than JSON…