@trickyflemming can you try this firmware please

teletype-i2c-fix-dd2070a-dirty.zip (104.0 KB)

I’m trying to simplify the II transmission code on the Teletype.

At the moment there are 3 different ways to send an message over i2c. tele_ii, tele_ii_tx and tele_ii_tx_now.

tele_ii_tx and tele_ii_tx_now are identical, I think tele_ii_tx used to use the event queue and the i2c buffer. I’m going to delete tele_ii_tx_now as well as the i2c buffer code.

tele_ii is the original II sending function designed for use with the trilogy modules. It posts an event to the event queue which is then ultimately sent using the same i2c_master_tx function. I think it’s redundant to use the event queue to send II messages like this as we’re now moving to a situation where all code that sends i2c is run from the event queue already.

My plan is to update the one instance where tele_ii is used to instead use tele_ii_tx.

@tehn is this the correct way to send messages to Trilogy modules?

void op_simple_i2c(const void *data, scene_state_t *ss, 
                   exec_state_t *es, command_state_t *cs) {
    int16_t message = (intptr_t)data;
    int16_t value = cs_pop(cs);

    uint8_t address = message & 0xF0;
    uint8_t message_type = message & 0xFF;

    uint8_t buffer[3] = { message_type, value >> 8, value & 0xFF };

    tele_ii_tx(address, buffer, 3);
}

where data will be a constant from ii.h, e.g.

#define WW_PRESET 		0x11
#define WW_POS			0x12
#define WW_SYNC			0x13

My reading of a constant such as WW_PRESET, is that the first 4 bits define the address, but the entire value is the message. Only for Trilogy modules though. Right?

I should be able to test it by using the simulator to print the messages sent.

3 Likes

yes, your deconstruction of the trilogy-only simple ii message is correct.

and yes, there’s a lot of debugging leftovers still in there-- the old test i2c message queue, the tx vs tx_now, etc. what a mess.

i’ll try to reproduce the bug now also with that test built you posted.

2 Likes

TT makes its own 5v, off the 12v rail

The main change was to move the tele_tick call and the metro script call into the event loop (the timers now only generate events), as you suggested earlier.

I’ve run it for a while with the following results (each of which has taken > 30 mins to appear).

  1. Keyboard stopped working, front button and knob carried on. Metro kept running

  2. Came back from the shops to find that the metro had stopped on it’s own, even though the teletype thought it was still running, restarted it.

  3. An actual crash. Must have taken 2 hours though.

There have been a few other glitches, UI ones, and on occasion a different Ansible gate would fire.

My metro was set to 10, the fastest allowed. The only script set was M with TR.P 5 in it. I manually set the metro time. Pamela’s Workout was outputting triggers at 1/64 at 104 BPM into input 1.

IMO, the device is a lot less likely to crash with the changes. But it does get unhappy and a bit unpredictable.

i’ve been running for an hour, 10ms metro with MP clock-out driving TR 1 (seems around 25ms, visually)

haven’t seen any issues yet.

i was considering a method to thin out the event queue-- one approach with TR inputs is to have a status register. set the bit of a TR that’s been triggered, and create a new event only if the status register was previously empty. the event processor then runs scripts based on the status register and clears it upon completion (this would need to be interrupt protected). rationalization being, if TRs are firing faster than are getting processed in the event loop there’s a natural limiting as to not overflow the event queue.

2 Likes

Seems a little hackish maybe? We’d have to place teletype specific logic into the general purpose events logic.

Alternative idea. Add time stamps to events when they are posted. (Need to be wary of timer overflows though.) Downstream event handlers can use this information as they see fit. The trigger event handler could check and only allow a script to run if it’s been 10ms since that script was last triggered.

It does’t reduce the number of events, but it does help process them quicker when they are coming in too fast.

Could also be used to implement a BPM calculation on each trigger input (useful for clock multipliers) in downstream code rather than having to do it in libavr32.

Another idea. Multiple event queues with different properties. E.g. a UI queue that’s not time critical but where it’s important that events aren’t dropped (so that we can guarantee the state of the system). And a ‘musical’ queue for things that want to be done quickly, but where we will liberally drop events when the load gets too high.

Or as @scanner_darkly suggests, let the user know that the event queue had to drop events and gently advise them that they should reboot (due to non-determine system state) and try to ease back the load.


Back to i2c, I’ve modified the i2c_* functions in libavr32 to return the status. In the Teletype firmware I’m printing that out to serial when it’s non-zero. I think that’s probably worth leaving in. It’s also possible we could try to feed that info back to users too. The UI code has been split up a lot so it’s hopefully not too hard to do. We could also add a i2c tx icon to the activity bar that lights up for a second or so whenever there is an i2c transmission.

4 Likes

Since the conversation has turned to timers and interrupts, I figured it was appropriate to share this observation that I made the other day around the accuracy of M timing on the Teletype. Perhaps this is something that can be addressed in the midst of everything else. :slight_smile:

When comparing the Teletype’s M timer to the the independent TR metronomes on the TXo, I noticed that they would very quickly fall out of sync with each other and phase. I didn’t expect perfect sync - but the rapid onset of phasing suggested that one or the other was running at a different rate. Curious, I hooked up both to my scope. I set up identical scenarios on each device for a single trigger output: 25ms trigger pulses 500ms apart.

Looking at the trigger outputs side by side, the Teletype seems to be running a hair fast.

Teletype is Blue; TXo is Red.

I haven’t poked around in the TT’s code to figure out why - at a glance everything looked cool. I filed it away on my list - but figured I’d share the observation now since @sam is looking at that area of the code.


Thanks to everyone for all of there efforts on the TT and i2c platform. I’m feeling pangs of guilt that I’m so wrapped up in soldering expanders that I haven’t had much time to devote to the investigation. Hopefully you will forgive me when I finally ship the things. :wink:

2 Likes

Two things have changed that will make a difference to the M code.

  1. The display refresh was taken out of the timer callback. (This will improve metro jitter.)

  2. The metro code is no longer executed in the timer callback. (This will worsen metro jitter.)

Might be worth testing it again with the firmware posted in this thread, or once 2.0 beta 4 comes out. In particular I’d interested to know if it’s consistently slow, or if it’s jittery. Can your scope measure that?

1 Like

ok-- finally got a “crash” after about 6 hours

not a hard crash, however.

  • lost USB keyboard connectivity. replug fixed.
  • METRO script just stopped responding. M.ACT 0 and then M.ACT 1 resumed it.

well, at least the errors are different now

2 Likes

Same bugs as me then. Very interesting. They might have nothing to do with i2c though…

If I get a chance tomorrow I’ll set the same experiment up without any i2c, just pulsing a built in output.

1 Like

Updated and testing now. Thanks for pushing out a test firmware so quickly! I’ll report back if I get a crash. 30 minutes has been my threshold for calling it stable. If I make it past that, do you want me to run it until I experience the same Metro and keyboard bugs that you and @tehn are experiencing?

2 Likes

ok, i’m running the same test without i2c now.

2 Likes

1 hour 15 minutes in: no crash yet. This is great progress :smile:
I’ll report back when the Metro or keyboard bugs manifest themselves.

2 Likes

2.5 hours in. No keyboard or metro issues yet.

3 Likes

ok i got a non-i2c metro “crash”

again fixed with a M.ACT cycle

2 Likes

Alright, 4 hours without an issue. I’m going to call it stable on my end and stop wasting electricity =)

4 Likes

Excellent. I’m back in the midst of the school holidays now that the long Easter weekend is over. I should get an hour or 2 later on today to: merge the i2c fixes into my 2.0 branch, post a new beta with those changes and open up a PR.

Once the PR is merged we can have a look for the keyboard and metro bugs. I wasn’t expecting them to be duplicated, but once they were, my gut reaction is that it’s some kind of integer overflow bug. Possibly in the timer code itself, possibly in some refactor I’ve made. Just had a quick look at the code and hold_key_count also looks like it has the potential to overflow too.


@trickyflemming if you get a chance, do you want to try increasing the complexity of the script? Maybe use the Jumpy Edges script?

1 Like

i doubt hold_key_count is the cause, even if it did overflow it would just continue incrementing it. my feeling is that this has something to do with timers. i thought perhaps the timer generating events for handler_keyTimer becomes corrupted but it also processes the front button presses and sounds like it still responds to that when keyboard and metro crash.

there is a separate timer however used to generate events for handler_HidTimer, maybe it becomes corrupted somehow, and the metro timer as well?

i wonder if ticksRemain overflows when it’s decremented, i don’t see how it’s possible but who knows, maybe there is some weird race condition… one way to prevent it would be to check if it’s greater than ticks after this line:

@sam could you give this a try? if this is indeed what happens it would result in the timer set to a really long delay which would explain the behaviour…

i really like the idea of separating critical events into their own queue.

1 Like

… timers becoming corrupted would also explain why M.ACT fixes the issue

1 Like