yes, your deconstruction of the trilogy-only simple ii message is correct.

and yes, there’s a lot of debugging leftovers still in there-- the old test i2c message queue, the tx vs tx_now, etc. what a mess.

i’ll try to reproduce the bug now also with that test built you posted.

2 Likes

TT makes its own 5v, off the 12v rail

The main change was to move the tele_tick call and the metro script call into the event loop (the timers now only generate events), as you suggested earlier.

I’ve run it for a while with the following results (each of which has taken > 30 mins to appear).

  1. Keyboard stopped working, front button and knob carried on. Metro kept running

  2. Came back from the shops to find that the metro had stopped on it’s own, even though the teletype thought it was still running, restarted it.

  3. An actual crash. Must have taken 2 hours though.

There have been a few other glitches, UI ones, and on occasion a different Ansible gate would fire.

My metro was set to 10, the fastest allowed. The only script set was M with TR.P 5 in it. I manually set the metro time. Pamela’s Workout was outputting triggers at 1/64 at 104 BPM into input 1.

IMO, the device is a lot less likely to crash with the changes. But it does get unhappy and a bit unpredictable.

i’ve been running for an hour, 10ms metro with MP clock-out driving TR 1 (seems around 25ms, visually)

haven’t seen any issues yet.

i was considering a method to thin out the event queue-- one approach with TR inputs is to have a status register. set the bit of a TR that’s been triggered, and create a new event only if the status register was previously empty. the event processor then runs scripts based on the status register and clears it upon completion (this would need to be interrupt protected). rationalization being, if TRs are firing faster than are getting processed in the event loop there’s a natural limiting as to not overflow the event queue.

2 Likes

Seems a little hackish maybe? We’d have to place teletype specific logic into the general purpose events logic.

Alternative idea. Add time stamps to events when they are posted. (Need to be wary of timer overflows though.) Downstream event handlers can use this information as they see fit. The trigger event handler could check and only allow a script to run if it’s been 10ms since that script was last triggered.

It does’t reduce the number of events, but it does help process them quicker when they are coming in too fast.

Could also be used to implement a BPM calculation on each trigger input (useful for clock multipliers) in downstream code rather than having to do it in libavr32.

Another idea. Multiple event queues with different properties. E.g. a UI queue that’s not time critical but where it’s important that events aren’t dropped (so that we can guarantee the state of the system). And a ā€˜musical’ queue for things that want to be done quickly, but where we will liberally drop events when the load gets too high.

Or as @scanner_darkly suggests, let the user know that the event queue had to drop events and gently advise them that they should reboot (due to non-determine system state) and try to ease back the load.


Back to i2c, I’ve modified the i2c_* functions in libavr32 to return the status. In the Teletype firmware I’m printing that out to serial when it’s non-zero. I think that’s probably worth leaving in. It’s also possible we could try to feed that info back to users too. The UI code has been split up a lot so it’s hopefully not too hard to do. We could also add a i2c tx icon to the activity bar that lights up for a second or so whenever there is an i2c transmission.

4 Likes

Since the conversation has turned to timers and interrupts, I figured it was appropriate to share this observation that I made the other day around the accuracy of M timing on the Teletype. Perhaps this is something that can be addressed in the midst of everything else. :slight_smile:

When comparing the Teletype’s M timer to the the independent TR metronomes on the TXo, I noticed that they would very quickly fall out of sync with each other and phase. I didn’t expect perfect sync - but the rapid onset of phasing suggested that one or the other was running at a different rate. Curious, I hooked up both to my scope. I set up identical scenarios on each device for a single trigger output: 25ms trigger pulses 500ms apart.

Looking at the trigger outputs side by side, the Teletype seems to be running a hair fast.

Teletype is Blue; TXo is Red.

I haven’t poked around in the TT’s code to figure out why - at a glance everything looked cool. I filed it away on my list - but figured I’d share the observation now since @sam is looking at that area of the code.


Thanks to everyone for all of there efforts on the TT and i2c platform. I’m feeling pangs of guilt that I’m so wrapped up in soldering expanders that I haven’t had much time to devote to the investigation. Hopefully you will forgive me when I finally ship the things. :wink:

2 Likes

Two things have changed that will make a difference to the M code.

  1. The display refresh was taken out of the timer callback. (This will improve metro jitter.)

  2. The metro code is no longer executed in the timer callback. (This will worsen metro jitter.)

Might be worth testing it again with the firmware posted in this thread, or once 2.0 beta 4 comes out. In particular I’d interested to know if it’s consistently slow, or if it’s jittery. Can your scope measure that?

1 Like

ok-- finally got a ā€œcrashā€ after about 6 hours

not a hard crash, however.

  • lost USB keyboard connectivity. replug fixed.
  • METRO script just stopped responding. M.ACT 0 and then M.ACT 1 resumed it.

well, at least the errors are different now

2 Likes

Same bugs as me then. Very interesting. They might have nothing to do with i2c though…

If I get a chance tomorrow I’ll set the same experiment up without any i2c, just pulsing a built in output.

1 Like

Updated and testing now. Thanks for pushing out a test firmware so quickly! I’ll report back if I get a crash. 30 minutes has been my threshold for calling it stable. If I make it past that, do you want me to run it until I experience the same Metro and keyboard bugs that you and @tehn are experiencing?

2 Likes

ok, i’m running the same test without i2c now.

2 Likes

1 hour 15 minutes in: no crash yet. This is great progress :smile:
I’ll report back when the Metro or keyboard bugs manifest themselves.

2 Likes

2.5 hours in. No keyboard or metro issues yet.

3 Likes

ok i got a non-i2c metro ā€œcrashā€

again fixed with a M.ACT cycle

2 Likes

Alright, 4 hours without an issue. I’m going to call it stable on my end and stop wasting electricity =)

4 Likes

Excellent. I’m back in the midst of the school holidays now that the long Easter weekend is over. I should get an hour or 2 later on today to: merge the i2c fixes into my 2.0 branch, post a new beta with those changes and open up a PR.

Once the PR is merged we can have a look for the keyboard and metro bugs. I wasn’t expecting them to be duplicated, but once they were, my gut reaction is that it’s some kind of integer overflow bug. Possibly in the timer code itself, possibly in some refactor I’ve made. Just had a quick look at the code and hold_key_count also looks like it has the potential to overflow too.


@trickyflemming if you get a chance, do you want to try increasing the complexity of the script? Maybe use the Jumpy Edges script?

1 Like

i doubt hold_key_count is the cause, even if it did overflow it would just continue incrementing it. my feeling is that this has something to do with timers. i thought perhaps the timer generating events for handler_keyTimer becomes corrupted but it also processes the front button presses and sounds like it still responds to that when keyboard and metro crash.

there is a separate timer however used to generate events for handler_HidTimer, maybe it becomes corrupted somehow, and the metro timer as well?

i wonder if ticksRemain overflows when it’s decremented, i don’t see how it’s possible but who knows, maybe there is some weird race condition… one way to prevent it would be to check if it’s greater than ticks after this line:

@sam could you give this a try? if this is indeed what happens it would result in the timer set to a really long delay which would explain the behaviour…

i really like the idea of separating critical events into their own queue.

1 Like

… timers becoming corrupted would also explain why M.ACT fixes the issue

1 Like

Yeah, thinking about it, if we’ve got problems with both the metro script and the keyboard then the timers are used by both.

I’ll try and have a look this week. I think I’m going to want to clean the code up a little before I dive too deep (delete the comments, try and improve the readability a little).

One other thing, maybe change if(t->ticksRemain == 0) { to if(t->ticksRemain <= 0) {, just in case it does end up below 0. (edit: actually should probably check if ticksRemain is signed or not first!)

1 Like

unsigned, so something like if (t->ticksRemain == 0 || t->ticksRemain > t->ticks)

still feels like the probability of this happening is very low, especially for these two timers while all others seem fine, but wouldn’t hurt to protect timer code a bit. another option to confirm it’s timer related would be to restart both of these timers when the front button is pressed, and once it gets into this state try it and see if it helps.