Teletype Firmware i2c Debugging


#201

Not my favorite question :confused: Yes, I’m trying to use it for music and not finding pathologic failure states (and that’s ignoring the deeper implications of what a “musical context” means in an open-source sequencing environment plus whether these aesthetic and computational boundaries should be defined in the documentation).

The patch that I started with is Teletype Studies #1 with an added Metro script for the TXo. The Metro speed is fast, but slower than its limit and slower than what @bpcmusic demonstrated in his tests.

The patch that we’ve whittled it down to is extremely minimal. Literally one line on a Metro script and nothing else (see @sam’s post above: Teletype Firmware i2c Debugging). It fails when a trigger is received on a blank script slot (really, any script slot, but we at least know it’s not because of an over-complicated script).


#202

I just want to underline that my message was not to suggest any sinister dealings or hidden flaws whatsoever. I had a bumpy road, who knows why. It could have been a hundred things.
I was merely conveying my excitement and encouragement for those who are working hard to resolve these problems. For the benefit of all of us.

In terms of “not being alone” when problems emerge: definitely.
There is always something strange happening in my case, which contains many modules from many manufacturers. Sometimes I shake my head in disbelief that it all seems to more or less work together, somehow!


#203

And to that point: you are absolutely not alone.
I see that exact behavior in my system as well.


#204

ha, yes i see the ambiguity in the question, apologies. my main interest, which was answered, was about the threshold of failure.

we’re dealing with what should be considered a pretty small processor (66mhz). it’s difficult to define the boundaries given the various ways you could possibly overrun the system. i’ve already made a sort of definition by not allowing the metro clock to go below 10ms. but there are certainly ways to bog the system. there are ways to bog any digital processing system. but unfortunately this takes time to learn-- ie, super long loops, fast metro, all 8 inputs firing scripts with extreme speed, script recursion. it’s much different than the common understanding of “too many plugins in my daw” but i think you’re correct-- we should put together some suggestions for ways to bog your TT (and hence, avoid them)

that said, the issue reported is not related to processing power, luckily. it seems like an interrupt priority issue.

@sam @bpcmusic @scanner_darkly

what do you think about scripts getting executed within the event loop? so metro and trigger inputs would simply queue a script event? it would quantize script execution to a 1ms clock (which i don’t see as a problem), but would certainly prevent any collisions (ie, a trigger firing a script while a metro script is mid-execution, which i think might be possible at this point-- not having looked at the code).


#205

It took me a bit to, hopefully, understand this right and I am very sorry if my above post came across as suspecting sinister dealings or murky affairs. I hope you can excuse that I am not a native english speaker and have to look words up quite a bit when I post something - as it seems not always with success. :hushed:
So maybe the term “in disguise” has a connotation I did not mean to imply.
What I meant was simply that the issues have been discussed on other communication channels than the open forum while I had the feeling that it is not easy to be taken seriously reporting them getting answers like ‘it is already fixed’ or if I even use it in a musical context.


#206

Seems like a good idea to me. Everything I’ve ever read about embedded computing says to always keep your interrupts as short as possible. We want the consequence of overloading your teletype for the timing to go out, not for it to lock up or misbehave, so keeping events sequential and not allowing them to run in parallel will make things a lot lot easier.

Trigger inputs already post to the event queue (set up in init_teletype.c).

How about I post a test version so that we can see if it fixes the i2c issues? The next beta of 2.0 should be out on Easter Monday, I could post a parallel version with the stuff in timers changed to post events to the event queue. If that fixes the crashes then we can think more deeply about the best way to code it (who knows about ‘priority queues’ then?)

I’m not sure that would be the case?? The event queue is processed continuously.


Teletype 2.0 beta (release candidate 2 released 13th July 2017)
#207

likewise, i apologize for the phrase “musical context” which makes no sense-- i should’ve just asked if it was working in a non-stress-testing situation. i appreciate all of your feedback and reports!


#208

you’re correct. trigger scripts are already in the event system. and it should be trivial to move the metro scripts there. i’m now confused why/how the TR input interrupts could possibly be breaking the metro, however.

but it will be good to get metro scripts out of an interrupt…

ah, yes, you’re right. i was confusing issues.


#209

@Leverkusen
:slightly_smiling_face: this is an issue of digital communication, even among native speakers of English

https://youtu.be/naleynXS7yo


#210

(speculating here, i haven’t looked at that part of code closely, will try to do so in the next day or two) i agree that having trigger interrupts simply generate events that are then processed in the same queue as all the other events would make the system act in a more predictable manner and help eliminate some bugs possibly caused by race conditions (which are usually hard to detect/debug).

this would also ensure that all the scripts are atomic. i mentioned this in the other thread - when we get into this extreme territory we’ll get to the point where it either misses some triggers, or it doesn’t have the time to complete a script. which could be an option, where you choose which you would prefer, but it’s probably much simpler to just treat each script as atomic. we’d need to consider what to do if the queue fills up - should it miss some events until it can accept more?

@tehn - where you thinking of timer interrupts? actually, this makes me wonder… iirc timer interrupts don’t use events, this probably also applies to metro script? might make sense to also change metro to use events as well.

i would also like to look at the size of i2c read/write buffers and see if there is something there, but probably a good idea to approach things one at a time (really wish i wasn’t away from my studio so i could try and reproduce and try a couple of things…)


#211

I like this idea. I think the most frustrating thing about the whole crash situation is that the unit totally locks up.

Since the module typically sits at the center of a patch, this is a Bad Thing (especially if you’re a live performer!). The failure state of such a module should be exploitable, not catastrophic. Double triggers, missed scripts, timer slowdowns, etc. are all “fun” side effects that could lead to serendipity.

The Shnth is a good example of this. The sampling rate dynamically slows down when the processor is getting smacked. I saw @wednesdayayay mention this in another thread, with the side effect being described as “cough syrup”. An alternative solution could have been dropped samples, but that almost never sounds good. The sampling rate solution can sound great, thus making it a better choice for failure states.

Throwing away events is my vote, since I imagine the other solution would be to empty the existing queue (or maybe some priority-based ranking system). In my mind, that solution would behave somewhat like the existing Stack (S) command that fills up with up to 8 commands and ignores the rest. Basically, if the failure state behaves like one of the existing commands, that’s an ideal solution.


#212

A full message queue is a common problem in any concurrent system. There’s basically 3 solutions:

  • drop something from your queue to make room for the new item
  • accept but ignore the message
  • have the sender retry later

The third option can work if the sender can maintain its own (often small, or single) message buffer while it waits for the receiver to make room.

However in my microcontroller software I almost always just ignore events if a queue is full. Most of the time it just means your external event rate has some kind of natural limit imposed on it by the event queue.

You can also use a mix of strategies by having priorities of events from different senders. For example on the Teletype you probably won’t want to ignore keyboard events, since it means your UI would become unresponsive if the processor is loaded.

Wish I had time (and a Teletype) to help sort this out in earnest, it’s right up my alley…


#213

A little knowledge to illuminate the discussion…

The event code is here:

It’s a ring buffer, with maximum size of 40. Any extra events are discarded. If you’ve ever had the FTDI cable connected up and managed to infinite loop or such, you’ll have seen the screen flooded with ‘event queue full’ messages (I have!).

The next 2.0 beta is nearly ready, I just to need to document the new keyboard shortcuts. Once that’s done I’ll grab the screwdriver and move some modules over to near my computer and see if I can recreate the bug. My wife and daughter are off for a walk tomorrow, so I should get most of the day to investigate.


#214

also:

looks like the original intent was to use an event for tele_tick (which drives delayed commands and tr pulses).

looking where the metro script gets called…


#215

here it is, indeed runs on a timer, not using events but a timer callback:

as a side note, i think there is still a benefit to having timer callbacks do some (very light) work instead of simply placing events into the queue, for those cases where more accurate timing is important. but we could make sure that anything touching i2c is only done via events.


#216

it could be useful to expose it on the live screen as well, have an indicator showing that the queue is close to full or full.


#217

I think it is a good idea to have different layers of reliability, especially for timing relevant things as triggering or i2c sync commands. Having a CV updating a bit late or even skip a change seems easier to cope with than everything crashing, loosing synchronization or phase shifting a sequence by one (quarter) step.
On the other hand nearly everything on teletype is timing based and different musical approaches need have different priorities.

Maybe it is possible to create safe spaces? As the Metroscript being prioritized, or a priority order within scripts 1 - 8 so you could put the less relevant events in the upper script ranges wich might get skipped over at high rates while the lower ones always get executed.


#218

here’s the relevant moment:

instead of run_script() we just generate an event

everything else basically needs to stay in place


#219

I think a priority queue might be worth investigating too. Also, many ‘timing critical’ calculations can be deferred if events are timestamped when they are created.

Maybe…, event_post returns it’s success state. Personally I’d say it’s not something recoverable though, more something to tell the user about so that they can report it as a bug.


#220

That’s was my plan, I was just going to create a kEventAppCustom event handler with a sub type enum to indicate metro and tick.


FYI it’s probably best to use this tree to examine code, there are very big differences in some places.

https://github.com/samdoshi/teletype/tree/2.0