Yes, of course - I am sorry if I was being stupid.
Just finished my mouser order.
It was a bit tedious and I hope everything fits on the board when it arrives…
:slight_smile:

Another test (sorry!). You’ve got an Ansible connected up right?

Can you modify a failing script to use the Ansible outputs instead of the TXo outputs and see what happens?

1 Like

@Leverkusen - bus boards passed with flying colors for me. I was able to turn off the inelegant hardware pull-up in one of my expanders and run a fairly large configuration solidly. I’m not sure what is going on with @trickyflemming right now - but he does have quite a lot of variables at play. I don’t think it is his bus - or at least not in the way that I’ve seen the bus fail. Those issues were very replicable and were all cured by the bus board for me.

@trickyflemming - how are you patching and triggering that script? also, have you tried running it at a slower M? Perhaps you are overloading the TT slightly and it is going into an overrun condition after time. (@sam - i like your suggested test too!)

@sam: I’ll run the Ansible test later today.
@bpcmusic: I’ve posted the I and M scripts. For 1-4, they’re copied from Jumpy Edges. For the patch itself, I’m taking the first four TR outputs from the TO and plugging them back into the Teletype. Essentially, I & M setup a clock division patch on the TO, M & PARAM work together to control the speed of that patch, and then that patch is used to control Jumpy Edges.

My plan with learning the expanders was to go through the Teletype Studies a second time, but this time with an eye on how to use the expanders to build upon each concept.

@sam: How should I use the Ansible in your proposed test? I’m using the TO’s metronome capabilities to set up a clock, and the metronome speed is updated on the M script (at 75 ms refresh). Should I just use, say, the clock outputs from Levels and ignore M? Should I use Kria and update KR.PERIOD from the TT’s PARAM knob?

Can you post a txt file of the script you are using please.

tt00s.txt (762 Bytes)

Had to re-update to 2.0b to save >_<

Just connect the 4 gate outs from the TO expander to the TT script inputs 1-4. The order doesn’t matter. Any level of PARAM works, whether super fast or slow. It will crash around 10 minutes or so. The record was about 20 seconds I think.

I have free time tonight and tomorrow. Anything else that I should test?

Sorry for not getting back to you sooner, the school holidays are just starting here in the UK, and the last week of school before the break is always busier than you anticipate.

Anyway, first a little background… what we’d like to do is create the simplest script that fails, but we also want to be open minded. It’s possible (probable even) that the bug has nothing to do with i2c, my vote is either a race condition or memory corruption in either the Teletype or the TXo firmware.

So, what I’d like to do is starting removing bits of script / functionality until we stop getting lock ups.

This is the order of operations I’d like you to follow if possible (all based on the CRASHTEST script you posted).

  1. Make sure that CRASHTEST as is still fails.

  2. Remove all the cables from the TXo to the Teletype. Does the Teletype still lock up?

  3. Remove the contents of scripts 1-4 and test:

#M
X SUB 320 RSH PARAM 6
L 1 8 : TO.TR.M I MUL I X

#I
L 1 8 : TO.TR.M.ACT I 1
M 50
M.ACT 1
  1. Test with just 1 TO.TR.M.ACT
#M
X SUB 320 RSH PARAM 6
TO.TR.M 1 MUL 1 X

#I
TO.TR.M.ACT 1 1
M 50
M.ACT 1
  1. Remove PARAM:
#M
TO.TR.M 1 <what's a sensible value here?>

#I
TO.TR.M.ACT 1 1
M 50
M.ACT 1

Can you do those and report back please?

Thank you for the detailed response!

Okay, I’ve tested the following so far. For each test, I’ve marked it as “stable” if it ran for 30 minutes without a lock-up

CRASHTEST as-is: Crash. (Test 1)
Remove all cables from the Teletype: Stable (Test 2)
Remove all cables and remove scripts 1-4: Stable (Test 3)
Keep I, M, 1-4. Trigger scripts 1-4 from Varigate: Crash (Just to test a different external trigger source)
Keep I, M, 1-4. Trigger scripts 1 & 3 from Varigate: Crash (Since 1&2 and 3&4 have command overlap, I wanted to see what happened if only 1&3 were triggered)
Eliminate I & M, Keep 1-4. Trigger scripts from Varigate: Stable.

One test I’m running right now is keeping Scripts 1-4 and adding an M script that gets called every 50 ms. This script simply calls “TR.TOG 4”. So far, it has been stable for 15 minutes. I’ll update this thread later after running a few more tests. After this, I’m going to call TR.TOG on the Ansible, followed by TO.TR.TOG. I want to see if updating those from the M is unstable. Finally, I’ll run your proposed tests 4 and 5. Since Test 3 passed, I imagine that 4 and 5 will be stable, though.

:astonished: wasn’t expecting that!

So, let’s start from CRASHTEST again…

Can you try the following sequence (again we want to minimise our failure case)

  1. Keep CRASHTEST as is. With the 4 cables connected, but on each test pass, remove a cable. Until you have no cables. At what point does it go from crashing to stable. (Please power cycle between tests.)

  2. Connect all 4 cables, but delete scripts 1-4.

And then on a separate track, use the varigate to trigger the Teletype. Start from CRASHTEST but modify M and I as such:

#M
X SUB 320 RSH PARAM 6

#I
M 50
M.ACT 1

If that still crashes, then modify M to: (keep I as is)

#M
X 0

and then if it still crashes delete the contents of M (but keep I as is).

I hope that’s not too hard to follow.

Great. About to test that, but one more data point…

I just changed my M script to TR.TOG 5 (with the Ansible in Teletype mode) while using the rest of the Jumpy Edges script (no TO code in there… that was the only line in the M script). It crashed within ten seconds.

EDIT: As an aside, here’s my thought as to what is potentially happening. It seems that the Teletype is internally stable. I was smacking it pretty hard with the Varigate while the M script was flipping one of the main toggles. Everytime I do anything involving the ii bus on the M script while using the main TT script triggers it seems to crash.

I think there may be some weird race condition involved with the M, ii, and trigger sampling schedulers, but only when using all three.

With my “No Triggers” test (M script triggering an ii message every 50 ms), things were stable.
With my “Lots of Triggers” test (M script triggering internal message every 50 ms), things were stable.
As soon as I have triggers, M, and ii together, that’s when it seems to crash.

Okay, just ran all of the #1 tests. It crashes with any number of cables (4, 3, 2, or 1). So, I suppose it’s only stable with no cables there.

On to test #2.

EDIT
Test #2, part 1.
Deleting scripts 1-4 but leaving the cables in leads to a crash. This is making me believe more in my scheduler race condition theory.

ANOTHER EDIT:
I’m about to test the M thing for Test #2, although I think it’s pretty similar to the TR.TOG M test that I ran above. I looked at the STACK_SIZE and STACK_OP_SIZE defines at https://github.com/samdoshi/teletype/blob/2.0/src/state.h and noticed that they’re defined to a constant 8. I’m not too familiar with the Teletype code, so this may be way off. However, my M script is sending 8 messages to the ii bus each tick. Is it possible that there’s a situation where the various schedulers are trying to push to an already full stack?

@trickyflemming really interesting results. Apologies for not being more interactive this weekend, I was building furiously and now am on my way to South America for a couple of days. @sam - thanks for your efforts here as well!!

So those stacks aren’t what you think.

STACK_OP_SIZE defines the maximum number of commands you can add to the S mod.

STACK_SIZE is how large the stack can grow inside the teletype language. 8 is plenty. e.g. L 1 8 : TO.TR.M I MUL I X will firstly use multiple different stacks, one for the PRE and one for each run of the loop. Loops from L are processed sequentially, not concurrently. In the given example the stack will never get larger than 2. The actual stack is defined in the linker script and is 8kb.

You’re right that there is an issue between the i2c code and triggers / M scripts. A few options (a.k.a. a bit of brainstorming):

  • The i2c code is not ‘thread safe’ enough. Maybe there are some critical bits that need masking from interrupts.

  • Possibly an issue with messages being sent and received out of order.

  • Long running code in an interrupt (which will be masked by default?) causing the i2c internal rx or tx buffers to overflow

  • Too many long running processes (e.g. i2c) which lead to the event queue overflowing or generally fubaring.


More simplifications if possible, can you try the following 3, all with empty scripts 1-8, but a fast Varigate trigger into input 1.

The I script is the same for all of them:

#I
M 50
M.ACT 1
  1. Pulse TXo
#M
TO.TR.PULSE 1
  1. Pulse non-existant TXo
#M
TO.TR.PULSE 9 
  1. Pulse Ansible
#M
TR.PULSE 5

If you remove the trigger do the crashes stop?

2 Likes

I finished my powered i2c bus board tonight. Together with a simple one trigger to four random CV’s on output 1 - 4 this crashed within a minute:

I have two Ansibles, ES and MP connected. When I started the Metro script Ansible was not in the right mode, which was no problem. I switched it to telepepe output mode then and it started to blink. But not for long.

I also have a lot of display errors with truncated lines. Like this:

:cry:

Just ran all six scripts.

For all three with the cable inserted: failure.
For all three without the cable inserted: stability.

From all of these tests, I’ve noticed we have a set of three things:

  1. External triggers (i.e. gates plugged into the TT from other modules).
  2. The M script.
  3. Anything ii related.

Right now, any two of these co-exist peacefully on my TT. The moment all three are in action, I experience a lock-up within 5 minutes or so.

I can also report that the issues have changed a bit.

What I experience is that outputs get simply inactive while the whole system keeps working.

For example Ansible stops emitting CV changes transmitted form teletype but the trigger outputs still work as expected and patching in the grid still switches to kria. But no response on CV changes via scripts or in live mode. When I change the output numbers in the script from CV 5,6,7,8 to CV 1,2,3,4 while the script is running and repatch the four cables from Ansible CV to teletype everything is working again.

I also experience something like this on the teletype outputs occasionally, e.g. a trigger output just stays on and does not react to any commands.

It seems to me that this occurs with the newly build bus board but I could imagine that it is about the teletype firmware and I just did not use it that much while I was waiting for me building the new bus board.

I concur with this. Sorry for not helping with the testing myself, but my Teletype is currently out of my main case and has no other modules alongside it. I’m going to move this over to the Teletype Firmware i2c Debugging thread and tag various people as this is almost definitely a software bug in the i2c implementation.

@Leverkusen are you too only seeing the display errors when using triggers and an M script that transmits over the i2c bus?

Also is the display corruption happening all the time or just when using i2c? (fyi, display corruption like that is usually caused by memory corruption, which may be indicative of the type of i2c bug we’re dealing with). @trickyflemming have you seen any display artefacts like that?

I’ve experienced this as well. You can see it in the video I shared the other day on the expander thread; the ansible should be mirroring the CV values for the teletype - but at some point the CV became frozen while the triggers continued to function:

I’m assuming this is an issue in the Ansible itself and not the Teletype as all other i2c messages are still reliably transmitting.

@trickyflemming and @sam - apologies as well for not testing myself. I’m in Argentina for business through tomorrow and have been away from my TT to try your scenarios. It looks like you have done an excellent job of narrowing down the issue. I’ll work to replicate when I’m back in town. :slight_smile: