Does your teletype stop responding after an hour or two?

i can do it tonight - it’s not much work, and if you don’t mind giving it a try this will actually help to confirm the fix works. my main concern was having to test again as it’s possible it might regress as i introduce more changes, so made sense to save more extensive testing for later once things will be more finalized.

speaking of which, i should have a new version of TT in the next few days.

3 Likes

here is the earthsea version that includes the same tweaks as the above versions of teletype and meadowphysics, could you give this a try if you get a chance?

earthsea.hex (133.4 KB)

2 Likes

so when ES crash this way sounds like it doesn’t affect TT? TT is still operational? is it the same behaviour you had with the previous version of ES?

and did it behave the same way previously as well?

sorry - want to make sure - the previous test was with the official ES firmware and the latest test is with the one posted here, and in both cases TT firmware was the latest one posted in this thread?

i have a theory about what might be causing this - could you give this version a try? unfortunately i have to fly blind as i can’t test remote commands with my ES (so there is a possibility remote commands will not work at all with this version).

earthsea.hex (133.0 KB)

this is semi related to i2c code, so might help a bit with the overall investigation.

2 Likes

awesome, thanks for your help with testing!

making progress with TT - previously i had it pretty consistently freezing when using ansible remotes. with the latest tweaks (applied to both TT and ansible) it ran for almost 12 hours before locking. at this point i’m not sure i can get it to where it won’t lock period - this will require a much deeper dive into interrupt/i2c code, i’m afraid, but it does appear to be quite a bit more stable. and this 12 hr test was an extreme test - 10ms metro script reading from TXi and ansible and updating 4x TXo, TT CV and trigger outs, and 4 trigger scripts being triggered at audio rates doing i2c reads and writes as well as updating TT CV.

there is still some refactoring i’d like to do, such as improving screen refresh and perhaps making keyboard respond better. i’d also like to try increasing i2c rate and see what difference that makes.

2 Likes

tried increasing the i2c rate as an experiment. the current value is 132000.

if i change it to 400000 teletype seems to communicate fine with telex modules, but a single command to ansible locks it immediately. lowered it to 264000 (2x the current rate) and a single command works, but if i do a lot of remote commands in a fast metro script it locks within a minute.

this makes sense - increasing the i2c rate increases the demand on the i2c line, so this translates into a higher chance of failure (which i suspect is what causes the locking). i think what makes sense is to actually decrease the rate - i doubt this makes a big difference performance wise (i2c is pretty fast as it is), but if it increases stability it’s worth it. i’ve changed it to 100000 on my tt version and will continue testing with this value. it’s hard to judge how much of a difference this will make as now it’s at the point where even an extreme test worked for almost 12 hours, but i think this should improve things overall (perhaps with a side effect of making things more stable even without using extra i2c boards).

That should help anyway due to the increased allowance for rise time. The big downside is that any send/receive operations over i2c (e.g. TI.PARAM) will cause the Teletype to wait longer doing nothing.


I’m not quite back at it yet, but… I had one idea and need someone to test something for me. Could someone try setting very long slew times on the Teletype and/or Ansible and see if that impacts i2c rates and crashes?

i think we could get it more stable without affecting performance too much. i haven’t measured the actual impact but i doubt this will make a huge difference. at some point we can play with the rate and run some simple tests and compare.

i had another extreme test running for over 10 hours with no crash and just tried setting slew time on one of the outputs and it seems fine. this is a modified version though so might not test what you want (i’m testing changes to improve screen refresh and keyboard response and i moved some of the work that was done with events/timers into the check_events loop, including CV updates - i think i’ll move this one back though). what did you have in mind?


been running another extreme test, 2ms metro script (writing to 32 TXo outputs, reading from TXi and ansible, updating one CV and one trigger on TT itself), and 4 trigger scripts (updating 2 TXo outputs, one TT CV and one TT trigger output, reading from TXi and ansible) being triggered at audio rates:

it hasn’t crashed yet. this version has fully responsive keyboard (even when using extreme trigger and metro rates) and no artifacts on screen refresh.

3 Likes

Basically, even though SPI is a lot faster than i2c (1mbs vs ~100kbs), it’s still slow compared to the speed that the CPU runs at. CV slew updates are done inside the timer interrupt, maybe that’s not a good idea…

It sounds like your version has moved all the SPI code out of a timer callback anyway?

Anecdotally it seems as though Teletype is now more stable than Ansible, I’ve been trying to think why. I think the big reason is probably the poor hygiene regarding swapping event handlers between modes, but I was also looking at the way that Ansible TT mode uses timers. That got me thinking about the SPI issue.


Excellent progress by the way!

1 Like

not all the SPI code (i think there are other calls elsewhere) but i did move the screen refresh and the CV refresh code out of the timer callbacks / event queue. i think i’ll move CV refresh back to the timer callback though - it’s set to run every 6ms, and it’s probably the most timing critical code, so the event loop is not a good place for it since an event can in theory take longer than 6ms.

it shouldn’t affect i2c anyway - timer interrupts have lower priority than i2c interrupts.

in what way? any specific issue? i’m running my tests with ansible firmware updated to my version of libavr32, i can test if the issue is still present.

1 Like

Nothing specific, just from my experience of it and from various threads. I guess that’s what I meant by “anecdotally”.

yes, in my initial testing updating teletype firmware made it more stable unless ansible commands were used - if i added any ansible commands it would still freeze pretty quickly. this was while using the official ansible firmware.

all my latest tests are done with ansible updated with my latest libavr32 and it seems much more stable now. but it’s entirely possible there are still some outstanding bugs in i2c code in ansible that are not libavr32 related.

1 Like

here is a short video showing improved screen refresh and keyboard response - this is with a 2ms metro script and 4 triggers getting hit at audio rates - previously this would make it sluggish to the point where you could hardly type anything (which made it difficult to get out of this state as it would be difficult to change the metro rate):

2 Likes

update:

with the latest fixes to improve screen and keyboard response i broke USB read/write, made some additional changes and was able to fix it. also moved CV updates back to a timer callback, which introduced issues with screen refresh again, so i went through all the SPI code and protected it with irqs_pause / irqs_resume which seems to have fixed the issue.

the latest code seems fine so far and i’ve been running an extreme test for almost 17 hours now with no crash.

at this point i don’t anticipate making any further changes, so my plan is to incorporate all the latest changes from the official 2.0 branch (including telex updates) and post beta versions for testing next week.

2 Likes

Anyone fancy trying to calculate how many milliseconds it takes to send a full screen refresh over SPI?

It might be useful to know.

:smiley:[quote=“scanner_darkly, post:150, topic:7564”]
at this point i don’t anticipate making any further changes, so my plan is to incorporate all the latest changes from the official 2.0 branch (including telex updates) and post beta versions for testing next week.
[/quote]

Don’t forget to post betas for everything if you keep the i2c speed change!

1 Like

yep, my plan is to post tt/ansible/trilogy firmwares all updated with the latest libavr32, not just for the i2c rate update but there are other changes as well that are important (most critical being the i2c interrupt level). anything using the older version will still be able to cause a crash.

3 Likes

@sam - all the latest changes are on github btw if you wanted to take a look! and i moved irqs_# methods from conf to src.

https://github.com/scanner-darkly/libavr32/tree/dev
https://github.com/scanner-darkly/teletype/tree/dev

2 Likes

Looks interesting… particularly the changes to check_event. I do think trying to find a better solution to the SPI issue than masking all interrupts might be worth exploring (but I don’t have the time to do it, so feel free to ignore me!).

Will try to digest once I’ve got rc1 out the door on Monday.

Are you going to squash the commits down before the PR? I take it the plan is still for this to make it into 2.1 right?

You’re going to have a lot of work coordinating the release of new code for all the modules… :cold_sweat: I feel for you.

(Won’t you need betas of TXi and TXo too? And JF…)

1 Like