Just some quick comments, I’ll try to read the replies more fully tomorrow. It’s bed time here in the UK.

Yes as @zebra says, the lower priority masking inside event_post only applies if it is called from an IRQ.

Consider the following:

  1. I manually run event_post.
  2. It disables IRQs at priority 2.
  3. A priority 1 IRQ happens, and inside that IRQ event_post is called.

Also, as it stands with beta 9, the UI is really really responsive still. Even with me masking all interrupts in the event code.

You have to start sending i2c at 10ms intervals, or using high audio rate triggers to slow it down. And most importantly it doesn’t crash when you do.

(fyi, in 1ms you can send 12.8 bytes over i2c at 100kbs - are we using 100kbs??)

I think i2c bus recovery is a much bigger issue.

Plus some simple tweaks to the Teletype metro code to stop it completely taking over the event queue.

1 Like

we should be at 400kbs, which is the max. but i see that we’re at 132k (??)

i2c bus recovery, yes. i suspect the ansible/trilogy are the issue, crashing while holding the i2c lines hostage.

yeah, still confused about masking then… so, the lower priority masking only happens inside an ISR?

regardless, we might still have cases where a non masked interrupt tries to add an event while an ISR is in progress trying to do the same thing. so we either have to disable all interrupts, or if there are interrupts we’d rather not disable we have to make them stop using the event code.

also, looking at the latest timers_pause and timers_restore i think we might still have the following scenario:

  • process_timer is called, it’s the timer ISR, so it disables the timer interrupts
  • one of the timer callbacks adds an event
  • event_post calls timers_pause, timer_pause_resume_nesting is 0 so it increments and disables the timer interrupts again
  • event_post calls timers_resume which decrements the count and enables the timer interrupts
  • process_timer goes through the rest of the timers - but at this point the timer interrupts are already enabled, which means process_timer can be called again while it’s already in progress…

i wonder if your initial idea of storing the state from cpu_irq_level_is_enabled is more stable.


i still feel i’d rather have a more responsive UI than process more triggers… even though it doesn’t crash (which is a great improvement on its own!) it’s not entirely intuitive to realize you need to lower the rate to get it to respond better again.

in events.c should putIdx, getIdx and sysEvents be declared volatile since they can be changed from an ISR?

I don’t think this is possible, because process_timer is only called from one place, inside the timer interrupt handler, which doesn’t clear the timer IRQ flag until after process_timer has finished running:

  process_timers();
  // clear interrupt flag by reading timer SR
  tc_read_sr(APP_TC, APP_TC_CHANNEL);

The only gotcha, is if calling cpu_irq_enable_level(APP_TC_IRQ_PRIORITY) will also clear the interrupt flag.

Probably… I’ll have a look. I’m a bit clueless when it comes to that stuff if I’m honest…


@scanner_darkly, I’ve read through your comments about changes to how the event queue works… I’m afraid I’m running out of time (and sanity) for bigger changes like that. The 2.0 thread was started on the 17th of March, and the coding goes back even further to the 13 of November.

What I really want to do is make the simplest changes to get the system as stable as possible. Bigger changes will have to wait till 2.1.

It sounds like you’re up for doing some experiments though, here are some of the thoughts I’ve had…

  • Maybe the Teletype should get it’s own event code rather than the version shared with all the other modules. Priority queue?
  • AFAIK the AVR32 supports memory barriers… (git grep barrier in libavr32) which is the requirement for lock free queues, etc, (I think). Probably impossible to do with a JTAG though.
  • I’m also wondering if init_teletype / init_trilogy / etc, should all be moved out of libavr32 into their respective modules’ repos.
1 Like

Okay…

libavr32 changes have been updated in the following PR:

Ansible PR to update libavr32 and make changes to use new timers_pause and timers_resume functions (as well as the older msc changes):

This is the Ansible firmware hex built from that PR: ansible.zip (79.4 KB)

I have (briefly) tested Teletype, Kria and Meadowphysics modes. I can’t test the Arc ones.

@tehn can you test the Ansible firmware in more depth (either the zip or build it yourself from my repo)?

I have also built a version of the latest Teletype beta but using the new changes to libavr32, no problems to report either.

Huff. Time for a cup of tea.

2 Likes

ok! pull and testing coming up within the next couple hours.

yeah, that’s what i meant. i need to read up more on exactly when the flag is set/cleared and how does the masking work…

re: volatile - this seems to be the recommendation. i don’t think it’ll make a difference (we’d probably see some issues already if it did) but seems like a prudent thing to do, i can make this change when i experiment with doing keyboard and screen refresh on their own priority. i have an idea on how to do it without modifying event code.

memory barriers - need to read up on this as well. i thought declaring volatile was enough but now looks like maybe adding memory barriers would also be needed… going through http://stackoverflow.com/questions/26307071/does-the-c-volatile-keyword-introduce-a-memory-fence

i should be able to test your version tonight (almost done setting up the test case with a bunch of telex modules) and then can experiment with some of these ideas.

Continuing from here:

I think @scanner_darkly is now going to try to take us up to the next level of fast. It looks like he’s already made some decent progress too. Ludicrous speed ho!

Some thoughts… (apologies if this is all known to you, but it’s good info to share with everyone else).

I think you/we are going to need a proper testing protocol (luckily USB saving / loading works well now!).

I’d suggest:

  • audio rate triggers only (I was using a Dr Octature sine, AFAIK that will go well above hearing ranges, best to try both triggers and a waveform).
  • audio rate triggers sending i2c.
  • i2c in fast M
  • i2c in fast M plus audio rate triggers
  • i2c in fast M plus audio triggers sending i2c

Anything else? It might a good idea to list the specifics of what you do too.

It’s pain, but I think we’re at the stage where we might fix one thing and break another. I know it also seems redundant to do the lower level tests (e.g. audio rate only), but I think you’ll hit different performance limits in each of the tests.

Can you make each change you do a separate commit. If we do get a regression it makes it easier to track down (and git bisect is magic).

Don’t forget to test with Ansible! We can deal with the Trilogy modules later.

You mentioned changing the i2c IRQ priority. Do you want to add a new I2C_IRQ_PRIORITY define to conf_tc_irq.h and then reference that in conf_twi.h. I think it will be helpful to keep the level definitions in one place.

Lastly I think it’s worth keeping the encapsulation of timers_pause / timers_resume, even if all they do is call cpu_irq_disable_level / cpu_irq_enable_level without the nesting stuff. You could also think about changing them to app_pause / app_resume if you think that makes more sense (and move them to a new file app.c), similar to how it was done in the Aleph code base.

1 Like

I did not follow the multiple threads about this over the last days due to massive day job demands and a lack of understanding for all those irq stuff you are discussing.

I have some time now over the next days and can follow the suggested protocol. Just in case I missed something, what are the actual firmware versions for Ansible and Teletype I should use?

Don’t worry, I feel like that too sometimes :stuck_out_tongue:

That’s more for the future changes that @scanner_darkly is going to make.

Most likely there will be a 2.0b10 out later today, that would be the best thing to test. (That wont include the changes that @scanner_darkly is making though, they will come later, probably in 2.1).

I would suggest trying to test all the features in 2.0, as I’d like to get a release made once the documentation is up to speed. Concentrate on sensible i2c testing rather than the 10ms stuff that’s going on here.

There is no new Ansible firmware to test really. That came up as some of the changes I made to the lower level library for all modules caused issues for Ansible, so it required a fix for future releases. The zip file several posts above contains that firmware. In theory it should add some extra stability to Ansible, but it might not.

1 Like

Okay, so just to be clear: I downloaded Ansible 1.50 and now replace the hex file inside theat folder with the one from the zip file posted above and then update it and wait for the 2.0b10?

With beta9 on TT, metro at 25, metro toggling TXo TRs, and script 1 setting random TXo CVs, I can ratchet up the rate of triggering script 1 to ~1100Hz before it crashes. In terms of write performance I think that’s pretty amazing.

1 Like

The Ansible firmware is only if you’re curious.

And yes, beta 10 coming later today. You can play with beta 9 now if you want. It’s pretty good as far as performance goes.

:smiley:

@scanner_darkly’s changes look like they go even further than that too.

Still, we’d love to get rid of the all the crashes.

Okay, my curiosity led me to noticing Ansibel to get unreponsive in teletype mode with putting out triggers in audiorate over i2c. Teletype remains unaffected by this. Ansible stays in that condition even when I turn down the audiorate input to teletype but starts working when connecting a grid to leave the mode w/o a need to power cycle. Unconnecting the grid and switching bakc to teletype mode works too.

I had a M 25 running while doing this and several audio rate inputs on teletype sending trigger outs. The trigger outs on teletype stay working too when Ansible fell asleep.

EDIT: I just noticed that the front panel buttons stay unresponsive even after all teletype scripts are vanished and Ansible is in Kria mode (could program patterns but did not get on the config page w/o power cycle)

Is this all with the Ansible firmware from the zip in this thread?

Next question, do the same bugs appear in 1.5.0?

Yes and no. It’s with the new firmware from above. I have one with this installed now and one with v1.50 again. They both freeze, sometimes in the the same moment sometimes one takes a bit longer (I think the new one, but only seconds). Then both react on switching to a grid app by hotplugging the grid but on 1.50 the grid gets not responsive and unplugging it and then switching back to teletype app does not consistantly make this responsive again, power cycle is needed mostly. Not on the new one. The lack of frontpanel responsiveness is not consistent. On the new one it sometimes works for the v1.50 I cannot say since the grid is unresponsive.

1 Like

Sounds like there is no difference between the zip version and 1.5.0.

Which is fine by me. The test version of Ansible was just to check that the changes hadn’t broken anything.

Thanks for testing.

agreed! i think we should concentrate on getting the 2.0 release ready, and if i manage to improve i2c / fast triggers stability this will be 2.1.

for i2c / fast triggers issue my main goal is to get it to the point where it’s absolutely stable. i hope that this goal is achievable, but if not perhaps we could at least strengthen it to the point where it crashes significantly less often and under significantly higher load (which my testing suggests is possible). speed is a not a concern - i’d rather have a system that is not capable of processing audio rate triggers (as it was never designed for that) but is stable (as a side note, there is an interesting question of whether triggers or timers should have higher priority - i’ll experiment with this as well).

for now my plan is to continue experimenting with IRQs / events / timers / i2c to see if i can get it not to crash. with my set up i was able to get it to freeze very quick with the previous beta version. the changes i posted made it crash less fast (~10-15 min). the latest change i tried was giving i2c lower priority (1 instead of 3) - my theory was that the i2c interrupts getting masked caused i2c to hang, so my previous changes were aimed to make sure this doesn’t happen (only masking the timer and the trigger interrupts, since i2c does not use the event code anyway). but changing it to lower priority actually made it more stable - looks like i2c code is much more resilient than i thought… it did freeze eventually - not sure how long it took as i left it running overnight, but it did run for at least 1.5 hours which is a huge improvement.

with that in mind i’m going to try doing complete masking again and going back to using cpu_irq_save / cpu_irq_restore and see if that makes a difference. my plan is to test with making the same changes in ansible, to eliminate the possibility of broken ansible i2c taking down tt as well.

agreed on the testing plan - but i think there is no point on starting testing until i manage to get it stable in my testing under a heavy load, or at least improve it significantly. i’ll post my progress here and once i get it to the point where i’m happy i’ll start a new thread and will ask for help with beta testing.

these are not incremental changes though, they’re all related - they have to be done in conjunction.

good idea, i’ll do this!

agreed on this as well, i’ll make it part of my changes!

in my tests (beta9, metro at 10, doing some i2c stuff in metro) i was able to get it to freeze pretty quickly even with below audio rate triggers…

1 Like

:heart:

:heart: :heart:

As incremental as possible then?

I hate to bring up the dreaded resistance again… do you want to try with a smaller number of modules on the bus? You’ve got way more than most. Also, try building the latest master of Ansible.

I was running i2c to Ansible (only thing on the bus) at 2ms metro times.