Teletype Firmware i2c Debugging

As I’m working on my Teletype expanders and debugging my new operators, I’ve ran into a few problematic behaviors with the ii bus. These issues extend beyond my expanders and affect the Ansible in its expanded mode - in fact, some of the things I’m seeing are specific to the Ansible. Both the TT and the Ansible are on the latest firmware.

I’ve posted some “unlisted” youtube videos below that show some of the problems in action.I’ve been trying to solve the i2c read issues and timeout issues for a while now and am stuck at the limits of my experience. Any help would be appreciated - and will benefit Telex modules and the Ansible.

i2c Read Causes the Teletype to Freeze

The Teletype wants to lock up when reading values over i2c. This happens sporadically when values are read on the command line or in a trigger script. It happens 100% of the time when values are read during a metronome event first - and very quickly when the read is on a row further down the list.

I’ve been able to replicate the behavior in previous versions of the firmware with my expanders and in the latest version (1.3) with my expanders and the Ansible. These problems have been plaguing the successful integration of the input expander (TXi) that I’ve been making for the Teletype.

This can be reproduced easily with the Ansible. Go to the metronome script and type CV 1 CV 5. If the metronome is running, your Teletype should instantly lock up. If you put another event before it, it might take a few pulses.

I can also get the Teletype to lock up by simply executing the command in other places repeatedly (command line or in scripts). It takes a lot more repetitions to get it to choke in those places.

Loop Actions to the Ansible Skip Values in Scripts

I’m seeing some very strange behaviors when trying to loop through events in scripts with the Ansible. I have not experienced this with any of my expanders; at this point it appears to be Ansible-specific.

L 5 8 : TR.PULSE I

Only pulses outputs 1 and 3 on the Ansible.

L 5 8 : CV I V RAND 10

Only shows random CV values on outputs 2,3 and 4 on the Ansible.

i2c Calls to Nonexistent Modules Causes the Teletype to Freeze

This is one that happens whenever you attempt to execute commands for a module that is not connected. It is very easy to do with the numbering scheme for outputs and inputs that span modules. One typo and your Teletype freezes and you loose all of your work.

I am not expert enough in the workings of i2c and the AVR library to find an easy solution for this. There is an endless loop in the i2c library that waits until there is an answer - locking up the TT in the process. I’ve played with attempts to hack the library to timeout and break the loop. I was only partially successful and have not found a way to solve the problem and restore the TT to function after an errant command.


Thanks again!

b

3 Likes

I think it might be a good idea to collect all the i2c / teletype remote bugs in one thread.

I can replicate the loop/pulse behaviour and it works strange with all combinations of loop points while loop 1 - 8 works as expected on the four teletype outs, but not on the four ansible outs.

While testing with another script ansible suddenly started giving out pulses on an output that wasn’t even used in the script. Also slew commands are not getting through at all when coming from a script. No problem when typed in live. I tried it with a Metro script and a 16 step pattern that affects pitch, slew and M.

I don’t know if this is connected to the loop behaviour but teletype also skips single remote commands every now and with all trilogy modules and ansible (all on newest firmware). This is known for about a year but it seems that no solution has been found for it.

Would be great if this could be solved somehow. As soon as I get to something slightly more sophisticated things get out of control wich is very frustrating.

EDIT: I am not sure about this but it seems that the ansible trigger out (5) that fires though it should not from the script emits pulses alternating with the one that should (6) and does…

1 Like

some sort of bug crept into the i2c code as i certainly tested all of the mentioned conditions and they worked fine on the initial release of ansible.

i’ll do a review of the code and post a functional review of the approach

1 Like

mostly for myself, but in case anyone wants to come along for this debug.


general trace of i2c below. from the lowest level to highest.

lib i2c init: libavr32/src/init_common.c

lib i2c functions: libavr32/src/i2c.c

tt i2c send, trilogy: teletype/module/main.c

messages are sent out after being put into a queue
this function is called by the event handler

tt spawns ii events at line 1789, called from the teletype engine (ops)

tt i2c send, ansible:

line 845
messages are sent out after being put into a queue
this function is called by the event handler
this is different from the trilogy handler in that it handles variable length i2c messages

events queued from line 1796, called from teletype engine (ops)

line 1820 has a non-queued SEND NOW ii function

line 1824, rx is not queued. it is read immediately.

tt ops: teletype/src/ops/hardware.c

cv SEND

cv RECEIVE


ok-- now to tinker

3 Likes

@tehn thanks for looking into this! Let me know if I can be of any help.

Read Freezing

I have some untested, knuckle-headed theories on the II read in the M event as this has been plaguing my development of the expanders all year. It has been on my to-do list for a while now.

I’m doing essentially the same thing with the reads for the TXi 1.3 code - send a command as to what you would like to poll and then request the value. (I’m only using half of the buffer for the tx_now output setting - the “1” there is intentional.)

https://github.com/bpcmusic/teletype/blob/telex/src/ops/telex.c#L243

Ansible Output Uncertainty in 1.3

I was tinkering around a bit more and identified the following behavior with my output expanders. I have four connected (including my earliest prototype that includes an LED layout goof) and am able to easily address all 16 CV or TR outputs. For these devices, no outputs are being skipped like in the earlier Ansible examples.

I am running into some limits on the number of II events that I can trigger per metronome event. I seem to be able to fire only one line of events - as shown in the video below (using my fork + branch of the TT code). I used to be able to put tons of commands in the M event in the pre 1.3 code that I had extended to support the TELEX devices.

Note: Ignore the LED layout on the far right module. As I said above, that was my first prototype that included a PWM output routing error that put the TR 4 LED below the four CV LEDs. Need to get it out of my demo rig - it always makes me cringe.

Just a little vote of support & encouragement here! I know some of these issues relate to the Just Type firmware for JF as well, so excited to see things move forward.

On my side, I’ve noticed TT can freeze when sending multiple commands to JF from the same script. The receive code is triggered by interrupt, grabs the data on the i2c line and dumps it in a buffer. That interrupt is on a higher priority IRQ than the audio DSP loop, yet still when the DSP is running a more intensive mode, the freeze (on TT) is more likely – I can’t explain why!

Hope that’s not too OT.

2 Likes

Also words of encouragement from me, hoping for the fix!

I have experienced TT freezing with my trilogy on many occasions in the past, and I never thought that this might have been related to Metro functionality. But now that I think of it it could very well have been the case, especially with Earthsea, when I was hoping to clock it from TT…

I didn’t mention my half-baked, harebrained theory above for the freezing on read issue. My gut is telling me that the issue is related to a contention between interrupts which causes the TT to miss out on the receipt of the response and sit (relaxed) waiting for it.

I could be biased to think this because I’ve spent a lot of time looking here trying to figure out how to recover from calls to modules that aren’t connected:

Miss out on that return value or accidentally call to a module/output not on the bus and it is nappytime … forever. That is, at least until you power cycle. I’ve been able to tweak that library file to keep the unit from locking up, but I’ve not figured out how to bring the i2c bus back to life in a way where it will function and not lock up on future calls. My ignorance of the AVR TWI implementation (and the all over the place documentation) have kept me from solving this.

EXPERIMENT 1

This helps bolster my theory. I am able to read values for a much longer period of time if I disable the metronome and trigger the i2c input sampling from an external clock source. The TT will still lock up, but it takes a lot longer and happens more randomly than the near-instantaneous freeze when using the TT’s internal metronome interrupt. See the example video below.

I’m triggering SCRIPT 1 from the clock out of the 0-coast and am sampling its rise/fall generator via TI.IN 1 and pushing that value to CV 1 .

I’ve monitored the TXi and it gets and responds to all of the requests for data up until the moment of the freeze.

1 Like

i2c reads need to happen conditionally-- we need to be checking if the i2c write (just prior) succeeded. otherwise the system will lock. should be an easy first fix. but there are other things lurking that are clearly causing trouble.

2 Likes

So great to see you are having progress with this. I made a “yip” when I saw the notification email. :slight_smile:

Quick question: I’d played around with (and reconfirmed) that if I comment out the the tele_ii_tx_now just before the read, it will still lock up immediately during the metro event and eventually during a triggered script. Is this the write you are referring to?

Thanks!

b

1 Like

been looking into this, got a picoscope recently and remembered it does i2c decoding so thought i’d try that. confirmed first it decodes properly with orca remotes. then tried reading CV from ansible. no freezing when triggering manually (red is SDA, blue is SCL):

^ this is the last part, the 2nd byte in the ansible response to a write request which is not shown here. it returns data properly (in this case 0F). then i tried triggering it repeatedly by using WIN+1 to trigger a script and holding the buttons so it would keep calling it, and in a few seconds it froze. the read request from TT looks fine, as does the first byte sent by ansible but on the 2nd byte here is what happens:

looks like in the first graph it properly lets SDA go high after SCL is high to signal the end of transmission but in the 2nd one something pulls SDA back down and it stays down.

i’ll try more scenarios tomorrow - let me know if you want me to check anything specifically, thought i’d post this first in case it proves helpful. i’ve been reading up on i2c and looking at the code over the holidays and so far it does seem to be some sort of race condition… but that’s just a SWAG at this point.

5 Likes

I started to experiment with trying to use a bus pirate board to snoop the i2c bus last night but don’t yet understand how to use the bus pirate properly so it hasn’t yields valuable results yet. If there is anyone out there with bus pirate experience I’m all ears.

…I’ve started work on sketching out some new ansible arp ideas (configurable via tt) so I expect to be pounding on the ii code during testing. Hopefully another set of eyes on all things ii will be a positive. Now off to further familiarize myself with the code and the spec.

5 Likes

good data points. i’m on this also this week, but i must admit the issues are not straightforward so any extra opinions and investigation are most welcome

fyi, SDA/SCL pullups:

ansible : 0
trilogy: 0
tt: 10k

1 Like

also just noticed looking at the graphs there is no ACK from TT, not sure it’s related to the issues we’re seeing though, and don’t remember if it ACK'd the first byte or not - should’ve posted the full sequence, will do that once i get home.

1 Like

sorry, just an info dump at this point, going through the code and cross referencing it with the graphs.

CV 5 16383
seems fine, can’t figure out why the address is 20 when judging by the code it should be II_ANSIBLE_ADDR 0xA0


CV 5
also seems fine. TT sends II_ANSIBLE_CV | II_GET, then requests a read, and ansible sends 2 bytes. the only weirdness i see is a delay between the 2 bytes sent by ansible and TT not ACKing the 2nd byte:


CV 6 CV 5
the start is similar to the above, TT sends a request to read CV, ansible returns 2 bytes.

now the interesting part is that after TT receives the 2nd byte (same as doing CV 5) it starts a new transmission to set CV 6, but see how the 1st data package after the address package is stretched and it doesn’t transmit the clock for a while (either that or perhaps it’s ansible stretching the clock):

the behaviour with clock stretching seems pretty consistent.

2 Likes

one more observation: when trying L 5 8 : TR.PULSE I TT does send 4 transmissions. the first seems fine but in the last 3 i’m seeing similar clock stretching. interestingly enough L 5 8 : TR I 1 works (the outputs get set) even though i’m seeing same clock stretching, so might not be a factor in itself. and then if i try TR.PULSE after that outputs 1 and 3 pulse and 2 and 4 stay on.

@ngwese added your patch

pretty reliable read/writes by commenting out lines 95-102 in ansible_tt.c (on the ansible)

void ii_tt(uint8_t *d, uint8_t l) {
	// print_dbg("\r\ni2c:");
	// print_dbg_ulong(l);
	// print_dbg(" ");
	// for(int i=0;i<l;i++) {
	// 	print_dbg_ulong(d[i]);
	// 	print_dbg(" ");
	// }

but, i’m still seeing NACK on all the read requests also (with the analyzer). this is perplexing. i want to trust the ASF, but it’s worrying me.

but, it’s working, with the NACK and everything. but does not feel good.

i can still eventually overwhelm the event queue, but it takes a ton of at-once commands in succession.

hope to have more to report later.

1 Like

@tehn i’m assuming this is the patch you are talking about. i think a bunch of it is not needed so i’ll generate another more minimal PR instead.

i haven’t run into any problems yet writing from tt -> ansible during development. …going to implement a few more commands today which require more invasive changes then take things out for a test drive and try to break it.


…i had that same feeling while trying to figure out why the usb stack was locking up back a month or so ago…

1 Like

Are the reads working on the TT’s Metro event for you with this change in the Ansible?

looking at the ASF code it appears NACK on reads is intentional to stop the slave from transmitting any more. which seems to go against i2c protocol but makes sense from a practical point of view (apologies if my interpretation is naive, i have a very limited knowledge of i2c).

  twi_masterBufferIndex = 0;
  twi_masterBufferLength = length-1;  // This is not intuitive, read on...
  // On receive, the previously configured ACK/NACK setting is transmitted in
  // response to the received byte before the interrupt is signalled. 
  // Therefor we must actually set NACK when the _next_ to last byte is
  // received, causing that NACK to be sent in response to receiving the last
  // expected byte of data.