Teletype Firmware i2c Debugging

i’m able to fix L 5 8 : TR.PULSE I by using multiple read buffers:

#define I2C_RX_BUF_COUNT 4
uint8_t rx_buffer_index = 0;

uint8_t rx_buffer[I2C_RX_BUF_COUNT][I2C_RX_BUF_SIZE];
static uint8_t rx_pos[I2C_RX_BUF_COUNT];

void twi_slave_rx( U8 u8_value )
{
  if (rx_pos[rx_buffer_index] < I2C_RX_BUF_SIZE) {
    rx_buffer[rx_buffer_index][rx_pos[rx_buffer_index]] = u8_value;
    rx_pos[rx_buffer_index]++;
  }
}

void twi_slave_stop( void )
{
  uint8_t counter = rx_buffer_index;
  if (++rx_buffer_index >= I2C_RX_BUF_COUNT) rx_buffer_index = 0;
  
  process_ii(rx_buffer[counter], rx_pos[counter]);
  rx_pos[counter] = 0;

this proves skipped commands are likely due to the fact that the buffer is overwritten while process_ii is executing. the solution can either be the above or having process_ii make a copy of the buffer data before it starts processing.

still getting freezing with CV 6 CV 5 in M script…

one more thing in the code above - added a check for rx_pos not to go over I2C_RX_BUF_SIZE

2 Likes

i haven’t gotten to that yet.

but as another data point, i can’t get non-metro command to lock up with a non-existent i2c address, ie calling II WW.POS 2 without a WW attached.

@bpcmusic it’d be crazy helpful if you could scope/analyze your extender to see if you get an ACK after a TT read request.

1 Like

well, what. ok. ah.

good find, i was thinking this but couldn’t rationalize it being an issue-- so what this suggests is the I2C interrupt (for rx) is jumping in and plowing the data before the previous interrupt’s process function has finished, augh.

i’ll stress-test with this fix and then jump into the metro issues once it feels solid (i’m keeping the metro issue separate for now)

3 Likes

It’s normal for a master receiver to indicate the end of reception by sending a NACK. It needs to do this so it can take control of the bus and issue the STOP condition. If it sent an ACK instead, the state machine of the transmitter slave may attempt to continue to drive the bus.

3 Likes

makes sense, didn’t see it mentioned in any of the i2c materials i read (admittedly didn’t go deep enough) so wondered if it was ASF interpretation of the protocol.

1 Like

Yeah, the actual I2C spec is kind of vague about it, but it’s mentioned (point 5):

There are five conditions that lead to the generation of a NACK:

  1. No receiver is present on the bus with the transmitted address so there is no device to respond with an acknowledge.
  2. The receiver is unable to receive or transmit because it is performing some real-time function and is not ready to start communication with the master.
  3. During the transfer, the receiver gets data or commands that it does not understand.
  4. During the transfer, the receiver cannot receive any more data bytes.
  5. A master-receiver must signal the end of the transfer to the slave transmitter.
3 Likes

at this point pretty sure the reason something like CV 6 CV 5 locks in Metro script is because it executes op_CV_get which writes to and reads from i2c immediately, then it executes op_CV_set which queues an i2c event. so it’s possible that the next op_CV_get (triggered by the clock interrupt handler) tries to execute while there is a transmission in progress initiated by the i2c event handler, so it ends up corrupting it and leaving TT hanging.

this is confirmed by the fact that if i add if (i2c_waiting_count) return; to tele_ii_rx it doesn’t freeze anymore (but it never seems to read either).

with the current architecture i can’t think of a way to queue reads since it can’t proceed until a read is completed, so the only option seems to make any i2c communications atomic. so i think @ngwese’s solution to disable interrupts while i2c transmission is in progress is probably the only way to handle it. either that or any interrupts should just create events and the actual execution should happen within check_events.

edit: another option is to cache CV values, have op_CV_get return cached values, queue reads as i2c events and have the event handler update the cache. this means that the values are not guaranteed to be up to date - more of an issue for TXi though, since for ansible TT can update its cache when it writes to ansible (unless CVs can change outside of TT - like maybe reading CV values produced by, say, Cycles?). either way, it’ll likely be a tug war between how accurate you want CV reads to be vs the timing accuracy of TT.

5 Likes

ok, how about this:

in libavr32/conf_tc_irq.h

#define SYS_IRQ_PRIORITY       1
#define APP_TC_IRQ_PRIORITY    2
#define UI_IRQ_PRIORITY        2

changed APP_TC to 2 (was 3)

i also removed the tele_tx_ii queue and have it sending immediately, same as tele_tx_ii_now

i’m able to do this:

M

TR 5 STATE 11

run with a metro at 30ms (this basically reads KEY 1 and outputs the state to TR 1). going well so far-- will come back in 20 minutes to see if it’s still alive.

needs further testing. i did see some squirrelly data corruption randomly in some packets (but very uncommonly)

should likely add more checks in the send/receive code in general.

update: ok it all crashed after 40 minutes, but i think the rx queuing for i2c slave (ansible) may help fix the issue-- as ansible was also crashed (which may have in-turn crashed tt)

3 Likes

for the rx_buffer fix do you want me to do a pull request? that was a quick PoC code, could also be tightened up a bit - number of read buffers should be equivalent to MAX_EVENTS, and perhaps each buffer should include a flag indicating whether it was read by process_ii already which could be used by twi_slave_rx to find the next free buffer to write to.

1 Like

if you could, yes-- that’d save me a bit of time. thanks!

2 Likes

Without knowing too much about the specifics of the Teletype code I would suggest that the I2C interrupts should get masked once the reception is complete (I guess that’s in twi_slave_stop ?). Then the contents of the buffer should be copied to a command queue which the main code checks for new commands. That’s how I’ve done it in the Intellijel modules, not just with I2C but any interrupt-driven comms.

If the copy of the buffer is fast enough you might not even need to mask interrupts, though it doesn’t hurt…

3 Likes

i’m using an array of buffers, this way i don’t need to copy, just update the index.

@tehn pull request:

didn’t make too many changes - i was confused about using MAX_EVENTS since process_ii doesn’t use the event queue. 8 buffers seems enough, i tested with a 10ms Metro. also just incrementing the index is the best/easiest way to find a free buffer anyway, so don’t need a separate flag.

2 Likes

I’ll throw it on my (crummy) scope after work today and try to capture the behavior with the TXo and TXi.

Remember that Teletype with the fried i2c that you fixed @tehn? That’s what happened last time I did this (for the same problem). To be fair, wine was involved. I will omit that ingredient this time. :wink:

2 Likes

if you get a chance do you want to give the buffer fix a try as well?

i was out yesterday. will try your pull-- i think that might actually finish the job (along with the re-prioritization)

@bpcmusic it’s less important now-- false alarm. but it’d be interesting to see if the stm32 is the same as the avr32

2 Likes

Regarding this, and @scanner_darkly’s earlier mention about the address seeming strange: On the JF I had to left-shift the address we created in TT by 1bit in order for them to match up in terms of i2c. Not sure whether it’s avr or stm who messed up the 7bit address thing. Apparently stm packs those 7bits into the 7MSBs vs avr’s 7LSBs.

1 Like

working wonderfully over here-- but of course could use some community stress testing. but i packed it up for general consumption:

thank you all so much for helping get to the bottom of this quickly.

5 Likes

Great, thanks a lot for the heavy work and commitment - It’s late in the evening but I will give it a first try now.

:relieved:

Ansible (Levels) freezes as soon as I patch a cable into input 1 (next preset) from Meadowpysics…repeatedly…

:cry:

It also freezes in Cycles when I patch a cable from input 2 to Meadowphysics. Now after freezing a few times the CV output of cycles is heavily stepped again while it has been continuous before (after doing the new update).