So great to see you are having progress with this. I made a “yip” when I saw the notification email. :slight_smile:

Quick question: I’d played around with (and reconfirmed) that if I comment out the the tele_ii_tx_now just before the read, it will still lock up immediately during the metro event and eventually during a triggered script. Is this the write you are referring to?

Thanks!

b

1 Like

been looking into this, got a picoscope recently and remembered it does i2c decoding so thought i’d try that. confirmed first it decodes properly with orca remotes. then tried reading CV from ansible. no freezing when triggering manually (red is SDA, blue is SCL):

^ this is the last part, the 2nd byte in the ansible response to a write request which is not shown here. it returns data properly (in this case 0F). then i tried triggering it repeatedly by using WIN+1 to trigger a script and holding the buttons so it would keep calling it, and in a few seconds it froze. the read request from TT looks fine, as does the first byte sent by ansible but on the 2nd byte here is what happens:

looks like in the first graph it properly lets SDA go high after SCL is high to signal the end of transmission but in the 2nd one something pulls SDA back down and it stays down.

i’ll try more scenarios tomorrow - let me know if you want me to check anything specifically, thought i’d post this first in case it proves helpful. i’ve been reading up on i2c and looking at the code over the holidays and so far it does seem to be some sort of race condition… but that’s just a SWAG at this point.

5 Likes

I started to experiment with trying to use a bus pirate board to snoop the i2c bus last night but don’t yet understand how to use the bus pirate properly so it hasn’t yields valuable results yet. If there is anyone out there with bus pirate experience I’m all ears.

…I’ve started work on sketching out some new ansible arp ideas (configurable via tt) so I expect to be pounding on the ii code during testing. Hopefully another set of eyes on all things ii will be a positive. Now off to further familiarize myself with the code and the spec.

5 Likes

good data points. i’m on this also this week, but i must admit the issues are not straightforward so any extra opinions and investigation are most welcome

fyi, SDA/SCL pullups:

ansible : 0
trilogy: 0
tt: 10k

1 Like

also just noticed looking at the graphs there is no ACK from TT, not sure it’s related to the issues we’re seeing though, and don’t remember if it ACK'd the first byte or not - should’ve posted the full sequence, will do that once i get home.

1 Like

sorry, just an info dump at this point, going through the code and cross referencing it with the graphs.

CV 5 16383
seems fine, can’t figure out why the address is 20 when judging by the code it should be II_ANSIBLE_ADDR 0xA0


CV 5
also seems fine. TT sends II_ANSIBLE_CV | II_GET, then requests a read, and ansible sends 2 bytes. the only weirdness i see is a delay between the 2 bytes sent by ansible and TT not ACKing the 2nd byte:


CV 6 CV 5
the start is similar to the above, TT sends a request to read CV, ansible returns 2 bytes.

now the interesting part is that after TT receives the 2nd byte (same as doing CV 5) it starts a new transmission to set CV 6, but see how the 1st data package after the address package is stretched and it doesn’t transmit the clock for a while (either that or perhaps it’s ansible stretching the clock):

the behaviour with clock stretching seems pretty consistent.

2 Likes

one more observation: when trying L 5 8 : TR.PULSE I TT does send 4 transmissions. the first seems fine but in the last 3 i’m seeing similar clock stretching. interestingly enough L 5 8 : TR I 1 works (the outputs get set) even though i’m seeing same clock stretching, so might not be a factor in itself. and then if i try TR.PULSE after that outputs 1 and 3 pulse and 2 and 4 stay on.

@ngwese added your patch

pretty reliable read/writes by commenting out lines 95-102 in ansible_tt.c (on the ansible)

void ii_tt(uint8_t *d, uint8_t l) {
	// print_dbg("\r\ni2c:");
	// print_dbg_ulong(l);
	// print_dbg(" ");
	// for(int i=0;i<l;i++) {
	// 	print_dbg_ulong(d[i]);
	// 	print_dbg(" ");
	// }

but, i’m still seeing NACK on all the read requests also (with the analyzer). this is perplexing. i want to trust the ASF, but it’s worrying me.

but, it’s working, with the NACK and everything. but does not feel good.

i can still eventually overwhelm the event queue, but it takes a ton of at-once commands in succession.

hope to have more to report later.

1 Like

@tehn i’m assuming this is the patch you are talking about. i think a bunch of it is not needed so i’ll generate another more minimal PR instead.

i haven’t run into any problems yet writing from tt -> ansible during development. …going to implement a few more commands today which require more invasive changes then take things out for a test drive and try to break it.


…i had that same feeling while trying to figure out why the usb stack was locking up back a month or so ago…

1 Like

Are the reads working on the TT’s Metro event for you with this change in the Ansible?

looking at the ASF code it appears NACK on reads is intentional to stop the slave from transmitting any more. which seems to go against i2c protocol but makes sense from a practical point of view (apologies if my interpretation is naive, i have a very limited knowledge of i2c).

  twi_masterBufferIndex = 0;
  twi_masterBufferLength = length-1;  // This is not intuitive, read on...
  // On receive, the previously configured ACK/NACK setting is transmitted in
  // response to the received byte before the interrupt is signalled. 
  // Therefor we must actually set NACK when the _next_ to last byte is
  // received, causing that NACK to be sent in response to receiving the last
  // expected byte of data.

i’m able to fix L 5 8 : TR.PULSE I by using multiple read buffers:

#define I2C_RX_BUF_COUNT 4
uint8_t rx_buffer_index = 0;

uint8_t rx_buffer[I2C_RX_BUF_COUNT][I2C_RX_BUF_SIZE];
static uint8_t rx_pos[I2C_RX_BUF_COUNT];

void twi_slave_rx( U8 u8_value )
{
  if (rx_pos[rx_buffer_index] < I2C_RX_BUF_SIZE) {
    rx_buffer[rx_buffer_index][rx_pos[rx_buffer_index]] = u8_value;
    rx_pos[rx_buffer_index]++;
  }
}

void twi_slave_stop( void )
{
  uint8_t counter = rx_buffer_index;
  if (++rx_buffer_index >= I2C_RX_BUF_COUNT) rx_buffer_index = 0;
  
  process_ii(rx_buffer[counter], rx_pos[counter]);
  rx_pos[counter] = 0;

this proves skipped commands are likely due to the fact that the buffer is overwritten while process_ii is executing. the solution can either be the above or having process_ii make a copy of the buffer data before it starts processing.

still getting freezing with CV 6 CV 5 in M script…

one more thing in the code above - added a check for rx_pos not to go over I2C_RX_BUF_SIZE

2 Likes

i haven’t gotten to that yet.

but as another data point, i can’t get non-metro command to lock up with a non-existent i2c address, ie calling II WW.POS 2 without a WW attached.

@bpcmusic it’d be crazy helpful if you could scope/analyze your extender to see if you get an ACK after a TT read request.

1 Like

well, what. ok. ah.

good find, i was thinking this but couldn’t rationalize it being an issue-- so what this suggests is the I2C interrupt (for rx) is jumping in and plowing the data before the previous interrupt’s process function has finished, augh.

i’ll stress-test with this fix and then jump into the metro issues once it feels solid (i’m keeping the metro issue separate for now)

3 Likes

It’s normal for a master receiver to indicate the end of reception by sending a NACK. It needs to do this so it can take control of the bus and issue the STOP condition. If it sent an ACK instead, the state machine of the transmitter slave may attempt to continue to drive the bus.

3 Likes

makes sense, didn’t see it mentioned in any of the i2c materials i read (admittedly didn’t go deep enough) so wondered if it was ASF interpretation of the protocol.

1 Like

Yeah, the actual I2C spec is kind of vague about it, but it’s mentioned (point 5):

There are five conditions that lead to the generation of a NACK:

  1. No receiver is present on the bus with the transmitted address so there is no device to respond with an acknowledge.
  2. The receiver is unable to receive or transmit because it is performing some real-time function and is not ready to start communication with the master.
  3. During the transfer, the receiver gets data or commands that it does not understand.
  4. During the transfer, the receiver cannot receive any more data bytes.
  5. A master-receiver must signal the end of the transfer to the slave transmitter.
3 Likes

at this point pretty sure the reason something like CV 6 CV 5 locks in Metro script is because it executes op_CV_get which writes to and reads from i2c immediately, then it executes op_CV_set which queues an i2c event. so it’s possible that the next op_CV_get (triggered by the clock interrupt handler) tries to execute while there is a transmission in progress initiated by the i2c event handler, so it ends up corrupting it and leaving TT hanging.

this is confirmed by the fact that if i add if (i2c_waiting_count) return; to tele_ii_rx it doesn’t freeze anymore (but it never seems to read either).

with the current architecture i can’t think of a way to queue reads since it can’t proceed until a read is completed, so the only option seems to make any i2c communications atomic. so i think @ngwese’s solution to disable interrupts while i2c transmission is in progress is probably the only way to handle it. either that or any interrupts should just create events and the actual execution should happen within check_events.

edit: another option is to cache CV values, have op_CV_get return cached values, queue reads as i2c events and have the event handler update the cache. this means that the values are not guaranteed to be up to date - more of an issue for TXi though, since for ansible TT can update its cache when it writes to ansible (unless CVs can change outside of TT - like maybe reading CV values produced by, say, Cycles?). either way, it’ll likely be a tug war between how accurate you want CV reads to be vs the timing accuracy of TT.

5 Likes

ok, how about this:

in libavr32/conf_tc_irq.h

#define SYS_IRQ_PRIORITY       1
#define APP_TC_IRQ_PRIORITY    2
#define UI_IRQ_PRIORITY        2

changed APP_TC to 2 (was 3)

i also removed the tele_tx_ii queue and have it sending immediately, same as tele_tx_ii_now

i’m able to do this:

M

TR 5 STATE 11

run with a metro at 30ms (this basically reads KEY 1 and outputs the state to TR 1). going well so far-- will come back in 20 minutes to see if it’s still alive.

needs further testing. i did see some squirrelly data corruption randomly in some packets (but very uncommonly)

should likely add more checks in the send/receive code in general.

update: ok it all crashed after 40 minutes, but i think the rx queuing for i2c slave (ansible) may help fix the issue-- as ansible was also crashed (which may have in-turn crashed tt)

3 Likes

for the rx_buffer fix do you want me to do a pull request? that was a quick PoC code, could also be tightened up a bit - number of read buffers should be equivalent to MAX_EVENTS, and perhaps each buffer should include a flag indicating whether it was read by process_ii already which could be used by twi_slave_rx to find the next free buffer to write to.

1 Like