I2C questions / discussions

#1

a thread to discuss various i2c related topics. for teletype i2c protocol see this thread: Teletype I2C protocol

(i’m going to try and get away from master/slave terminology here by using leader/follower instead - feel free to suggest alternate terminology!).

i was curious about this comment:

does it mean norns supports multi i2c leader environment? is that something that could be fairly easily ported to libavr32?

4 Likes

A user's guide to i2c
#2

i’m sorry, that was a really incomplete/inaccurate statement.

norns software communicates with its hardware peripherals through a mix of i2c, SPI and GPIO. but this is all done through generic linux drivers. the specific parts drivers are implemented as kernel overlays.

the philosophy in norns is to try and use standard *nix interfaces whenever possible. this was a lot of work for our linux wizards to implement but the hope is that it maximizes the usability and customization potential of the system.

but it also means there is very little possible overlap between norns code and libavr32 at the system / HAL level.

0 Likes

#3

specifically regarding multi-leader busses, i don’t think these are supported in the BCM2837 TWI peripheral
[ https://raspberrypi.stackexchange.com/questions/46200/does-the-raspberry-pi-support-i2c-multi-master-configurations ]

and i kind of think probably not in the linux kernel either.

but anyways norns isn’t set up to interface with hardware on that level. the TWI and gpio pins aren’t broken out anywhere. the idea is that all peripheral interaction will be over USB, and for i2c and CV this proably means through the crow (which will use an STM32 part.)

in general, i don’t know if requiring multi-leader bus is going to be a realistic design pattern. it’s just not that widely supported. that doesn’t mean you can’t create arbitrary networks though - it’s not a problem for a device to switch between follower/leader on the fly, and “multi-master” is only needed when you literally have two devices trying so drive the bus at once. (in a certain digital modular i helped develop in the early 2000’s, there are patterns for what to do when two of a certain device existed on the TWI bus - one would enter leader mode and the other would stay in follower mode.)

so for example, TT and crow could arbitrate who gets to be leader or just be manually told.

i’m sure this will come up soon as work progresses on crow.

3 Likes

#4

thank you, this clarifies things a lot!

re: multi-leader, for teletype there are a couple of main scenarios:

  • 2 leaders both pushing to the same follower - say, teletype and faderbank both sending to er-301
  • followers temporarily switching to leader role to push updates

the 1st scenario just needs both leaders being able to account for the bus being taken by another leader, the 2nd scenario implies that teletype would have to be both a leader and a follower at the same time. is that something the spec defines? i suppose a custom protocol could also be implemented but that means it’d have to be implemented for all the environments we support (such as teensy).

0 Likes

#5

i think in both scenarios it’s a lot more straightforward if the devices with data are followers and get polled. faderbank has to start in follower mode, teletype has to see it and forward its values (ugh, sorry), or relinquish its leader-status.

a custom protocol could also be implemented but that means it’d have to be implemented for all the environments we support (such as teensy).

well if we’re talking about a protocol on top of i2c, then that’s fine. it will be a little C module. usually “teensy” means arduino.

[quote=“scanner_darkly, post:4, topic:13624”]
both leaders being able to account for the bus being taken by another leader, the 2nd scenario implies that teletype would have to be both a leader and a follower at the same time. is that something the spec defines?
[/quote]

yes, these are situations the multimaster bus mode is designed to address - when devices want to switch between sending and receiving at arbitrary times. the problem is, it’s a lot of extra complexity at the circuit level when you are implementing those peripherals, and so a lot of parts makers just don’t support it, and it’s been sort of a death spiral for that functionality.

a) Being able to follow arbitration logic. If two devices start to communicate at the same time the one writing more zeros to the bus (or the slower device) wins the arbitration and the other device immediately discontinues any operation on the bus.

b) Bus busy detection. Each device must detect an ongoing bus communication and must not interrupt it. This is achieved by recognizing traffic and waiting for a stop condition to appear before starting to talk on the bus.

a) is the one that is a real pain to design for, if i understand correctly

0 Likes

#6

polling would definitely be easier to implement, but i wonder about the performance impact (and the opposite side of that, latency). i2c interrupts are also given the highest priority right now (since it’s not able to recover from i2c failures, so making sure no other interrupts interfere with it) so they’ll mask everything else. could be okay but might also slow things down i guess? since timer interrupts will also be masked.

yeah i meant making changes to the protocol itself, assuming it wouldn’t support such scenarios.

i didn’t realize some of the spec was implemented in the hardware layer! does it mean that a multi leader mode is not feasible at all? or possible but would require reimplementing it in firmware?

0 Likes

#7

Props!

Also, just wanted to say thank you for the effort to bring a bit more thoughtful organizational approach to this expanding world of i2c!

4 Likes

#8

(first - take this all with a grain of salt - it’s been a while since i really looked at it.)

yeah, it’s typical for the arbitration logic to be implemented in silicon, it has tight timing requirements and is hard to do correctly just in software.

for example, last time i checked the Wire library is actually kind of broken for this mode, because it produces an error when a leader loses arbitration on a multi-leader bus. (losing arbitration is part of normal operation on such a bus; the correct action is to immediately go into follower mode - because maybe the arbitration winner is trying to send something to you - wait for a STOP condition, and retry with the same data. you can see right away why its a good idea for the peripheral circuit itself to have its own little TX memory queue and logic mechanism.)

the uc3a and uc3b used in aleph / teletype work with multi-leader mode just fine. 8-bit avrs are not always great, and arduino libraries are not always great. i don’t know about the STM parts used in mannequins stuff or the NXP parts used in teensy.

the key to making i2c work with just two wires is that every output on both busses must be open-drain (so that a follower can perform “clock stretching” by holding SCL low, and to make arbitration work*) this isn’t standard for GPIOs so you may or may not be able to implement i2c purely in firmware on a random mcu.

*the arbitratration mechanism is pretty clever. say two leaders start transmitting at the exact same time. it won’t actually matter until they try to write different states to the SDA line. since the spec says it must be open-drain, any device can always force it to be low - so the leader currently trying to send a 1 can immediately check and see that SDA is low when it “should” be high - at this point that leader loses the arbitration. neatly, no data has been corrupted. (it will also work if one device is just slower - it will hold SDL low and win the arbitration that way.)

2 Likes

#9

Definitely interesting discussions to be had. Shared leadership (hah!) is certainly a key design goal for crow, but also with future expansions to the W/ i2c environment. I’ll be spending some time with @tehn next few days, so will both be sure to put some thought cycles into this.

Will have a look into the deeper i2c capabilities of the STM parts next week. I know the library is very happy to switch between tx/rx, but also leader/follower, so it logically infers multi-master support. From the sounds of it though, actual hardware testing is clearly necessary!

//

This said, this conversation is certainly straying from ‘address space and conventions’. Perhaps we should split these into 2 separate discussions.

2 Likes

#10

of course now that i had the realization it makes perfect sense.

@bpcmusic mentioned faderbank causing tt to lock when both are running as leaders on the same bus. i’ll try running a test with 2 teletypes and see if that works.

that’s great to hear! i’ll be happy to help if needed but at this point i really need to familiarize myself with the spec before i’m able to contribute (and saving all my cycles rght now to get 2.3 release ready). i’m hoping we could also improve the current libavr32 implementation so that it can recover from failures, this would be a big win regardless of multi-leader support. agreed we should probably split into a separate topic!

as a side note, thank you for going along with the proposed terminology, i’m not sure “leader/follower” is a good replacement but it’s at least an effort to get away from the dreaded m-s (although it’ll be understandably hard trying to use it when all the documentation and online discussions still use the old terminology).

1 Like

#11

by all means split the thread if it makes sense. seems to me important to lock down the device / library capabilities at hand before even thinking about protocols.

if faderbank is using the i2c_t3 library like telex, then this excerpt from their readme seems important:
[https://github.com/stevenvo/arduino-libraries/tree/master/i2c_t3]

I2C_AUTO_RETRY - this define is used to make the library automatically call resetBus() if it has a timeout while trying to send a START. This is useful for clearing a hung Slave device from the bus. If successful it will try again to send the START, and proceed normally. If not then it will exit with a timeout. Note - this option is NOT compatible with multi-master buses. By default it is enabled.

the resetBus() function of this library attempts to “unhang” follower devices by sending a bunch of clocks on SDL. and it will presumably regard a lost arbitration as a “hang.” this is indeed very broken behavior for multi-leader scenarios.

it looks to me like the basic approach is same as Wire lib (which makes sense) - basically waves away multi-leader support - when it’s in leader mode and a tx/rx doesn’t complete as expected, it just raises an error - so presumably expects the programmer to deal with lost arbitration somehow in the error callback. which seems like a mess.

emphasizing: a “multi-leader bus” is just one where all devices on it respect the protocol for bus arbitration. which is a lot of extra work so when it is supported there is usually some kind of mode switch for it in the i2c peripheral or driver or whatever.

for the teletype, i’ll have to look at the ASF i2c drivers again. multi-leader is explicitly supported at the peripheral level so guessing it will be straightforward.

1 Like

#12

I2C is pretty well defined already right?

I2C Primer

0 Likes

#13

There’s a big difference between specification and implementation. Not all specs are equally simple to implement. You don’t do work until you need to…

0 Likes

#14

yeah, not every implementation will implement the spec fully (and the spec defines both “must” and “should”, and some things are left intentionally open for interpretation).

having said that we are mixing topics a bit, I2C layer implementation in hardware/firmware, and teletype protocol on top of that. since most posts here are about the former it’s just easier to move teletype protocol into a new thread:

we can keep this thread for i2c questions/discussions.

1 Like

#15

ok so i did this a bit.

for sure the TWI peripheral in the avr32 performs arbitration correctly. page 236 of the datasheet is extremely informative:

most relevantly, there is an ARBLST bit in the status register that the user is supposed to check after each attempted write. when arbitration is lost, the peripheral saves the data that would have been transmitted and sets that bit. so far so good.

but, looking at the TWI module in the ASF (libavr32/asf/avr32/drivers/twi.h, twi.c), the C-language support seems incomplete. there is a TWI_ARBITRATION_LOST error code defined, but it’s never used and …AFAICT… the relevant status bit isn’t checked (i could be missing something.) so i don’t think it will work “out of the box” without adding to this driver.


looking at this stuff, (and for whatever reason it’s been bugging me today), it is all coming back like a bad dream. the problem with multi-leader is that checking the arbitration-lost state is inherently unreliable, because it relies on detecting when SDA is low but should be high. with the open-drain design, it’s all too easy to have a spurious false negative, due to integration times, wonky levels, &c - it works fine if you are just waiting for rising-edge interrupts on SCL/SDA, but there’s a good chance that SDA won’t quite be high enough at the exact moment when you read it after a write.

so when you do try and correctly perform lost-arbitration behavior, you open up a huge headache in the form of this error condition where the leader keeps giving up on writes because of spurious lost arbitration. this is why you get ugly kludges like the i2c_t3 library “bus reset.”

also, the odds of this feature working perfectly are much worse when you have an ad-hoc network of devices connected by DIY cabling and who-knows-what bus terminations…


so long story short, i’m skeptical that a multi-leader system can be reliable, and it’s not going to happen without some work on pretty much every platform. (which doesn’t mean it’s not worth trying.) in real systems, i have never seen it - there is always a workaround involving multiple potential leaders using some message protocol on the broadcast address to figure things out. (that’s probably all i can say without getting into protected IP danger zone.)


@corpusjonsey
it’s been said but i’ll emphasize: this kind of discussion is necessary exactly because the i2c spec is not a law of nature and implementations diverge. in fact the very first post is about the fact that avr32 devices have a more limited address space than the 10 bits dictated by the phillips/NXP i2c spec.

there’s a historical reason for this particular mess: i2c is not a public standard, and phillips tried to monetize it through licensing in the 80’s and 90’s. many manufacturers, including Atmel, chose to inplement very similar but not identical protocols that are typically subsets of i2c by a different name - Atmel calls theirs TWI for instance, but it is the same thing in most respects.

so, this is how we navigate this perplexing territory: by sharing our real-world experience. which, taken collectively, is actually kinda extensive by this point…

8 Likes

#16

thank you - i feel like each one of your posts is saving me two weeks of research. really glad we have a thread to accumulate all this info.

2 Likes

#17

Was doing some ref manual scanning, and good news on the STM side is that multi-leader is implemented on the hardware level. The documentation suggests the handling of a lost arbitration is automatic and even if the losing-leader is the addressee of the winning command, it will still ACK and be able to receive. On the library side it’s not immediately apparent to me how such an arbitration loss is handled, but I’m confident it can be dealt with after we have a good test bed for it.

Personally I’m planning to get into the inter-W/ communication over the next 2-3 weeks, so I’ll use it as an opportunity to test a nice closed-system version. I figure getting 2 identical chips (and firmwares) to share the bus is a safer starting point than the full TT ecosystem.

Will report back, but if all fails a broadcast-leadership-request approach seems like a good backup plan.

4 Likes

#18

i was wondering about this, since we’d want to support this case if we go with the push workflow (which i think is still preferable to pull).


so, does arbitration only happen when 2 leaders somehow start at the same time? from what i’m reading, normally devices should be able to detect when the bus is busy (i assume either by tracking a start condition or any activity on the bus until a stop is received). how reliable is such detection? edit: sounds like there are some additional conditions that need to be accounted for: http://www.robotroom.com/Atmel-AVR-TWI-I2C-Multi-Master-Problem.html

i’m thinking, if it’s just the case when 2 leaders somehow start simultaneously, wouldn’t it be easier to address arbitration by both leaders issuing a stop and withdrawing and then trying again after a randomly selected interval?

i think it’s also worth thinking about it within the context of overall reliability, which i guess comes down to 2 scenarios, some device stretching the clock indefinitely due to some bug in implementation - this is probably not recoverable without implementing a timeout in clock stretching in all devices, and some device pulling down SDA and missing the next clock, which creates a problem where it needs to somehow differentiate between a stuck follower and an arbitration in progress. which could also be addressed by a leader detecting it and issuing a stop (which hopefully will reset the follower) and then trying again at a later point.

0 Likes

#19

yes. and as described above, the spec is really quite clever. it assumes the worst case: that not only do two masters start at precisely the same instant, but they send data that is bit-for-bit identical, up to some bit. as soon as the data differs on one bit, or the devices drift out of sync, then the device sending 0 (or the slower device) wins arbitration.

the problem is the gulf between the spec and the real world. b/c of the open-drawin requirement, the unspecified bus termination circuit, potentially long cable runs, &c, i2c clock and data edges are never very sharp. plus, many devices are designed to use either 5v or 3.3v high values. it all adds up to this:

(with variations) being a very, very common scenario.

this general business of “randonly retrying” is hard to get right and has no deterministic guarantees of actually working for every scenario. so engineers are correctly skeptical of such approaches. you could keep “retrying” foerver and if you’re unlucky then your device is stalled.


anyways, i think this is where we’re at so far, please correct me:

  • avr32 fully supports multi-leader arbitration in silicon. but the ASF driver doesn’t support it out of the box, AFAICT.

  • i’m assuming teensy parts also fully support it in silicon, because they are NXP parts and NXP owns i2c! but the i2c_t3 library definitely won’t work with multiple leaders out of the box, and i’m going to go out on a limb and guess that it’s not a simple matter of changing some #defines.

  • sounds like the STM situation is similar. i think most of these libraries just punt on this feature because it’s rarely used, has a lot of failure modes, and is very sensitive to circuit design decisions. (in other words, it’s a tech support nightmare for library providers.)


btw, it looks like ChibiOS has particularly full-fledged support for these features. might be worth taking a look…

1 Like

#20

retrying at a random interval would work with the assumptions that the bus is normally not too busy, each communication is fairly short, and all leaders are capable of properly detecting when the bus is busy. it feels like a simpler way to deal with arbitration than the official mechanism, which i agree is smart in that it keeps the winning communication uncorrupted without the need to resend, but relies on other really smart bits such as the losing leader still being able to respond to a communication that is already in progress (assuming that’s who the winner is trying to communicate with).

i realize though this and smarter ways were likely considered and for various reasons abandoned - and i appreciate you taking the time to offer detailed explanations.

random retrying wouldn’t address hanging devices holding the lines in any case, which i still think is the main problem. i’m much more surprised the above libraries don’t seem to support graceful recovery.

0 Likes