Hey @tehn - is the substitution that I suggested above proper?

I’ll give it a test Sat morning to see if helps. Without the schematic, I’m a little nervous to just start arbitrarily trying things. :wink:

Thanks!!!

b

I’ve just been playing with my TXi and TXo. Initially I was getting lockups on non-existent addresses (via an accidental TI.IN 0). I had the bus board connected with 4 longish cables. Only 1 of the cables was connected to the TXi and then daisy chained to a TXo, the other 3 were dangling loose to be connected later.

Once I removed the bus board and directly connected to the TXi and onwards to the TXo, the crashes stopped. Reading through this thread I’m guessing the increased capacitance of all the extra wires stopped the SDA line being kept high and thus failed to be interpreted as a NACK. (It’s possible I’ve got that all jumbled up though!)


Anyway if I could try and summarise where we’re at…

  1. We should probably add some sort of a watchdog timer for when things go south.

  2. Dealing with an I2C timeout.
    What are the scenarios for this occuring? Is it just misbehaving devices and NACK failures with incorrect pull ups? If so are these really recoverable from? Or instead should we BSOD (or rather YSOD) with a message indicating what happened, perhaps with the last I2C packet sent.

  3. Pull up resistor value.
    Looks like there are 2 ways to go about this. With hardware (i.e. a new bus board) and either a configurable resistance or a universal value. Or with software via each of the MCU’s configurable pull ups.

Are they the 3 major points?

here is how i would summarize it:

  • the hardware side of things - capacitance / noise / pull up values. from the article posted it sounds to me like the cable length / noise shouldn’t be a problem for our typical usage, we’re not talking long distances here, maybe 1ft at the most. a proper pull up value is definitely important, as i could get much more reliable results when using TXo firmware with the internal pull up resistor enabled. but instead of guessing perhaps we could just calculate / measure what it should be and start using it? assuming one value would work for different combinations of modules. but i think it would help greatly if we could just solve the hardware side of things first.

  • i2c libraries / recovering from corrupted communications - it looks like the avr library doesn’t have provisions for this. other libraries seem to have built in mechanisms for dealing with it and recovering from it. so we likely need to either use a different library for i2c or modify the existing one, add a timeout and a recovery mechanism (which in itself is not that straight forward, from what i read). as a side note, it seems i can lock it faster when doing a lot of reads from ansible than from TXi, and i think TXi uses a different i2c library - perhaps we can look at what’s different in the implementation.

  • the firmware code itself that uses i2c could be strengthened. this was the fix i did for ansible, increasing the number of read buffers which would prevent ansible from trying to process a command while the command itself was being changed by the next i2c transmission. some additional boundary checking can be also added to the remote command implementation.

  • even with everything fixed we can still run into scenarios where scripts put too much demand on i2c. if you have a 10ms metro script that does a lot of i2c reading/writing that can’t finish in 10ms what should it do? i think this should just be accepted as a limitation of the system (interesting question though, if a trigger comes in while a script is still executing should we ignore it? interrupt the script and start it again? make this a configurable option?).

1 Like

Sounds like a good plan. Who here is able to do that? I’m totally out of my depth when it comes to the electrical stuff.

we need to reduce the pullup value in the case of so many modules it seems. it’s 10k, fixed, on the TT right now.

activating the internal pullups does not substantially bring down the resistance, since it’s basically parallel with 100k or so (i don’t remember the exact number). i’m not sure how much the stm32 pulls up. of course, if there are a bunch of modules all pulling up together, it lowers the pullup value with each additional pull.

we need the total pullup values to be around 2k i think-- given the possibility now of adding so many.

i’ll try building an ansible firmware (and tt) with all the pullups on, and see if there is improvement.

Here is what I experience:

For 1 Teletype + 1 TXo + 1 TXi, things are solid. You can hammer it hard and you can type incorrect addresses without locking up the i2 bus. Totally fantastic!

For a 1 Teletype + 4 TXo + 2 TXi + 1 Ansible configuration without any changes to the i2c resistance, you will be able to send commands to the TXo and Ansible no problems - as long as your addresses are correct. If you type an incorrect address, the Teletype will hang. If you try reading from the Ansible or TXi, it will work sometimes - but ultimately lock-up (especially if you are reading from the M script). Crap.

I was able to make that massive configuration quite stable by activating the internal pull-ups on the TXi closest to the Teletype (and only that device). Then, it was back to the performance of the small configuration. No lock ups on writes to unknown addresses and no lock ups on reads. It is awesome!

In the “stable” states listed above, I could pound the hell out of the ecosystem in various unreasonable ways. You would start to see the throughput limitations in the extremes and could get things to start being wonky when you are trying to send and retrieve at a rate that is faster than the Teletype and bus can service. I agree with @scanner_darkly above that this should just be accepted as a limitation of the system.


I’m happy to turn on the pull-ups on the TT and see how that performs in the two scenarios. This would be the easiest solution as no one would have to modify their Teletype, install some sort of weird dongle or install different versions of the TX firmware depending on their bus configuration.

To do that, I’d love to have the answer to which to turn on. The values that @tehn provided above aren’t known by the compiler (PA09 and PA10). I’m fairly certain he meant A09 and A10, but I don’t know the pin configuration and am hesitant to just start trying stuff without that information.


Thanks so much for everyone’s interest in this. :slight_smile:

added pullups on by default for TT, pushed libavr32

here’s a hex (no presets) with pullups enabled

teletype-1.4.0-pullups.zip (80.0 KB)

1 Like

If A09 and A10 don’t work. It might be AVR32_TWI_SDA_0_0_PIN and AVR32_TWI_SCL_0_0_PIN

Those values are defined in avr32/include/avr32/uc3b0512.h located wherever you’ve got avr32-gcc installed

1 Like

It does not make a difference regarding clocking earthsea from teletype via i2c.

Don’t know if I got this right but TR.TIME 10 100 still freezes teletype:

given that you have so many devices on your bus, it’d be helpful if you tried testing some commands with less devices (ie, only two ansibles) to see if they then work, to confirm that many-devices is the issue at hand for you.

I’ll give that firmware change a spin when I land in LA late tonight. Thanks for helping!!

b

In my crazy configuration, the firmware posted for 1.4 above will hang when calling TR for nonexistent outputs. That is without any of my expander pull-ups being active.

Dang.

ok, so it’s worse it seems, which makes no sense.

i’ll try pulling on the ansible and seeing how that goes.

@tehn
I am not sure if it worse or better now - I just experienced this:

First try with the new firmware, as it was yesterday, TR.PULSE to not existing adress freezes teletype. (just tt and two Ansibles)

Then I went back to the last firmware, tried again with the same outcome. Shifting one of the Ansibles into the requested address range via the button combination and the black teletype immediately starts w/o power cycle.

NOW I reflashed the new firmware and happily shifted one of the Ansibles over the address ranges and no freeze! Even when TR.PULSE reaches for not known addresses. The only difference to the first setting was, apart from down- and upgrading, handily shifting the address range of one Ansible.

Then I got back to the firmware before the new one and it still works. TR.PULSE in a Metro Script to all address ranges and switching on Anisble back and forth through all ranges with now freezes, just different TR outs flashing.

So is it that you have to use the address range adjustment ontime on Ansible and then it can’t break anymore?

:open_mouth:

Keep in mind - I have 7-8 devices on my i2c bus. It is pretty extreme.

I went back to the latest 1.4.1 firmware without pull-ups and had the same problem. I then enabled pull-ups for just one expander (the closest TXi) and my bus is back to 100% stable.

[EDIT]

Just created a script that reads from 8 PARAMs and 8 INs, writes those values out to 16 different CV outputs and pulses 16 TRs. Running the script at 10ms intervals and having no issues. Manually calling to non-existent TR outputs while running does not lock the Teletype.


@Leverkusen - once, I accidentally activated a couple of external modules with pull-ups. In that case, I would find that the calls out would become unreliable. The TT wouldn’t lock - but some modules would stop responding. Resetting the firmware back to my “good” configuration made everyone happy.

…okay, more strange findings:

At first, using the older firmware does not work very good over a longer time, especially when you plug in the arc in one of the Ansibles.

The new firmware seems to be more stable in this. Teletype frezzes though when the script (L 1 17 : TR.PULSE I) ist still firing to a not known address range while reading out the now active Cycles on that Ansible at the same time. Just switching to Levels unfreezes it again (still adressing a not valid address range with no problems).

Now to the strange finding: Switching Ansible to address range four (pressing preset and both keys) shows irrational behaviour as at the first trial with the above mentioned script on every trigger in one of the trigger outs is firing, then the next one and so on - not all at the same time as on teletype and the first Ansible but one after the other.
On a second trial to rproduce this and a loop range with I going up to 20, trigger out 18, 19, 20 are firing, 17 not…

Unusual.

I can confirm now that everything that worked better with the new firmware and/or just two Ansibles does not anymore with Earthsea and Meadowphysics back in the case. A TR.PULSE to not known adress freezes teletype as a little more i2c activity does too.

:disappointed_relieved:

I’ve just tried a TXi firmware build with the pull up enabled that @bpcmusic kindly supplied me. That has fixed the lock ups on non-existent addresses. I haven’t given it a serious work out yet. Next up I’ll have a go at wiring up the remaining Monome modules (Trilogy + Ansible) and see how we get on.

Will new Teletypes be built with 2K instead of 10K?

Otherwise what are options for getting overall resistance down to 2K? A hardware dongle with access to a 3.3V line? Or can we get away with enabling a few additional pull ups in modules? (These would all be in parallel right?)

Do we know the pull up resistor values? I think for the Teletype and Ansible it is ~15K and for the Trilogy it is ~19K (source: page 37). What about the Teensy?

i could do that, but there’s still a pile of TT stock, and it’d be better to find a solution that works for all existent TT’s out there rather than having a split solution.

the test tt w/ pullups build was wrong. here’s a corrected version, which works well for me with 2 ansibles. anyone with a huge chain, please test this:

EDIT: here’s a version with the newest 1.4.1 features, i neglected to upstream merge first:

teletype-1.4.1-pullups.zip (83.1 KB)

3 Likes

So 10K in parallel with 15K would be 6K right? (oblig. xkcd)

Hopefully I’ll have time to give the new TT firmware a test tomorrow.

2 Likes