(Teletype) Firmware build troubles with 2.2 (Fixed!)

I’ve been encountering problems building the firmware that I think might be related to code space.

While trying to track down why one commit wouldn’t run on the hardware, I started to add debug statements to the code to see what was happening. As I added debug statements, less of the code would actually run, to the point where it didn’t enter main() at all.

I suspect that something’s going on with linking, but I don’t know where to start. There’s no way we’re pushing the boundaries of flash storage yet, are we?

so it executes some code and then it just stops? and it stops in different places as you add more debug code? could be something very simple, i had a silly issue where in scene init i had an infinite loop… perhaps you have something similar and its execution is delayed, which would explain why it stops in different places?

don’t know enough to comment about linking but have you tried running make clean and then building again?

Yeah I always make clean. Adding debug statements eventually stopped main from being called.

Very strangely, rolling back a few commits and manually reapplying the diff fixed it so… ?

i did some poking around in the linker script, objdumps and flash module for the first time in a few years. posted some observations in the other thread. deleted them there and reiterating here in case its helpful, or anyone can clarify / point out errors.

running, avr32-size,
my understanding * the .text section is the raw code size, .data section includes constants in flash, and the .dec and .hex outputs are two different representations of the final amount in flash after optimization. [is this true?] right now that number is
0x66be0.

(updates for accuracy)

the total size for the ucb0512 is (as the name implies) 0x80000 - you can see this in the linker script. (libavr32/src/link_uc3b0512.lds)

looks like the output of avr32-size includes stuff like the scene storage struct that is placed in nvram with a gcc attribute. (i guess

increasing the scene storage too much gives you a link error:

avr32/bin/ld: address 0x8009db02 of teletype.elf section `.flash_nvram' is not within region `FLASH'

that address 0x8009db02 is greater than 0x8000000 (start of flash) + 0x80000 (size of flash). we get this nice error because we used the nvram attribute and let the linker figure things out, instead of using the flash controller directly.

its possible that there are other “loopholes” at work that i’m missing. but i think we can trust the avr32-size output, which means there is still ~100k of flash available as of now (commit #8285656 on burnsauce’s fork.) i was wrong, that total includes .bss and .data, which aren’t in flash. so i think we actually have ~128K.

2 Likes

I wrote up some stuff on AVR32 ram and rom usage a while back. Basically a brain dump of the stuff I was trying to figure out at the time:

http://samdoshi.com/post/2016/10/avr32-ram-and-rom-usage/

3 Likes

thanks sam, i knew i was forgetting / misrepresenting some important stuff. that is a high-quality brain dump.

important clarification for anyone following along:

  • NVRAM isn’t used for text or rodata, it’s defined as a dedicated flash region for arbitrary usage. the linker script defines its size with __flash_nvram_size__ (default 256K, half of total flash) and places it at the end of the flash section. (this could always be changed if necessary.) nothing will go in nvram unless we specify it with an attribute or abuse the flash controller. on teletype i think it’s only used for scene storage.
  • the summary line (berkeley-style size report) includes both flash and allocaed INTRAM. so it is actually not that useful if we just want to know how much space remains for code and rodata.

here’s example output from the buld script:

SIZE    teletype.elf
teletype.elf  :
section                size         addr
.reset               0x200c   0x80000000
.rela.got               0x0   0x8000200c
.init                  0x1a   0x8000200c
.text               0x18718   0x80002028
.exception            0x200   0x8001a800
.fini                  0x18   0x8001aa00
.rodata              0x5550   0x8001aa18
.dalign                 0x4          0x4
.ctors                  0x8          0x8
.dtors                  0x8         0x10
.jcr                    0x4         0x18
.got                    0x0         0x1c
.data                0x1000         0x1c
.bss                 0x3dec       0x1044
.heap               0x111d0       0x4e30
[debug, comment omitted here ]
.stack               0x2000      0x16000
.flash_nvram        0x2ed82   0x80040000
.debug_ranges        0x5b38          0x0
Total             0x2b493f9

   text	   data	    bss	    dec	    hex	filename
0x1fea6	 0x1014	0x45d42	 420860	  66bfc	teletype.elf

(these reports are produced by calling avr32-size -Ax and -Bx respectively on teletype.elf.)

i’d expect that the free space left to us for instructions and rodata can be found by taking the ending address of the last flash section (.rodata, so 0x8001aa18 + 0x5550 = 0x8000x1ff68) and subtracting it from the starting address of nvram (0x80040000), which gives 0x20098 or ~128K. my last estimate was off because it included .bss and .data, which roughly adds up.

does that calculation seem right now? if so, it would probably be useful to have the makefile perform it somehow.

oh, and an outstanding question i have is whether the linker will complain if we add too much stuff to .text, .data and/or .rodata. i’m 99.9% sure that it will but don’t have the patience to test it right now.

2 Likes

Have you tried removing some bits of unrelated code, and seeing if things start working again?

I solved the problem by creating diffs, rebasing backwards, and applying the diffs. There might have been something wrong in the files that were checked out. I was having issues with op_enum.h. I always make clean and run the enum py script.

Figure that it was something wrong with my git cache.

Alright, I’ve hit the same snag again. The codebase that I once built and released as Alpha 3 now does not operate on the module when I rebuild it.

Steps to reproduce:

Behaviour:

  • Firmware uploads, but teletype never starts

Notes:

  • I’ve tried this with 2 different toolchains, same result
  • I once built this firmware, as evidenced by the release of Alpha 3

Do you have the FTDI serial up and running? If so do you get any output at boot?

Naked, I get the boot message “// teletype” and the “clearing flash” message.

So I started adding debug statements to see what’s happening. I eventually track it down to handler_ScreenRefresh()

Here’s where I get stuck. The bottom of screen_refresh_live():

char s[8];
itoa(screen_dirty, s, 10);
print_dbg("\r\nlive mode dirty: ");
print_dbg(s);
return screen_dirty;

The serial console displays that dirty is 255. Back in handler_ScreenRefresh():

    switch (mode) {
        case M_PATTERN: screen_dirty = screen_refresh_pattern(); break;
        case M_PRESET_W: screen_dirty = screen_refresh_preset_w(); break;
        case M_PRESET_R: screen_dirty = screen_refresh_preset_r(); break;
        case M_HELP: screen_dirty = screen_refresh_help(); break;
        case M_LIVE: screen_dirty = screen_refresh_live(); break;
        case M_EDIT: screen_dirty = screen_refresh_edit(); break;
        case M_SCREENSAVER: screen_dirty = screen_refresh_screensaver(); break;
    }

    print_dbg("\r\nscreen_dirty: ");
    itoa(screen_dirty, s, 10);
    print_dbg(s);

I never see this debug line appear. Additionally, no further events are called.

you can use print_dbg_ulong to avoid having to use itoa.
what happens if you comment out screen_refresh_live?

If I comment out screen_refresh_live, everything works in that I can change to other modes and teletype operates correctly. Obviously live mode is never displayed.

No change in behaviour on the ulong change.

So… compiler bug? Linker bug? Stack overflow?

The variable display is the newest bit of the live screen refresh, try commenting that out?

If that does fix the issue, I’d suggest going through the pointer arithmetic in that bit of code line by line.

1 Like

Looking at the code, is vars_prev uninitialised? (edit: doesn’t matter)

yeah, didn’t mean to suggest changing to print_dbg_ulong would fix the issue, just that it’s easier to use.

really weird, i’m doing much heavier updates in screen_refresh_live for the grid visualizer and haven’t seen any issues.

vars_prev is lazily initialized (D_LIST will be true when the variable display is enabled for the first time).

show_vars will be false on first run, so none of that code is relevant.

Commenting it out doesn’t work.

This bit has changed recently too:

        else if (show_welcome_message) {
            strcpy(s, TELETYPE_VERSION ": ");
            strcat(s, git_version);
            show_welcome_message = false;
        }

Try changing it to:

        else if (show_welcome_message) {
            strcpy(s, TELETYPE_VERSION ": ");
            // strcat(s, git_version);
            show_welcome_message = false;
        }

edit: I reckon it is this, it’s trying to include your tag in the version string, and it’s ending up too long.

1 Like

No luck, same behaviour.

I’m trying to look through the disassembly but it seems that the source lines indicated in between the assembly code don’t correlate. Is that normal?

I found it.

Defined int i in one place and size_t i in the same function.

Would have thought that this would not have been a problem given my understanding of variable scope, but I guess I’ve run into this before.

1 Like