So I have a question that’s kind-of-tangentially-related to audio programming in C/C++: I’ve been hired by my university to develop an application that allows playing with various kinds of pattern-generating algorithms that work on 2D grids (spiking neural networks, cellular automatons, various dynamical systems), to connect them to each other, and to interact with the outside world via OSC. We’re a small development team of two, currently at month one of the development cycle. I’m in charge of writing the backend, which is C++. The front end is QML / Qt.

I’m having a hard time finding a reliable way to run the algorithms at a fixed frequency, without dropouts or jitter, not because the algorithms are specially heavy, but because scheduling the execution of a routine at precise time intervals is tricky. We aim to run the algorithms at 10-120Hz. At the moment their execution happens on its own thread, in a function that calls itself back using a Qt singleShot timer. This is a pretty terrible solution, as this timer isn’t built for a precise task like this, and only guarantees the actual time interval will be at least equal to the requested one. Resizing the window causes intense lag, having the application lose focus for a while brings it to a grinding halt (a 30 millisecond timer suddenly turns into a 8 second timer), and generally having many things run in parallel makes it very jittery. I’ve been wondering if there were some techniques we could borrow from audio programming to have a thread execute code at a precise, reliable time interval.

This thing is open source, so I’ll be happy to share it here once we have a first version to relase.

No time for a longer answer right now but any callbacks from the UI thread happen in the UI thread context and will be blocked by other UI activity. What works better is for the background “realtime” threads to use the audio sub-system callbacks for their timer (e.g buffer ready or whatever exists within your audio framework) and the UI updates only the control values those threads compute from. If your models are particularly heavy they can do some precomputation or parallel processing of the non realtime stuff in yet another thread, using similar shared memory patterns for signalling the result of their work to the realtime thread(s). Think of each thread as async with respect to the others - if they depend on precise linear flow of data then they can’t really be threaded since they are locked together. But if the model does long running processing that results in a change every x buffers on average the audio thread can still happily clock out those x buffers before the model thread needs to complete its work and deliver the next batch.

Cross thread communication is a lot like interprocess communication and there are many strategies (and pitfalls) inherent in the process but if you start thinking of it in async layers and visualizing the data flows then you can start to see how even though each thread is kicked off by it’s own timer (or mutex/semaphore/queue using the OS thread tools etc) they can still end up in sufficient sync. I can elaborate more later.

3 Likes

also curious about this as a definite non-expert here

I’m starting on a GUI-centered music thing soon and I’m considering building audio and GUI in separate applications communicating over stdin as a seemly reasonable way to avoid these sorts of problems. though I might be inventing new issues when it comes to displaying waveforms

Without knowing much about your program’s architecture, a single thread that serves as a global timer and triggers execution of each of your algorithms sounds like the way to go. This can be as simple as a while loop with a nanosleep, or a while+for loop with a nanosleep if you have to enumerate cycles within a period. You should avoid any blocking mechanisms or calls within this central timing loop.

If your timing needs are exceedingly strict, you may need a hardware clock solution that communicates to userspace from the kernel (and maybe a real-time operating system).

5 Likes

I had something like that in mind — If I can hook up to some sort of audio callback that fires at very regular intervals, that’s a really great starting point.

I’ve been wondering how feasible it would be to multithread this. There are many use cases where the models may get very CPU-taxing, and in some of these cases they might be running in parallel, not feeding into each other. Assuming things can be kept in sync (which is a pretty big assumption), this would be a good target for multithreading. We could even allow complex connection graphs to be completely flattened to parallel execution by adding unit delays everywhere (although I don’t really know how I feel about that). I’ve also considered an algorithm for coordinating the multithreading in connection graphs that have a mix of parallel and linear flow, using some kind of mutex-ed “todo” queue, a coordinating thread, and a set of worker threads. Maybe this is not really practical or just overkill, as it just complicates the problem I have right now. The answer might differ depending on the desired refresh frequency.

Yes! Would love to hear more about that!

Each generator has its own set of properties, along with methods for computing its output, initializing it, resetting it, eventually connecting it to other generators, etc. A thread runs the UI, allowing the user to change the parameters of each generator. Another thread acts as some kind of compute engine — it is dedicated to executing the algorithms at regular time intervals, using the parameters each generator holds. (At the moment there is still some non-zero possibility of the UI thread changing some parameters that causes some memory to be deallocated or values to be overwritten while the compute tries to read into it, resulting in a crash. We need to add some kind of buffer that only lets parameter changes through right before the compute engine starts doing its thing.)

I haven’t tried a simple sleep loop — it might work better than Qt’s timers, as I think it might have some extra signaling / event queue overhead. I really should have done my homework properly and tried this already — I’ll get to work and report back!

Another pretty important detail is that we are aiming for a cross-platform release, including macOS, Windows, and Linux, so this factors in to choosing libraries.

1 Like

agree with @murray. also i would guess that using a timer in itself is just fine unless your timing requirements are really strong. (you didn’t really say who is receiving the OSC.) but as pointed out, this isn’t really a problem with timer resolution per se.

in Qt, you want to create a timer with Qt::PreciseTimer property, and move it to a dedicated QThread.

your QML stuff is going to be one bottleneck, so timing-wise the main thread really must be isolated from the “realtime” stuff (no locks on shared resources, &c). use atomics to write parameters from QML, or look into things like boost::lockfree::spsc_queue to get data from UI to the “realtime” thread.

the OSC tx side could itself be another bottleneck depending on how much data you’re sending on each tick.

in any case, the Tx should happen with stored outputs from previous tick every time the timer thread raises a Tx event (which in this case could be something as simple as an atomic bool.)

after tx’ing last frame’s data, start working on new frame. none of this should happen in Qts main thread. whether its more effective to have two threads:

  • timer
  • tx
  • work(ers)

or just

  • timer
  • tx/work

depends on stuff like parallelizability (?) and size of working memory.


i would probably consider just doing the timing and calculation stuff in a separate process, not using Qt at all. for that. the simplest answer is to just use std::this_thread::sleep_for(std::chrono::nanoseconds(...)) on a dedicated timer thread. i would only get more complicated than this if it is shown to be necessary in a stripped down test case with appropriate threading architecture. (the next level of complication would be to manage the timer thread’s priority, which is not a capability of the c++ standard library.)

my knee-jerk reaction is that using an audio timer for this feels like overkill, especially because your sink is a network endpoint and you’re not going to get sub-millisecond accuracy there anyways. (of course the balance of that consideration changes if you are already running a low-level audio process.)

2 Likes

So it turns out I wasn’t using Qt timers properly — I was creating a QThread for running computations, reimplementing the run() method to do the maths then call itself back through QTimer’s singleShot method, but I didn’t realize that timer relied on a Qt event loop to be executed — my thread didn’t run one, so it defaulted to the GUI thread’s event loop. So only the first execution of the computations actually used its own thread — the subsequent ones ran on the GUI thread. This explains the computations getting laggy when resizing the window, and the input locking up when the algorithms were taking too long to execute.

I’ve tried two ways of dealing with this problem.

#1 is just using nanosleep / usleep in a loop inside the reimplemented run() method of a QThread, as @murray suggested. This works better than what I had perviously (it doesn’t block / get blocked by the GUI), but it still stalls entirely if the app loses focus (isn’t the topmost / active app) for a while. And it also crashes after a random amount of time (30s-10min).

#2 is running a second Qt event loop on a QThread, and properly implementing the solution I initially came up with. I believe this is also what @zebra suggests. This implies extra overhead, but being able to use signals and slots could simplify inter-thread communication, at the cost of performance. This one also randomly crashes and stalls after a while.

I believe the crashes are caused by threading issues. However I have mutex-protected all sensitive sections that could allow the GUI thread to change parameters of the algorithms while they are running in ways that would break them. The crashes don’t happen when parameters are changed either. From what I gathered by printing things to console, generator objects just seem to get randomly deleted just before the compute thread tries to calculate their output (the compute engine tries to call the method of a deleted object designated by a pointer). There is probably something fundamental I’m misunderstanding about threading with Qt. At the moment, the objects that hold the data for each generator live in the main GUI thread. My gut feeling is that they probably need to live on the compute engine thread, but then, it becomes impossible to directly expose their properties to QML as it doesn’t support threading. I would need to create some kind of proxy / facade for each generator that exists on the GUI thread and communicates to the actual objects existing in the compute engine thread through signals. Kind of an ugly and redundant design. Maybe there’s a way to streamline this using Qt meta-properties. Maybe this isn’t actually the issue here.

That still leaves the stalling issue to be solved.

Part of me really is attracted to that because Qt seems to bring in a bunch of unnecessary overhead (and weird thread-unsafe behaviour) there. I’ve never had this kind of trouble with std::threads. Part of me dreads having to come up with a way of exchanging data between the two processes, as it will keep getting more complex and redundant as we add more generators and functionality.

I think I’ll give the 3-thread approach a try once I have the basics working. I don’t know a ton about OSC but it seems like separating the time-sensitive computations from IO tasks is probably a good idea and a relatively easy target for parallelization.

Good point! We don’t plan on running audio alongside this, so doing this would involve some extra work.

Ah, I missed that (since this is a lower-level audio programming thread… ahem :wink: ).

(To be clear: I thought your “tangentially related” reference was in that you had a QT thread running alongside some audio… not that there was no audio at all. )

1 Like

Just wanted to say thanks to @equipoise, @murray, and @zebra for the design advice. It turns out QObjets generally don’t behave in a thread-safe manner, so I needed to create a kind of Facade class assigned to the GUI thread which QML can use to indirectly interact with the Generator objects. It works without a crash now, and Qt::PreciseTimer seems to do the job, although I still have to figure out why it stalls after a while when the application loses focus. Implementing the OSC stuff on its own thread should be straightforward now that I know how to do threads with Qt properly, and given that Qt’s signalling system is really well suited for this.

1 Like

is this macOS by chance?

(assuming yes…)

since Mojave, doing realtime-bounded work in background has been painful. that’s one reason i would divest the RT stuff from the UI stuff by putting it in a completely different process. and if you can, control the RT process using launchctl, making it a direct child of launchd

if you do figure out a way of keeping the RT work thread healthy under Catalina, please do share!

1 Like

This is indeed macOS (Mojave 10.14.5). I don’t know about the behaviour of the program on other platforms.

I’ve tried researching the topic but it seems this is not a very well documented issue. I might just have the wrong search terms. It seems macOS has strong opinions on what threads should be allowed to run at any given time.

Knowing that splitting the code into two processes should likely solve the issue makes that option a lot more enticing. Do you know of any good library that would facilitate inter-process communication / shared memory without bloating our code too much? I’m worried things could get really complicated, because we will be wanting to share multiple relatively large (100-1000ish elements) matrices / arrays of floats / doubles to the GUI at a good framerate, and also because the required amount of shared memory wouldn’t be fixed since we need to be able to add and remove generators.

You probably want to use QAudioOutput where you can set the buffer size and sample rate etc. Then you need to implement QIODevice which will provide the audio output with samples when it requests them.

1 Like

It looks like what I want to do is disable timer coalescing («App Nap» as Apple calls it) for the process I’m running. I’m able to prevent stalling by disabling it system-wide using sudo sysctl -w kern.timer.coalescing_enabled=0. It looks like there might be a way to do that using IOPMLib.h according to this stackoverflow post. I’ll keep digging and report back!

1 Like

I’ll keep that in mind as an alternate option, this could probably work well too.

depends on tolerance for bloat and whether you need windows support.

i wish i knew a perfect answer, but have used all these:

  • POSIX shared memory (shm_open(), etc), with a unix socket for signalling. this is actually pretty simple, but presents challenges on windows.

  • boost::interprocess, which is robust but IMO more cumbersome.

  • gRPC, which is elegant in code but sort of an opinionated PITA when it comes to build system integration

  • zeromq and its cousin/fork nanomsg, which are socket abstractions. (don’t give you a data protocol, but guarantee atomic and sequential messaging.) the main problem with these is that i’ve found platform/language library implementation to vary somewhat in quality.

  • if it’s really macos specific, you might be best served by using native XPC and “distributed object” systems, but requires wrapping those bits in Swift or objc++.

i guess i’d recommend to check out the first option if windows isn’t on your plate, see if it feels appropriate. (mainly because you mentioned you have a relatively large amount of IPC data.) but i’m kinda hoping others have better ideas.


thanks for the tip on “app napping.” i think i’ve tried relevant uses of NSProcessInfo in the past, but will keep it in mind as something to check again.

1 Like

Thanks for the insight! I’ll keep that in mind if I go ahead with a split process design. For now I’ll try to get app nap to behave and see how robust that is.

Found a solution to the App Nap problem! See this stackoverflow thread. It essentially involves creating a small Objective-C program that makes the right calls to the OS to disable App Nap. Hopefully that solves some of the struggles you had with realtime C / C++ on macOS!

Added in the code in case it ever vanishes from stackoverflow, just to futureproof

AppNap.h

#pragma once
#if !defined(__cplusplus)
#define C_API extern
#else
#define C_API extern "C"
#endif
C_API void disableAppNap();
C_API void enableAppNap();

AppNap.m

#include "AppNap.h"
#include <Foundation/Foundation.h>

static id activity;

void disableAppNap() {
    activity = [[NSProcessInfo processInfo] beginActivityWithOptions:NSActivityLatencyCritical | NSActivityUserInitiated reason:@"Disable App Nap"];
}

void enableAppNap() {
    [[NSProcessInfo processInfo] endActivity:activity];
}

main.cpp

#include "AppNap.h"

int main(int argc, char *argv[]) {
// ...

// disable App Nap
disableAppNap();

// ...

// enable App Nap
enableAppNap();

// ...
}
1 Like

thanks!

checked this out… but yeah, already disabling app nap through NSProcess.

what i’m seeing is something else, and it’s both less extreme and more aggravating.

affects ProTools and Max/MSP in addition to my own audio apps. the OS appears to be throttling CPU time allocated when these programs are backgrounded after long uptime. the artifact is rare and hard to reliably reproduce, and not all machines are equally affected, but the best way to induce it is by creating memory and GPU pressure. seems to be a new thing with Mojave and Catalina, can’t reproduce on 10.13.

effect consists of repeated buffer underruns, which stop when the program is foregrounded. only workaround i’ve found is to split audio threads to separate processes.

anyways, i suspect this wouldn’t be an issue for “soft-realtime” applications.

That sucks. I am no macOS expert but my intuition tells me there maybe is a way to trick the OS into thinking the process is active / being interacted with using some of Foundation’s functions / signals periodically.