Basically, the norns is a computer that runs Lua code, while sound is (usually) then generated via the sound engine (currently these are Super Collider).
So, for example, Passerby is a synth engine that was built in Super Collider that is called up by a user generated Lua script. This can be replaced by another synth engine, given that the proper parameters and functions within the lua code are present.
So! The modularity you’re speak of is in the ability to swap sound engines in scripts, or create your own script either using a predetermined synth engine (such as passerby) or creating your very own!
Now, there is also the extra fun aspect of norns called Softcut, which is 6 simultaneously accessible record/playback buffers that can be manipulated as needed (a good example to see will be mlr, which Softcut is generally the backbone of).
If you’re curious how these engines are applied to a script, take a look over on the Library tab. examples of scripts that use synth engines are Animator and Awake. There are many more, but these are a good start.
That answer your question?