Music data repositories (MusicXML, lead sheets etc.)

#1

Hope this isn’t too specialized/OT…

I’m looking basically for “lead sheets” [notes, key, harmony] preferably in MusicXML format. I did find an archived version of the old Wikifonia database (http://www.synthzone.com/files/Wikifonia/Wikifonia.zip) as well as a slightly smaller set with the chord/key features already extracted from the MusicXML, following the paper: https://arxiv.org/pdf/1712.01011.pdf at location http://marg.snu.ac.kr/chord_generation/.

Some context: my goal is to build real-time key/mode identification (mainly Greek modes) as well as a basic harmony tracker (just core triads: major, minor, diminished, augmented, suspensions…).

The Wikifonia source represents a modest amount of data (~5000 files), more than enough for Bayesian methods (e.g. https://www.microsoft.com/en-us/research/project/mysong-automatic-accompaniment-vocal-melodies/) but I fear not enough for really effective deep learning approaches, which is ultimately what interests me. So if anyone has connections in the MIR community and knows of other “lead sheet” type sources it would help!!! I’m happy to share the results of any code or externals I’ve developed thus far [as soon as I do a badly needed cleanup]

I’m NOT looking to create a full auto-accompaniment system, least of all a “virtual Bach” etc. because that in the end would be boring, particularly so in the sense that simulations always fall short.

My actual goal [and this is obviously a personal project] is to incorporate key and harmonic progression as contextual variables in a mix/match constraint programming environment that allows a sequence to be morphed in various ways to accompany another sequence. If one of the sequences is “played” (on a keyboard, grid etc.) this sets up an interactive environment where one reshapes the second sequence within certain harmonic/voice leading/contrapuntal “rules” – which can certainly be fictional [think a very general kind of arpeggiator].

This follows exactly along the lines of what I was discussing in the “definition of music” thread, to suspend the truth-claims of theoretical/definitional elements [in this case very traditional, Western ones] and treat them instead as material to be mutated/recombined within the physicality of the interface. Basically a fully embodied and exteriorized, musi-fictional roleplaying environment.

Of course there’s no reason that one operator needs to be a human and the other machine, any combination is possible and more interesting things happen with actor-networks arranged in very complex chains. It’s all about what kinds of crazy structures emerge.

Traditional, anthropocentric notions of “composition” and “performance” disappear [nothing exists as “score”, as representation or preconceived idea; there’s also nothing that a performer can think in advance to express] and are replaced by a process of individuation, from which actualizations (recordings/renditions) are only “snapshots” – the process is always unfolding and never quite completes itself.

[prior work: A few months ago I created a key/mode identifier based on a probabilistic interpretation of the Krumhansl/Kessler algorithm, and tried to extend it to all Greek modes, but it doesn’t work quite well enough (still too much latency) so I don’t use it that often. I fall back to the simpler interactive/reharmonizing transposers/arpeggiators. I think for what I do, embedding an intermediate layer of structure (harmonic progression) would somehow constrain possibilities in such a way that high-level structures (key/mode modulations etc.) would become more identifiable. But I’m still deciding on the best machine learning approach, which depends on the amount of data that is actually available.]

1 Like
#2

ive seen a lot of music/probability studies using the Kern corpus
http://kern.ccarh.org/

not in musicxml though, youd need to convert from humdrum format. which is an encoding specifically for this kind of work
http://humdrum.ccarh.org/

1 Like
#3

Awesome, thanks so much! this is a lot more data for sure!!!

i also found through these links an interesting “bootstrap analyzer” called melisma: http://www.link.cs.cmu.edu/music-analysis/… this can be helpful if I need to get into the deep learning stuff (as a source of initial or fill-in labels)

#4

Is this something similar to what Synfire does? Are there open source alternatives?

Love the real-time idea, rooting for you!

1 Like
#5

Synfire does look similar in a number of respects, thanks! Something else of which I’ve been totally unaware!

Re: open source I don’t know of anything else, but I am definitely planning to open-source these tools (all Max externals) once they reach a certain maturity.

I also should mention the as-yet-unimplemented “constraint programming” part is inspired by the great series of articles by lines member @josephbranciforte; in particular:

I’m excited when I can move on to this part (also looking into the Strasheela environment of Torsten Anders)

Upon a closer look at Synfire my goals may be a bit different though.

First, rather than fix up an existing performance to become more “virtuosic” (and thus fully constrained within, or normative to ) my aim is to create musical theory-fictions and roleplaying environments, where theory is treated as material, on par with what the theory takes as its object (notes, rhythms, harmonies etc.). The outcome is a product of the environment as a whole and generally cannot be interpreted solely in terms of theory nor in terms of a performer’s intention. Being fundamentally non-dialectic, it is least of all an “innovation” upon theory.

By analogy, a performer’s initial intention is a “left-eye image”, the “role” defined by music-theoretic elements a “right-eye image”, and the result (after both agents overcome this metastability) resolves into a new structure, the “3D depth” – not an exclusive property of either agent. Something that emerges with its own informational structure.

Or more simply, think of two people having a conversation, everything from the physical communication channel to the context which generates meaning is formed. The conversation begins with an initial metastability (we want to talk about different things) but this either resolves in new and unforeseen ways, or both parties become frustrated. Resolution is especially likely with in-person conversations.

All these are reasons why real-time operation is absolutely necessary, emergence or individuation simply does not come about otherwise.

Second, in a technical sense, I’m focusing exclusively on binary operators, in which one sequence (say, what a performer “plays”) can reshape another sequence (say, a loop playing from a sequencer). This allows for complex networks of humans and machines. With unary operators (a sequence being transformed or “fixed up” into another sequence) only linear chains are possible, and I don’t see that being as interesting.

Third, since the outcome is uninterpretable in terms of the performer’s “intent”, or whichever elements of music theory are incorporated, the theory-fictional or roleplaying environment effectively folds elements taken to transcend musical practice back into immanence, which is also fundamentally an animal relation. The human animal cannot ever get outside of the environment to think it let alone master it; she can only act within it and assist it, hopefully responding simply and poetically to bring what is coming forth fully into its own. In many ways this introduces a fundamentally different kind of thinking than what is possible within traditional categories of “composition” and “performance”.

Hopefully, this is a project that affirms our post-humanity, but in ways that are diametrically opposed to the “billionaires living forever having drone-catered pool parties in space” transhumanism of Elon Musk, Kevin Kelly, Ray Kurzweil etc. in which technology intensifies all anthropocentric divisions (human/animal; Man/Nature; subject/object; mind/body etc.) towards their logical and most destructive ends.

#6

I dig it.

I’ve been using Synfire a bit (bit the bullet and paid the fee) and it’s pretty darn nifty. But I acknowledge that it’d be hard if not impossible to do all that Synfire does in realtime, and that realtime approaches are necessarily going to produce different perceptions and ultimately a different result.

Synfire is a composer’s tool, not an improviser’s tool. In the Venn diagram between composer and improviser there is much overlap, but it is not total. The overlap is probably where the most interesting stuff lives. But until the sophistication of tools such as that hinted at by @josephbranciforte’s work significantly increases I think Synfire will continue to be an arrow in my quiver.

But it has a funky UI and a whole mess of novel-to-Synfire concepts you have to take on board. I’d definitely prefer lower level tools that can get to similar (and equally sophisticated) results.

Wish Max didn’t frustrate me so much. Wish music developers were more likely to program in text. Wish I had a pony.

Enough of my blather. Sounds like a really cool project, and I’m super curious to see where you take it.

#7

Totally, this is very hard. I wouldn’t be as eager to charge ahead if I already didn’t have something basic (interactive re-harmonization, arpeggiation, transposition) + time axis transformations (event-based looper tools) already implemented in the context of binary operators. then I can develop one extension at a time and see where things lead, maybe I get lucky and push one or two concepts across the line to true real-time (zero-latency) operation (I already tried and failed once…) Maybe the bar is lower in some sense, it’s not how closely the output tracks predictions based on music theory, but how creatively one can develop fictional theories and roles that make just enough sense to contribute to the environment.

Anyway, with the existing tools I’ve at least found interesting structures emerge out of networks with three elements (more than three is difficult to manage). Overall I’m just trying to practice more consistently with them. I also have the dilemma, maybe I should just freeze development and work with what I have, do an album and then go back to the tools.

Max indeed is frustrating, which is why I’m more and more using it just as a wrapper for custom externals. Still, there’s a lot of “utility” stuff it would be a waste of time to do in C.

Thanks much for the encouragement!

1 Like
#8

To me one of the biggest advantages in Max is the immediacy of injecting these externals into your workflow. Yes, building a toolkit can take time and some frustration, but the real value lies in the sort of “UNIX style” paradigm of chaining together multiple individually simple tools quickly to do a thing you need done now, but might not ever need in exactly that way again later. The “problem” is that Max can do so many things it’s very tempting to try and make it do that, but it’s just as useful and applicable doing simple transformations or data/sequence conversions as it is building crazy full-blown frequency-domain modelling synthesizers (or even entire DAWs). Not trying to build the world with it is both a challenge and a freedom!

#9

reading this thread with interest.

was not previously aware of synfire. after a very superficial glance, i have a feeling of half skepticism / half excitement. i am happy that we are beginning to see attempts at products featuring embedded musical intelligence, yet i can imagine all types of terrible/confused music being made with such a program.

the development of structural rules for composition/improvisation is something that i believe should lie with the individual composer/improviser. i have no problem with outsourcing the number-crunching aspect to computers, but the question of “why” is still fundamental. why implement this rule instead of that rule? ultimately, there is no axiomatic bedrock on which to stand here. perception and subjectivity are essential in a deep understanding musical function.

2 Likes
#10

Give the trial a whirl, I think you’ll find the tool has no opinion about “why”. That ball is firmly in the artist’s court.

#11

We have used Wikifonia successfully for a couple of deep learning models, including LSTM and BiLSTM modeling of chord and rhythm progressions, as well as melody generation. There goes a significant amount of effort in preprocessing the data though, e.g. MusicXML is a representation of written sheet music, so you need to clean the lead sheet a bit by explicitly unrolling repetitions, encoding F# and Gb to the same pitch, etc. And then there is the trick of data augmentation; you can transpose each lead sheet into whatever key you want by shifting it up and down 1 to 12 semitones; this increases your amount of data that is seen by your deep model by a factor 24. Give it a shot!

2 Likes