AI in Music, Magenta

Has anyone here experimented with Magenta? I saw the announcement of it a few months ago, but haven’t worked with it.


Haven’t tried it yet, but this thread might come in handy for getting set up:!searchin/magenta-discuss/aws/magenta-discuss/wtlLx_pF6Ug/37tte32BAgAJ

I think most of the thread has been incorporated into the Github README. It’s also a pretty website, seems like Goog is serious about people using it.

1 Like

The readme explains how to install from scratch. The thread is about an Amazon ami image that will allow for something closer to a one-click install on aws. Also handy because Amazon offers CUDA enables instances (p2 instances) which is nice since most macs don’t have dedicated GPUs.

1 Like

I’ve been working with Magenta since early Fall 2016.


How do you use it? Do you use it to generate ideas? Or let it build complete pieces? How doesn’t it fit in your workflow?

1 Like

Magenta has been on my radar for awhile - I saw the presentation they gave at Moogfest last summer, and got to chat with Doug Eck briefly. Incredibly exciting and I’ve got a ton of ideas I’d like to try out, but I feel like I’m way to much of a technological caveman to use it as it is now.

Looking forwards to the inevitable consumer facing VST versions of this in the future though!

Anybody here who will work on that in the near future? I would be interested to see what people will do with magenta and my toolkit: Dadamachines Automat Toolkit

I have already wrote to adam from the magenta team. But they seem to be quite busy.

Posted in the forum there:!topic/magenta-discuss/w4RATFIYT-0

1 Like

curious of the same

is there anything comparable out there? I don’t understand any of this yet and am reading up

Comparable to Magenta? Magenta is base on the TensorFlow machine learning framework. Mxnet and caffe are two other ML frameworks. Caffe is single threaded. Mxnet is highly parallelizable and can outperform TensorFlow. All three have implementations of style transfer, which has been the most common artistic demo of ML (this is what’s going on when you use a Prisma filter on a photo, for example).

Only the Google Magenta team has taken so much time to engage artists and musicians in exploring the humanities side of ML. I feel this is a real missed opportunity for the Amazon folks who are eager to see more mxnet adoption and I’ve told them so. Kind of hard to bestow humanities appreciation in folks who don’t naturally have it, unfortunately.

The demos here do a great job of demonstrating the potential. Be sure to check out the AI Duet in particular.

1 Like

very helpful

I’m gonna check all of that

1 Like

Probably the easiest way to test multiple ML frameworks is with this image:

You can use it with Amazon’s p2 instance types in EC2. P2 instances are CUDA GPU enabled, which you will need for this.

The image currently includes:

I’m afraid to ask, but I wonder what it would take to run this on Norns…

(I thought after watching this video from Google IO)

that video is 3 hours long… what is “this?” the Magenta kernel? tensorflow? nsynth? ML for music?

short answer: the Magenta kernel/OS (Fuschia) works on pi and should work on norns.

long answer:
norns is a linux box with a fairly powerful Arm processor and is based on rasperry pi. you could definitely use it to run tensorflow or torch, and it is powerful enough to perform some classification tasks - here is someone’s project targeting a much slower Pi to perform image classification using DNNs. ( (of course the DNNs are trained elsewhere! with a lot of phat graphics cards.)

Arm has recently released an open-source Compute Library that includes optimized general matrix multiply and other things that DNNs use. ( that should speed up classification tasks; i’m not sure how far along tensorflow or torch are with integrating this into their arm/linux ports.

unasked questions:
is it useful to run Fuschia on a norns? i don’t know. seems like any significant DNN task wouldn’t leave much juice left for any kind of realtime audio processing or synthesis. if all you want is a small box to work with high-level realtime musical data in those environments and also USB-midi or networked OSC, then any small computer will do, and imho there’s no need for a much more expensive device specifically designed for music performance.

what about more general ML / predictive stuff? adaptive filters, PCA, linear regression, stochastic modelling &c &c?
machine learning is a big umbrella that includes a lot of numerical techniques besides DNNs.
the application of predictive modelling to music is itself a gigantic field that long predates the current wave of interest in DNNs. (here are my parents playing with a cello and analog computer in 1980: there are an infinite number of things you could do on a box like norns in this problem domain.


Maybe I’m mistaking the line between NSynth and Magenta… is Magenta more about the classification part that powers NSynth? I think I mistakenly thought that NSynth was the hardware running Magenta, but it seems that isn’t quite right

magenta actually is just the name of google’s lab looking at creative applicatinos of deep learning, its not a Thing

Magenta has developed a kernel and operating system for developing and running these applications, called Fuschia. it can be built for Arm/pi/norns. (but like i implied, the range of suitable tasks for arm/pi/norns is limited. i’d assume it would be mostly evaluating pre-trained models for classification or stochastic synthesis.)

NSynth is two things:

  • mostly, a model and dataset for generating and classifying waveforms in tensorflow: git repo. i’m not sure, but guessing you could maybe get this to run on a pi in tensorflow with or without Fuschia. i doubt you could do much in real-time with it but i could be wrong.

  • also, a prototype physical interface for that model. wot made all the hype videos and whatever.

i’d recommend reading more about the nsynth model, if you are really interested in using it. but my super-high-level summary would be:

  • the model consists of an autoencoder that conditions an autoregressive decoder, allowing a tensorflow developer to create audio waveform classifier/generators that work very similarly to image bitmap classifier/generators.
  • the dataset is just a big collection of individually sampled musical notes, classified by pitch, timbre and amplitude.

I found another Magenta project that listens for piano notes and transcribes them to MIDI. I think this is already possible with FFT analysis but I don’t think that’s what this Onsets and Frames technique uses.

I tried a few phrases in microphone recording mode. It has errors but got the pitches of all the chords right.

Onsets and Frames does indeed work on the FFT (or, potentially(?) something functionally identical like the Hartley transform) - first step is getting log-Mel spectrogram, then feeds into two ANN stacks, the first trained on onsets, the second on “voiced” sounds (aka frames i guess.)

using FFT (or other uniform filterbanks) is still the status quo for signal processing DNNs. but i recently saw a pretty wild new technique for working with time-domain signals directly, where the DNN comes up with its own basis functions on the fly. i think we’ll be seeing more of that soon.

1 Like

As I’ve recently been (very very clumsily) attempting the feature/model search process manually (and without great success) I can really appreciate the goals of ML 2.0.

A more detailed paper:


Unfortunately the case studies so far are not very inspiring.