Skip to main content

On Intelligent Generators

Author andre
Teaser Image

Artificial intelligence has been around for decades, but only recently neural networks, deep learning in particular, became a hot topic again. I wanted to share a bit of insight into Cognitone's perspective on this is and what we are currently doing in the way of new intelligent features for Synfire.

Your Phrases - Your Music

One of Synfire's unique strengths is that you can harvest any MIDI material, including your own takes, to collect a wide variety of musical expressions (phrases) for arbitrary re-use, transformation and combination. Playing around with the example phrases that ship with Synfire already is some fun, but the prospect of building original music based on your own breed of expressions is more compelling, of course.

Once you got the hang of creating your own phrases, it quickly becomes straight forward routine. Above all, by working with your own material (wherever you got that from), your output is not limited by some hard-wired concept of musical style and structure, or by a random generator that does magical stuff you can't control. 

Admittedly, in order to really enjoy this kind of freedom, you need to invest a little bit of work up front. Wouldn't it be nice, if this was an assisted and largely automated workflow? Imagine the productivity, fun, instant reward and lucky surprises this would entail. 

Generating Phrases

Motivated by this prospect, Cognitone is currently researching new technology that would help with the process of creating new original phrases, harmony progressions, parameters and possibly even entire songs, by introducing assisted and automated elements into the workflow (don't hold your breath for that complete random song feature, though).

Generators, anyone? Well, not so fast. In order to be effective, every music generator has to make autonomous decisions. Whether that is entirely synthetic output (e.g. from fractals, functions), or a blend of previously trained patterns (e.g. output of a convolutional neural network). Before the latest neural networks boom, there were many symbolic generative concepts in AI, like generative grammars, finite automatons, forward-inferencing rule systems and more. All of these can still be utilized for generating music.

Randomness however is only satisfying to some extent, because music isn't merely a sequence of notes. The often boring and not very catchy output of random music generators and automatons has shown this for years.

Music is a Language

The saying goes that music is a language which is understood everywhere. In fact, it really is a full-blown language with a vocabulary, grammars and semantics. This also explains why randomness isn't cutting it: Imagine randomizing the words or letters of a book. Not many people would want to read that. In order to generate meaningful results, a generator needs to take account for the inherent rules of a language.

Good composers always defined their very own language of musical expression and built their works based on that language. Music can be seen as a corpus of words, sentences, paragraphs of a language. This is what constitutes a style. And it's what makes a composer stand out.

Style Is Not Hard-Wired

Making elements of a language (e.g. Coda & Reprise, Question & Answer) become permanent controls of a user interface might not be the best idea, unless the software is designed to handle a specific range of musical genres and eras and nothing else. There are simply too many different languages out there. What if my music is based on  different elements?

Beyond Parametric Randomness

A general-purpose generator requires some customizable definition of a language, the many parameters of which can then be randomized within their range. The downside of this simple approach however is that Cognitone (or any other developer for that matter) can't possibly create hundreds, if not thousands, of such language definitions in advance. Huge upfront costs aside, no single person is probably able to understand and encode all the intricacies of every musical style. Even if someone did, chances are that no matter how many styles you created, there will be composers that still don't find what they are looking for. 

So what Cognitone is therefore currently doing, is experimenting with technology that, if possible, lets users define their own musical language elements and structure that Synfire will use to generate phrases, parts and possibly entire songs. Ideally, these languages could be shared and modified, just like arrangements and other files.

Neural Networks

As we are at it, I should mention that I spent the better part of 2017 experimenting with convolutional neural networks (CNN, Deep Learning), coming to the conclusion that they don't suit this purpose well. First, these networks were primarily designed for recognition tasks rather than generation. Second, training these networks on all imaginable styles our users might eventually want to compose is a tremendous task. The required labeled data just doesn't exist and creating it from scratch is a huge effort.

And third, since nobody wants to end up with cliché variations of similar output that thousands of others have also generated, you will want to have more fine-grained control over style, if not create your very own. There is no clear path towards this yet.

Deep learning is the hot thing currently. For the purpose of composing original music it is only of limited use however, at least in its current state.


Although there is no estimate yet if and when some of this will surface as a new feature, I thought it would be nice to let you know what's going on behind the scenes here, in addition to the day-to-day software engineering around our agenda that grinds its way forward.

(comments are welcome in this thread)