Skip to main content

Forest Journey music cinematic

Author Ruchir

Hi All.

My latest composition, my first using Synfire 2.0.

I've mashed up some trailer video clips from one of my fav movies currently playing, like a rescore study. Grateful for your comments.


Updated video with new imagery, generated by Midjourney AI:



Mon, 2022-08-22 - 02:54 Permalink

you've captured the feel of the 'trailer' perfectly. Loved it and that hook is a bit of an ear worm.

Sat, 2022-08-27 - 12:37 Permalink

Thanks. I've now updated the imagery in this video using images generated by the new image AI Midjourney which is absolutely incredible.

Fri, 2022-09-02 - 19:10 Permalink

These pictures much better match the calm music. Well done.

Midjourney is indeed awesome. As is DALL-E 2 of OpenAI. I wonder if deep learning will ever be able to acquire structural/procedural knowledge. As it works now, it's fantastic for visuals but pretty underwhelming for music composition. Music is many orders of magnitude more complex than images.

Fri, 2022-09-02 - 19:16 Permalink

Thanks. You're absolutely right!

Visual artists need to be really worried right now. Music artists less so.

Fri, 2022-09-02 - 20:53 Permalink

> Music is many orders of magnitude more complex than images.

Not sure whether music is so much more complex or whether simply much more research resources have been poured into image process (as well as language processing). 

    Fri, 2022-09-02 - 22:27 Permalink

    Certainly that's a factor.

    The generated images are dynamic interpolations and transformations (sort of) of pictures they have been trained on. The networks however have no clue about space, perspective, geometry, physics, cause and effect, etc. It's all an associative blend of 2D fragments it has been fed with.

    Deep learning works much like our visual cortext in the brain. Almost every A.I. generated image I have seen so far reminds me exactly of what I see in lucid dreams. At times I've been able to control almost every detail at will and see the world transforming right before my (inner) eyes. When Deep Learning came up around 2015, I suddenly realized I was in fact "surfing my visual cortex". I find it amazing how tech is able to resemble this process. Still, it is a pretty mechanical and hard-wired one.

    The intelligence is not in the images. The intelligence is in the sentient being that is able to control them. Nobody knows how it works.

    Music on the other hand is serial discrete information not unlike DNA. It unfolds over time with recurring patterns in multiple dimensions (rhythm, melody, harmony, sound, narrative structure), all related to each other. A single wrong "pixel" is enough to ruin it. I doubt a neural network will be able to avoid the countless mistakes that ruin a piece of music. Blending image fragments is easy (2D + color). Blending music fragments is not (5D + time + discrete information). It may work for some styles (Ambient, Meditation, New Age).


    Fri, 2022-09-02 - 23:16 Permalink

    You've obviously thought a lot about this. Very interesting analysis. Are you really that sure that the mechanism that creates a multi dimensional space in deep learning (in the case of MidJourney 500 axes apparently), can't replicate the same sophistication on music?

    Sat, 2022-09-03 - 09:04 Permalink

    I think deep learning will be useful for partial aspects of music (it might help with figure recognition). Dreaming up a full structure that makes sense to humans is more a task for rules-based generative algorithms (classic pre-neural A.I. - the stuff KIM is doing). I rather see KIM generate entire passages of music, or even songs at some point.

    The fundamental challenge is to label enough data for training. Someone needs to sift through a million or so musical compositions (MIDI or XML) and label all their elements and properties. With crowd sourcing it might be doable, but the labels will be very subjective. Music doesn't depict real objects that people can agree upon. Everyone associates different ideas with music. Imagine a dozen people describing an abstract painting and you'll get a dozen different descriptions.

    Sat, 2022-09-03 - 09:22 Permalink

    What about all the YouTube music videos that have already been labelled?

    instead of deep learning MIDI, why not let the AI skip that and deep learn the music textures etc. of an already recorded piece?

    Sat, 2022-09-03 - 12:28 Permalink

    Audio is hard to break down into a score and meta data.

    The general YouTube labels don't help much with deep learning either.

    You need labels that describe each measure of music in terms of pace, mood, instrumentation, timbre, style, rhythm, granularity, dynamics, whatnot. And then sections of music as intro, build-up, reprise, climax, etc. You need to use the same terms a composer would use to instruct the A.I. to compose something for them.

    Simple prompts like "Something like (insert latest hit here)" will merely give you a random jukebox with no control.

    I don't say DL is useless for music. It's just way more complex and subjective than images.


    Sat, 2022-09-03 - 12:44 Permalink

    The important difference:

    Images can be combined from elements by blending, overlaying, masking. There are existing filter methods for color grading, contrast handling, style-transfer, etc. 

    Blending elements of music is hard: Different rhythms, clashing harmony, incomatible styles, frequency spectrums, conflicting roles of instruments, etc. It can be done at the symbolic level (like Synfire does), but doing it at the audio level directly is probably impossible.


    Sat, 2022-09-03 - 14:49 Permalink

    Perhaps. But the human mind can distinguish instruments and syncopated rhythms based on hearing them in other contexts, which is not too dissimilar from a neural net. 
    what will be really interesting is when ai can use imagery to generate decent music, and music to generate decent imagery, in a culturally specific way (n.b. Beyond just "hear the sounds of the black hole at the centre of our galaxy")

    Sat, 2022-09-03 - 15:09 Permalink

    But the human mind can distinguish instruments and syncopated rhythms based on hearing them in other contexts

    Most people can't. It takes years of ear training and musical knowledge to do that. We are so used to it, it is easy to forget.

    But yes, audi- to-midi conversion is certainly what DL could be trained to do.

    Sat, 2022-09-03 - 15:17 Permalink

    But years of training for a human can be replicated by a neural net across vast internet based datasets by a deep learning algorithm. Worth reflecting on Google's lambda programme which has been trained to discern context and focus. Some poor Google engineer also got put on administrative leave for thinking it's sentient.

    Sun, 2022-09-04 - 11:25 Permalink

    vast internet based datasets

    If they exist. There is plenty text and images on the net. But last time I checked, I couldn't find much for music, let alone based on audio. When people write about music online they rarely describe the music as such. I remember a research project that collected a catalog of music with meta data for some automation purpose. No idea if it could be useful for training. Music is compressed discrete data with lots of logical constraints, rather than a continuous space that can be filled with "remembered" fragments.

    Sun, 2022-09-04 - 11:38 Permalink

    IMHO, A.I. is overestimated. It currently reached a level of awe that totally blinds us. These networks are good only at one specific task each. MidJourney and DALL-E include lots of hand-written processing stages. It's not that some super A.I. figures out anything by itself. No reason to be worried. Although there's a lot of potential for abuse.

    Sun, 2022-09-04 - 14:12 Permalink

    There are a couple of ML models that are generic, trained to do multiple different tasks concurrently (eg GATO). Not too much information about them except what they have achieved and they are only proof of concept so far. I should imagine the more advanced models and methods are for military use and secret.
    One of the problems about music is catagorising it, I'm not talking about genre, Im talking about what is a 'great tune', no two people will agree on that so how can a model be taught it? Most ml models need training by supplying them examples together with the answers although there are some new techniques where models compete to 'fool' the other model, each model tries to say if the output of the other model is real or machine generated, these techniques do not require answers with their training data. These last system might prove useful for producing realistic sounding music but i doubt it will ever create outstanding music, music that makes people cry, laugh, etc. When it comes to performing/recording music, humans (some not me) have a way of introducing imperfections into the music that despite being mistakes, elevate that music to even greater heights. I doubt machines will every develop that ability any time soon if ever.


    Sun, 2022-09-04 - 14:41 Permalink

    My next video will be full of MIdJourney generated AI art. I've already created the art, now need to create the music. Usually the other way round!

    Sun, 2022-09-04 - 15:04 Permalink

    I'm currently making orchestral music all day while testing Synfire. Inspired by this discussion, I will do a quick video for fun. Which tool do you use for the video slideshow?

    Sun, 2022-09-04 - 20:51 Permalink

    Ah ok, it's a video editor. I hope Final Cut Pro X also has a convenience feature for slides. Can't spend too much time on this right now.

    I just posted a score and am eager to hear some ideas from you what pictures to generate ;-)

    Sun, 2022-09-04 - 21:39 Permalink

    Just listened to it. Definitely a film noir, detective story, maybe the scientist is you and the mystery involves Kim, who is about to board the train. Little do the fellow passengers know that Kim is an advance AI from the future sent back in time to help Andre bring his Synfire to the masses.

    Here's the MidJourney:

    Mon, 2022-09-05 - 08:36 Permalink

    Oh, thanks for the flowers ;-)

    Time travel is one of my favorite genres. Travelers (Netflix) is a great recent example. 

    That picture is wonderful. I'm a bit disappointed with DALL-E 2. I should definitely try MidJourney. 

    Mon, 2022-09-05 - 11:04 Permalink

    Just joined the MidJourney Discord and must say: Wow. At the same time I'm kind of feeling depressed. This is overkill. After looking at only a few hundred images, it already got me to a point where I'm beginning to feel sick. Seems to be unhealthy. Like drugs. Us old farts are immune to this, but teens are not. It literally alters the development of their brain. If you have kids, you should be concerned.

    Societies are breaking apart from disinformation, greed and hate and instead of organizing and coming together, everybody that's youthful and energetic enough to make a difference seems to be escaping into paint-brushed fantasies and generating bizarre memes all night long. If technology continues to suck the energy out of the next generation with social media, excessive gaming and now this, it'll be a strong contender for the next nail in the coffin of humanity. 

    I hope that at some point people will get tired of inflationary virtual imagery, it will give rise to a rennaisance of bare bones reality. If it's not too late by then.

    Mon, 2022-09-05 - 14:06 Permalink

    They said the same about the internet!

    I think we just need to see things differently (no pun intended). The most interesting thing about MidJourney compared to the other AI offerings is the open sourcing of the imagery in order to crowd source new images. This open approach is what hooked me in, where people can riff off each other's ideas. I wish we could do the same in modern music, but copyright, sample library license restrictions, EULAs, T & Cs all get in the way.

    Tue, 2022-09-06 - 14:20 Permalink

    escaping into paint-brushed fantasies and generating bizarre memes all night long.

    It's not like you can do anything different with these kinds of embryonic A.I. networks. Try to make something more realistic and meaningful and you'll get terrible faces and nonsense architectures. When you make fantasy / random stuff instead, since it's not very clear what's being depicted, it's easier to make it work.

    It's stuff that seems crazy at first, but then you see the limitations, and so you're left with creating random/stupid images, or abastract ones; and the latter is what these A.I. are good for, IMHO.