Skip to main content

Generate prompts for Udio/Suno etc. to "render" Synfire compositions

Posted

The new AI based song generators are not only good at - well, coming up with songs, but also quite convincing at interpreting music, both instrumental or with singing.
And they are getting increasingly better at taking instructions to get exactly what one wants - not only lyrics but also providing chords and scales, uploading snippets of melodies or entire compositions.

Would be interesting to have an "export" capability in Synfire to generate a prompt for those new AIs to "render" the composition with realistic instruments, voice etc. While staying as close as possible to the composition in mind.

I guess this could be more realistic and useful than trying to beat the large multimodal LLMs at their (particularly) rendering songs game with classic sound libraries etc. that need a lot of effort to particularly get a lively interpretation out of them that does not sound like "MIDI".

Of course still nothing beats a recording with a real band, orchestra or singers. But not everybody has the opportunity, connections and financial means to do so.

 

On a tangent:

Will also be interesting how AI attribution, copyright etc. will evolve. Think most agree that AI involvement should somehow be mentioned. But I guess it does not make sense to have this binary, yes or no, but attributing specific contribution steps to either humans or machines. Like: Melodic ideas and lyrics human, rest AI (which could include Synfire assisted composition aids). Or: lyrics and score human, rendering AI.

Question would be what would count as "all human" then. Just pencil and paper? Do AI attribution needs start at computer aided transposition, orchestration aid, generation of melodic snippets? Or already at semi automatic mastering plugins, autotune and drum machines? When does "tool" end and "contribution" start?
I think now with AI at that stage some open questions will have to be (re)clarified.


Sat, 2024-06-15 - 15:39 Permalink

Generate prompts

Actually it should be the other way around. Their audio quality and choice of sounds is really bad and if it weren't for the vocals, the music would be minimalist, repetitive and boring. If anything, they should generate input for Synfire to render!

When I tested them, my prompts didn't make much of a difference. Details were simply ignored or contradicted. They are more like random radios with a search function. So your suggestion won't work.

What's more, these platforms are trained on millions of existing music productions without consent. A tsunami of faceless music clones threatens to devalue human expression and effort to the point of making it worthless. This predatory "disruption" is a serious threat to everything that makes our culture human.

I get how exciting it is to write a prompt and immediately listen to music. We all know how much work goes into music production. It feels like magic. But be careful what you wish for.

Coincidentally, since any reference to "AI" is already becoming a liability, I added a pledge to our site today:

Cognitone has been using AI long before the advent of LLMs, transformers and diffusion models that are now a threat to everything that makes our culture human. We will never swallow your music to replace you. We are committed to helping you create your own distinctive music, not the millionth variation of a statistical average. We are firmly opposed to the massive devaluation of human expression.

Tip: Listen to an Udio/Suno song more than once and check it again the next day. The vocals are impressive but that's only the surface. After listening 3x - 4x with some resting time in between you notice the flaws. I've played a few songs that really impressed me (as a former songwriter) to a couple teenagers and they found it boring and off. 

Without the awe and fascination about AI, the actual music doesn't even come close to the real thing.

 

Sat, 2024-06-15 - 15:48 Permalink

Think most agree that AI involvement should somehow be mentioned.

That's not as much an issue with music as it is with fake video/voices/news.

Use AI as much as you want if that gets your creative juices flowing. Just don't think that writing a prompt has anything to do with making music. Making music will never be as "simple" as these platforms claim. If it is that "simple", it's not you making the music. It's not your music.

Sat, 2024-06-15 - 15:52 Permalink

Think udio is going into the direction of giving the composers more control currently. Certainly still a lot of room...

I think considering human inputs and only helping what the particular human cannot do would be important.

Initially it was said that machines can take over the tedious work, so we can concentrate on what is fun. We wouldn't do ourselves a favor training machines to take over what is fun, so we have more time for the tedious parts...

That said, some stuff is really impressive. And particularly the good ones seem to be those with creative human input. E.g. this person came up with really interesting prompts and thus, listen-worthy music - almost all of the songs that came out of this "collaboration" are frankly speaking interesting to good:
https://www.udio.com/songs/fCe8okeWAGJhaRzTrrGQ9X

Classical music is still less "in danger". But imho not only the voices are good. Instruments also got quite convincing at this point.
E.g. some of these: https://suno.com/playlist/8c8838f4-0498-4f7d-ad36-0c7e7fb1804a

(like "AI took Vivaldi's job" or the piano pieces. The counter tenor in Masterpieces of lost time is also quite convincing, the soprano not so much...)

Sat, 2024-06-15 - 16:18 Permalink

Yeah, as said: The tech is impressive but the music is ... generic. I've heard all these songs before, many times, since decades. Random radio at 4:00 AM.

What sense does it make to generate something that every moderately talented musician can spontaneously improvise because it is already an established part of our common musical culture?

If you want to be noticed and heard, you need to find a form of expression that is distinctively different than the generic average. Only experimentation, curiosity, motivation and imagination gets you there.

Sat, 2024-06-15 - 16:34 Permalink

On a positive let's hope that the new "competition" will inspire exactly that - the willingness to experiment, take risks and try new things. Currently a lot of producers and composers are very conservative, only pushing and repeating what they already know will sell - because it is similar enough to what already sold before.

This kind of composition will probably not be viable anymore when you can have a similar track for 2 cent per piece. So people have to finally get ahead of their time again.

Sat, 2024-06-15 - 16:40 Permalink

But coming back to the original topic: This technology is (likely soon) looking very interesting for score to audio rendering. Wouldn't be surprised when one could upload the scores of an e.g. classical piece and get a faithful orchestra rendering of that by end of the year. (Already possible to some degree but the current systems don't distinguish much between composition and interpretation, so the result is more inspired by your composition, not a rendition of it...)

Sat, 2024-06-15 - 17:07 Permalink

If by "this technology" you refer to AI in general, yes. Text-to-music transformers, not so much. Music is extremely subjective and defies verbal description. Hence the short prompts.

Adding (fake) emotion, articulation and expression to an otherwise static score is certainly something we could use. The challenge is a totally fragmented market with thousands of sound libraries, synths, effects, etc. There is no common protocol or API that an AI (or: Synfire) could use to control them.

In the small niche market of music production however, nobody is stupid enough to invest eight-figure sums in fundamental AI research and the establishment of new standards. Hasn't happened in decades.

Sat, 2024-06-15 - 23:01 Permalink

but the music is ... generic. I've heard all these songs before, many times, since decades. Random radio at 4:00 AM.

Well, then the radio stations can easily replace all their expensive playlists with AI-generated tracks. Nobody will even notice and it won't cost any license fees anymore. Isn't that great? 

Sun, 2024-06-16 - 13:18 Permalink

These platforms will eventually run out of money. A lot of expensive hardware and energy is required to run transformer models on audio data. Most platforms are targeting consumers, content producers, videographers, small businesses. That's deemed a big enough market by their investors, but I have my doubts.

Consumers are already streaming an abundance of expertly produced original music on demand (spotify, etc), why should they pay for a random radio with faceless clones? And if I need a funny song for my kid's birthday party or a surprise tune for my love interest (or whatever), I won't subscribe for longer than a month.

And B2B customers can already choose from stock music libraries (envato, etc.). What you hear is what you get: Finished productions with decent quality (and real people getting paid). Customers can ask these producers to make a remix/variant for them and buy exclusive rights, while AI-generated stuff is legally insecure (probably public domain). And again, why keep a subscription for something you only need once in a while?

Same for songwriters and home producers. Why subscribe for longer than necessary? As soon as I got my inspiration, I'll cancel. These raw song sketches need to be rebuilt in Synfire or DAW anyway.

I have to admit that I am amazed by the generated vocals. But then again, any real singer can do it much better, and you'll need one anyway if you want a song to be performed and take off.

As we are at it, I'm very motivated to have a closer look at vocal rhythms and melodic movements. It should be possible to make a better factory for vocal lines that are more "pop" out of the box.