Up till now, most generative music fashions have been producing mono sound. This implies MusicGen doesn’t place any sounds or devices on the left or proper facet, leading to a much less energetic and thrilling combine. The rationale why stereo sound has been principally ignored to this point is that producing stereo just isn’t a trivial activity.
As musicians, after we produce stereo alerts, we have now entry to the person instrument tracks in our combine and we are able to place them wherever we wish. MusicGen doesn’t generate all devices individually however as an alternative produces one mixed audio sign. With out entry to those instrument sources, creating stereo sound is tough. Sadly, splitting an audio sign into its particular person sources is a troublesome downside (I’ve printed a weblog publish about that) and the tech continues to be not 100% prepared.
Subsequently, Meta determined to include stereo technology straight into the MusicGen mannequin. Utilizing a brand new dataset consisting of stereo music, they skilled MusicGen to provide stereo outputs. The researchers declare that producing stereo has no further computing prices in comparison with mono.
Though I really feel that the stereo process just isn’t very clearly described within the paper, my understanding it really works like this (Determine 3): MusicGen has discovered to generate two compressed audio alerts (left and proper channel) as an alternative of 1 mono sign. These compressed alerts should then be decoded individually earlier than they’re mixed to construct the ultimate stereo output. The rationale this course of doesn’t take twice as lengthy is that MusicGen can now produce two compressed audio alerts at roughly the identical time it beforehand took for one sign.
With the ability to produce convincing stereo sound actually units MusicGen other than different state-of-the-art fashions like MusicLM or Steady Audio. From my perspective, this “little” addition makes an enormous distinction within the liveliness of the generated music. Hear for yourselves (may be onerous to listen to on smartphone audio system):
Mono
Stereo
MusicGen was spectacular from the day it was launched. Nevertheless, since then, Meta’s FAIR staff has been frequently bettering their product, enabling increased high quality outcomes that sound extra genuine. In relation to text-to-music fashions producing audio alerts (not MIDI and so forth.), MusicGen is forward of its rivals from my perspective (as of November 2023).
Additional, since MusicGen and all its associated merchandise (EnCodec, AudioGen) are open-source, they represent an unimaginable supply of inspiration and a go-to framework for aspiring AI audio engineers. If we have a look at the enhancements MusicGen has made in solely 6 months, I can solely think about that 2024 can be an thrilling 12 months.
One other vital level is that with their clear strategy, Meta can be doing foundational work for builders who need to combine this know-how into software program for musicians. Producing samples, brainstorming musical concepts, or altering the style of your present work — these are a few of the thrilling purposes we’re already beginning to see. With a adequate stage of transparency, we are able to be certain that we’re constructing a future the place AI makes creating music extra thrilling as an alternative of being solely a risk to human musicianship.