Google researchers have created an artificial intelligence model called MusicLM that is capable of generating music from text descriptions in a similar way that DALL-E generates images from texts.
Currently, Google does not allow ordinary users to work with MusicLM, but the company shows several examples of the model in action. For example, melodies are available that sound like full-fledged compositions from paragraph-length descriptions. They include a description of the genre, atmosphere, and even specific instruments. Also available are 5-minute snippets created from one or two words such as “melodic techno”. Additionally, the work of the model in the history mode is demonstrated, when one composition contains different descriptions for different fragments. The model can even imitate human vocals. On the next page, you can see the music compositions created by MusicLM in various operating modes.
For AI professionals, Google has prepared a research paper that explains in detail how the MusicLM model works. However, Google does not intend to open access to the system to the general public, notes NIX Solutions.
“We have no plans to release models at this time,” the document concludes, citing the risks of “potential misappropriation of creative content” (read: plagiarism) and potential cultural appropriation or misrepresentation.
Google says it is publishing a dataset with approximately 5,500 music-text pairs that can help train and evaluate other AI music systems.