or other text-to-image competitors, Stable Diffusion is an open source project, so its code is available for anyone to use or adapt. “The great thing with Stable Diffusion is that the guy who set it up, Emad Mostaque, is very ideologically driven in terms of the accessibility of artificial intelligence systems, and the importance of interrogating biases within the data sets. So a core part of the ethos of the company behind Stable Diffusion is that they wanted it to be open source, meaning that individual users could take this framework, and then apply it to whatever they wanted, and change it, develop it,” Roach explains, “Within weeks, so many innovations were taking place. People are making all sorts of different APIs, different applications that could use this very, very powerful system, and it’s been wild since then. Every other week you’re seeing things, and thinking, well, that was the stuff of dreams, and this is now possible.
“What Stable Diffusion did has really changed the culture and the ideology behind what it means to have or to not have access to these tools, and radically opened up who can use them.”
In late December 2022, shortly after Riffusion launched, Roach became fascinated by its possibilities – he had no plans to make an album, but found the idea of creating music with AI too compelling not to. “I started experimenting with it, and realised that there was a lot that could be done,” he says. “I got obsessed with this thing, and made hours and hours of recordings, because it’s quite lo-fi as well, it’s just this website that would spit the stuff out. I was documenting it – this is another thing I’ve been doing sometimes, because these systems, you don’t necessarily own them and you don’t know if it’s going to be there tomorrow. So I was making all this stuff and recording it, to return to it later, already with that knowledge that it’d be about going back, editing and finding things. I suppose the other thing too is that the majority of what was being spat out by the system wasn’t what I was looking for, even with the prompts I would use, it would take a long time to find something that really pulled me into it.” What is that something, in those snippets of generated sound, that made them just right? Were there specific prompts that worked, or failed? He demurs, and fair enough. Maybe the potential for text-generated audio using AI is greater than for text-generated images, because there is no standard of photorealism to compare the results to. Or maybe language is slippery and can’t be contained by the Venn diagram mentioned earlier. Music hits differently. Like the crate diggers zeroing in on treasure, you just know when it’s right. Or as Roach says, “I was talking with a friend recently about the pieces of music that were the most touching. Both of us were thinking about this. And I think there’s music that you can understand why you like it and you could talk about it forever.
“But I feel like the most touching things, there are no words to say why, it just does something to you. It just does something to you, you know? And you know there are these attributes you could point to and other pieces of music that are structurally and sonically similar, or where they sit in a historical continuum. But then for some reason, this part of this song, the way the textures and the melodies or whatever it is work together, it just hits you in a way that is pre-linguistic and it touches you as a squidgy sentient person for some reason.
“I feel like the [Riffusion] system can make things almost like pointing. Like, this is a free jazz drum solo, and this is an arpeggiated synthesizer, and this is a tabla. This is Rihanna. You can
Subscribe for unlimited and fully-searchable access to the digital archive of The Wire stretching back to Summer 1982 (Issue 1) across web, iOS and Android devices.