I made this video for the excellent Frank&Beans.
The video is entirely AI generated – the starting point was four photographs (two of Milo and two of Tim). I used a ‘supermodel’ version of a FFHQ StyleGAN2 model as a base. Using back-projection I ‘found’ their faces inside the network and used these vectors as starting points for a series of audio-driven journeys through latent-space.
The underlying animation is created from the vectors for Milo and Tim, alongside a series of non-existent human faces from the mind of the GAN – interpolated to the tempo of the track.
From this base cycle, the audio stems were used to push the vector in predefined directions, such as ‘age’,’gender’,’pitch’ and ‘yaw’ etc. as determined by the audio.
The vocal section is based on a series of VQGAN-CLIP images derived from a still of Milo and some choice prompts.

The resulting images were then animated from the vocal audio stem using Wav2Lip.
Et Voila! An entire AI generated music video, derived from 4 photos and audio stems!
Drop me a line if you’d like something in a similar vein…