In the last couple of years we have seen a proliferation of the use of GANs in the scene we laughably seem to refer to as ‘AI Art’. Often these GANs are trained on faces, and every few months we see another higher-quality video of faces morphing into each other.
These models are the result of thousands of hours of processing time – to borrow a phrase from the crypto-currency community – they are a literal ‘proof of work’. Trillions of calculations, rendered as a block of numerical weights, ready to be replicated and interrogated.
These products of digital labour are scattered all over the web.
Artistically, these models can be considered ‘found objects‘
Found object originates from the French objet trouvé, describing art created from undisguised, but often modified, objects or products that are not normally considered materials from which art is made, often because they already have a non-art function.
A box of infinite faces
For this work, the found object is the FFHQ styleGAN model from Nvidia.
The model has been trained on the Flickr-Faces dataset, which contains 70,000 faces, and is frequently used as a benchmark for various computational face-related tasks. This model is able to produce an infinite number of novel faces – as demonstrated by websites such as https://thispersondoesnotexist.com/ .
In addition to generating new faces, it is possible to ‘find’ faces that are similar to ‘real’ faces. We effectively ask the network to imagine a face, based on an existing photograph – somewhat akin to a Police Sketch artist recreating a mugshot from the verbal description of a witness. It is not a direct reproduction of an image, but a ‘closest approximation’ derived from the latent space of the trained network.
For some images, this works very well.
For others, it produces horrors from the uncanny valley.
Mathematically speaking, these two attempts to find my face may be equally ‘successful’ – i.e. the algorithm has done its best. But to the human observer the second image is clearly ‘wrong’.
The human brain dedicates huge resources to the problem of recognising and reading emotional state from other faces. Its unsurprising that we are highly tuned to spotting when we’re being duped.
Contrast this with the machine-imagined cats at https://thiscatdoesnotexist.com/ – we aren’t as well versed in spotting the nuances of the cat face, hence a slightly wonky cat is much less disturbing than a slightly wonky human face.
One could argue that the whole history of portraiture is about playing at the edge of the uncanny valley. A good portrait captures the character of the subject and tickles our brain in an aesthetically rewarding manner.
The journey of a portrait starts with the perception of the artist, who transforms it through expression and technique, resulting in a work that is viewed and interpreted by the viewer.
Some say all portraits are self-portraits. That the artist is inevitably entwined with her work – perhaps GAN faces should be considered as self-portraits as well?
Portraits of the network that produced them.
Moving through face-space
The video for Cry by Godley & Creme hit the world’s 4×3 CRT screens in 1985. A simple concept, beautifully realised, and on a personal note, perhaps my first encounter with the concept of the face-morph.
The method used is relatively simple, a series of actors singing the song, straight to camera, blending into one another with some video fades. A more sophisticated version of the concept appeared in Michael Jackson’s video for Black & White, released in 1991.
I have been using (less sophisticated, but somehow more disturbing) face-morphs as part of The Private Sector‘s live AV for several years.
One of the features of the StyleGAN architecture is the ability to manipulate the faces across human-salient axes, such as age or gender.
Every time I see a GAN face morph, it makes me think of the Godley & Creme video, so I decided to see what would happen if the two met.
I split the Cry video into its constituent frames and asked the network to try and produce a portrait of each one.
I then asked it to produce a slightly older version of the same face. Often this seems to involve adding a pair of glasses.
Each frame of the video is a portrait ‘painted’ by the network. The output influenced by all that it has experienced before.
Human artists work in the same way, but our experience is infinitely broader than solely looking at the same 70,000 faces over and over again. However, as machine learning systems become more sophisticated and absorb other modalities, will the distinction remain so clear?
The source images are from the video are black and white, with high contrast, and are frequently in the process of being faded one into the other. In many ways quite different from the FFHQ dataset it has been trained on, and a nigh-on impossible task for the network to understand, given its limited experience of the world. Yet it still produces a portrait, albeit sometimes with scant resemblance to the original.
These faces are plucked from the fuzzy edge of the network’s knowledge, testing the limits of its abilities.
This is the machine painting a dream of the Cry video. Enjoy.