As individuals, we have a personal identity. As members of a community we have broader, shared identities, alongside our individual nuances. From these shared identities, tribes, communities and societal structures have emerged.
When meeting new people, we often ask ‘where are you from?’ (often followed by ‘what do you do?’) – looking for geographic cues to place them within our own landscape of partial knowledge, and all too frequently, preconceived prejudices.
The distinction between the ‘in group’ and the ‘out group’ has a strong evolutionary history. We are not built to deal with a tribe of 7.9bn. We are drawn to create social groups based on familial, geographical and cultural similarities.
From QAnon to the Labour Party, tribal boundaries are maintained explicitly and implicitly by their adherents. Often we find more energy is spent on identifying and demonising ‘the other’ than is spent looking inwards and finding cohesion.
These archetypes and their limits are maintained in the conscious and subconscious behaviour of their members. Often these groups align culturally, be it in dress, environment, practices or ritual.
The internet contains millions of examples of these tribes, alongside photographs and textual descriptions of their practices. A semi-ordered compendium of human culture. AI models trained on this data reveal a new kind of cultural truth, one derived directly from our online behaviours.
Image classification is a big issue in the Machine Learning (ML) world; the ability to automatically spot objects and people in an image or video feed is used in a host of areas, ranging from traffic and border control to policing, internet search, and the photo app on your phone. To achieve this amazing task, networks are trained on huge libraries of image and label pairs – these datasets are laboriously created by humans, an often unseen aspect of ‘machine learning’
“The ImageNet dataset, one of the largest efforts in this space, required over 25,000 workers to annotate 14 million images for 22,000 object categories.”OpenAI
The CLIP model, from openai, offers to take this process out of human hands. Images can be labelled automatically using the enormous ‘knowledge’ it contains about how we describe the world in words.
This may seem great for spotting the difference between ‘cats’ and ‘dogs’, but since the model is trained on everything, it can be used to classify and ‘find examples’ of whatever phrase you give it. In my previous work, The G(AN)8, I used this method to create fictional world leaders. The same method applies here, but rather than explicit individuals, I seeded the system with broader cultural labels.
The following images are visualisations of the phrase supplied – no further information is given, beyond the implicit learnings of the CLIP model. The system iteratively refines the face, trying to find a better and better match. Noise introduced into the training means a range of similar faces can be produced from the same prompt. The examples below are generally from the first response to the phrase.
Historically, the commonality within national groups has had a strong genetic component – large-scale, long-distance world travel being a relatively new phenomenon, evolutionarily speaking.
On the one hand, these images tend to support our general assumptions about the populations of these countries – on the other, they effectively hide any underlying population diversity. In many images we see the suggestion of flags and national colours (e.g. ‘American woman’ and ‘Dutch man’) or backgrounds that echo the climate (e.g. snow behind ‘Russian woman’)
We can even get more specific and look at an individual city. Once again, the background seems to play a strong part of the signal- note the red buses behind ‘London man’, whilst ‘London woman’ seems to be inside the bus…
When meeting new people we often ask ‘What do you do?’. The response is used to trigger a swathe of assumptions and preconceptions which we have developed over the course of our lives.
When I am asked this question and reply ‘I’m an artist’, it is almost inevitably followed by ‘but how do you make a living?’ – which tells you both about the status of the artist within contemporary culture, and the primacy of wealth to indicate ‘worth’ in our late-capitalist society.
I’ve specifically left out gender in these labels, to see whether the network considers the role generally male or female.
‘Academic’ is particularly interesting, as the system searches for the best match, the face transforms from a woman to a middle-aged man. Female friends who work in academia will no doubt concur with this interpretation.
What is considered ‘beautiful’ differs wildly across culture. Here, however, we find the system inevitably settling on typically Western standards. This undoubtedly reflects the imbalance in representation within the training set – there are far more pictures of thin, white women labelled ‘beautiful’ than any other kind of face. As the internet spreads and evolves, this should begin to encompass other standards of beauty (but I wouldn’t hold your breath).
Often character traits are assigned, based on looks. With this process we can finally discover the archetypal ‘crazy guy’.
Subcultures and ‘tribes’ frequently develop within culture, often involving styles inspired by musicians or fandoms. Members of these tribes go out of their way to present a strong common visual image, signalling their affiliation to others as both an invitation to the fellow tribe-members and sometimes as a warning to others.
What does this mean?
These faces have been dreamt up from very simple text prompts, entire cultures neatly summarised with the power of AI.
The same systems can, of course, be used in reverse.
I gave it the official photographs of the current UK Cabinet and asked it to pick the top 3 members who best matched the following phrases, based purely on what they look like:
“A terrible politician” – Michael Gove, Matt Hancock, Boris Johnson
“An idiot” – Boris Johnson, Julian Smith, Rob Buckland
“An adulterer” – Boris Johnson, Rob Buckland, Ben Wallace
“A useless female politician” – Andrea Leadsom, Theresa Villiers, Nicky Morgan
Of course, these above classifications are (somewhat) facetious, but the automated classification of individuals is big business. We leave a trail of ourselves in our financial activity, our communications, our location data, etc. Patterns in these data often determine everything from what ads we get served, to the public services we are allowed to access.
Imagine the implications of adding photo-based algo-judgements to the mix.
The vast majority of classifications to which we are subjected remain hidden from us. With the emergence of massive neural-network models, derived from internet data, our implicit biases and prejudices (expressed online) become entrenched – a feedback loop constantly reinforcing existing preconceptions.