The Tate Collection recently released their metadata on github, the first ‘big data’ set that has piqued my interest enough to download it and have a play.
Here we present the metadata for around 70,000 artworks that Tate owns or jointly owns with the National Galleries of Scotland as part of ARTIST ROOMS. Metadata for around 3,500 associated artists is also included.
For the record, I’m a big data skeptic, to me it seems to be a buzzword which translates as: “We’ve spent all this time and money collecting lots of disparate data, surely there’s something interesting in there if we look hard enough?”.
I particularly like this definition:
by giladlotan via @doctorow
However, with that caveat, it was with a mixture of excitement and trepidation that I stepped into the data. The Tate data has all the expected attributes about the artworks, title, artist, date, media etc. but more interestingly there are hierarchical metadata associated with each artwork – effectively a tree of tags.
My first investigation was to see how the top level categories are represented in the data over time, perhaps it would reveal an interesting shift in themes, showing the changing nature of artistic expression (and/or curatorial fashions).
Here’s a graph of numbers of artworks tagged with the top level metadata. Be aware this isn’t a true representation of the actual numbers of artworks, rather the number of tags (artworks can have multiple tags, or appear in multiple top level categories). There are interesting peaks around the early/mid 19th C. and the 1970s.
Here are the actual numbers of artworks by year.
The graphs are very similar, suggesting that there is a consistent level of tagging for all artworks.
Here’s the same view, normalized to show the variation in proportion of main subject over time.
What’s interesting in this view is that this is not about the quantity of artworks in each category, per se, but rather about the distribution of the tags. These metadata tags are human curated, someone looked at the artwork and made a judgement about a range of attributes. Some are simple enough, particularly for representational works – people, places, activities etc.
However if we travel deeper into the tree some of the categories are much more subjective, and these categories are often the most interesting to explore.
For example, the category ’emotions and human qualities’ contains the following: fear, love, horror, despair, suffering, grief, shame, anger, innocence, strength, compassion, foolishness, happiness, sadness, wisdom, tenderness, guilt, shock, chastity, desire, humility, pride, nostalgia, contemplation, isolation, condescension, complacency, anxiety, vulnerability, psyche, hope, creativity, vitality, disillusionment, memory, concentration, inspiration, exhilaration, boredom, courage, muse, victim, hedonism, aggression, disgust, dignity, mischievousness, gratitude, serenity, heroism, avarice, laziness, devotion, frustration, anonymity, virtue, deceit, jealousy, pessimism, disbelief, hatred, triumph, antihero, narcissism, uncertainty, escapism, subconscious, gluttony, loyalty, pomposity and hypocrisy
Imagine how it feels to look at an artwork and decide that it represents any of the above qualities. It seems quite difficult to me, but thankfully The Tate have invested the resources into the endeavour and we get to reap the benefits.
The dataset contains over fifteen thousand subjects, so it’s not immediately obvious how to approach the problem of navigating the data.
To get a feel for the data I built a rudimentary tool which allows you to drill down through the categories and find the artworks which match. It’s already proving to be a fascinating rabbit hole into the collection, throwing up interesting and exciting juxtapositions of works.
Click the image below to try it out.
This dataset is a machine-readable representation of the artistic space of the Tate Collection. There are meanings implicit in the hierarchy of labels of the artworks. Whilst machines that can truly ‘see’ artworks remain in the realm of science fiction, the augmentation of these artworks with human-curated metadata massively expands the ways in which a large collection can be automatically navigated.
The data effectively offers a new representational landscape overlaid on top of the collection. There are an infinite number of paths through this landscape. Normally our path through a collection is in the hands of the curator – works are grouped by artist, period, movement or other curatorial perspective and we are to an extent bound by their decisions.
With this metadata, we could curate our own collection based on our personal preferences and desires, or perhaps at the whims of an algorithmic curator. Don’t get me wrong, the role of the human curator can be integral to the artistic experience, however these kinds of data open up a new realm of possibilities.
Playing with the dataset so far has prompted a number of ideas about how to auto-curate paths through the collection, but I’ve also become aware of the dangerously seductive nature of slicing and dicing something as complex as an art collection on the basis of metadata alone. I am navigating a space one level removed from the artworks, a space defined by the Information Architecture decisions of the designers, and the tagging decisions of the humans who actually entered the data.
I can already see how the contours of this landscape can lead to automated decisions about the relative relevance of one artwork over another. Are we looking at a future where poorly marked up artworks effectively condemned to a dusty backroom gallery of the internet? and perhaps the art stars of the machine-readable future are those with the best tag clouds?
I’ve made an automated version of Sir Nicholas Serota out of the data.