algo big data visualization words

Exploring The Tate Collection

The Tate Collection recently released their metadata on github, the first ‘big data’ set that has piqued my interest enough to download it and have a play.

Here we present the metadata for around 70,000 artworks that Tate owns or jointly owns with the National Galleries of Scotland as part of ARTIST ROOMS. Metadata for around 3,500 associated artists is also included.

For the record, I’m a big data skeptic, to me it seems to be a buzzword which translates as: “We’ve spent all this time and money collecting lots of disparate data, surely there’s something interesting in there if we look hard enough?”.
I particularly like this definition:

by giladlotan via @doctorow

However, with that caveat, it was with a mixture of excitement and trepidation that I stepped into the data. The Tate data has all the expected attributes about the artworks, title, artist, date, media etc. but more interestingly there are hierarchical metadata associated with each artwork – effectively a tree of tags.
My first investigation was to see how the top level categories are represented in the data over time, perhaps it would reveal an interesting shift in themes, showing the changing nature of artistic expression (and/or curatorial fashions).
Here’s a graph of numbers of artworks tagged with the top level metadata. Be aware this isn’t a true representation of the actual numbers of artworks, rather the number of tags (artworks can have multiple tags, or appear in multiple top level categories). There are interesting peaks around the early/mid 19th C. and the 1970s.
Here are the actual numbers of artworks by year.
The graphs are very similar, suggesting that there is a consistent level of tagging for all artworks.
Here’s the same view, normalized to show the variation in proportion of main subject over time.
What’s interesting in this view is that this is not about the quantity of artworks in each category, per se, but rather about the distribution of the tags. These metadata tags are human curated, someone looked at the artwork and made a judgement about a range of attributes. Some are simple enough, particularly for representational works – people, places, activities etc.
However if we travel deeper into the tree some of the categories are much more subjective, and these categories are often the most interesting to explore.
For example, the category ’emotions and human qualities’ contains the following: fear, love, horror, despair, suffering, grief, shame, anger, innocence, strength, compassion, foolishness, happiness, sadness, wisdom, tenderness, guilt, shock, chastity, desire, humility, pride, nostalgia, contemplation, isolation, condescension, complacency, anxiety, vulnerability, psyche, hope, creativity, vitality, disillusionment, memory, concentration, inspiration, exhilaration, boredom, courage, muse, victim, hedonism, aggression, disgust, dignity, mischievousness, gratitude, serenity, heroism, avarice, laziness, devotion, frustration, anonymity, virtue, deceit, jealousy, pessimism, disbelief, hatred, triumph, antihero, narcissism, uncertainty, escapism, subconscious, gluttony, loyalty, pomposity and hypocrisy
Imagine how it feels to look at an artwork and decide that it represents any of the above qualities. It seems quite difficult to me, but thankfully The Tate have invested the resources into the endeavour and we get to reap the benefits.
The dataset contains over fifteen thousand subjects, so it’s not immediately obvious how to approach the problem of navigating the data.
To get a feel for the data I built a rudimentary tool which allows you to drill down through the categories and find the artworks which match. It’s already proving to be a fascinating rabbit hole into the collection, throwing up interesting and exciting juxtapositions of works.
Click the image below to try it out.


This dataset is a machine-readable representation of the artistic space of the Tate Collection. There are meanings implicit in the hierarchy of labels of the artworks. Whilst machines that can truly ‘see’ artworks remain in the realm of science fiction, the augmentation of these artworks with human-curated metadata massively expands the ways in which a large collection can be automatically navigated.
The data effectively offers a new representational landscape overlaid on top of the collection. There are an infinite number of paths through this landscape. Normally our path through a collection is in the hands of the curator – works are grouped by artist, period, movement or other curatorial perspective and we are to an extent bound by their decisions.
With this metadata, we could curate our own collection based on our personal preferences and desires, or perhaps at the whims of an algorithmic curator. Don’t get me wrong, the role of the human curator can be integral to the artistic experience, however these kinds of data open up a new realm of possibilities.
Playing with the dataset so far has prompted a number of ideas about how to auto-curate paths through the collection, but I’ve also become aware of the dangerously seductive nature of slicing and dicing something as complex as an art collection on the basis of metadata alone. I am navigating a space one level removed from the artworks, a space defined by the Information Architecture decisions of the designers, and the tagging decisions of the humans who actually entered the data.
I can already see how the contours of this landscape can lead to automated decisions about the relative relevance of one artwork over another. Are we looking at a future where poorly marked up artworks effectively condemned to a dusty backroom gallery of the internet? and perhaps the art stars of the machine-readable future are those with the best tag clouds?
I’ve made an automated version of Sir Nicholas Serota out of the data.

7 thoughts on “Exploring The Tate Collection”

  1. I have only recenly started reading about the big data technologies and applications and it was frankly way beyond me. This whole article was very interesting and I specially liked the last bit about the art stars being those with the best cloud tags.

  2. Full disclosure: I work at Tate as Lead Developer.
    This is a fascinating analysis of the Tate data. Thank you for digging so deep—especially into the subject index. This provides a new way for me of approaching our information and I’d like to give it more time in the coming days and weeks. Drop me a line if you have any questions.
    Excellent work!

  3. Thank you for this post, it was really interesting! I work in a museum as a digital collection manager and we tag our collection as well, for the same reasons as Tate, I would guess, e.g. to make it possible to navigate an online representation of the collection. To your last questions/remarks/worries: I think that what you describe is not so different from the fate of art and literature always. The works of authors and artists have always been at the mercy of their public, and what lives on in our cultural canon is not so much a result of being the better representatives for their period/culture, but more often of the whims of the public and the ways of the history, and of the politics and ideologies that governs our taste. I guess what I am trying to say is that as always — whether we are tagging, curating an exhibition or writing a course syllable — we need to be aware of our limitations and to make sure that the quality of our work is consistent and that it opens a horizon that is wider than our personal preferences. And as users of data sets we need to know of their limitations, consider for instance the very good project The Republic of Letters, — The conclusion on this reflection of mine is that we should welcome this new opportunity and investigate well into it and not be alarmed of consequences that really aren’t that different from what has always happened within the world of art and literature. I guess the best thing we can do is keep our eyes open wide while working on as best we can 🙂 Thanks again, it’s fascinating to see what can be done with data sets as this!

Comments are closed.