Paglen Meets Agre
Imaging Systems and Grammars of Action
There’s a lot that does not meet the eye. We know it’s there, but we can’t see it. Things around the corner, for instance, or behind closed doors, or below the surface. Interestingly all of these expressions, beyond just describing limitations of our perceptual faculties, have become ways of expressing more existential views towards ‘what’s in the dark’. We project the near future as being ‘around the corner’; we are frustrated by not being privy to what goes on ‘behind closed doors’; we wonder whether we can actually understand a phenomenon without knowing what transpires ‘below its surface’.
So we imagine it. Before our mind’s eye, or literally, by drawing a picture, we visualize what we cannot actually see. Since the dawn of humanity, next to depicting what we actually witness, we have made images of what cannot be seen, or of which we have lost sight. Among the earliest surviving records of human(oid)s asserting their presence by leaving a pictorial mark are stencils made by Neanderthals spitting paint over a hand pressed against the wall of a cave. (1) The paint delineates the outlines of an individual’s hand – the hand itself is absent. Imagine this individual’s wonder, more than 66.000 years ago, after he took a step back and pondered the negative shadow of his own hand. It had been there, it had touched the rock – the image was not so much a picture of a hand, but visual testimony of the fleeting act of touching. It was a record. Call it a very early instance of activity-data capture.
For all but a select few researchers, this ancient record of a Neanderthal individual’s presence in a remote corner of a Spanish cave is visible only through photographs of it – enhanced photographs, at that. To see that there is an image at all, and that it actually represents an over 66.000 year old act of making an image, needs an archaeologist’s trained eye. For the rest of us, it becomes visible only after some serious image processing. So we see an image of digitally processed photographic data, which can be experienced as a visual record of somebody having touched a cave’s wall, eons ago. The camera is merely one of the technical and craft media involved in ‘making’ this image, which also comprise the Neanderthal’s mouth, pigments and computer hardware and software.
Although this does not in itself mean that we are dealing with a ‘post-photographic’ or even a ‘multimedia’ image here, my description of it does suggest that we have meanwhile developed a rather expanded idea of what a ‘photograph’ is. Once a medium that could assert an exclusive claim to objectivity regarding the depiction of reality, it has now become one of many technical media providing data for visualizing anything imaginable, factual or fictional. The factuality of the enhanced photograph of an image sprayed by mouth ten thousands of years ago depends not so much on the mere fact that it was photographed, but on the story that accompanies this photograph, supported by verifiable data about chemical conditions of the cave and its mineral surface, sound archaeological argumentation and transparent processing of the photographic data. In a sense, it flips an old adage, connected to the veracity of photographs, “seeing is believing”, to “believing is seeing”. Unless we believe the scientists’ narrative, all we’ll see is a photo of a cave wall with weird excrescences of mineral depositions.
In a series of short posts published in 2014 on Photomuseum Winterthur’s website, artist and photographer Trevor Paglen elaborates his notion of “seeing machines”. And he ponders: “What happens if we think about photography in terms of imaging systems instead of images?” The phrase ‘imaging systems’ is interesting, because it shifts the view – and our understanding of the concept – of what (photographic) images are from the end result (pictures) to the processes that produce this result and render it meaningful. From the image as a product to “imaging” as a socio-techno-cultural process. In his posts, Paglen delineates what he calls an “expansive definition of photography”. In broad strokes he maps the terrain: “Seeing machines includes familiar photographic devices and categories like viewfinder cameras and photosensitive films and papers, but quickly moves far beyond that. It embraces everything from iPhones to airport security backscatter-imaging devices, from electro-optical reconnaissance satellites in low-earth orbit, to QR code readers at supermarket checkouts, from border checkpoint facial-recognition surveillance cameras to privatized networks of Automated License Plate Recognition systems, and from military wide-area-airborne-surveillance systems, to the roving cameras on board legions of Google’s ‘Street View’ cars.” Moreover, Paglen’s new definition contains not only the apparatus for ‘making’ images, but also the resulting images themselves, and the way they are interpreted, by either humans or other machines or algorithms. Thus, one might say, the definition of photography explicitly integrates narrative and rhetorical notions, which have always been part of photography’s discourse, but which were usually seen as contingent effects of the cultural use of photographs, rather than as constitutive of what photography is, as a technological medium. In this expanded – or indeed ‘post-photographic’ – concept, a photograph does not merely produce meaning, but the image is also produced by what we project onto it, and by our understanding and use of the technologies and practices through which it becomes meaningful. Paglen introduces the notion of ‘scripts’ for this entanglement of technology and its cultural or economic or political or administrative use: “I think about a ‘script’ as the basic and obvious function of an imaging system, its ‘style’ of seeing, and the immediate relationships (between seer and seen, for example) it produces, and the obvious ways in which a seeing machine sculpts the world.” He describes, for instance, the way an automated number plate reader (ANPR) “wants to see” the world – by not just photographing cars’ number plates, but by connecting the data extracted from these images to information about the location of the vehicle, the owner, and public or private records that will make this data meaningful in specific ways. Thus, states Paglen, “seeing machines create cultural, economic, and political footprints on society at large.” The same goes for the common digital camera, I’d say, with its gridded viewfinder rectangle, which is part of a ‘script’ that also includes Insta memes and YouTube tutorials that instruct the camera’s users how the medium, and not just the technical tool, “wants to see the world”.
Grammars of action
Paglen’s notion of “scripts” closely resembles what computing and artificial intelligence scholar Philip Agre described as “grammars of action” within computer “capturing systems”. (2) Now Agre developed these concepts in the context of theoretical reflections on how computing systems deal with “information”, mostly within the framework of surveillance for administrative and business uses. But I think it is worthwhile to project these theoretical constructs onto how we see photographs and how we make and use them within narrative scripts which direct our interpretation of the pictures that at the same time serve as visible substantiation of the narrative.
In Agre’s terms, information is commonly seen as true – “that it corresponds in some transparent way to certain people, places, and things in the world”. It is, as the etymology of the term ‘data’ suggests, a ‘given’. Information, in order to be processed by computers, needs to have a certain structure, with rules that govern how each bit of information – each ‘given’ – becomes meaningful in relation to others, so together they align to something we can make sense of and which we can accept as being ‘true’. These structures are called grammars in analogy to how grammar structures the alignment and variation of words in a correct sentence. Agre develops the concept of “grammars of action” to describe how computers (i.e. engineers and coders) structure data representing actions in the ‘real world’. In order to make sense of data “captured” from connected sources (cameras, sensors, other computers) or inputs (records of time, movement, location etc.), the computer program needs to be able to recognize specific objects, variables and relations between data. So the program projects a “grammar” onto what it “sees.” For our purpose, one could say that in photography, for instance, resolution, contrast and focus, among other things, determine to a large extent what the camera sees. Things smaller than the film’s grains or the digital chip’s pixels, contrasts below the threshold of differences in density that these granules or pixels can render, etc. are not seen, just as things outside of the picture frame are not seen. Thus, the camera “wants to see” the world in a framed, specifically conditioned way that the photographer needs to comply with – a grammar of action that fundamentally conditions the way we make and see photographs. Capture, therefore, in this case means a lot more than just mechanically registering light onto a film: it anticipates all the actions a photographer must perform in order to make a viable photo, and projects the criteria by which this photo will be judged as a ‘good’ or ‘bad’ photo back onto the photographer.
Agre stresses the impact of such grammars by pointing to “a kind of mythology” that is often constructed around them, “according to which the newly constructed grammar of action has not been ‘invented’ but ‘discovered.’ The activity in question, in other words, is said to have already been organized according to the grammar.” Thus we say that photography, for instance, was merely ‘discovered’ as a technology for rendering the world as we already see it. Agre, on the other hand, would insist that it rather constitutes “a reorganization of the existing activity, as opposed to simply a representation of it”. This reorganization is what Paglen calls “script”. The scripts embedded within the very core of seeing machines reorganize the way we see the world, and impose this reorganized way of seeing on us, their users. In Agre’s context, the grammars of action that structure the way that computers make sense of data, are projected back onto the computer’s users. If the user’s input does not match the program’s expectations, the computer says “no”. Seeing machines do something similar in structuring the way they ‘see’ the world in a specific manner, which in turn reorganizes the way we experience it. A very funny example of this reorganization is Erik Kessels’ 2010 collection of attempts by amateurs to photograph their black dog. The camera said “no” and produced vaguely dog-shaped black holes in pictures that through this reorganization of the visible world become quite uncanny. (3)
Back to the cave painting. The digital photograph of the cave’s wall becomes a meaningful image for most of us only after applying a specific grammar to its data that prioritizes certain aspects of the data and discards or reduces others. This reorganizes the way we see the image to the extent that we can now see the outlines of a hand. The story that accompanies the image convinces us that these outlines constitute a hand stencil, left on the cave’s wall at least 66.000 years ago. Could this image – both the hand stencil itself and the enhanced photograph of it – have been produced by any other means? Yes. Theoretically, the whole configuration of mineral residue could have been of a purely chemical nature, without any interference of conscious acts by hominids. The chemical analysis and archaeological argumentation of the scientists makes this theory very unlikely, though. More interestingly, the photo of the hand stencil could theoretically have been produced by other technologies. All kinds of sensors and scanning devices that ‘look for’ specific wavelengths of reflected light or emitted radiation, for instance, could produce a similar or perhaps even better image than the enhanced photograph. We have, in short, developed quite an impressive array of seeing machines beyond the traditional camera that we can use to translate any available data into images that can pass as representations or ‘likenesses’ of reality. Take the cave itself: we can combine ‘Lidar’ or ‘Terrestrial Laser Scanning’, digital cameras, GPS data and animation software to map an interactive 3D model of the cave’s interior relative to its underground location, and get the experience of walking through it – similar to photos or films of it, but also quite different. An elaborate example of this ‘imaging’ of subterranean space was done by a team of the National School of Surveying of the University of Otago, New Zealand, in scanning and modelling the tunnels and quarries below the French town of Arras, built by New Zealand military engineers during World War 1. (4)
The laser scanner produces a huge array of dots, each with specific location data relative to the laser’s source. With the implementation of the correct grammars, these individual dots can be aggregated to ‘point clouds’ that can be rendered with a fine resolution as visual representation of 3D spaces.
The imaging system can be used to model buildings or entire cities, or in this case the Arras tunnels, and combined with carefully calibrated configurations of images shot by visible light cameras, these renderings can acquire a level of detail that make them hard to distinguish from traditional photography. Or film – the location and distance data behind the renderings contained in the point cloud facilitate the visual experience of a seamless movement through the space. Thus the visible world has become transparent to our seeing machines. We can virtually walk through walls and mountains and oceans knowing that what we see closely corresponds to the actual material facts that make up the visual experience. This amounts to a renewed claim to veracity of digital imaging media: that their captured data have a one-to-one relationship with material reality. But there is an interesting – and I’d say fundamental – difference with the ‘objectivity’ of the traditional camera, whose recordings we once trusted as immediate reflections of the visible reality before the lens. In the expanded field of photography, we have to assess and understand a seeing machine’s grammar of action before we accept its veracity. How else could we decide whether what we see is augmented or enhanced reality (i.e. basically ‘true’), or merely a “virtual reality” that only exists as artistic fantasy (i.e. basically ‘fake’)?
Photography has become an expanded network of imaging systems, within which each system specializes in different ways of providing data for visually representing the measurable world or, in Paglen’s words, the way it “sculpts the world”. This has great consequences, not only for the way photography functions as technology for generating images and as a library of cultural ‘scripts’ for using them, but also for transmedial visual storytelling. The Arras story is a case in point. Next to a ‘conventional’ visual story about the tunnels, told via a combination of texts and some 10.000 hi-res photos, there is a ‘making of’ video that clarifies the way the tunnels were mapped and modelled, and the 100 gigapoint point cloud (i.e. a few dozen terabytes of data) which is used for the animated fly-through videos and for an interactive web viewer, in which visitors can virtually walk through the tunnel maze themselves. The ‘making-of’ video is not just peripheral to the narrative. I think it’s essential: it allows us to understand the grammars of action of the seeing machines used to make us experience the tunnels’ interiors. It makes us understand, and therefore accept, some weird glitches in the visual experience. Walls become semi- or totally transparent at some points, for instance, and we are not forced to follow the tunnels’ material trajectory. At the same time, this ‘grammar’ or ‘script’ facilitates us to keep an overview of the entire network and its relation to the visible world above it. Obviously, the data can be used for a VR experience as well, which is now being developed for use in the Musée Carrière Wellington in Arras. From a viewpoint of transmedia storytelling, it is easily imaginable how such a variety of media and renderings of the huge data set can be used for making vivid audio-visual stories, with added archival material of WW1 and records of soldiers and locals who lived and fought in the region and used the tunnels for shelter. With all this, we become bodyless entities, dematerialized beings floating through a semblance of the material world, which constitutes a rather dramatic reorganization of how we usually perceive the world. Understanding the grammars of action or scripts embedded in each of the media involved is not only important for viewers or ‘users’ or ‘experiencers’ of the visual narratives, but it is also crucial for makers. Because the way the employed seeing machines “sculpt the world” not only enhances certain aspects, it also reduces or discards certain others. The grammars of action that computers (and seeing machines) use to make sense of the chaotic complexity of raw data that they capture tend to simplify these data into manageable categories. The fundamental insight that whichever medium we use always leaves out much more than it shows has never been more relevant. For despite their claim to data-objectivity, or their mythology of truthfulness in Agre’s view, today’s networked seeing machines have internalized biases within their grammars of action, which were once mainly associated with human agency. The camera didn’t lie, although the photographer or editor could. Now that seeing machines, including ‘smart cameras’, not simply capture visual data in the old sense, but interpret these data for us or other machines before ‘rendering’ them into something that the imaging system judges to be an accurate image, we cannot be so sure anymore. (5)Machine vision
All of this prompts a redefinition, not only of photography, but of what we mean with terms like ‘likeness’ and ‘depiction’. How is a point cloud ‘like’ the reality it depicts? How does it sculpt the world? Such questions are triggered by other imaging systems as well, as Paglen has extensively shown in his work on surveillance systems and machine vision. A very instructive example from his recent work is the series of portraits he generated by using a machine learning system, in which he stored facial recognition models of people he had collaborated with. This means that the program will look for all kinds of features it ‘knows’ as being characteristic of the persons’ face. Paglen then had another program bombard the facial recognition system with random polygons, which were either recognized as potential parts of the stored features or discarded. Going back and forth, the image gradually evolved into something that the facial recognition program identified as a representation of the given person. Paglen observes that we end up with “a kind of latent portrait from ‘inside’ the facial recognition software”. (6)
The resulting image is quite instructive: comparing the machine’s portraits with actual photos of the subjects, suggests that it operates from a different grammar of action than we do. It begs the question, who formulated the ‘grammar’? Actually, this is the central question of Paglen’s oeuvre: “What kind of judgments are built into technical systems? Why are they made that way? Who are they benefiting and at whose expense do they come?” Paglen lets the code ‘speak for itself’. Another artist, David Birkin, sabotages the code by inserting elements that do not fit its grammar of action. In a series of works he ominously titled Embedded, he disrupted the computer code of digitized images taken in times of conflict. In the example here, he inserted the name of the photographer, Yosuke Yamahata, into the code of a photo that Yamahata took directly after the atomic bombing of Nagasaki. Yamahata died years later as a result of the radiation he was exposed to while taking this photo. The ‘ungrammatical’ text within the photo’s code produces a glitch in the image that works as a compelling aesthetic emphasis of the invisible forces at work within the reality that the picture depicts.
The ideology of images
Already in the first photograph ever, Nicéphore Niépce’s famous 1826 photo of the view from his window, the story around the image is an essential ingredient of the image itself – in essence, it makes the image visible. Compared to the Maltravieso cave photo that I discussed above, the image on the original metal plate onto which Niépce recorded it during a time span of some 8 hours, is as invisible as the Neanderthal’s hand. Most of us know Niépce’s image in the enhanced version made by photography historian Helmut Gernsheim after he rediscovered the plate in 1952. So, the ‘first’ photo as we know it, is an enhanced reproduction of a print of a photo taken at a specific angle under specific lighting conditions of a shimmering pewter plate with some vague shadows on it. Gernsheim, to put it differently, was an expert who understood the grammar of action embedded within Niépce’s seeing machine – a camera obscura and a pewter plate with a specific mix of chemicals, which under the right lighting conditions produced a specific optical effect. The story that Gernsheim’s enhanced version is a reliable representation of an image made over a century before, is based on more than immediate technical translation – again, if we have no means of verifying this story, or believing its argumentation, the image does not exist.
Agre argues that “capture is never never purely technical but always sociotechnical in nature”. (8) Returning to his assessment of the rhetoric or mythology that often accompanies the construction of a grammar of action – that the grammar would be merely a newly ‘discovered’ and therefore reliable translation of how humans act in the real world anyway –, he warns us that “if the capture process is guided by some notion of the ‘discovery’ of a pre-existing grammar, then this notion and its functioning should be understood in political terms as an ideology”. This brings me back to my opening musings about how we express our existential views towards what’s in the dark. We imagine, we speculate, we test, we argue, we falsify… we conjure up stories that are meant to ‘capture’ reality, also its invisible and abstract aspects, which we interpret based on ideologies that connect all of these aspects. We have developed machines that allow us a view of the world as it actually is, or so we hope, by constructing a layer of abstract data on top of the visible world, and enmesh it with everything we can see. It is as if we have finally succeeded in reliably rendering what is outside Plato’s cave. In the Greek philosopher’s allegory, we dwell in a confined cave with no means of ever getting out. All we see from the world as it essentially is, are shadows, which we take for ‘reality’ because we have no way of standing in the light of truth ourselves.
With Plato’s metaphor in mind, we can appreciate Paglen’s and Agre’s critical views to seeing machines and grammars of action as a critique of the ‘escape from the cave’ ideology that surrounds much of our new technology-enhanced vision of the world. At the same time, we can use this insight for hacking the available wealth of imaging systems and put them to unforeseen uses in narratives for which we tweak or rewrite their grammars of action. This, I think, is a major task for (trans)media makers today: employ the grammars of action implicit in the media you use for not only enhancing your recipients’ experience, but also, and perhaps more importantly, for empowering them to critically assess the scripted reality we all live in.
(1) Dirk L. Hoffmann et al., U-Th dating of carbonate crusts reveals Neandertal origin of Iberian cave art, in: Science 359, no. 6378, pp. 912–915. DOI: 10.1126/science.aap7778
(2) Philip E. Agre. Surveillance and Capture – Two Models of Privacy, in: lnformation Society 10 (1994), no. 2, pp. 101–127. Quoted from: Noah Wardrip-Fruin and Nick Montfort (eds). The New Media Reader (2003), MIT Press, Cambridge and London, pp. 740–760. I was introduced to Agre by Sjoukje van der Meulen, when we co-wrote ‘Man as Aggregate of Data’ for AI & Society #34, 343–354 (2019).
(3) Erik Kessels, In Almost Every Picture #9, Amsterdam 2010
(4) LiDARRAS project, 2017. A collaboration between the National School of Surveying, Otago, New Zealand, the École Supérieure des Géomètres et Topographes (ESGT Le Mans, France), the city of Arras, the Museum Carriere Wellington and alumni from Otago’s School of Mines. See: www.otago.ac.nz/lidarras
(5) For quite some time already, ‘autocorrection’ software is built into smartphone cameras, to the point that it started to annoy users. Such apps' terminology (“beautification” and “slimming” for instance) testifies to the kind of sociocultural biases built into the cam’s scripts. In October 2020 Google announced they will try and be less judgmental. See for instance Techcrunch: https://tcrn.ch/3n2PojL or search “selfie face correction” for more details.
(6) Trevor Paglen: Instagram post, 30 September 2020.
(7) Trevor Paglen: Bloom. Video statement for Pace Gallery, London, September 2020
(8) Agre 1994, p. 748