Italian Trulli

Early Modern Computer Vision

Leonardo Laurence Impett (leoimpett@googlemail.com), Bibliotheca Hertziana Max Planck Institute for Art History, Italy

Computer vision necessarily embodies a scientific theory of vision. Since the early 1980s, this has largely been a neuroscientific theory of human vision, chiefly with reference to David Marr’s posthumous Vision (Marr 1982). This metaphorical link has been even more explicit since the dominance of Convolutional Neural Networks as a general technique for image-understanding in the 2010s, with commonplace metaphors (e.g. Kietzmann 2018) between artificial and human neurons and the subsequent hierarchical visual processing structure they entail.

This paper reports the prototyping of a computer vision which is instead founded on early modern theories of optics, vision and visual perception. In doing so, we create not only an experimental apparatus to investigate those theories; but also a computational lens through which to see the image production of the time. In that sense, it is an attempt to construct a computational version of Michael Baxandall’s ‘Period Eye’ (Baxandall, 1988) - a lens through which we can escape our habitual ways of seeing images.

In Shadows and Enlightenment, three decades after he had introduced the notion of Period Eye in Painting and Experience, Baxandall himself had drawn parallels between early modern theories of vision and computer vision; the possibility of a computer vision based on those early modern theories was implied, though no implementation was attempted. Baxandall’s rationale starts from Molyneux’s problem, a 17th century thought-experiment which asks: could a person born blind, and taught only by touch to differentiate between spheres, cubes etc, recognise these objects visually if suddenly given the ability to see?

Baxandall se es computer vision as a contemporary equivalent to Molyneux’s thought-experiments: “If one is going to draw on modern thinking about vision, it must first be clear that the newly sighted man of Molyneux has been displaced from his central part in the thought experiment. The exemplary figure addressing the cubes and spheres now is often an array of electronic sensors, feeding light measurements into one or another conformation of circuitry controlled by such-and-such a program” (Baxandall 1997, p.41).

The period in which Baxandall was writing Shadows and Enlightenment - the mid 1990s - coincided with a particular crisis in artificial intelligence. The failure of symbolic or rules-based intelligence, also known as GOFAI (Good Old-Fashioned Artificial Intelligence), led to an interest in alternative approaches: embodied computation, human-machine interaction, machine learning. Baxandall was not only aware of the two competing paradigms of artificial intelligence research in the mid 1990s - he saw them as embodying differing historical theories of vision. In discussing different algorithms that predict 3D shape from 2D shading, he differentiates between a “parallel distributed processor (or neural network or connectionist system) with a learning algorithm, not a serial symbolic system with a processing algorithm... There is no symbolic language, no pre-set procedure except the network structure itself” (ibid, p.46-47). Baxandall frames the tension between symbolic AI and machine learning as a continuation of Enlightenment debates around the Molyneux problem; going so far as to suggest that, as a solution, neural networks “would have pleased Condillac more than Locke” (ibid, p.160).

Baxandall then puts his oppositional thought-experiment into practice, in the reading of an early 18th century drawing by Tiepolo. Having discussed the relative mechanics of bottom-up and top-down computer vision, he sees Tiepolo’s drawings as containing “two models of perception in incomplete relation”. The central part is scale-free, and might as well be a crumpled paper bag; the head, single hand and single foot hint at a “schematic mental mannekin”; and neither of these two readings being quite resolved, there results a “persisting element of flicker between readings” (ibid, p.51).

This project aims to go beyond what Baxandall had hinted at in Shadows; not only to use different computer vision techniques as metaphorical thought-experiments, but to use their technical implementations as experimental apparati. A first attempt at this can be seen in Figure 1 - details from the Tiepolo example are processed by object detection and caption generation, and the peripheral elements (‘person’; ‘close up of a person’) are indeed more immediately schematisable than the scale-free robes (‘kite’; ‘close up of a half eaten sandwich’).

Figure 1. Detected objects and generated captions on Baxandall’s Tiepolo example, using convolutional neural networks (in the vein of Condillac). The detections are ‘Person: 63%’ (in green) and ‘Kite: 62%’ (in brown). Caption generator from (Xu, 2015), code available at https://github.com/DeepRNN/image_captioning ; object detector from Huang (2017), code available at https://github.com/tensorflow/models/tree/master/research/object_detection . (The author)

Taking our historical perspective beyond Locke and into the sixteenth century, we use Giovanni Paolo Lomazzo’s Temple of Painting (1590) as the scaffold on which to build our Early Modern Computer Vision. An influential text in Italian mannerism, it is both well-ordered, scientifically explicit, and specifically directed towards the visual arts: it intersperses optical theory with practical recommendations for painting. In contrast to Molyneux’s imaginary subject, Lomazzo had practiced as a painter before becoming blind.

Figure 2. Giovanni Paolo Lomazzo’s colour system (the author)

Lomazzo’s Temple of Painting (along with his earlier Trattato) contains a scientifically and artistically substantial theory of colour; which, according to Barasch, “make Lomazzo a turning point in the history of color concepts in the theory of art” (Barasch 1979, p.160). The backbone of this theory of colour is a colour scale, ranging between white and black, in a single sequence - as dictated by the Neoplatonic thought of the time (Marsilio Ficino and Gerolamo Cardano have similar scales).

We have no shortage of colour-systems today, but all digital-display systems use three channels (or four, for opacity). The neoplatonic imposition of a one-dimensional scale through three-dimensional colour-space can therefore be visualised as a single vector path from white to black - Figure 3 shows an example in the HSL (Hue-Saturation-Luminance) double-cone.

Figure 3. Lomazzo’s colour system as a path through the Hue-Saturation-Luminance double-cone (the author).

Lomazzo specifically urges the painter to avoid stark contrasts in adjacent colours in his sequence: a colour is ‘friendly to one that stands next to it… while it is hostile to a color separated from it by other shades’ (Barasch 1979, p.183); contradicting Leon Battista Alberti’s earlier advice in De Pictura (1435) to set adjacent robes in contrasting colours, to give a greater impression of clarity.

Using the path in Figure 3 not only as a visualisation but as an interpolation, Lomazzo’s harmonic sequence can become a digital colour space in the technical sense: any colour image, including Lomazzo’s own work, can be translated to points in the Lomazzo colour-space (Figure 4, centre). Colour-harmonic relationships inherent to Lomazzo’s scale are shown, would be incompatible with current notions of a ‘colour-wheel’: red and ultramarine, for instance, are almost adjacent.

Figure 4. Giovanni Paolo Lomazzo, Madonna e santi, chiesa di San Marco a Milano. Seen in original colour (left) and interpolated using the 1-D Lomazzo colour system (centre), then rendered back to RGB (right). (The author).

Barasch, M. (1979). Light and color in the Italian Renaissance theory of art. New York: New York University Press.

Baxandall, M. (1988). Painting and experience in fifteenth century Italy: a primer in the social history of pictorial style. Oxford: Oxford University Press.

Baxandall, M. (1997). Shadows and enlightenment. New Haven, Connecticut: Yale University Press.

Huang, J, et al. (2017). Speed/accuracy trade-offs for modern convolutional object detectors. Proceedings of the IEEE conference on computer vision and pattern recognition. New Jersey: IEEE.

Kietzmann, T. C., McClure, P., & Kriegeskorte, N. (2018). Deep neural networks in computational neuroscience. bioRxiv, 133504.

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. Cambridge, Massachusetts: MIT Press.

Lomazzo, G. P. (1590). Idea del tempio della pittura. Milan: Paolo Gottardo Pontio.

Xu, K, et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the 32nd International Conference on Machine Learning. ICML.