Interpreting hybrid images

How the brain interprets complex visual scenes is an enduring mystery for researchers. This process occurs extremely rapidly – the “meaning” of a scene is interpreted within 1/20th of a second, and, even though the information processed by the brain may be incomplete, the interpretation is usually correct.

Occasionally, however, visual stimuli are open to interpretation. This is the case with ambiguous figures – images which can be interpreted in more than one way. When an ambiguous image is viewed, a single image impinges upon the retina, but higher order processing in the visual cortex leads to a number of different interpretations of that image.

Only one of these interpretations is available to our conscious awareness at any one time. Repeated viewing of the image leads to perceptual reversal, whereby first one, and then the other, interpretation is perceived. For psychologists and neuroscientists, ambiguous figures provide a means by which the functioning of the human visual system can be investigated.

Salvador Dali’s 1940 painting Slave Market with the Disappearing Bust of Voltaire (top) is an example of an ambiguous figure. In this painting, the two nuns just left of centre can also be perceived as the bust of the French writer and philosopher Voltaire. When looking at the painting, our perception of the painting switches from one interpretation to the other.

In a study published in 2002, Lizann Bonnar, then at the University of Glasgow, and her colleagues, investigated the stimuli which drive perception of the visual scene depicted in Dali’s painting. Participants were presented with a cropped greyscale version of the painting, consisting solely of the area containing the nuns. A “bubble” filter was used to enhance or obscure certain features of that part of the painting. They found that the participants reported seeing the bust of Voltaire when the finer details of the painting were obscured, and reported seeing the nuns when large scale features were obscured.

This experiment showed the importance of scale information in perception. The researchers specifically manipulated the spatial resolution of the painting (that is, the periodicity with which image intensity changes). Large scale features change little over a given distance, and therefore have a low spatial resolution, while fine-grained features change much more over the same distance, and so have a high spatial resolution.

In a second experiment, the participants were shown random noise patterns before the cropped greyscale painting. One group was shown a pattern with a high spatial resolution, the other a pattern with a low spatial resolution. Afterwards, the former reported seeing the bust of Voltaire, while the latter reported seeing the nuns. This showed that previous experience is an important factor in perception. The participants had selectively perceived the frequency channels presented to them before they viewed the image.

Aude Oliva, head of the Computational Visual Cognition Laboratory at the Massachusettes Institute of Technology, has been using a similar approach to gain a better understanding of the processing of information in the visual cortex.

For more than 10 years, Oliva and her colleagues have been creating and using hybrid images that consist of two superimposed images, both of which have been altered with specialized filtering software.

Using these filters, sharp facial features, such as wrinkles and other blemishes, are removed from one image, and coarse features, such as the shape of the mouth or nose, are removed from the other. The two images are then superimposed; because features with a high spatial frequency are visible only from up close, and those with low spatial frequencies are only visible from further away, superimposition of the two produces a single image whose perception changes as a function of viewing distance.

Thus, the hybrid is a single image with two stable percepts; at a given distance, only one of the images is visible, and it is this image that dominates processing in the visual system; the other image is perceived as something lacking internal organization (noise).


Hybrid image of Marilyn Monroe and Albert Einstein. Aude Oliva, MIT

From up close, the image on the left is perceived as Albert Einstein, because only the sharp features are visible; but if you step a few metres away from the monitor, the blurred features become visible, and the image of Marilyn Monroe emerges.

Oliva’s group has been using this and similar images to investigate the role of different frequency channels for image recognition, and the time course over which this process occurs. What they have found is that when participants are shown hybrid images for durations of 30 milliseconds, they only recognized the low spatial resolution component of the image; when the images were displayed for 150 milliseconds, they only recognized the high spatial resolution component; In both cases, the participants were oblivious to the other interpretation of the image.

Participants were also shown hybrid images consisting of sad and angry faces (high and low spatial resolution, respectively) of superimposed male and female faces. When the images were displayed for 50 milliseconds, and the participants were asked to determine the emotion of the face they had seen, they always reported seeing an angry face; but when asked to determine the sex of the person in the image, they reported seeing a male as often as they reported seeing a female, although the two faces had different spatial resolutions.

Thus, selection of frequency bands during fast image recognition appears to be flexible – in some cases, the brain picks out characteristics with a low spatial resolution, while in others, it discriminates those with a high resolution. It seems that the brain is adept at selecting the frequncy band containing the most information relevant to a particular task. Again, the participants were unaware that the images they viewed contained information in the other frequency range.

The work carried out by Oliva’s group shows that the brain extracts large-scale features slightly earlier than fine-grained features. Large scale features are processed within 50 milliseconds, giving an overall impression of the visual scene. The processing of fine-grained details begins slightly later, at around 100 milliseconds. The fine- and coarse-grained features are extracted separately, and processed in parallel through different channels, in successively higher order areas of the visual cortex. In a process called perceptual grouping, the information from the channels is then seamlessly recombined at visual cortical areas of the highest order to produce a coherent, and usually unambiguous, image.

7 thoughts on “Interpreting hybrid images

  1. This is truly fascinating! I’ve always enjoyed playing with ambiguous images (I suppose the recently faddish Magic Eye pictures might qualify). Incidentally, I found that to see Marilyn Monroe in the lower image, I not only had to move back but also squint. Up close it didn’t matter; I got Albert either way.

  2. Pingback: Divided We Stand United We Fall

  3. Not sure anyone will ever read this comment, but I just finished writing the lit review for an experiment I’m working on involving hybrid images and I have to take the opportunity to ramble.

    What we can perceive is limited by our visual acuity at any given distance. When we are far away from an object, we see general shapes (low spatial frequencies) instead of details (high spatial frequencies) because our vision is not good enough to detect fine lines so far away. Squinting also decreases visual acuity, and although it does not do so in precisely the same manner as distance, squinting at an object allows for a good approximation of the the spatial frequencies that would be perceived from far away.

    Consequently, if Einstein still looks like Einstein, you haven’t backed away far enough. 🙂 It is possible to create hybrid images that never really change precepts if the high spatial frequencies are too strong or if you combine some random object like an orange with a face (facial recognition is too strong in humans, partially due to separate processing areas), but this Marilyn-Einstein is one of the best hybrid images I’ve seen in terms of complete precept switch.

    I was trying to make some hybrid images the other day to use in my experiment, and honestly, these images can get pretty weird if you attempt to combine the wrong images. I was attempting to combine a picture of a koala and a face. Bad idea. I just ended up with Koala-Man at all percepts. It would have made a perfect villain for some comic book.

    End ramble.

  4. Illusion today connects the artist with the quantum physicist – because all of these people are examining some kind of illusion, the artist journeys outwards into what can be imagined (the structure of life), the quantum physicist and the astrophysicist journeys into to reveal the hidden (the space of life). So I am not going to contemplate whether neurophilosophy is an art or a science. That is in the eye of the beholder.

    Every discipline is linked and limited by our understanding of the human mind. The visual quality of hybrid images serves as a poster child to the relevance of neuroscience from a scientific mindset, just as a advertising is related from a neuromarketing mindset. Hybrid images therefore serve also a metaphor for the the way disciplines are merging also, such as science becoming neuroscience and marketing becoming neuromarketing and then of course I arrive here at neurophilosophy.

    The simple act of observation therefore is enlightening so long as we do not get lost in the detail. Art for instance is about the unravelling of detail, just as knowledge is the accumulation of detail, so I look at these hybrid images in this context and in this context I try to serve my own intelligence as I am to understand what humanity means and is.

    I do this as an observer and as someone who is conducting a personal exploration so that I am not blinded by detail but the detail provides reflective intelligence and consequently working out the hybrid of have freedom and being free. (Just thinking out aloud)


  5. I can actually sit a normal distance from the monitor, cross my eyes and see both Marilyn Monroe and Albert Einstein from the picture. I think is has to do with me making the image out of focus while crossing my eyes. Also, since my vision is poor, I can remove my glasses at normal distance and see Marilyn Monroe.

Comments are closed.