Keynote : Josh McDermott

Video recorded and edited by Pascal Cesaro.

Josh McDermott is a perceptual scientist studying sound and hearing in the Department of Brain and Cognitive Sciences at MIT, where he is an Associate Professor and heads the Laboratory for Computational Audition. His research addresses human and machine audition using tools from experimental psychology, engineering, and neuroscience. McDermott obtained a BA in Brain and Cognitive Science from Harvard, an MPhil in Computational Neuroscience from University College London, a PhD in Brain and Cognitive Science from MIT, and postdoctoral training in psychoacoustics at the University of Minnesota and in computational neuroscience at NYU. He is the recipient of a Marshall Scholarship, a James S. McDonnell Foundation Scholar Award, an NSF CAREER Award, and the Troland Award from the National Academy of Sciences.

Computational auditory scene analysis as causal inference

A central computational challenge of everyday hearing, and of music perception, is the need to separate the distinct causes of sound in the world. The most commonly discussed version of this problem occurs with concurrent sound sources, often termed the ‘cocktail party problem’. However, analogous problems are posed by reverberation, in which the sound from a source interacts with the environment (via reflections) on its way to the ears, as well as by sound-generating object interactions, in which the physical properties of multiple objects jointly determine the sound. Dating back to Helmholtz, perceptual judgments have been considered the result of unconscious inference, in which our perceptual systems determine the most likely causes of sensory stimuli in terms of structures and events in the world. Despite the conceptual appeal of this view, perceptual inference has historically been difficult to instantiate in working computational systems for all but the simplest perceptual judgments. In this talk I will revisit the notion of scene analysis as inference, leveraging recent computational developments that make inference newly feasible and exploring neglected classes of everyday scene analysis problems along with classical auditory scene analysis.