Keynote : Geoffroy Peeters


Geoffroy Peeters

Geoffroy Peeters received his PHD on signal processing for speech processing in 2001 and his Habilitation (HDR) on Music Information Retrieval in 2013 from University Paris VI. From 2001 to 2018, he led research related to MIR at IRCAM. His research topics concern signal processing and machine learning (including deep learning) for the automatic analysis of music (timbre description, audio features, singing voice, source separation, beat/downbeat/rhythm estimation, chord/key/multi-pitch estimation, music structure/summary, audio-identification, cover-version, auto-tagging), evaluation methodologies and corpus creation. Since 2018, he his full professor in the Image-Data-Signal department of Télécom Paris, Institut Polytechnique de Paris where he teaches those topics. He is the author of numerous articles and several patents in these areas and co-author of the ISO MPEG-7 audio standard. He has been co-general chair of the DAFx-2011 and ISMIR-2018 conferences and is member of the DAFx board, IEEE Task Force on Computational Audio Processing and has been elected in the ISMIR board in 2016.


from shallow-MIR to deep-MIR: is that really what it looks like?

MIR usually stands for Music Information Retrieval but can be extended in the broader sens to Music Information Research. It is then the interdisciplinary research field which focuses on the processing of digital data related to music, including gathering and organisation of machine-readable musical data, development of data representations, and methodologies to process and understand that data. It appeared around the year 2000 probably as an answer to the development of digital library and the massive amount of accessible music data (first through peer-to-peer, today through streaming). It initially (and naturally) took its roots around research fields such as musicology (computational musicology), perception and cognition of sound, musical acoustic, audio signal processing and machine learning. In the first MIR systems, named knowledge-driven (named "shallow" by the deep-learning community), the knowledge of these fields (such as harmonic rules, perceptual rules or signal processing algorithms) were encoded by a human in a computer. Progressively data-driven approaches, through machine-learning, helped to acquire the knowledge for under-explored fields (such as finding the relationship between audio and music genre) by analyzing large music data-sets. The climax of this trend are the recent "end-to-end" systems where deep neural networks (aka deep learning) approaches are used to acquire all the knowledge (including the signal processing one). Source separation is now easily achieved using a completely agnostic deep-Wave-U-Net rather then using Computational auditory scene analysis premise. In this talk we discuss this evolution, why it appeared, the pros and cons of it and where it leads.