HOME > PROJECTS > DAFX98

MUSICAL GESTURES AND AUDIO EFFECTS PROCESSING
(Presented at the the DAFX98 conference in Barcelona)
Eric Metois - November 1998

The first annual Digital Audio Effects Processing conference (DAFX98) took place in the beautiful city of Barcelona. The call for paper was an opportunity for me to revisit some toy ideas that had emerged from my work at the Media Lab. It uses the concept of Musical Gestures (which came from my Ph.D. dissertation) in the context of audio effects processing. As an embodiment of the general framework is presented, I brought with me a sound example resulting from the processing of a recording of Ravel's Sonate pour violon et violoncelle. Harmonic structure likelihoods (standing for the chosen set of musical gestures for this example) were estimated from the polyphonic recording and they were then used to add a synthetic layer consisting of female voices (generated as part of the processing through some trivial wavetable procedure). The resulting audio is satisfying and it illustrates the fact that even a difficult problem such as real-time polyphonic tracking can be reasonably achieved with a soft analysis/control system which doesn’t attempt to capture high level musical intentions but rather confines itself to an expressive and humble set of measurements.

My 4-page paper is available in the conference’s proceedings or you can download it here:

metdafx98.pdf (Pdf file - 204 kbytes).

SYNOPSIS

The first annual Digital Audio Effects Processing conference (DAFX98) took place in the beautiful city of Barcelona. The call for paper was an opportunity for me to revisit some toy ideas that had emerged from my work at the Media Lab. It uses the concept of Musical Gestures (which came from my Ph.D. dissertation) in the context of audio effects processing. As an embodiment of the general framework is presented, I brought with me a sound example resulting from the processing of a recording of Ravel's Sonate pour violon et violoncelle. Harmonic structure likelihoods (standing for the chosen set of musical gestures for this example) were estimated from the polyphonic recording and they were then used to add a synthetic layer consisting of female voices (generated as part of the processing through some trivial wavetable procedure). The resulting audio is satisfying and it illustrates the fact that even a difficult problem such as real-time polyphonic tracking can be reasonably achieved with a soft analysis/control system which doesn’t attempt to capture high level musical intentions but rather confines itself to an expressive and humble set of measurements.


AUDIO EXAMPLE

The following audio example results from the processing of a recording of the first movement of Ravel's Sonate pour violon et violoncelle. The chosen musical gestures (Harmonic likelihoods, or more specifically 32 soft keys) are extracted over a range of three octaves (between 100 and 800 Hz). This extraction follows the general scheme that was outlined in the paper. As these musical gestures are being estimated, they are fed as control parameters to a rudimentary wavetable synthesis module that was loaded with samples of female vocals. We recall that a soft key consists in two state variables: frequency and energy. In this example, these two state variables are mapped literally to the pitch an the loudness of appropriate samples in the synthesis engine. This process results in a synthetic choir that is then mixed with the original stream, leading to the following:

rav36sh.mp3
(MPEG-1 Layer III - 44.1kHz, 96kbps, joint stereo - 43 seconds / 507 kbytes)