Dan Ellis : Resources : Matlab :

A Phase Vocoder in Matlab

Introduction

The Phase Vocoder [FlanG66, Dols86, LaroD99] is an algorithm for timescale modification of audio.  One way of understanding it is to think of it as stretching or compressing the time-base of a spectrogram to change the temporal characteristics of a sound while retaining its short-time spectral characteristics; if the spectrogram is narrowband (analysis window longer than a pitch cycle, so the individual harmonics are resolved), then preserving the spectral characteristics implies preserving the pitch, and avoiding the 'slowing down the tape' pitch drop.  The only complication to the algorithm is that the phases associated with each bin in the modified spectrogram image have to be 'fixed up' to maintain the dphase/dtime of the original, thereby ensuring the correct alignment of successive windows in the overlap-add reconstruction.

I first wrote a phase vocoder in 1990 which eventually became the 'pvoc' unit generator in Csound.  This implementation is a lot smaller and took much less time to debug!  It first calculates the short-time Fourier transform of the signal using 'stft'; 'pvsample' then builds a modified spectrogram array by sampling the original array at a sequence of fractional time values, interpolating the magnitudes and fixing-up the phases as it goes along.  The resulting time-frequency array can be inverted back into a sound with 'istft'.  The 'pvoc' script is a wrapper to perform all three of these steps for a fixed time-scaling factor (larger than one for speeding up; smaller than one to slow down).  But the underlying pvsample routine would also support arbitrary timebase variation (freezing, reversal, modulation) if one wished to write a suitable interface to specify the time path.

Code

These were developed on Matlab 5.0, but should work on any version.

Here's an example of how to use pvoc to slow down a soundfile of voice (sampled at 16 kHz) to 3/4 speed:

»[d,sr]=wavread('sf1_cln.wav');
»sr
sr =
           16000
»% 1024 samples is about 60 ms at 16kHz, a good window
»y=pvoc(d,.75,1024);
»% Compare original and resynthesis
»sound(d,16000)
»sound(y,16000)

Here's how to use phase vocoder time-scale modification followed by resampling to effect a pitch shift. In this case, we shift the pitch up by a major third (by extending duration with the phase vocoder, then resampling to the original length), then add it back to the initial sound to give harmonization:

»[d,sr]=wavread('clar.wav');
»e = pvoc(d, 0.8);
»f = resample(e,4,5); % NB: 0.8 = 4/5
»soundsc(d+e(1:length(d)),r)

References

[FlanG66]
J. L. Flanagan, R. M. Golden, "Phase Vocoder," Bell System Technical Journal, November 1966, 1493-1509.
http://www.ee.columbia.edu/~dpwe/e6820/papers/FlanG66.pdf

[Dols86]
Mark Dolson, "The phase vocoder: A tutorial," Computer Music Journal, vol. 10, no. 4, pp. 14 -- 27, 1986.

[LaroD99]
Jean Laroche and Mark Dolson "New Phase Vocoder Technique for Pitch-Shifting, Harmonizing and Other Exotic Effects". IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Mohonk, New Paltz, NY. 1999.
http://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf

There's also a recommended tutorial here at Stephan Sprenger's DSP dimension.

History

2003-03-06 Added pitch shifting/harmonization example

2002-02-13 Revised version uses stft/istft for perfect reconstruction when r = 1. More stuff on page.

2000-12-11 First version of this page, after demo'ing in E4810.


Last updated: $Date: 2003/04/09 03:43:54 $

Dan Ellis <dpwe@ee.columbia.edu>