Background

Methodology

Guitar Effects

    Tremolo

    Distortion

    Wah Wah

    Chorusing

    Delay

    Reverb       

Pitch Scaling

   Introduction

   STFT

   SOLA

   DSP SOLA

   SOLA Results

   Conclusions

 

Frequency Domain Method using the Short Time Fourier Transform 

To process a signal in real time, the signal must be processed as it comes in. The signal is broken up into frames (short segments of the signal). An FFT of these frames is taken (a Short Time Fourier Transform), and processing performed on that FFT. Reconstruction with the inverse FFT and then the adding of overlapped frames results in the processed output signal. This is the basis for the method of pitch scaling in the frequency domain as described below. 

An FFT of each grain is taken and the resolution (number of frequency bins) is increased with upsampling and interpolation. An FFT can be seen as a series of narrow bandpass filters, with each bin representing one filter. The magnitude of each bin is shifted, and each bins new position is calculated by the product of the current position and the scaling factor. Figure 20 on the following page demonstrates how the original spectrum is modified to obtain a pitch scaled waveform. The bin resolution has been increased by a factor of 10, and each original magnitude has been shifted by a scale factor of 1.1. Bin 1 is moved to 11, bin 2 to 22 and so on. Interpolation is performed between the original magnitudes. These are shown as the thinner lines in the bottom graph. 

Figure 20: Modification of frequency spectrum to attain correct bin positions (scaling factor of 1.1)

The performance of this method is promising, and with further development, could produce suitable results. However, a few key problems were found in implementing this method. 

There needs to be a high enough frequency resolution of the initial FFT frame to ensure that harmonic scaling is successful. If an FFT bin represents too much of the frequency spectrum, when it gets repositioned true harmonic scaling will not be performed. Initial results found that a long frame size (300 ms) is required to gain a negligible error in where the frequency positions are, and where they should be. If a frame size any smaller is used, then the error in each shifted frequency bin is not constant, leading to a sound similar to that of a badly tuned guitar.  

This large frame length is problematic for DSP implementation. Firstly, an entire FFT frame needs to be in memory before it can be processed. This leads to a significant 300 ms delay that is not acceptable for a guitar processor unit that is intended to be used live. Secondly, having to upsample by at least a factor of 10 a frame size of 8192 (300 ms at a 24kHz sample rate) leads to memory issues.  

Therefore, before investing too much time on a method that seemed could not be implemented properly with the DSP available, SOLA, the time domain approach was investigated to bypass the inherent difficulties associated with FFT methods.

 

Previous | Next