|
Frequency Domain
Method using the Short Time Fourier Transform
To process a signal
in real time, the signal must be processed as it comes in. The signal is broken
up into frames (short segments
of the
signal). An FFT of these frames is taken (a Short Time Fourier Transform), and
processing performed on that FFT. Reconstruction with the inverse FFT and then
the adding of overlapped frames results in the processed output signal. This is
the basis for the method of pitch scaling in the frequency domain as described
below.
An FFT of each grain is taken and
the resolution (number of frequency bins) is increased with upsampling and
interpolation. An FFT can be seen as a series of narrow bandpass filters, with
each bin representing one filter. The magnitude of each bin is shifted, and each
bins new position is calculated by the product of the current position and the
scaling factor. Figure 20 on the following page demonstrates how the original
spectrum is modified to obtain a pitch scaled waveform. The bin resolution has
been increased by a factor of 10, and each original magnitude has been shifted
by a scale factor of 1.1. Bin 1 is moved to 11, bin 2 to 22 and so on.
Interpolation is performed between the original magnitudes. These are shown as
the thinner lines in the bottom graph.

Figure 20: Modification of
frequency spectrum to attain correct bin positions (scaling factor of 1.1)
The performance of this method is
promising, and with further development, could produce suitable results.
However, a few key problems were found in implementing this method.
There needs to be a high enough
frequency resolution of the initial FFT frame to ensure that harmonic scaling is
successful. If an FFT bin represents too much of the frequency spectrum, when it
gets repositioned true harmonic scaling will not be performed. Initial results
found that a long frame size (300 ms) is required to gain a negligible error in
where the frequency positions are, and where they should be. If a frame size any
smaller is used, then the error in each shifted frequency bin is not constant,
leading to a sound similar to that of a badly tuned guitar.
This large frame length is
problematic for DSP implementation. Firstly, an entire FFT frame needs to be in
memory before it can be processed. This leads to a significant 300 ms delay that
is not acceptable for a guitar processor unit that is intended to be used live.
Secondly, having to upsample by at least a factor of 10 a frame size of 8192
(300 ms at a 24kHz sample rate) leads to memory issues.
Therefore, before investing too
much time on a method that seemed could not be implemented properly with the DSP
available, SOLA, the time domain approach was investigated to bypass the
inherent difficulties associated with FFT methods.
Previous
| Next
|