|
Time Domain Pitch Scaling using
Synchronous Overlap and Add
The need for the FFTs can be
removed if all of the processing is performed in the time domain using the SOLA
approach. This approach works by resampling grains to achieve a change in pitch.
In resampling each grain, multirate techniques can be used. Each grain is
resampled, and overlapping grains are added together to give the overlap and add
method. Figure 21 shows how each grain is processed, with resampling,
correlation and windowing.

Figure 21: Block diagram of the
overlap and add approach
Grain Length and Synchronous Period
The grain length (GL,
in samples) is determined from the sample rate (FS) used
in the system. From experimental results, it was found that for a wide range of
guitar signals, that a grain length of between 150 ms and 200 ms maintains a
high quality of scaling, with a low level of distortion. Equation 1 shows the
relationship with the sample rate for a grain length of 170 ms.
(1)
Therefore, at a sampling rate of
24KHz as implemented on the TI DSP, GL = 4080.
The Synchronous period (SP)
determines the spacing between the start of each grain. This must be
considerably smaller than the grain time to ensure that there is adequate
overlap between grains. This overlap is required to reduce envelope distortion.
The overlap ratio is defined as
(2)
Good
quality is achieved from RO
³
0.75, which results in SP = < 1020 samples for the TI DSP
implementation.
Multirate Signal Processing
By performing an upsampling process
with interpolation, anti-alias filtering and then downsampling, any desired
pitch alteration to the grain can be achieved. Even though the length of the
grain is changed, by overlap and adding successive grains the overall length of
the signal is maintained.
The ratio between semitones is
given by
(3)
where n is the number of
semitone deviations from the original note. Equation (3) allows us to calculate
the error in integer ratios when using multirate processing. For example, if the
ratio 18:17 is used (upsample by a factor of 18, then downsample by a factor of
17), this gives 1.0588. The exact ratio between one semitone is 1.0595, from
(3). This difference of 0.06% is not detectable. However, if we wish to scale up
by five semitones using the same upsampling integer (18:23), the error in the
frequency increases to 3%, which is detectable. This means that for each
semitone increase or decrease in pitch, an integer ratio must be found which has
an undetectable error. Table 1 lists the smallest integer ratios that satisfy
the requirements, and have been used for implementation.
TABLE 1. Integer ratios for
multirate processing. Percentage error is error between actual semitone ratio
from (3) and integer ratio.
|
Semitone Shift
|
Ratio |
Error (%) |
|
+/- 1 |
17:18 |
0.060 |
|
|
|
|
|
+/- 2 |
8:9 |
0.226 |
|
|
|
|
|
+/- 3 |
11:13 |
0.625 |
|
|
|
|
|
+/- 4 |
4:5 |
0.794 |
|
|
|
|
|
+/- 5 |
3:4 |
0.113 |
Multirate
signal processing is preferred over methods such as FFT because it is less
computationally expensive. In addition, the delay in processing data is smaller
than FFT as the current grain does not have to be buffered in memory before it
is processed.
Grain Correlation and Adjustment
Once the first part of the grain
has been through the resampling process, it is cross-correlated with the
corresponding part of the previous grain to find the best position for the
grains to be overlapped. This minimises large phase changes, and reduces the
effect of waveform distortion through the adding of out of phase grains. Large
phase changes at grain boundaries will add unwanted high frequency components.
If GL is
calculated with equation (1), only a small correlation length is required to
obtain high quality results. From testing, this length needs to be approximately
4 ms, or around 100 samples when FS = 24KHz.
However, if GL is
made to be significantly shorter than that of (1), then cross correlation
becomes very important to the quality of the output. The length of the
correlation in this case needs to be inversely proportional to GL.
The correlation length also depends on the pitch of the signal, with lower
frequency notes requiring a longer length to find a good match.
Once the maximum point of the
cross-correlation function is found, this is used to align the current and
previous grains.
Grain Windowing
Each grain is windowed by a
triangular window before grain overlapping and adding. This gives a smooth
transition into and out of the grain, and reduces high frequency noise caused
from phase changes on grain boundaries when they are added together. The type of
window makes little difference to the quality of the output. Other windows such
as Hamming and Blackman work just as well. Figure 22 shows the tapering effect
of using a triangular window function on a sine wave.

Figure 22: Grain windowing using a
triangular function

Figure 23. Comparison between
boxcar (rectangular) and triangular grain windowing
Figure 23 shows how the phase
offsets in the circles in the top subplot are removed by using triangular
windowing in the bottom plot. This effect of phase spikes leads to the addition
of high frequency noise if not removed.
Overlap and Add
The next step after windowing is to
add the grain to the previous grain with an offset. This offset is the value of
the synchronous period, SP plus the offset determined from the
cross correlation in section 5.3. This is illustrated by figure 24 on the
following page, as the third window of the output signal is the sum of G1C,
G2B and G3A.
Figure 24 illustrates the entire
SOLA process by breaking the input into grains and processing them step-by-step.
The areas that are shaded by diagonal lines are where the correlation between
grains takes place.

Figure 24: Process of generating
the output waveform using SOLA
Previous
| Next
|