|
SOLA Results
This section examines the quality of the SOLA method by raising
and lowering the pitch of various signals. +5 and 5 semitones are used to
generate the results, as these are the maximum variation in pitch that a
guitarist would need. However, extended variation above these values is
achievable at reasonable quality.
The effect of correlation and overlapping rates is also investigated.
Scaling of Various Signals
All results have been generated
using GL = 4080, RO = 0.75 and FS
= 24000.
Sine Waves
Figure 26 shows the effect of
scaling at +5 and 5 semitones on a 500Hz sine wave.

Figure 26: Pitch scaling of a sine
wave. Top is original 440Hz sine, middle is scaled by 5 semitones, bottom by +5
semitones
This figure demonstrates that the
envelope of the waveform has been largely maintained although a slight
modulation is discernable in the figure. There is no noticeable distortion
introduced that the ear can detect.
Scaling of Guitar Samples
The results for single notes are
shown first, with chords following. Figure 27 below shows the effect of both
increasing and decreasing the pitch of an A note (hi E string, 17th
fret). At this note, the upper frequency range of the guitar is present. These
spectrums clearly show an undistorted scaling process.

Fig 27. (a) Original spectrum of
note A, (b) scaled by +5 semitones, (c) scaled by -5 semitones
The drop D note shown in Figure
28 on the following page is representative of the lowest frequencies generated
by a guitar. Whilst increasing the pitch in (c) shows no distortion, decreasing
it in (b) causes the peaks to spread. This is caused by the length of
correlation being too small to give any meaningful alignment between successive
grains. This is the effect of the correlation length not being long enough to
cover several fundamental periods.

Fig. 28 (a) Original spectrum of
note D, (b) scaled by +5 semitones, (c) scaled by -5 semitones
For scaling at 5 semitones, XL
= 50 samples. By increasing this to 200 samples, the distortion is removed, as
can be seen in figure 29.

Fig 29. Reduction of distortion by
increased correlation length
Figure 30 gives the results for
pitch scaling a G chord. While distortion is present, it is not perceivable.

Fig. 30 (a) Original spectrum of
chord G, (b) scaled by +5 semitones, (c) scaled by -5 semitones
In the processing of chord samples,
if correlation is not used, or the correlation length is very small compared
with the fundamental frequency, the result sounds like the individual strings
which contribute to the chord are out of tune. The reason for this effect is the
adding together of non-aligned grains that distort the wave envelope. This
results in a beating effect similar to that heard when two notes of slightly
different frequency are played.
Analysis of Overlap Rates
The accuracy of how the resultant
scaled envelope compares to the original envelope depends on the overlap ratio.
Figure 31 shows that as the ratio increases, the envelope more closely resembles
the original. The windowed grains are evident in (b), with each grain consisting
of a triangular envelope.

Fig. 31 (a) Original envelope, (b)
overlap of 2%, (c) overlap of 30%, (d) overlap of 75%
At 75% overlap, it is found that
the envelope distortion is reduced to levels that are not significantly
perceivable to the ear.
Extended Scaling
Using SOLA
This algorithm
can scale by much greater than +/- 5 semitones. At around 12 semitones, the
output starts to sound fake, as if a guitar is not generating the signal. At 24
semitones, the signal is either too high or too low in frequency, so is not
particularly useful. However, at this level of scaling, the output sounds very
hollow. The reason for this is that all frequency components are being scaled.
Some components of a sound (known as formants) stay at the same frequency
regardless of the change in pitch of the instrument. These are generated usually
from the body of the instrument. To achieve 100% accurate scaling, these
formants cannot be scaled. The reason the SOLA algorithm works so well for
electric guitar signals is that the entire signal is generated from the strings
(except for any noise pickup from interfering sources). This removes the
formants from the body of the guitar, but the strings still generate a small
number of these. The effect of scaling these formants is most noticeable when
the semitone deviation is high, such as 12 or 24.
This formant
issue is why music and voice signals are not scaled very well with this version
of SOLA. The larger bandwidth of these types of signals also means that a very
large correlation length is required to align the grains successfully. This
means that the grain lengths have to be increased to accommodate. When these
grains become too large, the signal in the grain becomes non-stationary and
length distortion becomes apparent. A possible solution to this problem is to
break the signal up into frequency bands, and process each band individually,
removing the requirement for a long correlation length.
hichord.mp3
hisingle.mp3
lochord.mp3
losingle.mp3
Previous
| Next |