An Electronic Capo for Guitar Pitch Scaling

A.R. McKinnon & D.G. Bailey

Abstract: There is a need for a guitarist to be able to alter the pitch of their instrument at the press of a button, without the need for a capo or re-tuning. Simple scaling of the pitch does not work, as the relationship between frequency and time results in the sound length being changed. This paper illustrates a windowed approach that uses Synchronous Overlap and Add (SOLA) with multirate signal processing, to enable real-time pitch scaling of a guitar signal on a DSP.

Keywords: Pitch Scaling, Synchronous Overlap and Add, Electronic Capo.

1      Introduction

Conventionally, a guitarist uses either retuning, or a capo device to raise or lower the pitch of a guitar. The capo simply depresses all of the strings at the desired fret location, effectively shortening the length of all strings by the same ratio. This allows a piece of music to be played in a different key. Because changing the position of the capo takes time, in most instances key changes cannot be performed quickly. This has created a need for a device that can alter the pitch of a guitar signal by semitones, which operates in real time, and performs without an appreciable delay.

This problem cannot be solved simply by altering the sample rate. When a signal is digitised at a certain sample rate, any resampling process will leave the new signal with a different length. The reciprocal relationship between time and frequency means that changing the frequency by a factor will change the length of the signal by the inverse of that factor. For scaling music this is unacceptable, the length needs to be the same as the original signal. This paper examines both time and frequency domain approaches to pitch scaling, with emphasis being placed on the Synchronous Overlap and Add method that has been implemented on a Texas Instruments TMS320C542 digital signal processor.

This idea of pitch scaling has been developed as part of a wider project based on the creation and implementation of guitar effects and processes. This scaling process can be used in conjunction with other digital effects such as distortion, wah wah and tremolo, in a chaining together of effect and pitch scaling algorithms. This gives the possibility to create a stand-alone guitar-processing unit high functionality.

2   PITCH SCALING APPROACHES

Two approaches to scaling are considered here. Firstly, simple frequency shifting and its inherent harmonic distortion are examined. Following this is a method of note detection and modulation.

2.1   FREQUENCY Shifting

Scaling by frequency shifting is achieved in two parts. Initially, a Fast Fourier Transform (FFT) is performed on small lengths of the input guitar signal. These lengths are known as grains. This FFT results in a set of frequency bins that are centred an equal distance apart. Each contains a value that represents the magnitude of a frequency that is present in the signal. The next step is to move each magnitude to a new bin location. The size of the move determines the level of pitch deviation from the original.

Figure 1: Frequency shifting by a constant

Figure 1 above shows this approach. The original waveform is comprised of three sine waves at 100, 300 and 500Hz. The bottom part of the figure shows how each frequency component has been shifted by a constant factor of +100Hz.

The problem with this approach is that the harmonic relationships in the original signal are not maintained when the shifting occurs. These relationships are what make music sound pleasant to the ear. When a guitar string vibrates, it generates a fundamental frequency, as well as multiple higher order harmonic overtones. In Figure 1, the fundamental (at 100Hz) is doubled, but the overtones need to be also doubled for the harmonic relationship to hold. Therefore, the correct values would be a 200Hz fundamental, with 600 and 1000 Hertz overtones. As expected, any audio sample shifted in frequency with this method becomes highly distorted to the ear.

Next, a method is proposed that preserves the aforementioned harmonic relationships.

2.2  Frequency Band Modulation

To overcome this problem, the envelope of the signal (conveyed by the width and shape of each frequency domain peak [1]) must be maintained. This can be achieved by using multiple bandpass filters to modulate individual frequency bands up or down in frequency. Each filtered band is at least as wide as the peaks in the frequency domain. Each band is modulated by a value that is determined by the scaling factor and the filters centre frequency. Figure 2 shows how this method works. With this approach, each band is modulated by the correct amount to maintain the harmonic relationship, and the peaks stay the same width, maintaining the time relationship.  For example, with a main scaling factor of two, a bandpass filter with a centre frequency of 600Hz will be modulated by a sine wave of 600Hz to give a centre frequency of 1200Hz.

This approach requires a large number of filters to get good accuracy in representing all of the frequencies present in a signal. Therefore, it requires significant computational resources to implement effectively. 

Figure 2: Block diagram of frequency band modulation

3   PITCH SCALING USING A WINDOWED APPROACH

The frequency shifting approach described in section 2.1 can be made to work if the grain size (the size of each window) is sufficiently short, and the windows overlap significantly. By keeping the windows short, the audible effect of the length distortion is minimised, and by having overlapped windows, any gaps in the signal caused by the window becoming shorter can be minimised. The windowed approach may be applied in either the frequency domain or the time domain.

3.1   Frequency Domain Approach

The idea of this method is similar to that of section 2.1. An FFT of each grain is taken and the resolution (number of frequency bins) is increased with upsampling and interpolation. An FFT can be seen as a series of narrow bandpass filters, with each bin representing one filter. The magnitude of each bin is shifted, in a similar way to that of section 2.1, but now each bins new position is calculated by the product of the current position and the scaling factor. The effect is similar to the process described in section 2.2, although the band pass filters will almost certainly be narrower than the peaks in the frequency domain. Figure 3 demonstrates how the original spectrum is modified to obtain a pitch scaled waveform. The bin resolution has been increased by a factor of 10, and each original magnitude has been shifted by a scale factor of 1.1. Bin 1 is moved to 11, bin 2 to 22 and so on. Interpolation is performed between the original magnitudes. These are shown as the thinner lines in the bottom graph.

Figure 3:  Modification of frequency spectrum to attain correct bin positions (scaling factor of 1.1)

While this approach requires fewer resources to implement on DSP than the band modulation of section 2.2, it still requires both an FFT and inverse FFT on each grain.

3.2   Time Domain Approach

The need for the FFTs can be removed if all of the processing is performed in the time domain. In resampling each grain, multirate techniques can be used. Each grain is resampled, and overlapping grains are added together to give the overlap and add method [2]. Figure 4 shows how each grain is processed, with resampling, correlation and windowing.

Figure 4: Block diagram of the overlap and add approach

3.2.1 Grain Length and Synchronous Period. The grain length (GL , in samples) is determined from the sample rate (FS) used in the system. From experimental results, it was found that for a wide range of guitar signals, that a grain length of between 150 ms and 200 ms maintains a high quality of scaling, with a low level of distortion. Equation 1 shows the relationship with the sample rate for a grain length of 170 ms.

                                                (1)

Therefore, at a sampling rate of 24KHz as implemented on the TI DSP, GL  = 4080.

The Synchronous period (SP) determines the spacing between the start of each grain.  This must be considerably smaller than the grain time to ensure that there is adequate overlap between grains. This overlap is required to reduce envelope distortion. The overlap ratio is defined as

                          (2)

Good quality is achieved from RO ³ 0.75, which results in SP = < 1020 samples for our TI DSP implementation.

3.2.2 Multirate Signal Processing. By performing an upsampling process with interpolation, applying an anti-aliasing filter, then downsampling, any desired pitch alteration to the grain can be achieved [3]. Even though the length of the grain is changed, by overlap and adding successive grains the overall length of the signal is maintained.

The ratio between semitones is given by

                                          (3)

where n is the number of semitone deviations from the original note. Equation (3) allows us to calculate the error in integer ratios when using multirate processing. For example, if the ratio 18:17 is used (upsample by a factor of 18, then downsample by a factor of 17), this gives 1.0588. The exact ratio between one semitone is 1.0595, from (3). This difference of 0.06% is not detectable. However, if we wish to scale up by five semitones using the same upsampling integer (18:23), the error in the frequency increases to 3%, which is detectable. This means that for each semitone increase or decrease in pitch, an integer ratio must be found which has an undetectable error. Table 1 lists the smallest integer ratios that satisfy the requirements, and have been used for implementation.

TABLE 1: Integer ratios for multirate processing. Percentage error is error between actual semitone ratio from (3) and integer ratio.

Semitone Shift

Ratio

Error (%)

+/- 1

17:18

0.060

 

 

 

+/- 2

8:9

0.226

 

 

 

+/- 3

11:13

0.625

 

 

 

+/- 4

4:5

0.794

 

 

 

+/- 5

3:4

0.113

Multirate signal processing is preferred over methods such as FFT because it is less computationally expensive. In addition, the delay in processing data is smaller than FFT as the current grain does not have to be buffered in memory before it is processed.

 

 

 

3.2.3 Grain Correlation and Adjustment. Once the first part of the grain has been through the resampling process, it is cross-correlated with the corresponding part of the previous grain to find the best position for the grains to be overlapped. This minimises large phase changes, and reduces the effect of waveform distortion through the adding of out of phase grains. Large phase changes at grain boundaries will add unwanted high frequency components.

If GL is calculated with equation (1), only a small correlation length is required to obtain high quality results. From testing, this length needs to be approximately four milliseconds, or around 100 samples when FS = 24KHz.

However, if GL is made to be significantly shorter than that of (1), then cross correlation becomes very important to the quality of the output. The length of the correlation in this case needs to be inversely proportional to GL. The correlation length also depends on the pitch of the signal, with lower frequency notes requiring a longer length to find a good match. Once the maximum point of the cross-correlation function is found, this is used to align the current and previous grains.

3.2.4 Grain Windowing. Each grain is windowed by a triangular window before grain overlapping and adding. This gives a smooth transition into and out of the grain, and reduces high frequency noise caused from phase changes on grain boundaries when they are added together. The type of window makes little difference to the quality of the output. Other windows such as Hamming and Blackman work just as well.

Figure 5: Comparison between boxcar (rectangular) and triangular grain windowing

Figure 5 shows how the phase offsets in the circles in the top subplot are removed by using triangular windowing in the bottom plot.

3.2.5 Overlap and Add. The next step after windowing is to add the grain to the previous grain with an offset. This offset is the value of the synchronous period, SP plus the offset determined from the cross correlation in section 3.2.3. This is illustrated by figure 6, as the third window of the output signal is the sum of G1C, G2B and G3A.

Figure 6 illustrates the entire SOLA process by breaking the input into grains and processing them step-by-step. The areas that are shaded by diagonal lines are where the correlation between grains takes place.

Figure 6: Process of generating the output waveform using SOLA

4   RESULTS

This section examines the quality of the SOLA method by raising and lowering the pitch of various signals. +5 and –5 semitones are used to generate the results, as these are the maximum variation in pitch that a guitarist would need. However, extended variation above these values is achievable at a reasonable quality. The effect of correlation and overlapping rates is also investigated, with a final section on DSP implementation.

4.1  Scaling of Guitar Signals

All results have been generated using GL = 4080, RO = 0.75 and FS = 24000.

The ‘drop D’ note shown in Figure 7 is representative of the lowest frequencies generated by a guitar. Whilst increasing the pitch in (c) shows no distortion, decreasing it in (b) causes the peaks to spread. This is caused by the length of correlation being too small to give any meaningful alignment between successive grains. This is the effect of the correlation length not being long enough to cover several fundamental periods.

Figure 7: (a) Original spectrum of note D, (b) scaled by +5 semitones, (c) scaled by -5 semitones

For scaling at –5 semitones, XL = 50 samples. By increasing this to 200 samples, the distortion is removed, as can be seen in figure 8.

Figure 8: Reduction of distortion by increased correlation length

Figure 9 gives the results for pitch scaling a G chord. While distortion is present, it is not perceivable.

Figure 9: (a) Original spectrum of chord G, (b) scaled by +5 semitones, (c) scaled by -5 semitones

In the processing of chord samples, if correlation is not used, or the correlation length is very small compared with the fundamental frequency, the result sounds like the individual strings which contribute to the chord are out of tune. The reason for this effect is the adding together of non-aligned grains that distort the wave envelope. This results in a beating effect similar to that heard when two notes of slightly different frequency are played.

4.2  Analysis of Overlap Rates

The accuracy of how the resultant scaled envelope compares to the original envelope depends on the overlap ratio. Figure 10 shows that as the ratio increases, the envelope more closely resembles the original. The windowed grains are evident in (b), with each grain consisting of a triangular envelope.

Figure 10: (a) Original envelope, (b) overlap of 2%, (c) overlap of 30%, (d) overlap of 75%

At 75% overlap, it is found that the envelope distortion is reduced to levels that are not significantly perceivable to the ear.

4.3  DSP Implementation

This algorithm has been implemented on the Texas Instruments TMS320C542 low power Digital Signal Processor development unit. The device uses the TLC320AC01C analogue interface circuit to obtain a digital representation of the guitar signal, and to convert the processed digital signal back into an analogue form suitable for playing through a speaker. This device runs at a clock speed of 40MHz, and has 10k of usable memory.

Because multirate processing can be performed as samples arrive, with only a small delay, entire grains do not have to exist in memory. This therefore removes any problem with lack of memory that might have arisen with other methods, such as those utilising FFTs. For implementation, each grain that is contributing to the output signal at a given time need only to have enough samples so that the upsampling with interpolation, and downsampling processes can proceed. Cross-correlation adds a small delay, because this data needs to be buffered in memory. However, only the new and previous grain needs this correlation data.

A small enough delay is achieved so that it is undetectable to the ear. A maximum delay of 42ms occurs at +5 semitones scaling. 40Mhz is ample computing power to be able to process this algorithm in real time.

5   CONCLUSION

In this paper, an investigation into differing approaches of pitch scaling is carried out, with the Synchronous Overlap and Add (SOLA) method being implemented in real time on a Texas Instruments DSP. High quality scaling can be achieved by having a long enough grain window with a large overlap between grains. Results on the quality of this approach show that for a majority of guitar signals, undetectable scaling up to five semitones can be achieved. This algorithm could be a useful tool in conjunction with other guitar effects and processes, to form a guitar-processing unit that can be used in a live situation.

6   REFERENCES

[1] D.G. Bailey, October 1997, Detecting Regular Patterns Using Frequency Domain Self-filtering, Proceedings of the International Conference on Image Processing, Santa Barbara, California, Vol I, pp 440-443.

[2] P.H. Wong, O.C. Au, 1998, Fast Browsing of Speech Material for Digital Library and Distance Learning, Proceedings of the IEEE Int. Sym. on Circuits & Systems.

[3] S.K. Mitra, 1998, Digital Signal Processing, A Computer Based Approach, McGraw Hill, pp 671– 679.