November 2001 - January 2002 : An introduction to Time- Frequency (TF) analysis for a TEOAE interpretation

1.  Introduction

        A number of handbooks introducing us in the world of sound, are stating that sound has three main characteristics time, frequency and intensity. My intention here is to give an introduction to these terms as deeply as necessary and then to proceed further to the more sophisticated topic of time frequency (TF) analysis and its interpretation in the context of TEOAEs.

        Time and frequency are somewhat connected terms. In a mathematical or engineering context it might be said that the information carried by a sound signal can be represented in one of two equivalent forms - in a time-domain or in a frequency-domain. Although the time-domain representation is the natural form, one can transform the information to the frequency domain and back to the time domain without any loss of information. This transformation considers only the mathematical properties of information in the signal. However, for very simple periodic signals, the transition to the frequency domain allows a better signal inspection than the natural time representation. On the other hand, if a source emits acoustic signals of variable frequency or there are transient segments in the acoustic signal, the frequency representation has only a formal and technical value. In this case, the spectrum of the signal, being the frequency domain representation, helps in evaluating the properties of the acoustic signal.

2. Properties of a simple periodic acoustic signal

        Let us go back to the considerations about the fundamental features of sound that is the time, intensity and frequency. I would like to focus only on practical aspects of these phenomena from the "engineering" point of view. And this approach will be particularly difficult if we try to describe time. A lot has been written about the term "time" in philosophical and physical aspects (i.e. theory of relativity), however these descriptions and subtleties have little practical meaning in consideration of the otoacoustic emission signals. It is more practical to follow a mathematical approach and say that time is a primitive term. For us, the most important feature of time is that it "runs" permanently.

        As the time "runs" we can observe the changes around us. For the otoacoustic emissions, our observations consist in the recording of acoustic pressure changes produced by the activity of the outer hair cells in the inner ear. As the movement of the cells has two phases, in the process of expansion and contraction, the acoustic pressure changes its sign being once positive then negative. It should be noted that in this example we are referring to the instantaneous values of the otoacoustic emission signals, as they are commonly presented in the "Waveform" panels of the majority of commercial software packages which run the acquisition of the OAE responses.

        The instantaneous value represents either the temporary phase of pressure or the intensity of the otoacoustic emission. We are going to separate these two aspects and focus only on the intensity. In evaluating the intensity, the "signal envelope" and "signal amplitude" terms are very helpful. In order to introduce definitions of these terms, we need to start with an easy case from the class of simple periodic signals. For these signals, nor the envelope neither the amplitude changes in time. The envelope consists of two straight parallel lines limiting all possible values of the signal. The upper line links all maximum values (peaks) of the waveform and the lower line links the minima. The reason why for simple signals both lines are straight is that nothing changes in the signal except of the phase of vibration. Amplitude is defined as the one half of the difference between the upper and lower line level of the envelope. For a large class of the simple periodic signals, the upper line is the mirror reflection of the lower line as both of them are symmetrically placed around the time axis. The otoacoustic emission signals roughly satisfy this condition. Anyway, in such cases, we can define the amplitude as the distance of the upper envelope line from the baseline.

3. More complex acoustic signals

        The above definition of the envelope is consistent only for less-complicated waveforms. If one tries to apply these definitions on a wider, general class of signals, he will meet a lot of difficulties. One of the problems appears in the case for which many oscillations appear inside one period of the signal. Examples of these class of signals include the glottal vibrations or the simpler arterial pressure waveform with its dichotic notch. These are examples of quasi periodic signals, where many positive and negative peaks exist in the segment covering one period. Which peak is important for the envelope calculation? The answer is suggested in the statement defining the problem. If we know what is the period of the oscillations, we can take one maximum in the range of the period for the upper level line of the envelope and one minimum for the lower line. The two extreme deviations, positive and negative, are important in this concept.

        The above method provides single points from particular periods as data for the signal's envelope. At this point certain questions arise: What about the rest of the continuous time scale? Are the values for the rest of the time scale important and how they can be evaluated? In order to find the answers to the above questions, let us recall that our intention is to get information about the sound intensity. The points which we are estimating, from the sequence of peaks of acoustic pressure, belong to a small subset of intensity samples. The full evaluation of the signal's envelope requires special processing methods.

        Through an evaluation of the sound intensity we aim to reconstruct the energetic characteristics of the vibrational source. For example, in a pendulum, the temporary angle of swing changes. Despite these variations there is continuous mutual exchange in the form of the energy. In the lowest position, the pendulum ball has the largest speed and maximal kinetic energy. In contrary, when the ball reaches highest position the kinetic energy is equal to zero because all energy is accumulated as potential one in gravitational field. The key property of this phenomenon is that the balance, being the sum of both forms of energy, is preserved, athough the ball position changes in a continuous oscillation format. The total energy stored in the vibrating object is a good measure of the intensity of the oscillations.

        The analogy with hair cells is direct, but in this context the oscillations have an active nature. This means, that vibrations are stimulated and energy is continuously provided to the cells. Here, the energy balance, calculated for a longer time period shows that the provided energy compensates any friction losses. We believe that the energy emitted from the hair cell vibrations is approximately proportional to the current energy of the cells. The observed otoacoustic emission differs from that emitted through some transformations occurring on the way between the source (cochlea) and external canal. Again we believe that these transformations are passive and that they modify the OAE signal slightly. If so, by investigating the energy of the "emitted wave" we can get access to the information about the current intensity of the vibration of the source. In the ideal pendulum, the energy (intensity) of the oscillations is constant and single measurement provides all characteristic of the phenomenon. The energy of vibrations of OAE source (or sources) depends on many factors and changes in time. The changes are the key goal of our experimental interest. The amplitude of vibrations, as we defined it above, reflects directly instantaneous power of the signal that is the intensity of OAE. The reconstruction of the amplitude variations in time is an important task leading indirectly to the knowledge about intensities of source sound vibrations.

4. Signal processing methods

        Considering the statements above, we can say that the information carried by the sequence of peak values is important and… fragmentary. In the paragraphs below, we are going to review methods to fill the gaps between the peaks creating the envelope and the amplitude. The most popular and traditional method is called peak rectification. The method derives from a former radio broadcasting technique, when the signals were coded using amplitude modulation. The radio transmitter modulated the carrier of constant frequency by changing the intensity of the emitted radio waves. The task of the receiver was to reconstruct the amplitude and in this way to get information about the intensity. The analogy with our task is vary close, the hair cells are the transmitters and our probe-microphones are the receiver. Because the envelope is symmetrical, it is enough to take only positive part of the waveform (rectification) and connect peaks of such signal. To perform such a connection, the device uses memorizing elements (capacitors) to store value of the last peak.

        The estimation of the instantaneous power offers more likelihood reflection of the amplitude. Values of the signal are squared (squared rectification), then the components being side-product of the highest rate of changes are removed by low-pass filtering. The resulted waveform is transformed by square rooting. For a signal obtained this way one can scale it to get the measure of the intensity.

        The first method, peak rectifier, is similar to hand made determination of the amplitude which one can perform on paper connecting successive peaks with a straight line or slowly decaying line. This method does not give smooth waveform and the accuracy of the method is not high. The second method using gives a smoother but not ideal representation. The dilemma arises how to select the cutting frequency of the low-pass filter. If we "cut" too low the high frequency components, we will get false ripples. If we "cut" too much, we will get inertial effects and the result will be slower then real changes of intensity.

        Another method of estimation of amplitude is the reconstruction of the analytical signal using Hilbert filtration. In the above sentence we have introduced two new terms - Hilbert filtration and analytical signal. In order to explain them we need to go back to the pendulum analogy. Let us recall that in the pendulum there are two forms of energy , the kinetic related to movement and the potential related to position. Sound has also two forms of the energy but by using microphones we record one of them called sound-velocity or pressure. In order to evaluate the energy of the acoustic signal one needs to know both forms of energy. One solution would be to use a second type of microphone, however such an approach is not very efficient. Instead of conducting complicated measurements, we can use a mathematical transformation to get the dual form of the signal. The transformation is called Hilbert filtration. A pair of signals consisted of real measured waveforms, obtained from the filtration, can be used to create a complex signal called analytical form. The amplitude of the analytical signal is calculated as the square root of instantaneous power of both components. Using this method the estimated amplitude provides a correct information about the source of the signal, but only in cases when the signal is simple. Mathematical proofs have verified that the envelope, determined by the amplitude of the analytical signal, passes through all peaks of the real component. So it satisfies the assumption from which we started - it crosses the amplitude peaks. The advantage of this method is that there is no questions on how to cut the high frequency oscillations because the this method has no inertia.

        However, the Hilbert filtration has limitations. If the signal has many peaks in the range of the period then the envelope will automatically pass through all such peaks. And in such cases the method will give a lot of false ripples. The method is limited only to simple signals.


5. Frequency

        Lets try to define a term which we can call as the "frequency component". But first few words about "frequency". The frequency is just reciprocal of the time length of the signal period - this fragment of the waveform which regularly repeats in the signal. Not all signals are periodic, which means that we cannot characterize all signals with frequency. To extend our consideration we will define a class of signals with variable frequency, where the period changes more or less systematically in time. The most popular example of such signal is what we call a "chirp" showed in the Figure 1. Although the amplitude and frequency of the chirp change in time, for this signal we can attribute values of amplitude and frequency. The simplest case of frequency component is the sinusoidal tone. However a waveform of similarly constant frequency and amplitude but not sinusoidal shape is for us a composed signal with harmonics. In turn, each harmonic having frequency being an integer multiplication of a fundamental frequency belongs to the class of frequency components.


Figure 1: TF representation of a Chirp signal

        Now we have a couple of terms which will allow us to determine the objective of time-frequency (TF) analysis. The goal of TF methods is to provide a picture of the signal as a set of frequency components. For each frequency component, we are interested in the evaluation of its amplitude and frequency variations. We assume that a particular frequency component may derived from a particular source or may be a harmonic So, we believe that performing a TF analysis of a signal like a TEOAE we are getting information about the sources or emission generators of the signal.

        The result of the TF analysis has a form where signal intensities are mapped on the plane in time and frequency coordinates. In theory we have a lot of variants of methods dedicated for TF analysis. However, the spectrogram called Short Term Fourier Transform (STFT) is the most popular and natural method introduced by Gabor. The distribution is created using traditional spectral analysis for sliding, short segments of the signal. For successive time instants, a segment of the signal is gated around a given point in the time scale. The spectrum calculated for this time instant shows how power is distributed over frequencies. The time scale is scanned by sliding the gating window.

        The STFT approach is very natural and has good physical interpretation. Unfortunately, the STFT gives results with low resolution. The time resolution is determined by the width of the sliding window and the obtained spectrum is averaged for the time period of "gated" segment. On the other hand, if one shortens the segment in order to increase the time resolution, he would lose frequency resolution. The uncertainty rule says that frequency resolution is reciprocal of the time resolution and vice versa. For such short signal as a TEOAE, dividing it into shorter segments results in very smeared image of the energy distribution. A distribution using the wavelet transform offers the possibility to get such image that time resolution is higher for high frequency components and lower for low frequency components. Generally like in spectrograms, however, the similar trade-off both domains exists here.

        We can increase the resolution two times by applying the so-called Wigner-Ville distribution (WVD). The "cost" of WND generating such over-natural effect consists in that false components, called cross-terms, appear in the image. Additionally, the values of the energy distribution is not positive everywhere, and this introduces difficulties in the interpretation of results.A typical TF represenation from a pre-term neonate is shown in Figure 2, generated with a custom-made software package, which will be soon be available in the OAE Portal site .


Figure 2: TF represenation of a TEOAE response, showing a large number of cross-terms

        Various methods of smoothing have been proposed in order to lessen the effect of cross-terms. This concept has lead to the Cohen class of smoothing distributions. In this family, the most prominent position is taken by the exponential distribution named as Choi-Williams. The majority of studies which have used time-frequency analyses have pointed out that each class of signals requires an optimal selection of a specific distribution, which yields optimal results for that class of signal. The first reason is the necessity to select the resolution to fit the particular characteristics of the studied signal. The most fundamental reason is that a given "waveform" may have more than one time-frequency representations. For example, a signal with variable amplitude and constant frequency may be interpreted (and generated) as a signal with amplitude modulation, or as the superposition of two tones with close frequency and constant amplitudes. The same waveform may be generated one way or another and one distribution may give an image consistent with the first generator and another distribution may be consistent with other one. The usefulness of a particular method of analysis depends on that how well it matches the original signal.

6. TEOAEs and TF analyses

       For TEOAE TF analysis I personally prefer to use the Wigner-Ville method modified with a specially fitted technique of regional smoothing. The method does not belong to the Cohen class of distributions because the smoothing is applied selectively to (a) regions where the energy is negative and (b) to the borders of such negative energy contours. With this method we can obtain significant improvements in the quality of the results, without losing significant portions of resolution. We know also the disadvantages of such an approach. When the cross-terms are located on the same places as the real components, the smoothing is not efficient enough to emphasize such information. Generally, though the smoother WVD method gives good results for the TEOAE signals.

        An example of the Wigner-Ville distribution with regional smoothing is shown in Figure 3.


Figure 3: The TF representation of Figure 2, smoothed 500 times

        Data from a recent study we have conducted on neonatal subjects have indicated that on the time-frequency plane one can distinguish several categories of components, such as:


  • Horizontal lines with almost constant frequencies

  • Almost vertical lines covering broad band of frequencies and short time period

  • Lines with decreasing (falling) frequencies.


Figure 4: Various types of TEOAE components in a TF representation

        Our interpretation of these results leads to the conclusion that every component category might be generated by a different cochlear mechanism. The horizontal lines are related to spontaneous emission. We have observed that the dominant frequencies of these lines overlap with the peak frequencies of the spectrum of spontaneous emissions. We suspect that the vertical lines are traces of the acoustic artifact the click produced by stimulus reflections. We have not clear explanations yet, about the decreasing frequency-components. We hypothesize that these component could reflect distortion product otoacoustic emissions, an argument which needs additional investigations.