Contributor –  Shiksha Pandita


Speech enhancement is basically related with the enhancement of speech which has been degraded by some kind of noise, and the process of using various algorithms on the speech which have been degraded by the various types of noises such as exhibition ,white noise ,musical noise etc so as to improve its quality and intelligibility is called as speech enhancement .speech enhancement is done to increase the perceptual quality and intelligibility of the noisy signal by using audio signal processing techniques. The various techniques are used for enhancing the quality and intelligibility of noisy speech. There are two aspects to speech quality; the perceived overall speech quality, and the speech intelligibility. Perceived overall quality is the overall impression of the listener of how “good “the quality of the speech is On the other hand, speech intelligibility is the accuracy with which we can hear what is being said Y(n) is the noise corrupted input signal which consists of the clean speech signal x(n) and the noise signal d(n) .Various speech enhancement algorithms that have been studied are as follows:-filter (i) spectral subtraction (ii)wiener filter (iii) MMSE(iv) Log MMSE( v)Decision directed approach

Spectral subtraction

It is most simple, easy to calculate and first algorithm which was used for the speech enhancement .

Basic principal

The spectral subtraction method uses very simple principle for the removal of unwanted signal from the main signal/ required signal .the noise is assumed and assuming noise we can get the estimate of the clean signal by subtracting the noise from the noisy signal. The noise spectrum can be estimated and updated during periods when the signal is absent. Here it is assumes that noise is stationary and slowly varying process and that the noise spectral does not change significantly between periods .The inverse discrete Fourier transform of the estimated signal is done in order to obtain the enhanced signal.


The algorithm is computationally simple as it only involves a forward and an inverse Fourier transform


1) Subtraction process needs to be done carefully to avoid any speech distortion.

2) If two much is subtracted then some speech information might be removed.

3) If two little is subtracted then much of the interfering noise remains in the signal.

Weiner filter

The wiener filter filters out noise that has which is unwanted and has degraded the speech quality making it difficult to use for the various purpose at the destination end. This filtering of the noisy signal derives the enhanced signal by optimizing a mathematically tractable error criterion method.

Basic principle

The basic principle of the wiener filter is to obtain an estimate of the clean signal from that has been degraded by any kind of the noise. The estimate is obtained by minimizing the mean square error(MSE) between the desired signal s(n) and the estimated signal . The wiener filter is named after the mathematician Norbet wiener it is theoptimal filter that minimizes the estimation error. Norbet weiner is the person who first formulated and solved the filtering problem in continuous domain.The output signal to noise ratio(SNR) after noise reduction with the single channel wiener filter is always larger than or equal to the input SNR. for any length of filter and for all possible speech and noise correlation matrices.The wiener filter is a popular technique that has been used in many signal enhancement.


(1)The filter is linear, thus making the analysis easy to handle.

(2)The filter could be finite impulse response (FIR) or infinite impulse response (IIR), but often fir filters are used because they are inherently stable and the resulting solution is linear and computationally easy to evaluate.

(3)Controls output error.

(4) Straight forward to design.


( 1) Results often too blurred.

( 2) Spatially invariant.

(3) Fixed frequency response at all frequencies and the requirement to estimate the power spectral density of the clean signal and noise prior to filtering

MMSE (minimum mean square error)

Basic principle

It does not assume that any relationship between observed data and the estimator exists, but in the same case it also needs information about probability distribution of the speech and DFT coefficient of the noise. it refers to estimation in a Bayesian setting with quadratic cost function and basic idea behind this is estimation stems from practical situations where some prior information about the parameter to be estimated is present. we may have prior information about the range that the parameter can assume; or we may have an old estimate of the parameter that we want to modify when a new observation is made available.


MMSE estimator unlike the wiener estimator, does not assume the existence of a linear relationship between the observed data and the estimator, but it does require the knowledge about probability distribution of the speech and noise DFT coefficient.


MMSE based method does not introduce musical noise and hence is good for the speech affected by the musical noise and hence can be used with shorter frame durations in the modulation domain


Basic principle

The basic principle of log MMSE is that it minimizes the error of the spectral magnitude spectra. though squared error of the magnitude spectra is mathematically easy to control but it may not be subjectively useful/meaningful. So it is possible that the metric based on the squared of the log magnitude spectra may be more useful for the speech enhancement .


It minimizes the error of the spectral magnitude spectra.It has been suggested that a metric based on the squared error of log magnitude spectra may b more suitable for speech processing.


The method needs complex mathematical calculations as compared to the other methods.

Decision directed approach

Basic principle

This method is based on the definition of priori SNR and its relationship with the a posteriori SNR .It determines the priori SNR from the noisy speech signal . We know that is given by. “ ^” is the lambda.


The conventional order of the time we reverse the time index and make the estimate of for the current frame dependent on clean speech estimates from future frames .This implies the need for a user desirable delay of several frames in contrast to FDD and BDD results in less biased estimates of at the beginning of speech sounds and overcomes echo artifacts at the offsets of speech sounds.


The estimate off or the current frame is dependent on clean speech estimates from the past .Thus ,the estimated, denoted with ^,may be dependent on clean speech estimates from a different speech sound ,this leads to biased estimates of and consequently to incorrect noise suppression especially at the beginning of speech sounds and moreover to the introduction of echo craft at offsets speech sounds.


Different filters have their own advantages and disadvantages in terms of simplicity the spectral subtraction is the most simple and low cost filter but the quality of the signal obtained at the output is not high ,the other methods such as MMSE and log MMSE have complex mathematical calculations as compared to spectral subtraction and wiener filter. All these methods have their advantages and disadvantages depending upon the different type of noises. Two types of decision directed approaches are there one is forward decision directed and to overcome the demerits of the of FDD, backward decision directed approach is used.