1. K. K. Paliwal's white noise Kalman filter
K. K. Paliwal's "A SPEECH ENHANCEMENT METHOD BASED ON KALMAN FILTERING" might be the first implementation of Kalman filter on speech enhancement. His assumptions are that the speech can be represented by an autoregressive (AR) process and the noise is a gaussian white noise. Base on that, he derived the algorithm of equations 15 to 19 in his article.
Esfandiar Zavarehei put the Matlab code of this algorithm on his website. (youpi again!) But he just assumed that the clean speech is available, that means that he estimated the AR coefficients from the clean speech instead of from the noisy one. So this brought about another problem: how to estimate noisy speech's AR coefficients?
2. Iterative AR coefficients estimation
An easy way of AR coefficients estimation is doing it iteratively. This method is introduced in the article "FILTERING OF COLORED NOISE FOR SPEECH ENHANCEMENT AND CODING":
1. chop the noisy signal into frames,
2. estimate the AR coefficients and the variances by Linear prediction coding (LPC) on these noisy frames,
3. Kalman filter the signal with these AR coefficients and variances, obtain the filtered signal frames,
4. do the LPC estimation again on the filtered frames to get new AR coefficients and variances,
5. iterate the steps 3 and 4 several times to obtain the clean speech.
|Iterative AR coefficients estimation|
I tested this algorithm with the same noisy speech sample used in the last article. The SNR of this speech sample is quite low so it can evaluate the quality of the algorithm. However, this algorithm can't remove the artifact of musical tone. And when I increase the iteration time to eliminate more noise, the intelligibility degrades rapidly.
There is some variations or extensions of this simple Kalman filter on the internet, but they seem to all need complex mathematical derivation and a lot of computation time. So I suddenly lose interest in this method.
4. Matlab Code
I put the white noise and colored noise versions algorithms in https://github.com/ronggong/voiceenhance. Notice that the input noise signal should be stationary in this implementation.