Wseas Transactions On Acoustics And Music


Print ISSN: 1109-9577
E-ISSN: 1109-9577

Volume 5, 2018

Notice: As of 2014 and for the forthcoming years, the publication frequency/periodicity of WSEAS Journals is adapted to the 'continuously updated' model. What this means is that instead of being separated into issues, new papers will be added on a continuous basis, allowing a more regular flow and shorter publication times. The papers will appear in reverse order, therefore the most recent one will be on top.



Weighted Multi-band Summary Correlogram (MBSC)-based Pitch Estimation and Voice Activity Detection for Noisy Speech

AUTHORS: Rashida Akhtar Rakhi, Humayan Kabir Rana, Md. Kislu Noman

Download as PDF

ABSTRACT: The pitch estimation and Voice activity detection (VAD) is the task of classifying an acoustic signal stream into voiced and unvoiced segments that plays as a crucial preprocessing tool to a wide range of speech applications. In this paper, a weighted multi-band summary correlogram (MBSC)-based pitch estimation algorithm (PEA) as well as voice activity detection (VAD) is proposed. The PEA performs pitch estimation and voiced/unvoiced (V/UV) detection via novel signal processing schemes that are designed to enhance the MBSC’s peaks at the most likely pitch period. This technique computes an independent normalized auto-correlation function (NACF) for each channel or frame which is relatively insensitive to phase changes across channels firstly and then filtered these NACFs to remove a significant portion beyond the pitch range 50-500 Hz and then finding an adaptive threshold from filtered NACFs. This threshold acts as a pitch position indicator and a voiced/unvoiced region detector. The accurate pitch period is obtained from the weighted MBSC. The proposed algorithm has the lowest gross pitch error (%GPE) for noisy speech in the evaluation set among the algorithms evaluated. The proposed PDA also achieves the lowest average voicing detection errors

KEYWORDS: - multi-band summary correlogram, empirical mode decomposition, normalized autocorrelation, voiced/unvoiced speech.

REFERENCES:

[1] Zhaohua, W., and Huang, N. E., “A Study of the Characteristics of White Noise using Empirical Mode Decomposition Method”, in Proc. Roy. Soc. Lond. A(460), pp. 1597-1611, 2004.

[2] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, N.J., 1993.

[3] K. Kasi, “Yet another algorithm for pitch tracking,” Masters Thesis, Old Dominion University, Norfolk, VA, 2002.

[4] G.Fant, Acoustic theory of speech production, Mouton and Co., Gravenhage, The Netherland, 1960.

[5] T. Shimamura, H. Kobayashi, “Weighted Autocorrelation for Pitch Extraction of Noisy Speech”, IEEE Trans. on Speech and Audio Processing, vol. 9, n. 7, pp. 727–730, October 2001.

[6] M. K. I. Molla, K. Hirose, N. Minematsu and M. K. Hasan, ' Pitch Estimation of Noisy Speech Signals using Empirical Mode Decomposition', Interspeech 2007

[7] Md. Khademul Islam Molla, Keikichi Hirose and Nobuaki Minematsu , “Robust Voiced/Unvoiced Speech Classification using Empirical Mode Decomposition and Periodic Correlation Mode”, Interspeech 2008.

[8] Hasan, M. K., Hussain, S., Setu, M. T. H. and Nazrul, M. N. I., “Signal reshaping using dominant harmonic for pitch estimation of noisy speech”, Signal Processing, 86(5):1010-1018, 2005.

[9] S. Nakamura, K. Yamamoto, K. Takeda, S. Kuroiwa, N. Kitaoka, T. Yamada, M. Mizumachi, T. Nishiura, M. Fujimoto, A. Sasou and T. Endo, 'Data Collection and Evaluation of AURORA-2 Japanese Corpus', IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 619- 623, 2003.

[10] D. Talkin, A robust algorithm for pitch tracking (RAPT), Speech Coding and Synthesis, Elsevier Science, pp. 495-518, 1995.

[11] Whitham, G. B., “Linear and nonlinear waves”, New York, Wiley, 1975

[12] Liu, S. C., “An approach to time-varying spectral analysis”, J. EM. Div. ASCE 98, 245-253, 1973.

[13] Bitzer, Joerg, Simmer, K. U., and Kammeyer, K. D., “Multi-microphone noise reduction techniques as front-end devices for speech recognition”, Speech Communication, vol. 34, pp. 3-12, 2001.

[14] Bitzer, Joerg, Simmer, K. U., and Kammeyer, K. D., “Mult-microphone noise reduction techniques as front-end devices for speech recognition”, Speech Communication, vol. 34, pp. 3-12, 2001.

[15] W. J. Hess, Pitch Determination of Speech Signals. New York: Springer, 1993.

[16] Shah, J. K. et. al., “Robust voiced/unvoiced classification using novel features and Gaussian mixture model”, in Proc. Of ICASSP04, 2004.

[17] Ahmadi, S., and Spanias, A. S., “Cepstrum-based pitch detection using a new statistical V/UV classification algorithm,” IEEE Trans. Speech Audio Processing, vol. 7 No. 3, pp. 333-338, 1999.

[18] ftp://ftp.cs.keele.ac.uk/pub/pitch/

WSEAS Transactions on Acoustics and Music, ISSN / E-ISSN: 1109-9577 / , Volume 5, 2018, Art. #3, pp. 20-27


Copyright © 2018 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0

Bulletin Board

Currently:

The editorial board is accepting papers.


WSEAS Main Site