By John Kane, Christer Gobl (auth.), Thomas Drugman, Thierry Dutoit (eds.)

This e-book constitutes the lawsuits of the sixth overseas convention on Nonlinear Speech Processing, NOLISP 2013, held in Mons, Belgium, in June 2013. The 27 refereed papers incorporated during this quantity have been conscientiously reviewed and chosen from 34 submissions. The paper are geared up in topical sections on speech and audio research; speech synthesis; speech-based biomedical purposes; automated speech popularity; and speech enhancement.

IEEE Trans. Information Theory 38, 917–924 (1992) An Efficient Method for Fundamental Frequency Determination of Noisy Speech 41 5. : Autocorrelation of the Speech Multi-scale Product for Voicing Decision and Pitch Estimation. Springer Cognitive Computation 2, 151–159 (2010) 6. : A Pitch Extraction Reference Database. In: 4th European Conference on Speech Communication and Technology EUROSPEECH 1995, Madrid, pp. 837–840 (1995) 7. : A comparative performance study of several pitch detection algorithms.

Detection was speaker-based. 4 Results and Discussion Blind clustering by MANOVA analysis was carried out to determine the relevance of each feature set. The results are given in Fig. 4 in terms of the first two canonical components from MANOVA (c2vsc1): (a) if speech mfcc's are used two main clusters are observed which are clearly separated with an overlapping region within dot lines (6 errors); (b) vocal tract mfcc’s reduce the errors (4) but clusters are less separate; (c) glottal source mfcc’s separate clusters better, but number of errors is larger (8); (d) if glottal pulse and vocal tract mfcc's are combined the separation between clusters improve and the number of errors is lower (3).

The methodology presented here discards f0 as a valid feature because its estimation is complicate, or even impossible in unvoiced fragments, and its relevance in emotional speech or in strongly prosodic speech is not reliable. The approach followed consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. 77 in a gender-balanced database of running speech from 340 speakers. Keywords: speech processing, joint-process estimation, speaker’s biometry, contextual speech information.

