IJSTR

International Journal of Scientific & Technology Research

IJSTR@Facebook IJSTR@Twitter IJSTR@Linkedin
Home About Us Scope Editorial Board Blog/Latest News Contact Us
CALL FOR PAPERS
AUTHORS
DOWNLOADS
CONTACT
QR CODE
IJSTR-QR Code

IJSTR >> Volume 3- Issue 7, July 2014 Edition



International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616



A Sinusoidal Noise Model Based Speech Synthesis For Phoneme Transition

[Full Text]

 

AUTHOR(S)

H.M.L.N.K Herath, J.V Wijayakulasooriya

 

KEYWORDS

Index Terms: Speech synthesis; Phoneme; Sinusoidal noise model

 

ABSTRACT

Abstract: One well-known problem with speech synthesis is the occurrence of audible discontinuities at phoneme boundaries, which lead to the unnaturalness of synthetic speech. This paper presents a sinusoidal noise based mathematical method to reform the transition regions from one phoneme to another phoneme with low storage. The speech parameters of sinusoidal noise model were estimated and stored as polynomials to reconstruct the transition wave. According to the results, all transitions regions which are considered during this experiment have higher correlation values for lower order polynomial with less capacity ratio. In addition, to that the same experiment has been carried out by changing the number of FFT coefficient. As the FFT coefficient increases, capacity ratio was also increased, while correlation coefficient values were also increased. It was understood that a signal which is very close to the original signal can be generated with a lesser number of FFT coefficients

 

REFERENCES

[1]. Epson, “Voice guidance LSI” S1V3G340 datasheet, March 2009

[2]. S. Roucos and A. Wilgus. “High-Quality Time Scale Modification ofSpeech”, in Proc. of the IEEE International Conference on Acoustics,Speech and Signal Processing, ICASSP’85, 1985, pp. 236-239

[3]. F. J. Charpentier and M.G. Stella. “Diphone synthesis using an overlapadd technique for speech waveforms”, in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’86, 1986, pp. 2015-2018

[4]. C. Hamon, E. Moulines and F. Charpentier, “A diphone synthesis system based on time-domain prosodic modifications of speech”, in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’89, 1989, pp. 238-241

[5]. W. Verhelst and M. Roelands. “An Overlap-Add Technique based onWaveform Similarity (WSOLA) for High-Quality Time-ScaleModification of Speech”. in Proc. of the IEEE International Conferenceon Acoustics, Speech and Signal Processing, ICASSP’93, 1993, pp. 554-557

[6]. T. Dutoit and H. Leich. “MBR-PSOLA: Text to speech synthesis based on a MBE resynthesis of the segments data-base”. Speech Communications, no. 13, pp. 435-40

[7]. L.L.M. Vogten, C. Ma, W. Verhelst and J.H. Eggen. “Pitch inflected overlap and add speech manipulation”, European patent 91202044.3, 1991

[8]. R.C. Torres, J.M. de Seixas, S.L. Netto, D.R. da S. Freitas and E.F.Brasil, “Portable implementation of a text-to-speech system forPortuguese”, in Proc. of EUSIPCO 2008, 2008

[9]. Ann Syrdal, YannisStylianou, Laurie Garrison+, Alistair Conkie and JuergenSchroeterTd-Psola Versus Harmonic Plus Noisemodel In Diphone Based SpeechSynthesis

[10]. A.S.Visagie,J.A.duPreez, Sinusoidal Modelling in Speech Synthesis, A Survey

[11]. Turi Nagy M., RozinajG.,An Analysis/Synthesis System of Audio Signal withUtilization of an SN Model, Radioengineering, Vol. 13, No. 4, December 2004.Pattern Recognition Association of South Africa (PRASA) conference, 2001

[12]. SERRA, X. Musical Sound Modeling with Sinusoids plus Noise.Musical signal processing. 1997, Roads C.& Pope S. &Picialli G. &De Poli G., Swets&Zeitlinger Publishers.

[13]. O’Saughnessy D. (2001). Speech Communications – Human and Machine, University

[14]. Ştefan-Adrian Toma1, Gabriel-Ionuţ Târşa2, Eugeniu Oancea1, Doru-Petru Munteanu1, Felix Totir1, Lucian Anton1 A TD-PSOLA Based Method for Speech Synthesisand Compression

[15]. R.J. McAulay and T.F. Quatieri ,Speech Processing Based on a Sinusoidal Model, The Lincoln Laboratory Journal. Volume 1, Number 2 (l988)