Punjabi Speech Synthesis System for Android Mobile Phones

International Journal Of Engineering And Computer Science ISSN: Volume 3 Issue 9 September 2014 Page No Punjabi Speech Synthis System for Android Mobile Phon Jagmeet Kaur
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
International Journal Of Engineering And Computer Science ISSN: Volume 3 Issue 9 September 2014 Page No Punjabi Speech Synthis System for Android Mobile Phon Jagmeet Kaur 1, Parminder Singh 2 1 Department of Computer Science and Engineering, SGI, Farrukh Nagar, Haryana, India 2 Department of Computer Science and Engineering, G.N.D.E.C, Ludhiana, Punjab, India Abstract: Mobile phone usage is approximately 3.5 tim more than the usage of Personal Computers. Android has the biggt share among all Smartphone Operating systems like Symbian, Windows etc. because it has very few rtrictions for the developers to develop an application. Text to speech synthis is one application which reads the written text aloud. TTS systems on Android are available for many languag but not for Punjabi. Our prent work is to develop a Punjabi text to speech synthizer that can produce an output speech on a mobile device. While porting this TTS system to a rource limited device like mobile phon, some practical aspects like application size and procsing time are considered. The Concatenative Speech Synthis technique has been used which us the as the smallt single units for concatenation. Keywords: Android Operating System, Concatenative Synthis,, Speech Database. 1. INTRODUCTION In the past decade, mobile device have made rapid progrs. A few years back an open source platform Android is being popular. An Android mobile phone also provide the Text to Speech synthis with vocal interface for the users to allow them to read their text or aloud on their mobile screen, hence reducing the use of visual modality. This TTS application also helps the users to read the text while driving, jogging etc. [2]. Android TTS application is helpful for users with visual disabiliti and illiterate mass [1]. Many speech synthizers are already available for various languag, but building a speech corpus for other languag than that of English speech corpus is a different task [3]. The problem of porting a TTS system to mobile devic is limited storage and procsing power. Gopi et al. develop Text to speech synthizer (TTS) for Android platform. Authors use ESNOLA (Epoch Synchronous Non Overlap Add) based concatenative speech synthizer technique with Partnem as the smallt unit for implementing the TTS for the Malayalam language [4]. Ahlawat and Dahiya developed English and Hindi TTS engin on an Android environment. For Hindi TTS, authors prent a two layer procs by first getting the input text in Hindi language and then map this whole Hindi data into English language and then generate the output speech. They use phonem of the English language are used as the smallt units for the concatenation [5]. Mhamunkar et al. prent an application of Android the speech to text and text to speech conversion for searching a word meaning through Jagmeet Kaur 1 IJECS Volume 3 Issue 9 September, 2014 Page No Page 8020 voice. The application accepts the word in the voice then search the word in its mobile dictionary and generat the meaning of the word as the synthetic speech output [6]. A Bengali Text to speech synthizer for the Android operating system is developed by Mukherjee and Mandal using ESNOLA (Epoch Synchronous Non Overlap Add) based Concatenative Speech synthis technique with Partnem as the smallt units for the concatenation [2]. Saychum et al. use the hidden Markov model (HMM) based speech unit for a bilingual TTS System on Android operating system which converts the text into Thai language and English language separately and then plays an audio file for that [7]. The application text to speech synthis on Android mobile phon for people who are blind or visually impaired is discussed by Shaik et al. [8]. An African text to speech engine is developed in the mobile platform for E-learning by Roux et al. The authors use HMM technique to read a portion of text aloud and hence generating the output speech [9]. A Thai speech synthizer on a mobile device based on Flite, a unit concatenation synthizer is implemented by Wongpatikaseree et al. [1]. Singh and Lehal developed Punjabi TTS systems which produced reasonably acceptable synthis output on PC [10]. However, it has not yet been implemented for rource-limited devic such as mobil. The goal of our rearch is to develop a Punjabi text-tospeech synthizer that can produce the synthized output speech in a real-time on a mobile device. The Concatenative speech synthizer technique has been used in order to get the two qualiti of the output synthized speech: intelligibility and naturalns. of the Punjabi language have been used as the basic unit. Punjabi speech database is developed for the Punjabi language containing the valid phonem (combination of vowels (V) and consonant-vowel combination (CV)) and their corrponding sound filenam are stored against them. Sounds for the V and CV combinations are recorded in wave file. The input text is first segmented into Punjabi phonem, then the phonem are searched in the database and corrponding filename are retrieved. The fil are then searched in the rourc folder and played [11]. 2. ANDROID ARCHITECTURE Google on 5 th November 2007 launched the mobile platform called Android for mobile devic like PDA, net books and smart phon [12]. Android is an open source OS based on Linux kernel which acts as an abstraction layer between the hardware and the software stack [13]. All applications are written in Java Programming language and Eclipse is the IDE for developing Android apps. Google created Dalvik as the virtual machine environment for mobile devic for compiling the projects, each application runs on its own VM, not on the Java VM [14]. Since embedded systems have the constraints of application size and procsing time, so Dalvik Virtual Machine (DVM) is the one important feature which is optimized for low memory requirement [12]. The architecture of Android operating system is divided into four layers: the first layer is Application layer, the second layer is Application framework, the third layer is divided in two sub layers: librari and Android Runtime, and the last layer is Linux Kernel. So in total there are five layers [14]. Main featur of Android operating system are: Free use and adaptation of operating system to manufactur of mobile devic. Optimized use of memory with DVM. High quality of audio visual content. Quick and easy development of applications using development tools and rich database of software librari [15]. 3. METHODOLOGY The methodology followed for the project is as follows: a) Concatenative synthis technique is used to get the naturalns quality of the synthetic speech which involv taking real recorded speech; cutting it into segments and concatenating the sound segments back together during synthis to produce the dired output speech. b) The user enters the Punjabi text into the textbox which is divided into words. This phase first analyz the positions of vowels and consonants in a word and then segments the words into phonem as a vowel and consonant vowel combination. Jagmeet Kaur 1 IJECS Volume 3 Issue 9 September, 2014 Page No Page 8021 c) The database of this Punjabi TTS consists of two fields: word and filename. The word field contains the phoneme and the filename field contains the sound file. The database preparation consists of the selection of sentenc containing all the vowels, consonants and their combinations, recording of the sentenc and finally marking of the phoneme sounds in the recorded sound fil. SQLite DBMS is used to store the Punjabi phonem and their corrponding sound fil. d) Now the phoneme is searched in the database and retriev the filename, which is further searched in the application rourc for its corrponding sound wave file to get played. At the end, the phoneme sounds are concatenated to generate the sound corrponding to the input text. 4. DEVELOPING PUNJABI TTS SYSTEM TABLE 1. VALID AND INVALID PHONEMES e Type No. of Invalid V (Non - Nasa lized ) V (Nasa lized) CV (Non- Nasalize d) CV (Nasalize d) To tal DATABASE PREPARATION Following are the steps which are followed for the preparation of Gurmukhi Punjabi speech database: Valid PUNJABI PHONEME SELECTION For the development of this TTS system, are selected as the basic unit of concatenation. The reason for selecting phonem as basic speech units is that, any word can be made using phonem while keeping the database relatively smaller than any other method [10][11]. In Punjabi language there are two typ of phonem V and CV gave rise to 380 phonem with non-nasalized vowels and 380 phonem with nasalized vowels, rulting total 722 valid Punjabi phonem, where V and C reprent vowel and consonant rpectively as shown in TABLE TEXT FOR RECORDING For the analysis of phonem, a carefully selection of unbiased Punjabi Corpus was made, having nearly four million total words. For labeling the phoneme sounds, we have selected the words having all the consonants, vowels and their valid combinations for recording. Exampl ਆ ਆ ਜ [ ਮ [ ਮ(C) ਜ(C) + ਆ (V)] +ਈ(V)] RECORDING WORDS The selected words have been recorded by a native female speaker of Punjabi. The speech quality depends upon the quality of the recorded sound and hence, sound quality of extracted speech units from this recorded sound. So, a profsional female speaker of Punjabi is selected for recording. The recording has been done in the studio with the following characteristics: Sampling Rate: Hz, Bit Depth: 16 bit, Channels: Mono. Jagmeet Kaur 1 IJECS Volume 3 Issue 9 September, 2014 Page No Page 8022 4.1.4 LABELING OF THE PHONEMES The next phase is to label the e sounds in the recorded sound file. This phase of labeling the phoneme sounds in the database is very crucial and time consuming task and needs to be done very carefully, because the naturalns of the synthetic speech produced by the TTS system depends upon how exactly the phoneme boundari have been marked. For this purpose the sound editing software- Sonic Foundry Sound Forge 10.0 has been used. The phoneme sounds have been labeled manually one by one, after carefully listening and analyzing the word sounds. The phoneme boundari have been marked and noted down [11] CREATION OF PUNJABI SPEECH DATABASE The Punjabi speech database is an important part, which it is optimized for the high quality TTS system. In order to obtain the naturalns of our application, we have used the concatenative technique for combining the sound fil of the phonem. As this TTS system is to be developed for portable devic on Android operating system, so a very light weight SQLite database is used. The database of our TTS system has two fields: word and filename. The first field contains phoneme itself, second field contains the sound filename. In total 578 phonem and their sound fil have been recorded and the size of our database is 13.1 MB. 4.2 TEXT NORMALIZATION The Punjabi text normalization module consists of procsing the input text before passing the text for the TTS conversion. The input text is a raw text containing the abbreviations, special symbols and numeric valu are first searched and then replaced with their expanded form, so that they are spoken in full word form. The entered text is then segmented into words [1]. 4.3 WORD SEGMENTATION INTO PHONEMES Since e is the basic unit of concatenation, so it is necsary to first segment the Punjabi input text into words and then the words are segmented into phonem which are stored in the database. For example, the word ਜਗਮ ਤ will be segmented into five phonem: ਜ (ਜ ਅ) + ਗ (ਗ ਅ) + ਮ (ਮ ਈ) + ਤ (ਤ ਅ). 4.4 SEARCHING PHONEMES AND CONCATENATION This phase is rponsible for searching the corrponding phoneme in the database and retrieving the sound filename. Then the corrponding sound file is searched in the application rourc, loaded into the memory. If the search is succsful for that particular phoneme, the corrponding sound file is returned from database. If there is no entry for that particular phoneme in the database, then this will be skipped as invalid phoneme. At the end, the phoneme sounds are concatenated to generate the sound corrponding to the input text. 4.5 PORTING TTS TO ANDROID PLATFORM The final phase of our TTS application is to port the TTS system from the dktop to the Android platform. Gurumukhi font has been introduced in the application to render the typed Punjabi text. With the help of Gurumukhi font, for every key prsed from mobile its corrponding Punjabi letter is displayed in the input text field. This text is then converted into the Unicode valu. The Unicode valu are the inputs for the TTS engine. 5. IMPLEMENTING PUNJABI TTS SYSTEM ON ANDROID The minimum system specification for implementing Punjabi TTS on Android is Android OS version 2.2 with 512MB RAM. Android application should be efficient because they will run on mobile device with limited computing power, storage and constrained battery life. The size of our application is 10.3 MB. On Android device keeping the database connections opened and occupying the memory all the time is very expensive. So database connections are closed as soon as the sound filename is retrieved and the memory is released as soon as the sound file is finished playing. The TTS system has the functionality that the user can generate Punjabi speech sound by typing the Punjabi Jagmeet Kaur 1 IJECS Volume 3 Issue 9 September, 2014 Page No Page 8023 word in an English alphabet format. The input text typ in an English alphabet can be written in the textbox. The Play Sound button when clicked generat the speech file corrponding to the text and plays the audio file generated. The application can be distributed to the end user, and can be made available on the Android devic by connecting to a PC through the USB port and Bluetooth. Fig. 1 illustrat the Gurmukhi Speech Synthis system and Fig. 2 illustrat the In this paper, we dcribe our implementation of a Punjabi speech synthizer on an Android OS mobile device. Our aim is to develop a TTS (Text-to-Speech) application that can produce an output speech in almost real time on the Android based smart phon. Our development of TTS system is based on principl of concatenation using phonem as the speech unit. The size of the speech database is 13.1 MB with 578 phonem sounds extracted from the recorded sound file and the size of this Punjabi TTS application is 10.3 MB. The developed system shows good rults for segmentation of words into phonem and can segment the text of any length to the phonem. This application mainly caters the need of visually weak people who cannot read. This application will help them in knowing what is written. It also enabl the ill-literate people, who don t know how to read, in getting the information which is in written form. It also helps the book-lovers who like to read while travelling. Sometim they cannot read while travelling due to some reasons. They can simply plug in the ear phon and can listen the text. It also helps the people in learning Punjabi language. Although the are some advantag of this prented work, but still there are gaps in pronunciation mechanism which can be improved by improving the quality of the sound file using DSP techniqu. REFERENCES Interface for our TTS application. [1] K. Wongpatikaseree, A. Ratikan, A. Thangthai, A. Chotimongkol and C. Nattee, A Real-time Thai Speech Synthizer on a Mobile Device, Proc. 8th IEEE Conference on Symposium on Natural Language Procsing (SNLP), Bangkok, 2009, [2] S. Mukherjee and S.K.D. Mandal, A Bengali Speech synthizer on Android OS, Proc. 1st ACM Workshop on Speech and Multimodal Interaction in Assistive Environments (SMIAE), Jeju, Republic of Korea, 2012, Figure 1. Flow chart of Punjabi TTS Figure 2. Interface of TTS application [3] S. Kiruthiga and K. Krishnamoorthy, Annotating Speech Corpus for Prosody Modeling in Indian Language Text to Speech Systems, International Journal of Computer Science Issu (IJCSI), 9(1), 2012, CONCLUSION Jagmeet Kaur 1 IJECS Volume 3 Issue 9 September, 2014 Page No Page 8024 [4] A. Gopi, P. Shobana, T. Sajini and V.K. Bhadran, Implementation of Malayalam Text to Speech Using Concatenative Based TTS for Android Platform, Proc. IEEE International Conference on Control Communication and Computing (ICCC), Thiruvananthapuram, Kerala, India, 2013, [5] S. Ahlawat and R. Dahiya, A Novel Approach of Text To Speech Conversion Under Android Environment, International Journal of Computer Science & Management Studi (IJCSMS), 13(5), 2013, [6] P.V. Mhamunkar, K.S. Bansode and L.S. Naik, Android Application to get Word Meaning through Voice, International Journal of Advanced Rearch in Computer Engineering & Technology (IJARCET), 2(2), 2013, [7] S. Saychum, A. Thangthai, P. Janjoi, N. Thatphithakkul, C. Wutiwiwatchai, P. Lamsrichan and T. Kobayashi, A Bilingual Thai-English TTS System on Android Mobile Devic, Proc. 9th IEEE International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTICON), Phetchaburi, 2012, 1-4. [12] N. Gandhewar and R. Sheikh, Google Android: An Emerging Software Platform for Mobile Devic, International Journal on Computer Science and Engineering (IJCSE), NCICT Special Issue, ISSN: , 2010, [13] J. Liu and J. Yu, Rearch on Development of Android Applications, Proc. 4th IEEE International Conference on Intelligent Networks and Intelligent Systems (ICINIS), Kunming, China, 2011, [14] S. Primorac and M. Russo, Android application for sending SMS msag with speech recognition interface, Proc. 35 th IEEE International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO, Opatija, Croatia, 2012, [15] B.R. Reddy and E. Mahender, Speech to Text Conversion using Android Platform, International Journal of Engineering Rearch and Applications (IJERA), 3(1), 2013, [8] A.S. Shaik, G. Hossain and M. Yeasin, Dign, Development and Performance Evaluation of Reconfigured Mobile Android Phone for People Who are Blind or Visually Impaired, Proc. 28th ACM International Conference on Dign of Communication, SIGDOC, S.Carlos, 2010, [9] J.C. Roux, P.E. Scholtz, D. Klop, C. Povlsen, B. Jongejan and A. Magnusdottir, Incorporating Speech Synthis in the Development of a Mobile Platform for E-learning, Proc. 7th International Conference on Language Rourc and Evaluation (LREC), Valletta, Malta, 2010, [10] P. Singh and G.S. Lehal, Text-To-Speech Synthis System for Punjabi Language, International Conference on Multidisciplinary Information Science & Technologi (INSciT), Merida, Spain, 2006, [11] S. Luthra and P. Singh, Punjabi Speech Generation System based on, International Journal of Computer Applications (IJCA), 49(13), 2012, Jagmeet Kaur 1 IJECS Volume 3 Issue 9 September, 2014 Page No Page 8025
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks