Brief Project Description | Caratteri e strutture fonetiche, fonologiche e prosodiche della lingua sarda

The project entitled “Phonetic, phonological and prosodic characteristics and structures of the Sardinian language” is a research program having the scope of creating the basis for the setting up of a Text-To-Speech synthesizer capable of producing spoken voice from texts written in Sardinian language.

Speech synthesis is widely used in ICT, especially in computers and mobile electronic devices, automotive, educational programs and for the help of the impaired. Synthesized voices can be created by concatenating pieces of recorded speech (single words or phrases, phones or diphones) stored in a database. The concatenation is obtained on the basis of phonetic and prosody information implemented in the TTS system. The work therefore involves several disciplines such as linguistic investigation, vocal analysis and computer programming.

The aim of the project was therefore is to set up a system containing the phones, diphones and phonemes of the Sardinian language along with the descriptions of how these sounds are to be concatenated in order to obtain an intelligible voice output. This meant realizing two main research accomplishments: firstly, the primary phonetic description of the Sardinian language; secondly, the development of a method for the creation of the linguistic archive. The main objective were three: acquire the necessary linguistic data and information; realize the database of the phonemes of the language; define the concatenation rules.

The activity was made up of the following tasks: (i) realization of a linguistic corpus; phonetic and phonological inquiry of the language; (ii) individuation and sampling of the single linguistic phenomenons; (iii) development environment set-up; (iv) database population and programming of the concatenation procedures; (v) creation of the demonstration prototype; (vi) results dissemination.

The research tasks lasted 18 months while publication of results and web site construction took another 6 months of work. The activities were divided in 8 trimester time intervals and 4 work packages which were: WP1 – Preliminary Research, from 1st to 2nd trimester; WP2 – Data gathering and elaboration, from 2nd to 6th trimester; WP3 – Project summary and publication of results, from 7th to 8th trimester; WP4 – Project support activities, from 1st to 8th trimester.

The deliverables of the project are the following: 1) a linguistic database containing the texts and recorded material; 2) a set of language description rules and computing procedures; 3) a TTS prototype for the Sardinian language with two distinct voices; 4) a book containing the project results; 5) a web site containing the project results, the prototype and the linguistic materials.

All the results constitute an innovation in their respective sectors in both quantitative and qualitative terms as far as the native languages of Sardinia are concerned (not only Sardinian but also Gallurese and Sassarese) and likewise for the other two alloglot languages present in the Island (a Genovese dialect called Tabarchino and the Catalan language of Alghero). We do not know of similar works for other Italian minority languages.

The research results contribute significantly to of the advancement of the phonetic studies on the Sardinian language. They may also be used as a reference for teachers and students who need a reliable phonetic description of the Sardinian language. Other possible uses are as a basis for the development of language programs and didactic or social applications as well as a testing ground for language normalization problems.

The benefits deriving from this research may even be applied to other TTS systems such as the development of applications regarding local expressions of the Italian language. Last but not least, it constitutes a basis for the future development of an ASR synthesizer (Automatic Speech Recognition).