Abstract:
|
Different databases of phonetic units are required in multilingual Text-to-Speech systems based on concatenative synthesis. We are currently developing a TTS system able to convert text either in Catalan and Spanish, with some of the modules being used indistinctly by the two languages while others are specific to each language. In order to reduce the total amount of units, a bilingual database has been obtained from two monolingual databases recorded by the same speaker, which contains all possible units for both languages. Common units have been selected according to their phonetic representation. The bilingual database has 1099 units, including diphones and some long units, while the two monolingual databases would result in 1545 units. An analysis of Catalan unit frequencies has been done to select what units should be included in the database. The experiments carried out showed that that synthetic speech has a strong Catalan accent, probably due to the speaker's accent. Some common units, even if they are represented with the same symbol, should be considered separately in a bilingual database in order to cope with acoustically different allophones. |