C H A P T E R use microsoft office data matrix barcodes development toaccess datamatrix with microsoft office GS1 Barcode Types ProsodyEquation Section 15 t isn t what you s aid; it s how you said it! Sheridan pointed out the importance of prosody more than 200 years ago [53]: Children are taught to read sentences, which they do not understand; and as it is impossible to lay the emphasis right, without perfectly comprehending the meaning of what one reads, they get a habit either of reading in a monotone, or if they attempt to distinguish one word from the rest, as the emphasis falls at random, the sense is usually perverted, or changed into nonsense. Prosody is a complex weave of physical, phonetic effects that is being employed to express attitude, assumptions, and attention as a parallel channel in our daily speech communication. The semantic content of a spoken or written message is referred to as its denotation, while the emotional and attentional effects intended by the speaker or inferred by a listener are part of the message s connotation.

Prosody has an important supporting role in guiding a listener s recovery of the basic messages (denotation) and a starring role in signal-. Prosody ing connotation, o r the speaker s attitude toward the message, toward the listener(s), and toward the whole communication event. From the listener s point of view, prosody consists of systematic perception and recovery of a speaker s intentions based on: Pauses: to indicate phrases and to avoid running out of air. Pitch: rate of vocal-fold cycling (fundamental frequency or F0) as a function of time.

Rate/relative duration: phoneme durations, timing, and rhythm. Loudness: relative amplitude/volume. Pitch is the most expressive of the prosodic phenomena.

As we speak, we systematically vary our fundamental frequency to express our feelings about what we are saying, or to direct the listener s attention to especially important aspects of our spoken message. If a paragraph is spoken on a constant, uniform pitch with no pauses, or with uniform pauses between words, it sounds highly unnatural. In some languages, the pitch variation is partly constrained by lexical and syntactic conventions.

For example, Japanese words usually exhibit a sharp pitch fall at a certain vowel on a consistent, word-specific basis. In Mandarin Chinese [52], word meaning depends crucially on shape and register distinctions among four highly stylized syllable pitch contour types. This is a grammatical and lexical use of pitch.

However, every language, and especially English, allows some range of pitch variation that can be exploited for emotive and attentional purposes. While this chapter concentrates primarily on American English, the use of some prosodic effects to indicate emotion, mood, and attention is probably universal, even in languages that also make use of pitch for signaling word identity, such as Chinese. It is tempting to speculate that speakers of some languages use expressive and affective lexical particles and interjections to express some of the same emotive effects for which American English speakers typically rely on prosody.

We discuss pausing, pitch generation, and duration separately, because it is convenient to separate them when building systems. Bear in mind, however, that all the prosodic qualities are highly correlated in human speech production. The effect of loudness is not nearly as important in synthesizing speech as the effect of the other two factors and thus is not discussed here.

In addition, for many concatenative systems this is generally embedded in the speech segment.. THE ROLE OF UNDERSTANDING To date, most work on prosody for TTS has focused exclusively on the utterance, which is the literal content of the message. That is, a TTS system learns whatever it can from the isolated, textual representation of a single sentence or phrase to aid in prosodic generation. Typically a TTS system may rely on word identity, word part-of-speech, punctuation, length of a sentence or phrase, and other superficial characteristics.

As more sophisticated NLP.
