In touch with real speech

Listening Cherry 04 – The decoding gap

Listening Cherry no. 4 continues my report on recent work by Mirjam Ernestus in speech perception.

[This post follows on from Listening Cherry 03]

One of the aspects of Mirjam Ernestus’s work is her focus on the different soundshapes that words can have. And in particular on various levels of reduction. Her classic examples are (for English) yesterday being yeshay |jɛʃeɪ| and a little while being ud.l.wa |əɾl̩wa| (for more examples in English, Dutch and French cf. Ernestus and Warner 2011, p. 254). Of these examples Ernestus makes the following two points:

speakers are typically not aware of such variants in their own speech or in the speech of others, and … they do not recognise these variants when presented out of context (Ernestus, 2014, p. 11)

The first point, that speakers are not aware of the soundshapes they utter, is in line with my findings, and results in a phenomenon that I refer to, in Phonology for Listening as the  ‘Blur gap’ (Cauldwell, 2013, p. 17).

The justification for her second point – that native speakers don’t recognise reduced variants out of context – comes from experiments reported in Ernestus, Baayen and Shreuder (2002) where they used a mixture of high medium and low reduced forms of Dutch words in three contextual conditions of decreasing size:

  • in the full sentence forms,
  • with vowels and ‘intervening consonants’ of neighbouring words,
  • in isolation.

They found that

Participants recognised the tokens with low or medium reduction more than 85% of … [cases] independently of how much context they heard.

But with the highly reduced words, the recognition rate decreased dramatically in line with a decrease in context:

Highly reduced word recognition

ContextPercent recognition
Full sentence92%
Neighbouring vowels70%

The table shows that where informants were presented with the full sentence context, they recognised the highly reduced form in 92% of cases, with the neighbouring vowels context this figure dropped to 70%, and in isolation, it dropped to nearly 50%. Ernestus (2014, p. 12) speculates that

listeners unconsciously reconstruct reduced variants to their unreduced counterparts on the basis of context [emphasis added]

That is, a speaker-created, speaker shaped stream of sounds arrives at the ears of the listener who immediately – way below the level of awareness – associates the reduced variant with its full counterpart, and believes that he/she has heard the full form.

This speed-of-light reconstructive perception on the part of native speaker and expert listeners is an obstacle to teaching listening to fast (normal) spontaneous speech. Why? Precisely because it is unconscious – it happens below the level of awareness – and we (expert-listener) teachers/textbook writers/teacher-trainers believe we hear full (ish) forms when in fact the sound substance consists of fast moving word-traces in a continuous blur. On the other hand, our students hear the mush of reduced forms for what it really is – an acoustic blur – because they have yet to achieve the unconscious ability to reconstruct words from the input of sound substance.

This results in the awkwardness and discomfort that often occurs in the classroom when recordings of real normal speech are played: teachers are deceived – by their own expert perception – into believing that the sound substance is unproblematic, whereas learners with undeceived honest ears hear the mush for what it is (an acoustic blur). And they have difficulties matching the mush with what they know.

So there is difference in perceptions of ‘what is actually there’ in recordings. Hence the awkwardness. This is a situation I describe (Cauldwell, 2013: 255) as the decoding gap.


