Richard Cauldwell
54 - Treasuring speed
In my work I jump for joy when I find examples of fast speech, because these are locations in recordings that are likely to prove difficult for learners to decode: soundshapes of words are very likely to have been streamlined - and, if so, are likely to have become unfamiliar and/or difficult to catch. An example of such fast speech I have recently come across is provided by Julian Treasure in a TED talk he gave in July 2009.
One of his closing sentences went: I’m going to leave you with with a little bit more birdsong. But interestingly, the TED transcript gives fewer words than this: I’ll leave you with more birdsong. (Omitting words/fillers which don’t contribute hugely to meaning is quite common in such transcripts.)
TED 660 JULIAN Treasure The 4 ways sound affects us 21 July 2009
The length of this talk is 5 minutes 26 seconds, with an average speed of 207.42 words per minute (as measured by the TED Corpus Search Engine here).
Speed of the sentence
I measure the duration of Julian’s I’m going to leave you with with a little bit more birdsong at 1.7 seconds. If we take a Garden approach to counting words (counting standard contractions - I’m and gonna - as single words, and I’m gonna as three syllables) then
- there are 10 words
- the speed is 5.9 words per second,
- which comes to 354 words per minute - 75% faster than the average speed of 207
- there are 13 syllables
- the speed is 7.7 syllables per second
- which comes to 460 syllables per minute
So this whole sentence itself is much faster than the talk as a whole. At this point I would like to compare the speed-in-syllables of this sentence with the speed-in-syllables of the recording as a whole, but I don’t have this available to me. However what we can do is compare it to the notions of ‘fastest’ speech in the research literature. A commonly accepted measure in the research literature of the ‘fastest’ speed for spontaneous speech is 5.4 sps (Field, 2019:63; Laver, 1994:541). This is 30% lower than Julian Treasure’s sentence.
But if we focus on the verb group (I’m gonna leave you) and the word cluster (with a little bit more) we find that these go at 9.0 syllables per second - whereas ‘bird song’ goes at 4.3 syllables per second. So the average speeds, usually minute-averages (words or syllables per 60 seconds), can hide a wide variation in speeds around the average - from off-the-scale fast (9.0+), to slow and stately.
For me, it is on these very short stretches of speech that decoding work needs to focus. Precisely because they present such a speedy challenge to learners.
53 - The accommodation question
My talks and workshops always involve explanations and demonstrations of fast spontaneous speech, and the consequent variability of sound shapes that occur in this stream of speech. I always emphasise the fact that I am talking about the teaching and learning of listening, not pronunciation, and that the goals for mastery of the two skills are different.
A question, or a point that is often put to me in my talks and workshops is this: Surely people accommodate to each other, so that when people speak they adjust the speed and clarity of their speech according to the ability levels of the person, or people, who are listening to them. And (the question continues) as most of the interactions in English in the world are between ELF speakers, who have learned English as a second language, they will be better at accommodating than L1 English speakers. The underlying thinking that these questions represent seems to me to be ‘Surely we need not bother with this messy stuff that you are presenting - we can rely on accommodation and what we currently do in listening to smooth the way’.
Hey ho!
This brings up loads of things.
[1] This accommodation view adopts (in my view) an over-optimistic view of (a) the range of circumstances in which language interactions take place and (b) the niceness of speakers of English - they are not always willing to be of helpful, or able to be helpful (c) the fact that language interactions often take place under pressure when people don’t have time, or the will, to be nice.
[2] I concede that often interactions can take place where accommodation is possible - speakers facing each other, with at least one of them having the skill to moderate their speed and accent to match the abilities of the person listening to them. But equally often language use either (a) occurs in non-reciprocal circumstance (radio, television, public announcements) where speakers do not know who their listeners are, and cannot adjust to their levels of understanding or (b) in pressure situations, or unfamiliar situations, where urgency required of the task precludes the time required to accommodate.
[3] Perhaps sympathetic accommodation occurs more often between L2 speakers of English, but my experience (though limited) of witnessing such interactions, or at least of analysing recordings of people using ELF, is that the same fast speeds and streamlining effects, and consequent multiple sound shapes often occur. In short, ELF speech can also be messy!
I would like to believe in a world where all interactions in English involve cooperative speakers and hearers who accommodate to each others abilities and needs, and who therefore speak with the clarity and intelligibility that is modelled in textbooks and in the classroom. But I know I don’t live in such a world, and I believe that pretending that we do shortchanges our learners. It gives us yet another excuse to ignore the realities of the speech signal.
52 - Answering questions 2 - That’s horrible!
Sometimes people don’t ask questions - they just exclaim. At a conference in Barcelona, I had just played one of my favourite sound files, one that I call ‘able-zoo’ which consists of a variety of voices, and a variety of versions of ‘be able to’. There was an immediate reaction from someone close to the stage: ‘That’s horrible’. Listen to it below - see if you agree.
This was one of a number of occasions when, on reflection, I was not happy with the response I gave. This response was to the effect that ‘What you call ‘horrible’ is normal for the Jungle’. My answer rode on the assumption that hearing the same words said in different circumstances, by different speakers, would be an enlightening - even delightful - experience which would hammer home the fact that sound shapes of all words and phrases can vary so much.
But actually, presenting examples of sound shapes in such a way is quite counter to our normal experience of speech. (Even though they may be samples of normality). Our normal experience of listening is that we have time to acclimatise to the accents, speech patterns, and individual characteristics of each speaker, and we work with streams of speech which are intended to make sense. Our hearing/listening mechanisms are therefore not accustomed to encounter a sequence of non-sense-making language samples containing ‘the same words’. So when thus confronted with isolated samples of sound shapes joined together in a bewildering chain of ‘same words/different sounds’ the reaction ‘That’s horrible’ seems quite reasonable.
So my short answer (short because I was anxious to progress through my workshop) ignored this dimension of the ‘That’s horrible!’ reaction.
I leap I dance I jump for joy, when I have compiled such chains of sound shapes, forgetting that for people listening to such chains for the first time this can be a shocking challenge. A good starting point for a workshop perhaps.
51 - Answering questions
Years ago, I was very timorous about the questions people would ask at the end of a talk, as I felt they would find a weak link or even a severe fault in my presentation. And their finding it would leave me gasping in panic, red-faced with embarrassment, my credibility shot to pieces, my cover blown.
But the older I get the more confident I get that I’ve got a good, helpful reply somewhere in my locker. I (now) love it when people ask questions. The questions-from-the-audience part of any talk or workshop is something that I now really enjoy (most of the time).
That doesn’t mean that I am happy with every answer I give - the ones that went wrong I still remember with acute embarrassment. But I love it most when someone asks a question from a point of view that I have not thought of, and I find that the questioner has knocked on a door that I didn’t know existed, but behind which a previously un-thought-of proposition emerges in reasonably good order to serve as an answer.
One such question came from a participant at a workshop I gave at the University of Birmingham last year. The question was ‘Which type of Jungle speech do you most dislike?’ (to understand this question, you need to know about the botanic metaphor of the Greenhouse, the Garden and the Jungle - see here).
It has never occurred to me to dislike any aspect of the Jungle (fast messy normal everyday spontaneous speech) the messier it gets, the happier I get: I am reassured that there is something to describe and make teachable/learnable.
So how did I answer this one? Well I said that there is nothing about the Jungle that I dislike, but that if I have a dislike of anything, it is that ELT treats the norms of the Garden (careful, rule-governed, sentence-twinned intelligible speech) as if they are the complete picture of what is true of all speech. In the garden, connected speech rules hold sway. And in ELT, we have had only the rules of connected speech as our metalanguage to help us cope with naturally occuring speech.
Don’t get me wrong, these rules are a helpful first step in explaining what happens when word meets word in phrases and sentences in the genteel circumstances of written text read aloud. But the set of such rules that we operate with in ELT are too much oriented towards the tidy rule-governed styles of speech: they cannot cope with the messiness of everyday spontaneous speech. We need to relish, describe, and teach the mess so that our students can become familiar and comfortable with such speech, so that the task of listening and understanding fast normal everyday speech is made much easier than it currently is.
The next post will concern a reply with which I am not so happy.
More on relishing the messy here.
50 - day of school = dave school
I have long contended that ELT’s ‘rules of connected speech’ are inadequate to describe what happens in the stream of normal spontaneous speech (see here). They are inadequate, in other words, to help with the listening challenges that many if not most learners face when listening to ‘real stuff’.
My colleague Curt Ford who has a lovely listening website here which features a blog post which came about as a result of a discussion we had about one of his examples.
One of the rules of connected speech predicts that given the words ‘day of school’ the Greenhouse pronunciations of the two words |deɪ.ɒv| will become - in the Garden pronunciation - linked by a glide giving |deɪ.ʲɒv|. Have a listen and see what you think.
After we discussed this example, Curt and I agreed that the two words ‘day of’ become a single syllable sounding very close to ‘Dave’ so we don’t get |deɪ.ʲɒv| we get something close to |deɪv|. The two syllables have become one - in my terms a ‘syllablend’ has occurred, resulting in a ‘sylldrop’ (cf. Cauldwell, 2018: 111-113).
Listen to ‘day of’ as ‘dave’ in this extract (it occurs three times).
Curt has now put together a collection of similar examples here, with nice dictation exercises.
I believe it is important for teachers and materials writers to spend some time with their capacity to make meaning turned off, and use this time to attend to the nature of the sound substance. We should realise that soundshapes may occur which are in defiance of meaning and context. Thus we get ‘Dave school’ for ‘day of school’ in the sound substance, even though in context/meaning terms this results in nonsense.
Also - and this is very important - you may disagree that ‘day of’ has become ‘dave’ - and that’s fine. But I would implore you to allow the validity of alternative hearings of the sound substance. By this I mean that it is perfectly reasonable and acceptable for people to have different perceptions of what the sound substance sounds like to them. Particularly when ‘different people’ include your students: they may be far better than you at dissociating meaning from substance, and their perceptions may be closer to the realities of the sound substance than your own.
Back to Curt’s website - here’s another of his examples - have a listen. What sound shape do you hear? What do you think the words are that the speaker intended to say?
Now go to Curt’s website here and see if you are right.
Cauldwell, R.T. (2018). A Syllabus for Listening - Decoding. Birmingham: Speech in Action. Available here.
49 - Insights into Student Listening - one = wouldn’t
This is the third of three posts in which I discuss issues which arise from Beth Sheppard and Brian Butler’s wonderful research paper Insights into Student Listening from Paused Transcription (Sheppard & Butler 2017). They asked a total of 77 students to write down four-word phrases which had just occurred in a recording: the recording was paused in order that they could write down their versions of the four words.
One particularly interesting result that emerged from the study was that the word wouldn’t in the four-word chunk that wouldn’t seem like had no correct transcriptions: it was omitted by 35 of the 77 participants, the other42 transcribed it as one, was, will, would, want. None of the participants transcribed it ‘correctly’.
Before we go any further, have a listen.
In the four-word phrase that wouldn’t seem like the two syllables of wouldn’t to my ears have a monosyllabic soundshape |wʊn|. What do you think? The extract below consists of the soundshape of wouldn’t on its own.
You have to be careful how you listen. If you approach the recording with the certainty that this is the soundshape of the bi-syllabic form of wouldn’t you will be priming yourself to hear it as consisting of two syllables. If however you listen with naive ears - your starting position is that you attending solely to the sound substance and not on meaning - I think you will agree that it is closer to monosyllabic than bisyllabic.
Personally, I don’t blame the 35 participants who did not transcribe (and presumably did not hear) these syllables. For two reasons: what actually occurs is a monosyllabic soundshape close to |wʊn| (as I have mentioned); and according to my measurements, this soundshape goes at 12.5 syllables per second, whereas the four-word phrase as a whole goes at 6.5 syllables per second. [NB In doing this calculation, I treated wouldn’t as two syllables]. Both speeds are extremely fast: the standard measure of fast speech is ca. 5.3 syllables per second (cf. Cauldwell, 2013: Chapter 7), and wouldn’t is going at more than double this speed.
But 13 of the participants transcribed it as one which again I think is a reasonable hearing: it is a representation of the actual soundshape |wʊn| which is more accurate - in terms of soundshapes - than the ‘correct’ representation given by the spelling wouldn’t with its associated citation-form soundshape of |wʊdᵊnt|.
As with the previous post on the word study, students’ perceptions can guide us expert listeners (teachers and authors) to hear the reality of the sound substance of speech.
We need to respect and value their perceptions. Not least because it is with these perceptions that we should start the process of teaching them about the variability of the sound substance.
Cauldwell, R.T. (2013). Phonology for Listening: Teaching the Stream of Speech. Birmingham: Speech in Action.
Cauldwell, R.T. (2018). A Syllabus for Listening - Decoding. Birmingham: Speech in Action.
Sheppard, B. & Butler, B. (2017). Insights into student listening from paused transcription. CATESOL Journal, 29.2, 81-107
48 - Insights into Student Listening: study = stay
This is the second of three posts inspired by Beth Sheppard and Brian Butler’s wonderful paper ‘Insights into Student Listening … ‘ (Sheppard & Butler, 2017). It is a research paper, but it clearly demonstrates the value of an activity which - in my view - ought to be an essential component of every listening comprehension class.
This activity is the paused dictation (Field, 2008). In this post we will look at and (most importantly) listen to what some of Beth and Brian’s students made of the word study in the four-word phrase study that was done. The two syllables of study were transcribed by four students as a single syllable stay; six other students had monosyllabic transcriptions (stay, stayed, stains, still, stand, state) and sixteen did not transcribe it at all. (A total of 77 students were involved).
This is the recorded extract of the four-word phrase study that was done:
To my ears, on first hearing, the soundshape of study is clearly two syllables - all segments of the citation form are present |stʌd.i|. So why would students hear it as stay? Maybe their perception dropped momentarily in the middle of the word, maybe they are good at hearing beginnings and ends of words, but not the middles. Maybe, in other words, it is their fault that they do not hear the full word. And anyway, the word stay does not fit the contextual meaning - how could they be so stupid!??
They are not stupid at all. For two reasons: first, in doing such decoding exercises students very commonly work by setting considerations of meaning to one side, and wrestle with the sound substance in a meaning vacuum; second, the representation of the soundshape of study as stay turns out to be accurate.
Don’t believe me? Listen to the soundshape of study on its own:
Oh dear. Now that the word study is isolated like this, to my ears it has become a monosyllabic-like |stʌi| with the consonant |d| dropped. So give credit to those students who wrote monosyllabic representations of the soundshape as stay - they have succeeded in hearing the soundshape for what it is.
It happens not to match the speaker’s intended word, but it is a more accurate representation of the soundshape than the orthodox spelling study and its citation form soundshape |stʌd.i|.
Theirs is a perfectly reasonable hearing. So, learners have had more success than the expert listener in perceiving and reporting on what the sound substance contains. And this is such an important point, of which I write fairly extensively in A Syllabus for Listening - Decoding (Cauldwell, 2018, Chapter 4).
Expert listeners suffer from the blur-gap
We therefore need to respect the students’ perceptions, and work with perceptions such as these: our students may not understand what they are listening to, but they may be hearing the sound substance better than expert listeners do! And their perceptions can guide us blur-gapped teachers and authors to a true appreciation of the nature of the sound substance of speech.
By the way, the streamlining process which changes study into stay is a type of consonant death I term d-drop (cf. Cauldwell, 2018, Chapter 17) and it is not a rare occurrence.
Cauldwell, R.T. (2018). A Syllabus for Listening - Decoding. Birmingham: Speech in Action.
Field, J. (2008). Listening in the Language Classroom. Cambridge: Cambridge University Press.
Sheppard, B. & Butler, B. (2017). Insights into student listening from paused transcription. CATESOL Journal, 29.2, 81-107
47 - Insights into Student Listening …
Beth Sheppard and Brian Butler of the University of Oregon published a paper last year (Sheppard & Butler, 2017 - full reference below) which interests me greatly, as it investigates learners’ ability to decode the stream of speech. This is the first of three posts where I take one of their findings, and then rant and riff about it.
Beth and Brian asked students to write down the last four words that they heard in paused transcription exercises (following Field’s 2008b suggestion). The methodology was carefully designed:
- students at two levels of proficiency (upper-level and mid-level);
- three types of recording including ‘an authentic university lecture available online’;
- 36 four-word phrases were chosen (twelve from each recording).
Learners listened to each of the three recordings, and during each of the twelve pauses, they were asked to write down the last four words that they had heard. (For a more detailed, complete, description of what they did read the paper yourself the reference is given at the foot of this post.)
Briefly some overall statistics:
- 33% of all words were incorrectly transcribed …
- Upper-level students got 27% of all words incorrect
- Mid-level students got 46% of all words incorrect
- 24% of all content words were incorrectly transcribed and …
- 46% of function words were incorrectly transcribed
In this post I am going to focus on the last finding - that 46% of function words were incorrectly transcribed. This percentage is nearly double the number of content words incorrect.
The function-word fallacy
It is a common view in ELT that learners do not need to be able to decode the function words, as they are not as ‘meaning-bearing’ as the content words. Beth and Brian explain the commonly expressed reason for ignoring function words:
… these words are often reduced in speech and also are usually less essential to understanding the overall message of an utterance (p. 89)
I am going to rant and riff about this statement, but before I do I should exclude Beth and Brian from criticism, because they immediately go on to write
…function words can have a significant effect on meaning
And they go on to state that mastery of hearing and understanding function words can free up processing space to work on the content words.
My rant
It is a complete disgrace, in English Language Teaching, that even relatively advanced learners cannot decode the most frequent words in the language - words that they encounter very early in their learning, and which they encounter frequently at every stage of learning.
This inability is a result of two failings in listening pedagogy: a ‘function-word fallacy’ - the false belief that the teacher of listening, and their learners, can ignore these important components of the stream of speech, and leap on the content words to make meaning; and the false belief that continuous practice in coping (the hope-to-cope listening comprehension method) is the only way to teach listening.
These failings are part of an avoidance-of-difficulty strategy which enables teachers to move on to other more tractable, teachable, matters in the classroom, and avoid the difficulty (which in all fairness, they have not been trained to deal with) of teaching students how to decode. (PS John Field also found that advanced learners could not decode function words, cf. Field, 2008a).
My riff
A number of points:
- The sound substance consists of a continuous varying sequence of content and function words. The mushiness of function words will mush-i-fy (‘en-mush’) the soundshapes of the content words because function words and content words are damaged by collisions and the resulting mess exists in a shared rhythmic and perceptual space.
- Therefore I believe it is better not to think of individual function words being the unit of learning/decoding, but rather to prefer word clusters as the unit of learning (cf. Cauldwell, 2018: 96; Carter & McCarthy, 2006: 828ff.) Word clusters are frequently occurring groups of words which consist of blends of function words and very common content words (…I think it’s a..it’s a bit of a…)
- Word clusters comprise a large proportion of the sound substance of the stream of speech
- If learners want to learn the language of normal speech (rather than simply hope to cope), then they need to learn the variety of sound shapes that word clusters can have.
In the following two posts, I look at two particular findings from Beth and Brian’s paper: students’ perceptions of the word study, and their perceptions of the word wouldn’t.
References
Carter, R. & McCarthy, M. (2006). Cambridge Grammar of English: A Comprehensive Guide to Spoken Grammar and Usage. Cambridge: Cambridge University Press.
Cauldwell, R.T. (2018). A Syllabus for Listening - Decoding. Birmingham: Speech in Action.
Field, J. (2008b). Listening in the Language Classroom. Cambridge: Cambridge University Press.
Field, J. (2008a). Bricks or Mortar: Which Parts of the Input Does a Second Language Listener Rely on? TESOL Quarterly, 42/3, 411-432.
Sheppard, B. & Butler, B. (2017). Insights into Student Listening from Paused Transcription. CATESOL Journal, 29.2, 81-107.
46 - Lucy Pickering’s new book
It has been many years - very many - since a book has been written about Discourse Intonation, and Lucy Pickering has just published a beautifully clear account of the systems of Discourse Intonation. It will, I’m sure, rapidly become a classic in teacher education.
I am biassed of course. I have been immersed in the Discourse Intonation framework since I first encountered it in the early 1980s, and all of my work has used the transcription system and principles of David Brazil’s work (on which I have a major section on my website here). An additional reason for my bias is that Lucy has given me a very complimentary acknowledgement:
Although I never had the opportunity to meet David, I have been able to “channel” him through Richard Cauldwell …
Jeepers, I didn’t know I had that skill. (Lucy also thanks her pet dogs).
Lucy’s book is in ten short chapters, each of which (apart from the last) usefully ends with sections entitled ‘Check your learning’ and ‘Activities’. While Lucy makes very occasional use of fancy words to prove her academic credentials (‘obfuscate’ and ‘lacunae’ are favourites of mine), the brevity of the chapters, and the clarity of the explanations make this an ideal book for teachers in training on a Master’s course, or other course designed for teachers who already have an initial qualification.
I particularly admire her balancing act of giving explanations of the meanings of intonation, while constantly acknowledging that the choices made by speakers are not rule-bound in the traditional sense (thus not the answer to ‘What is the correct way of saying this sentence?’).
New to me were her Chapters 7 and 8, which concern (respectively) Variation between Traditional and New Varieties of English, and Variation between Traditional Englishes and English as a Lingua Franca. These chapters alone makes the book worth a serious read.
Beautifully clear, easy to understand, excellent decision-making about how much to include, and how much to leave to others to explain. A fantastic publication.
Pickering, L. (2018) Discourse Intonation: A Discourse-Pragmatic Approach to Teaching the Pronunciation of English. Michigan Teacher Training. Ann Arbor: University of Michigan Press. Available here.
45 - Gap-fill 03 - Able-evil
At the end of Listening Cherry 43, I included a recording which contained six instances of the the words be able to which is given again below. My focus in this posting is on the penultimate sound shape - which contains just able on its own - where the word sounds (to my ears) close to the word evil.
This sound shape comes from a four-second extract (transcript below) containing ten words, in which able occurs on its own in speech unit 04. The speaker is Karam, from California, recorded in 2003.
01 || NOT || 4.0
[pause 0.316]
02 || Everybody || 5.1
03 || …is… || 4.3
[pause 0.232]
04 || Able || 4.2
05 || TO || 3.2
06 || TEACH someone else a SKILL || 5.1
Now listen to her able on its own - first at original speed, and then at 50% speed.
To my ears, it sounds close to evil with the diphthong |eɪ| being smoothed towards a version of its second element |ɪ| (or, you could argue, a raised version of its first element) and with the consonant |b| being blurred towards |v|.
So what do we do - what can we do - about this kind of dramatic difference between the citation form of able and the sound shape evil in the teaching of the sound substance?
The evil soundshape is not a predictable one. It is extremely unlikely that any work which starts with the citation form and proceeds to derive by rule a range of candidates would include evil. This is - possibly (but I doubt it) - a unique occurrence, a one-off. But it has occurred. So it could occur again. That might make it worth learning.
But what it is representative of is the fact that - in the Jungle - ‘anything can happen for no apparent reason’. So in teaching listening/decoding we need to think of words as extremely malleable flexiforms, streamlike phenomena which can be coaxed or bullied into a large range of soundshapes.
A few principles therefore:
- Accept what happens (because anything can happen)
- Don’t turn one event into a rule
- Give learners practice in handling a variety of soundshapes
- Encourage them to bend soundshapes in unlikely/extraordinary ways
In the table below, I demonstrate a possible range of sound shapes for to be able to . These are scripted speech units, each of which has the same vowel in both prominent syllables - so in 01 for example the long vowel |iː| occurs in both STEVE (column 2) and LEAD (column 4). In column 3, the word able is flavoured with the same long vowel. The audio contains three instances of each speech unit: a fast-speed original, then a slowed down version (50%), then the fast-speed original once more. Some of them may sound very unlikely, even preposterous. But what could be more preposterous than the version that Karam actually gives us above?
In teaching the sound substance of language - helping students learn to decode - we need to practise the preposterous.
| 5 | 4 | 3 | 2 | 1 | |||
| 01 | so | STEVE | was able to/eebulltuh | LEAD | them | ||
| 02 | and | BELL | -a was able to/ebbluh | TELL | them | ||
| 03 | in | TIME | i was able to/eyebulluh | FIND | |||
| 04 | so | JOY | was able to/oibulluh em | PLOY | them | ||
| 05 | and | JUL | -ia was able to/oobulluh | FOOL | them | ||
| 06 | so | BERT | was able/erbulluh | SERVE | them | ||





