in touch with real speech

Speech in Action

In touch with real speech

38 – Lifetime achievement award – Ron Carter

At The British Council’s annual awards ceremony the ELTons, Professor Ron Carter was given the lifetime achievement award. Unfortunately, because of illness, he was not able to attend in person, but his friend and longtime colleague Chris Kennedy read his acceptance speech (beautifully). Below is an extract from the acceptance speech, which I find particularly important … ‘don’t be dazzled, there is a lot we don’t know’.

Each year we see, as witnessed at the ELTons, astonishing levels of pedagogic and technical innovation in all aspects of course materials. The field we are in is exciting. My main hope for the future is that we do also continue to keep a precise description of the English language in our sights. It is easy to think we know a lot about the English language, and of course we do. But there is a risk that dazzled, and rightly so, by ever more creative technologies, that we may take for granted our knowledge of the English language. For there is a lot we don’t know, especially about the spoken language, about language beyond the level of the sentence, and about its newest forms in e-communication. We all need to continue learning about the English language in all its globally relevant forms.

You can see the full acceptance speech here, starting about 01:25:45.

37 – Earworms 2 – Would you like a … July … liar?

Last summer (2016) I was invited to IH London to give a two-hour seminar to teachers of English from the Basque country. They were amazingly enthusiastic and very receptive. I had given (as I usually do) a talk/workshop on how to prepare students for their listening encounters with normal everyday speech, focussing on examples of fast speech.

At the end of the session, one of the teachers came up to me and told of her experience shopping at a local supermarket, and more particularly, a question that she was asked at the check out. The question was ‘Would you like a receipt with that?’ but it was spoken so fast that the teacher had – at first – no idea what had been said. She was utterly bamboozled. Quite how she got from being bamboozled to knowing what the words were (I regret to say) I did not find out.

But, I thought I might make another ear-worm out of it, for the purposes of demonstration at the latest IATEFL conference in Glasgow. At the conference I justified the use of ear-worms such as this by arguing that they will stick annoyingly (or amusingly) in the heads of learners, and thereby get them accustomed to short stretches of rhythmic sound substance. My hope and belief is that such repetitions would accustom their short term memory and their mechanisms of speech perception to better decode the stream of speech of the language they were learning – in this case English. So below, in the table you can hear the Greenhouse, Garden and Jungle versions of this question. The Greenhouse and Garden versions contain just one run through, but in the Jungle versions you will hear them twice, interrupted by July and liar. 

 Greenhouse Would you like a receipt with that |wʊd juː laɪk ə risiːt wɪð ðæt|
 Garden  Wuhju lykuh receipt withthat  |wʊʤjuː laɪkə risiːt wɪðæt|
 Jungle – July  Wuh julyuhareseewithat?  |wʊjuːlaɪərisiːʔwɪðæt|
 Jungle – liar  Wuhyouliareseewithat?  |wʊjuːlaɪərisiːʔwɪðæt|

The reason that we have a July  version of the Jungle version is because part of the sound substance (the end of would the whole of you and the beginning of like) is hearable as July. And the reason we have a liar version is because part of the sound substance (like – with the |k| dropped, thus giving us lie –  and the indefinite article a) is hearable as liar.

Those four parts were stitched together to give us the following:

I am not teaching at the moment, so I don’t know myself whether these ear-worms are as useful as I like to think they are. But they certainly got my audience of teachers very much amused. But that, is not (of course!) proof of their usefulness.



36 – Earworms

One of the teacher-trainers that I most admire is Adrian Underhill. I like the way he encourages learners and teacher trainees to explore. He encourages people to mouth (maʊð) a full range of sounds, not simply the ‘correct’ ones – and he encourages people to dance around, using their whole body to get the feel of sounds.

And I think those of us who teach decoding in our listening classes have something to learn from his methods – and this brings me to the notion of earworms. This is an idea that I first heard about in a presentation by Annie McDonald of which you can find more details here.
An earworm is an annoying extract from a song that keeps on re-playing in your head long after you have heard the song. For me, this song by Hank Williams creates an ear worm: ‘Why don’t you love me like you used to do? How come you treat me like a worn-out shoe? My hair’s still curly, and my eyes are still blue, why don’t you love me like you used to do?’

I wonder (I’ve never tried it, so I don’t know if it will work) if we could attempt to create, and plant earworm-like stretches of speech in our learners heads, and encourage them to cherish them (rather than banish them) and repeat them over and over as they walk, jog, run, or exercise in the gym.

I’ve created one, which you can hear below, which uses the words ‘where there were’ (these words also feature on this page) as part of a follow-up to a Jungle Listening lesson (no. 10 of a pilot publication you can find here).

The idea of this particular ear worm is to provide as many different soundshapes of the word cluster ‘where there were’ as it was possible for me to do (yep, the voice is me).

The idea is to explore a sample of the range of ways in which these words might be said and heard in a world (the real world) where people have accents and the words can be said in an infinite number of ways which cannot be constrained by rules. An earworm such as this is a form of vocal gymnastics which might go some way towards fulfilling an important requirement of any decoding work identified by John Field:

‘… to encounter the same words in a wide range of contexts and voices … [by assembling] … examples of the same groups of words uttered in different circumstances and at different speeds by a number of different speakers’ (Field, 2008: 166)

My hope is, that by creating earworms, we can produce the effect of hearing words in different voices, accents, and speeds that learners can carry around in their heads, and repeat. The desired effect would be to get them accustomed to handling real-like English at fast speeds, expanding the capacity of their short term memory to hold stretches of the sound substance of English in their minds.

35 – Travelling without a map

The way we teach listening is like insisting that travellers arrive at a destination via several stops without giving them the means of travelling.

It’s like asking people to move from point to point with a map that has geographical features (hills, valleys, rivers) but with the roads tracks and trails missing. You may want them to be able to identify Mount Big Meaning, but not allow for the fact that they may find themselves in a tunnel when the moment for identification comes. You may want them to identify Castle Tikbox, without allowing for the fact that they are wholly focussed on crossing a wild river – jumping from stone to stone – without losing their balance.

We need to describe the whole journey, teach the means, teach the patterns of the stream of speech the roads, the trails, the footpaths. We need to focus far more on the relationship between sound substance and its interpretation (decoding, meaning building).

What we currently do is pretend our learners can travel meaningfully, without giving them the means of navigating through the stream of the sound substance. We keep them in ignorance of the sound substance (because we teachers are ourselves ignorant) and therefore deny learners the means of learning how to navigate.

Image from here.

Listening Cherry 34 – Aping the goal

Imagine a concert pianist, on stage playing a virtuoso sonata by Liszt. Wonderful patterns of notes a beautiful and moving (in all senses) soundscape of colours, major and minor keys, cascades, soft then loud, etc. etc. This is a public display of expertise which is a wonder to behold. Expert behaviour, hard earned and hard learned over a long period of time.

But what if someone claiming to be a teacher decided to teach pupils such expert behaviour, by focussing on the visible observable aspects of performance. Their pupils would be encouraged to sit at a piano and splash away at the keys, imitating the observable rapid movement of the fingers, the coordination of the hands, and the foot movements on the pedals. All without learning the means of playing the piano, disciplined controlled slow movements, simple scales, starting with easy pieces. The result may look enchanting an accurate depiction of what it takes to be a famous pianist – but the sound would be awful. All without paying attention to the sound.

This would be a case of aping the goal (where ‘ape’ means ‘to imitate someone or something, especially in an absurd or unthinking way’) at the cost of dealing with the major issue of sound.

In many approaches to listening comprehension exercises we ape the behaviour that is the goal, while minimising the amount of detailed instruction that provide the means towards this goal – increasing students’ mastery of the sound substance. We are thus goal-obsessed, and we starve our learners of the means of achieving the goal.

We expect learners to role-play native speaker/expert listener behaviour in listening comprehension lessons by catching meanings. But we don’t teach them how to perceive words in the sound substance of speech.

We get them to ape the goal behaviour (the describable elements of it) without giving them the means (the dimension of sound) whereby the full goal behaviour can be achieved.

The belief seems to be that through undergoing repeated listening comprehension exercises of this type, learners will eventually learn how to perceive words in the sound substance. It is as if we are leaving the undescribable (or what we believe to be the undescribable) to work its magic on the learners perception unconsciously while we focus attention on what we can describe.

Image from here. Oh, and Igor Levit, who is pictured is a wonderful pianist who produces the most gorgeous sounds.

Listening Cherry 33 – Selective reality

We like to think that making listening as real-life as possible is the best way to teach listening. But our use of ‘reality’ in the design of lessons materials is selective. We steer our lessons as close as we dare to real-life listening, and we focus on extracting meaning but we remain in denial about the realities of the sound substance.

We keep the number of listenings as close as possible to one, because – the argument goes –  in real life we only get one chance. We make the learners listen as though they are present and active at the interaction that has been recorded. And we fill their minds with contextual information about the people, the situation, the purpose and predictions about what will be said. We plug learners into a reality role. We plug them in to a mind set and situation. 

The problem is that the more we steer closer to these realities at the level of meaning, the less time there is to focus on the realities of the sound substance of speech – the normal messiness of everyday speech. The urge to mimic reality leads us to forget that the classroom is a place for teaching and learning, and that (pretty much) anything goes as long as learning is effective.

But it’s nobody’s fault. ELT simply does not (yet) have a model of speech which encompasses the messiness and wildness of everyday speech (as I have said frequently in this blog the ‘rules of connected speech’ are wholly inadequate). The only model of speech that exists in ELT is the Careful Speech Model – optimised for clear, intelligible pronunciation.

In the absence of a model of spontaneous speech (optimised for listening), the requirement to mimic reality is convenient for us, because it takes up a lot of time and it enables us to feel we are doing a good job as teachers and materials writers. Because ‘that is the way good teachers teach listening’ – we conform to the expected behaviour. The trouble is we are ignoring the realities of everyday normal speech.

We are in denial about the realities of everyday speech. To adapt the words of a famous Calvin and Hobbes cartoon ‘It’s not denial, we’re just very selective about the reality we accept’.

Image from here.

Listening Cherry 32 – The black box

We still behave, as a profession, as if the secrets of learning to listen are hidden inside a black box whose mechanisms are unknowable and unteachable. Two things inside the black box seem particularly unknowable and un-teachable: (a) the messy, unruly sound substance of normal everyday speech and (b) knowledge of what our students make of this sound substance. Because we ‘don’t know’ what goes on in this black box we focus almost all our efforts on what happens before and after the black box. We focus on the input and the output.

We strive very hard to make the input authentic, useful and appropriate – matching topics, vocabulary, context, and characters in a way that will motivate learners and facilitate transition to work on other parts of the syllabus.

We also strive hard to make the output appropriate: making the tasks that the students have to do while/after listening valid acts of meaning and communication.

We put extraordinary focus on the input and output, relying on the power of contextual meaning and contextual appropriacy to skip over the problems and challenges of the black box processes. We seem content to let the black box continue to be impenetrable and intractable.

But hang on, is that fair? Don’t we give students strategies to take with them while they are engaged inside the black box? Indeed we do. Before they enter the black box, we get them into a good learner frame of mind (focussed on the task, feeling good about themselves as learners) and we exhort them to apply good behaviours (don’t strive to hear every word, listen for the stresses, build meanings, re-evaluate and reconsider). We then exhort them to apply these behaviours when they go through the black box. And after they have been through the black box we focus on their performance of these good behaviours.

But this is still about input and output – it’s like giving people warm clothes and motivating talks before they go for a walk through an unlit mine – they have to navigate without a light, at speed, and afterwards report what they sensed in the mine (they couldn’t see anything, remember). And they have to report on the state of their clothes, whether they stayed warm, and whether they still felt good about themselves after the walk. So the preparation before and the report after are more concerned with the mine walkers self-management strategies, rather than on the nature of the mine. So it is with listening classes. We are expert at the before and after, but largely inexpert in our knowledge of the sound substance of speech. We do the before and afters very well – but we avoid the sound substance, with our focus on the peripheral (worthy, useful, but still peripheral) rather than on the central issues.

This idea of listening as a black box comes from Michael Rost, writing fifteen years ago, who wrote:

Listening is still often considered a mysterious “black box” for which the best approach seems to be ‘more practice’. Much work needs to be done to modernise the teaching of listening. (Rost, 2001: 13)

Personally, I am wholly against the idea that the best approach to listening is the ‘more practice’. If you are interested in modernising the teaching of listening, keep following this blog. You can also attend a workshop I am giving in April 2017 in London at the London Language Lab here. You can buy my Phonology for Listening: Teaching the Stream of Speech here, or wait for my Syllabus for Listening: Bottom-up approach – due late 2017.

Rost, M. (2001). Listening. In R. Carter & D. Nunan (Eds). The Cambridge Guide to Teaching English to Speakers of Other Languages. Cambridge: Cambridge University Press.

Listening Cherry 31 – Thinking warm

One of the problems with our current approach to teaching listening is that we can overdo/dose on the preparatory and post-listening activities. And we thereby run the risk of stealing time away from direct encounters with the sound substance of speech which is contained in the recordings.

It is like spending most of the time of a swimming lesson outside the pool, having long preparation and post-swim  talks which deal with:

  • Security of belongings
  • Lifeguards and first aid
  • Being safe – no jumping
  • Following health procedures – foot bath, hair wash before entering the pool
  • Warming up activities
  • [Swim]
  • Showering
  • Drying
  • Dressing
  • Feedback
  • Filling in evaluation forms for the pool administration

And rather than teaching them to swim, we give them things to think about while in the water which will make them good controllers of their own metabolism, as they move from the warmth of their clothes, to the cold of the pool, and back again.

Yes, it will feel cold, but how are you feeling at the moment in your everyday clothes? Warm, good. So while you are swimming, I want you to remember how you feel right now, in warm clothes. I want you to ‘think warm’ throughout the whole swim. You will feel a whole lot better about swimming when you think warm – you almost won’t notice the cold.

Image from here.

Listening Cherry 30 – Waterfall listening

Some listening lessons are like standing under a waterfall – you centre the student under the main flow of the water so that it is directed at the centre of their head. Sometimes the flow of water is a gentle trickle and they wonder what the value of standing there is. Suddenly torrents hit them hard on the head and cascade down over the shoulders and becomes a force under which it is difficult to stand. The student moves to one side and looks up as if to reprimand the waterfall and is surprised by a new, differently-angled cascade  that catches them full in the face. They take in mouthfuls of water, and are blinded by dollops of water catching them in the eyes which they have to rub to clear them. In doing so they lose balance and stagger, and a new harder cascade catches the top of their swimming costume, and hands come off the eyes onto the swimming costume to prevent it slipping. The student is flustered, embarrassed, temporarily blinded, coughing and spluttering water.

And the teacher asks ‘Did you see the way the sunlight caught the stream of water and made a rainbow out of the fine spray?’

Image from here.

This was, of course, a strategy-free lesson.

Listening Cherry 29 – Two substances

There is one major prerequisite for becoming an effective user of a language. (Being a prerequisite means that it is an essential requirement before you can start doing anything meaningful.) This prerequisite is ‘substance mastery’. By this I mean mastery of the substances of both writing and speech, the ability to form (in writing and speaking), and to perceive (in reading and listening) the words that you and other people write or say. Substance mastery is that essential something which precedes the tasks of sharing and understanding meanings.*

In the Listening Cherry 28, I used the terms sight shapes and sound shapes to refer to the different physical forms (marks on a surface, streams of sound) that we employ to create and understand language. In my work on A Syllabus for Listening (forthcoming 2017) I find that I need both these terms and the terms sight substance and sound substance to refer respectively to (a) writing, or graphic matter in general and  (b) the  stream of speech, the acoustic matter, of language.

In A Syllabus for Listening my focus will be helping students learn to decode the sound substance of language, so that they can perceive the words intended and uttered by the speaker, and thence proceed to build and understand meanings. Although the end goal of learning to listen to language is understanding, my focus will be on the means of getting to that goal.

This focus is needed, because one of the problems with the current orthodoxy around the teaching of listening is that we practise goal behaviour, rather than take students step by step from their starting point as learner-listeners through a programme of learning which will gradually equip them with the knowledge and skills to enable them to become increasingly expert listeners. A Syllabus for Listening will be a book which presents the knowledge and skills that can be used to help them on their learning journey to become efficient listeners.

‘Sight substance’ refers to the visual, graphic shape of words, phrases, sentences and paragraphs that are perceived by the eyes. The sight substance exists on paper, computer screens, boards and walls – anywhere where print can be presented to the eye. It hangs around, it doesn’t disappear, it can be read, inspected and returned to.

‘Sound substance’ refers to the auditory, acoustic shapes of words, word clusters, speech units and longer stretches of speech that are perceived by the ears. And in contrast to the sight substance, the sound substance does not have a stable existence: it happens and it is gone. It happens at speeds that the listener cannot usually control. It passes through short-term memory and is continually replaced by the sound substance which follows. Unless it has been recorded, it cannot be inspected at leisure, it cannot be re-found.

In the sight substance, word recognition is generally very easy, but in the sound substance it is much less easy. Crucially, if there is a decoding difficulty in the sight substance, the substance stays in sight so that you can devote as much time as you like to working out which word it is. So even if you are having to learn a new script (Arabic, Chinese, Greek, Korean, the IPA) the script stays still, in sight, so that you can study it.

However in the sound substance of speech, words occur as a continuous blur of sound, any section of which has only a momentary existence: occurring and disappearing being replaced by more speech. It is never in sight; and out of sight, it is very quickly out of mind.

As a native listener we easily perceive words in the sound substance. In fact, with native listening we seem not to require precise perception – which is good, because what occurs in the stream of speech is rarely precise. The perceptual skills are a prerequisite, but it often seems that once we have acquired these prerequisite skills, we can operate at the level of meaning without having to give attention to the level of decoding.

But in a language we are learning, when we are not yet able to operate at the level of meaning, we need to give greater amounts of attention to the level of/work of perception (bottom up decoding) of the stream of speech. And language teaching deficient in this area.

If we had similar perceptual difficulties with the sight substance, the written language, we would do something about it. We would have to, because learners could point to things and ask ‘What is that? Why is it shaped like that?’

And the thing they are asking about stays on the page. If you don’t help them, they can walk with the book to the Director of Studies and point to the page and complain that you cannot explain something they want to know, and that you should be dismissed, because you know nothing about the language.

And actually, as a profession, ELT is guilty of knowing close to nothing of the true nature of the sound substance of normal everyday speech. We think we know about it, but what we actually know is a sight substance representation – the misrepresentations of printed orthography.

*This is not necessarily true of L1 language use between expert speaker/listeners, where there is a much more complex relationship between perception and understanding.



Richard can be contacted at

Tel: 07790 629859