in touch with real speech

Speech in Action

In touch with real speech

Listening Cherry 36 – Earworms

One of the teacher-trainers that I most admire is Adrian Underhill. I like the way he encourages learners and teacher trainees to explore. He encourages people to mouth (maʊð) a full range of sounds, not simply the ‘correct’ ones – and he encourages people to dance around, using their whole body to get the feel of sounds.

And I think those of us who teach decoding in our listening classes have something to learn from his methods – and this brings me to the notion of earworms. This is an idea that I first heard about in a presentation by Annie McDonald of which you can find more details here.
An earworm is an annoying extract from a song that keeps on re-playing in your head long after you have heard the song. For me, this song by Hank Williams creates an ear worm: ‘Why don’t you love me like you used to do? How come you treat me like a worn-out shoe? My hair’s still curly, and my eyes are still blue, why don’t you love me like you used to do?’

I wonder (I’ve never tried it, so I don’t know if it will work) if we could attempt to create, and plant earworm-like stretches of speech in our learners heads, and encourage them to cherish them (rather than banish them) and repeat them over and over as they walk, jog, run, or exercise in the gym.

I’ve created one, which you can hear below, which uses the words ‘where there were’ (these words also feature on this page) as part of a follow-up to a Jungle Listening lesson (no. 10 of a pilot publication you can find here).

The idea of this particular ear worm is to provide as many different soundshapes of the word cluster ‘where there were’ as it was possible for me to do (yep, the voice is me).

The idea is to explore a sample of the range of ways in which these words might be said and heard in a world (the real world) where people have accents and the words can be said in an infinite number of ways which cannot be constrained by rules. An earworm such as this is a form of vocal gymnastics which might go some way towards fulfilling an important requirement of any decoding work identified by John Field:

‘… to encounter the same words in a wide range of contexts and voices … [by assembling] … examples of the same groups of words uttered in different circumstances and at different speeds by a number of different speakers’ (Field, 2008: 166)

My hope is, that by creating earworms, we can produce the effect of hearing words in different voices, accents, and speeds that learners can carry around in their heads, and repeat. The desired effect would be to get them accustomed to handling real-like English at fast speeds, expanding the capacity of their short term memory to hold stretches of the sound substance of English in their minds.

Listening Cherry 35 – Travelling without a map

The way we teach listening is like insisting that travellers arrive at a destination via several stops without giving them the means of travelling.

It’s like asking people to move from point to point with a map that has geographical features (hills, valleys, rivers) but with the roads tracks and trails missing. You may want them to be able to identify Mount Big Meaning, but not allow for the fact that they may find themselves in a tunnel when the moment for identification comes. You may want them to identify Castle Tikbox, without allowing for the fact that they are wholly focussed on crossing a wild river – jumping from stone to stone – without losing their balance.

We need to describe the whole journey, teach the means, teach the patterns of the stream of speech the roads, the trails, the footpaths. We need to focus far more on the relationship between sound substance and its interpretation (decoding, meaning building).

What we currently do is pretend our learners can travel meaningfully, without giving them the means of navigating through the stream of the sound substance. We keep them in ignorance of the sound substance (because we teachers are ourselves ignorant) and therefore deny learners the means of learning how to navigate.

Image from here.

Listening Cherry 34 – Aping the goal

Imagine a concert pianist, on stage playing a virtuoso sonata by Liszt. Wonderful patterns of notes a beautiful and moving (in all senses) soundscape of colours, major and minor keys, cascades, soft then loud, etc. etc. This is a public display of expertise which is a wonder to behold. Expert behaviour, hard earned and hard learned over a long period of time.

But what if someone claiming to be a teacher decided to teach pupils such expert behaviour, by focussing on the visible observable aspects of performance. Their pupils would be encouraged to sit at a piano and splash away at the keys, imitating the observable rapid movement of the fingers, the coordination of the hands, and the foot movements on the pedals. All without learning the means of playing the piano, disciplined controlled slow movements, simple scales, starting with easy pieces. The result may look enchanting an accurate depiction of what it takes to be a famous pianist – but the sound would be awful. All without paying attention to the sound.

This would be a case of aping the goal (where ‘ape’ means ‘to imitate someone or something, especially in an absurd or unthinking way’) at the cost of dealing with the major issue of sound.

In many approaches to listening comprehension exercises we ape the behaviour that is the goal, while minimising the amount of detailed instruction that provide the means towards this goal – increasing students’ mastery of the sound substance. We are thus goal-obsessed, and we starve our learners of the means of achieving the goal.

We expect learners to role-play native speaker/expert listener behaviour in listening comprehension lessons by catching meanings. But we don’t teach them how to perceive words in the sound substance of speech.

We get them to ape the goal behaviour (the describable elements of it) without giving them the means (the dimension of sound) whereby the full goal behaviour can be achieved.

The belief seems to be that through undergoing repeated listening comprehension exercises of this type, learners will eventually learn how to perceive words in the sound substance. It is as if we are leaving the undescribable (or what we believe to be the undescribable) to work its magic on the learners perception unconsciously while we focus attention on what we can describe.

Image from here. Oh, and Igor Levit, who is pictured is a wonderful pianist who produces the most gorgeous sounds.

Listening Cherry 33 – Selective reality

We like to think that making listening as real-life as possible is the best way to teach listening. But our use of ‘reality’ in the design of lessons materials is selective. We steer our lessons as close as we dare to real-life listening, and we focus on extracting meaning but we remain in denial about the realities of the sound substance.

We keep the number of listenings as close as possible to one, because – the argument goes –  in real life we only get one chance. We make the learners listen as though they are present and active at the interaction that has been recorded. And we fill their minds with contextual information about the people, the situation, the purpose and predictions about what will be said. We plug learners into a reality role. We plug them in to a mind set and situation. 

The problem is that the more we steer closer to these realities at the level of meaning, the less time there is to focus on the realities of the sound substance of speech – the normal messiness of everyday speech. The urge to mimic reality leads us to forget that the classroom is a place for teaching and learning, and that (pretty much) anything goes as long as learning is effective.

But it’s nobody’s fault. ELT simply does not (yet) have a model of speech which encompasses the messiness and wildness of everyday speech (as I have said frequently in this blog the ‘rules of connected speech’ are wholly inadequate). The only model of speech that exists in ELT is the Careful Speech Model – optimised for clear, intelligible pronunciation.

In the absence of a model of spontaneous speech (optimised for listening), the requirement to mimic reality is convenient for us, because it takes up a lot of time and it enables us to feel we are doing a good job as teachers and materials writers. Because ‘that is the way good teachers teach listening’ – we conform to the expected behaviour. The trouble is we are ignoring the realities of everyday normal speech.

We are in denial about the realities of everyday speech. To adapt the words of a famous Calvin and Hobbes cartoon ‘It’s not denial, we’re just very selective about the reality we accept’.

Image from here.

Listening Cherry 32 – The black box

We still behave, as a profession, as if the secrets of learning to listen are hidden inside a black box whose mechanisms are unknowable and unteachable. Two things inside the black box seem particularly unknowable and un-teachable: (a) the messy, unruly sound substance of normal everyday speech and (b) knowledge of what our students make of this sound substance. Because we ‘don’t know’ what goes on in this black box we focus almost all our efforts on what happens before and after the black box. We focus on the input and the output.

We strive very hard to make the input authentic, useful and appropriate – matching topics, vocabulary, context, and characters in a way that will motivate learners and facilitate transition to work on other parts of the syllabus.

We also strive hard to make the output appropriate: making the tasks that the students have to do while/after listening valid acts of meaning and communication.

We put extraordinary focus on the input and output, relying on the power of contextual meaning and contextual appropriacy to skip over the problems and challenges of the black box processes. We seem content to let the black box continue to be impenetrable and intractable.

But hang on, is that fair? Don’t we give students strategies to take with them while they are engaged inside the black box? Indeed we do. Before they enter the black box, we get them into a good learner frame of mind (focussed on the task, feeling good about themselves as learners) and we exhort them to apply good behaviours (don’t strive to hear every word, listen for the stresses, build meanings, re-evaluate and reconsider). We then exhort them to apply these behaviours when they go through the black box. And after they have been through the black box we focus on their performance of these good behaviours.

But this is still about input and output – it’s like giving people warm clothes and motivating talks before they go for a walk through an unlit mine – they have to navigate without a light, at speed, and afterwards report what they sensed in the mine (they couldn’t see anything, remember). And they have to report on the state of their clothes, whether they stayed warm, and whether they still felt good about themselves after the walk. So the preparation before and the report after are more concerned with the mine walkers self-management strategies, rather than on the nature of the mine. So it is with listening classes. We are expert at the before and after, but largely inexpert in our knowledge of the sound substance of speech. We do the before and afters very well – but we avoid the sound substance, with our focus on the peripheral (worthy, useful, but still peripheral) rather than on the central issues.

This idea of listening as a black box comes from Michael Rost, writing fifteen years ago, who wrote:

Listening is still often considered a mysterious “black box” for which the best approach seems to be ‘more practice’. Much work needs to be done to modernise the teaching of listening. (Rost, 2001: 13)

Personally, I am wholly against the idea that the best approach to listening is the ‘more practice’. If you are interested in modernising the teaching of listening, keep following this blog. You can also attend a workshop I am giving in April 2017 in London at the London Language Lab here. You can buy my Phonology for Listening: Teaching the Stream of Speech here, or wait for my Syllabus for Listening: Bottom-up approach – due late 2017.

Rost, M. (2001). Listening. In R. Carter & D. Nunan (Eds). The Cambridge Guide to Teaching English to Speakers of Other Languages. Cambridge: Cambridge University Press.

Listening Cherry 31 – Thinking warm

One of the problems with our current approach to teaching listening is that we can overdo/dose on the preparatory and post-listening activities. And we thereby run the risk of stealing time away from direct encounters with the sound substance of speech which is contained in the recordings.

It is like spending most of the time of a swimming lesson outside the pool, having long preparation and post-swim  talks which deal with:

  • Security of belongings
  • Lifeguards and first aid
  • Being safe – no jumping
  • Following health procedures – foot bath, hair wash before entering the pool
  • Warming up activities
  • [Swim]
  • Showering
  • Drying
  • Dressing
  • Feedback
  • Filling in evaluation forms for the pool administration

And rather than teaching them to swim, we give them things to think about while in the water which will make them good controllers of their own metabolism, as they move from the warmth of their clothes, to the cold of the pool, and back again.

Yes, it will feel cold, but how are you feeling at the moment in your everyday clothes? Warm, good. So while you are swimming, I want you to remember how you feel right now, in warm clothes. I want you to ‘think warm’ throughout the whole swim. You will feel a whole lot better about swimming when you think warm – you almost won’t notice the cold.

Image from here.

Listening Cherry 30 – Waterfall listening

Some listening lessons are like standing under a waterfall – you centre the student under the main flow of the water so that it is directed at the centre of their head. Sometimes the flow of water is a gentle trickle and they wonder what the value of standing there is. Suddenly torrents hit them hard on the head and cascade down over the shoulders and becomes a force under which it is difficult to stand. The student moves to one side and looks up as if to reprimand the waterfall and is surprised by a new, differently-angled cascade  that catches them full in the face. They take in mouthfuls of water, and are blinded by dollops of water catching them in the eyes which they have to rub to clear them. In doing so they lose balance and stagger, and a new harder cascade catches the top of their swimming costume, and hands come off the eyes onto the swimming costume to prevent it slipping. The student is flustered, embarrassed, temporarily blinded, coughing and spluttering water.

And the teacher asks ‘Did you see the way the sunlight caught the stream of water and made a rainbow out of the fine spray?’

Image from here.

This was, of course, a strategy-free lesson.

Listening Cherry 29 – Two substances

There is one major prerequisite for becoming an effective user of a language. (Being a prerequisite means that it is an essential requirement before you can start doing anything meaningful.) This prerequisite is ‘substance mastery’. By this I mean mastery of the substances of both writing and speech, the ability to form (in writing and speaking), and to perceive (in reading and listening) the words that you and other people write or say. Substance mastery is that essential something which precedes the tasks of sharing and understanding meanings.*

In the Listening Cherry 28, I used the terms sight shapes and sound shapes to refer to the different physical forms (marks on a surface, streams of sound) that we employ to create and understand language. In my work on A Syllabus for Listening (forthcoming 2017) I find that I need both these terms and the terms sight substance and sound substance to refer respectively to (a) writing, or graphic matter in general and  (b) the  stream of speech, the acoustic matter, of language.

In A Syllabus for Listening my focus will be helping students learn to decode the sound substance of language, so that they can perceive the words intended and uttered by the speaker, and thence proceed to build and understand meanings. Although the end goal of learning to listen to language is understanding, my focus will be on the means of getting to that goal.

This focus is needed, because one of the problems with the current orthodoxy around the teaching of listening is that we practise goal behaviour, rather than take students step by step from their starting point as learner-listeners through a programme of learning which will gradually equip them with the knowledge and skills to enable them to become increasingly expert listeners. A Syllabus for Listening will be a book which presents the knowledge and skills that can be used to help them on their learning journey to become efficient listeners.

‘Sight substance’ refers to the visual, graphic shape of words, phrases, sentences and paragraphs that are perceived by the eyes. The sight substance exists on paper, computer screens, boards and walls – anywhere where print can be presented to the eye. It hangs around, it doesn’t disappear, it can be read, inspected and returned to.

‘Sound substance’ refers to the auditory, acoustic shapes of words, word clusters, speech units and longer stretches of speech that are perceived by the ears. And in contrast to the sight substance, the sound substance does not have a stable existence: it happens and it is gone. It happens at speeds that the listener cannot usually control. It passes through short-term memory and is continually replaced by the sound substance which follows. Unless it has been recorded, it cannot be inspected at leisure, it cannot be re-found.

In the sight substance, word recognition is generally very easy, but in the sound substance it is much less easy. Crucially, if there is a decoding difficulty in the sight substance, the substance stays in sight so that you can devote as much time as you like to working out which word it is. So even if you are having to learn a new script (Arabic, Chinese, Greek, Korean, the IPA) the script stays still, in sight, so that you can study it.

However in the sound substance of speech, words occur as a continuous blur of sound, any section of which has only a momentary existence: occurring and disappearing being replaced by more speech. It is never in sight; and out of sight, it is very quickly out of mind.

As a native listener we easily perceive words in the sound substance. In fact, with native listening we seem not to require precise perception – which is good, because what occurs in the stream of speech is rarely precise. The perceptual skills are a prerequisite, but it often seems that once we have acquired these prerequisite skills, we can operate at the level of meaning without having to give attention to the level of decoding.

But in a language we are learning, when we are not yet able to operate at the level of meaning, we need to give greater amounts of attention to the level of/work of perception (bottom up decoding) of the stream of speech. And language teaching deficient in this area.

If we had similar perceptual difficulties with the sight substance, the written language, we would do something about it. We would have to, because learners could point to things and ask ‘What is that? Why is it shaped like that?’

And the thing they are asking about stays on the page. If you don’t help them, they can walk with the book to the Director of Studies and point to the page and complain that you cannot explain something they want to know, and that you should be dismissed, because you know nothing about the language.

And actually, as a profession, ELT is guilty of knowing close to nothing of the true nature of the sound substance of normal everyday speech. We think we know about it, but what we actually know is a sight substance representation – the misrepresentations of printed orthography.

*This is not necessarily true of L1 language use between expert speaker/listeners, where there is a much more complex relationship between perception and understanding.

Listening Cherry 28 – Distorted blur

In this blog I will attempt to demonstrate, using distorted orthography, the difficulties involved in learning to listen in a language in which you are not yet expert (second language learning). First, I need to tell you about Anna’s anger.

Anna is a friend of mine (a very prominent professor of English) whose first language was not English, she had to learn it at school. When she was a student, she hated the approach of one of her teachers to listening lessons. And she told me:

…I’ve hated the underuse of the material. I’ve … answered three silly questions … then someone tells me patronisingly (it IS bloody patronising) that the rest doesn’t matter. Well it does if I want to learn the language!

Let’s explore the kind of classroom activity that might have made her angry. Because this is a written blog (in sight substance) I am going to transpose her listening comprehension activity  into a reading comprehension exercise (using distorted orthography) on a very short text.

Here is the teacher’s introduction (remember I am recreating in reading-activity form what was a listening activity, so this is not actuality).

You will read about a young woman, Emily, who was given a teaching job in a school in South Africa. She had gone there to work as a young volunteer, at the age of eighteen, before coming back to the UK to enter university. Read the following questions – there are three choices of answer, and then read – scan –  the text to find the answers.

  1.     What subject was she given to teach? (a) Maths (b) English or (c) Technology
  2.     How many pupils were in the school? (a) 500 (b) 1500 (c) 2000
  3.     How many times did the word ‘teacher’ occur? (a) Once (b) Twice (c) Three times



(What you are looking at, and inspecting, is a blurred version of an orthographic transcription. There are a few  differences from normal orthography: upper case letters are for prominent syllables, and lower case letters are for non-prominent syllables – the mush of speech.)

Having given the students time to arrive at answers, the teacher would then ask the class what they thought the answers were, and praise them for getting the answers which are:

  1.     What subject …? (c) Technology
  2.     How many pupils …? (b) 1500
  3.     How many times … ‘teacher’? (b) Twice

If our teacher were Anna’s teacher, he/she would move on to another activity – refusing to answer any other questions, because, having arrived at the answers, there is –  in his/her view nothing else to do. The communicative act of arriving at an understanding of the meaning has been achieved, so the teacher believes that their work is done as far as this activity is concerned.

But remember that Anna wants to learn the language. And because (in this blog) the language is presented as sight substance (albeit blurred), it remains available for inspection. This is completely unlike sound substance, which will have departed the scene, and would therefore be invisible. Out of sight, out of mind – ignorable.

Because (in our imaginary scenario) Anna is looking at sight substance, she and her fellow students can point to the blur and ask: ‘What does this mean?’ But actually (and this is the point) they are more likely to ask ‘What are these words? Because picking out the words in this blurred sight substance is difficult.

Anna’s desire to learn the language is not satisfied by having done the communicative task of answering the questions. She wants to stay with the substance, and improve her ability to recognise words in the substance.

So rather than walk away from the substance, what could Anna’s teacher have done in the listening lesson? (We now revert to sound substance). The answer is simple: stay with the sound substance of the recording. Always, always, always allow time for this question:

Now that you know the answers, listen again, and try and identify the words that lead to the correct answers.

Then, always always always, do something with a short extract (which either you or your students choose) and go to work on it. Ask them how the words in the selected extract sound to them, and tell them how they sound to you. Do vocal gymnastics (see here), get them to savour different ways of saying the words: different speeds, different accents, severely reduced, extra carefully elongated. Encourage them to create raps, or ear-worms. (More on this in future blogs, and in A Syllabus for Listening (forthcoming).

Listening Cherry 27 – Sight and sound shapes

Imagine that every time you see a word written down it looks different: letters in a different order, letters missing, different fonts, different sizes, different use of caps and lower case. Additionally, imagine that the spacing (or lack of it) between words varies according to the speed at which the author originally wrote them. So if the author wrote very fast, many words would squash up close together so that they become a disheartening, difficult-to-decipher mess of word-on-top-of word. And if the author wrote very slowly, pausing to think after every couple of words, the letters of each word would be widely spaced and the words themselves would move apart leaving annoyingly big gaps.

Sight shapes

Imagine, to put it differently, that words have different sight shapes every time you see them on the page. This would add to our workload as teachers and textbook writers, but we would teach this, and we would seek to find useful generalisations and patterns in the varying sight shapes. We would teach this because we (and our students) could see the words, they would exist to be inspected, analysed, studied and learned. They would be in sight, and very much in mind. We could assemble all the different sound shapes of a given word, and put them in a visual learnable sequence.

Of course, in reality, words have very few sight shapes, and these are easily learned: they do not change shape according to the speed that author writes: they do not suffer squashing, squeezing or elongations. And if sight shapes present a difficulty, then they still have the immense advantage of staying in the same place on the page – in sight – so that we can work on deciphering them.

Sound shapes

These shapings (squashing, squeezing, elongations) happen to the sound shapes of words. (All words have a wide variety of sound shapes – not just the ‘weak forms’.) And the trouble is, because that they are not visible or inspectable we don’t teach, study, or learn them. They are out of sight, and therefore out of mind.

Out of sight, out of mind

Our students, cannot (with conventional print plus audio) point to the sound shapes and say ‘Look at them – tell me why they are like that!’ Teachers cannot say ‘Look at this sound shape, it is one of the many sound shapes of the word produced

You might think we could do all this with the transcript of a recording, but unfortunately words have the same sight shape each time they occur in the transcript. The sight shape version of a recording – the transcript – is thus a misrepresentation of the recording, and the sound shapes that it contains.

One way forward might be to find ways to help our students point to the sound shapes and say ‘Look at them – tell me why they are like that!’ and to help teachers say ‘Look at this sound shape, it is one of the many sound shapes of the word produced‘ (see here).

But we have to be careful using the verb ‘look’. Any sight shape version of a word – including phonetic versions – misrepresents the sound shape. ‘Looking’ may help, but the audio, the sound shapes have to be immediately accessible. The imperative needs to be ‘Listen!’ – or (better) ‘Look and listen!’ with the sight shapes and sound shapes either embedded in each other (as in the Flash movies you can find here), or placed side by side, as can be done in Sonocent’s AudioNotetaker (see here). There has to be immediacy of access to the different sound shapes of a word.

Although main course textbooks and contemporary listening methodology does not yet include these things, examples of what needs to become orthodoxy can be found in Hancock and McDonald’s Authentic Listening here and here, and in my own work here, here and here.

(NB Some of these links require you to play Flash Movies, which – in Safari – you may need to enable.)



Richard can be contacted at

Tel: 07790 629859