Wednesday, December 18, 2013

The Key to Learning Pronunciation by Gabe Wyner


As rumor has it, you can’t learn to have a good accent if you’re above the age of 7, or 12, or some other age that you’ve most definitely already exceeded. But that can’t possibly be true. Singers and actors learn new accents all the time, and they’re not, on average, smarter than everyone else (and they certainly don't all start before the age of 7).

So what’s going on here? Why does everybody tell you that you can’t learn good pronunciation as an adult? And if that’s not true, what is?

In this article, we’ll take a tour through the research on speech perception and pronunciation, and we’ll talk about learning pronunciation efficiently as an adult. But first, allow me a moment on my soapbox:

Pronunciation is important

This is a big topic, and as an opera singer, it’s a topic close to my heart. I find accents extraordinarily important.

This is a fényképezőgép
For one, if you don’t learn to hear the sounds in a new language, you’re doomed to have a hard time remembering it. We rely upon sound to form our memories for words, and if you can’t even comprehend the sounds you’re hearing, you’re at a disadvantage from the start. (Try memorizing Hungarian's word for camera, fényképezőgép [recording] or train station, vásutállomás [recording]. These words are brutal until you really get a feel for Hungarian sounds.)

But in addition to the memory issue, a good accent connects you to people. It shows people from another culture that you’ve not only taken the time and effort to learn their vocabulary and their grammar; you’ve taken the time to learn how their mouths, lips and tongues move. You’ve changed something in your body for them – you’ve shown them that you care – and as a result, they will open up to you.

I’ve seen this repeatedly when I sing or watch concerts in Europe. As a rule, audiences are kind, but when you sing in their native language, they brace themselves. They get ready to smile politely and say, “What a lovely voice!” or “Such beautiful music!” But beneath the surface, they are preparing for you to butcher their language and their heritage before their eyes. No pressure.

At that moment, if you surprise them with a good accent, they open themselves up. Their smiles are no longer polite; they are genuine. You’ve shown them that you care, not just with your intellect, but with your body, and this sort of care is irresistible.

But enough romanticizing; how do you actually do something about pronunciation?

Research on Ear Training and Pronunciation

Good pronunciation is a combination of two main skills: Ear training and mouth training. You learn how to hear a new sound, and you learn how to make it in your mouth. It’s the first of these two skills that’s the trickiest one; if you can hear a sound, you can eventually learn to produce it accurately, but before then, you’re kind of screwed. So for the moment, we’ll focus on ear training.

While doing research for my book, I came upon a wonderful set of studies by James McClelland, Lori Holt, Julie Fiez and Bruce McClandiss, where they tried to teach Japanese adults to hear the difference between “Rock” and “Lock.” After reading their papers, I called up and interviewed Dr. McClelland and Dr. Holt about their research.

The first thing they discovered is that ear training is tricky, especially when a foreign language contains two sounds that are extremely similar to one sound in your native language. This is the case in Japanese, where their “R” [ɺ] is acoustically right in between the American R [ɹ] and L [ɫ]. When you test Japanese adults on the difference between Rock and Lock (by playing a recording of one of these words and asking them which one they think you played), their results are not significantly better than chance (50%). So far, so bad.

The researchers tried two kinds of practice. First, they just tested these Japanese adults on Rock and Lock for a while, and checked to see whether they improved with practice.

They didn’t.

This is very bad news. It suggests that practice doesn’t actually do anything. You can listen to Rock and Lock all day (or for English speakers, //[bul/pul/ppul] in Korean), and you’re not going to learn to hear the differences between those sounds. This only confirms the rumors that it’s too late to do anything about pronunciation. Crap.

Their second form of practice involved artificially exaggerating the difference between L and R. They began with extremely clear examples (RRrrrrrrrrock), and if participants improved, stepped up the difficulty until they reached relatively subtle distinctions between the two recordings (rock). This worked a little better. The participants began to hear the difference between Rock and Lock, but it didn’t help them hear the difference between a different pair of words, like Road and Load. In terms of a pronunciation training tool, this was another dead end.

Then they tried feedback, and everything changed.

Testing pairs of words with feedback

They repeated the exact same routine, only this time, when a participant gave their answer ("it was 'Rock'") , a computer screen would tell them whether or not they were right ("*ding* Correct!"). In three 20-minute sessions of this type of practice, participants permanently acquired the ability to hear Rs and Ls, and they could do it in any context.

Not coincidentally, this is how actors and singers learn. We use coaches instead of computerized tests, but the basic principle is the same. We sit with an accent coach and have them read our texts. Then we say our texts out load, and the coach tells us when we’re right and when we’re wrong. They’re giving us feedback. They’ll say things like “No, you’re saying siehe, and I need sehe. Siehe…Sehe. Hear that?” And as we get closer, they’ll keep continue to supply feedback ("You're saying [something that's almost 'sehe'] and I need sehe.”) After the coaching, we’ll go home, listen to recordings of these coaching sessions, and use those recordings to provide us with even more feedback.

Now, some caveats. Participants didn’t reach a full native ability to hear the difference between Rock and Lock. Their accuracy seemed to peak around 80%, compared to the ~100% of a native speaker. Further investigation revealed what was going on.



Consonant sounds have lots of different components (known as 'formants'). Basically, a consonant is a lot like a chord on a piano: on a piano, you play a certain combination of notes together, and you hear a chord. For a consonant, you make a certain (more complex) combination of notes, and you hear a consonant. This isn’t just a metaphor; if you have a computerized piano, you can even use it to replicate human speech.




English speakers tell the difference between their R’s and L’s by listening for a cue known as the 3rd formant – basically, the third note up in any R or L chord. Japanese native speakers have a hard time hearing this cue, and when they went through this study, they didn’t really get any better at hearing it. Instead, they learned how to use an easier cue, the 2nd formant – the second note in R/L chords. This works, but it’s not 100% reliable, thus explaining their less-than-native results.

When I talked to these researchers on the phone, they had basically given up on this research, concluding that they were somewhat stumped as to how to improve accuracy past 80%. They seemed kind of bummed out about it.

Possibilities for the future

But step back a moment and look at what they’ve accomplished here.

In three 20-minute sessions, they managed to take one of the hardest language challenges out there – learning how to hear new sounds – and bring people from 50% accuracy (just guessing) to 80% accuracy (not bad at all).

What if we had this tool in every language? What if we could start out by taking a few audio tests with feedback and leave with pre-trained, 80% accuracy ears, even before we began to learn the rest of our language?

We have the tools to build trainers like this on our own. All you need is a spaced repetition system that supports audio files, like Anki, and a good set of recorded example words (A bunch of rock/lock’s, thigh/thy’s, and niece/knee’s for English, or a bunch of sous/su’s, bon/ban’s and huis/oui’s for French). They take work to make, but that work only needs to be done once, and then the entire community can benefit.


Pronunciation is too important, and this solution is too valuable to wait for some big company to take over. Over the next 9 months, I’m going to start developing good example word lists, commissioning recordings and building these decks. I’m going to recruit bilinguals, because with bilinguals, we can get recordings to learn not only the difference between two target-language sounds, like sous and su, but also the difference between target language sounds and our own native language sounds (sous vs Sue). I ran this idea by Dr. McClelland, and he thought that may work even better (hell, we might be able to break the 80% barrier). And I’m going to do a few open-ish beta tests to fine tune them until they’re both effective and fun to use.

Hopefully, with the right tools, we can set the “It’s too late to learn pronunciation” rumors to rest. We’ll have a much easier time learning our languages, and we’ll have an easier time convincing others to forget about our native languages and to speak in theirs.

Gabriel Wyner is the author of Fluent Forever (Harmony/Random House, August 2014) and the Fluent Forever blog. His Kickstarter project, a series of pronunciation trainers in 11 languages, will run until January 2, 2014.