How to perfect perfect speech

Count Smorltalk does voiceover

Many moons ago I wrote about heteronyms. You know, those awkward words spelled the same but pronounced differently in different contexts. If you only have a second to think about it, how do you know it is second and not second? How did you know it was “How to perfect perfect speech” in the title? Context, as always, is the answer.

In this fourth and final post on AI and interpreting, I turn my attention to speech synthesis, the bit of the automated interpreting system that gets translations into the ears of the listeners. Having already speculated that AI speech-to-text is only about 10 years from human parity (Were Dare Aerate) and having already noted that English to German machine translation is already at human parity (Je pense, donc je suis), I now deliberate on the quality of AI text-to-speech. It’s deliberate, of course. See what I did there?

A computer has a range of different things to grapple with when reading text aloud. Heteronyms are one problem. But there are also things like 75, $, and Mr. To us they are obviously seventy-five, dollar and Mister. Easy for us. More difficult for the computer. Of course, many an interpreter working from French will tell you that the number soixante-quinze isn’t quite as easy for them to get right as the word Paris or even the words plaque d’immatriculation. But that’s just by way of illustration of the kind of problems humans have.

So, before text can be voiced, it has to be converted into some kind of phonetic transcription. It also has to be made into chunks that will enable the synthesizer to know where to go up and down in intonation to reflect phraseology.

So, given that even the simple job of reading text aloud is so difficult for a computer compared to a human, surely this is one area where we reign supreme. Yes, we do. If I were to read this text out and a computer were to read it out, I bet I would sound better. No contest.

But let’s just pause for a minute on our own pedestal and think about those times where that might not be quite as clear cut. What about when I have a cold? What about when I sound like an Englishman but my audience is Scottish? What about when my brain is so busy listening and translating that I misspeak and don’t notice silly slips like “present” instead of “president” or another common one “three” instead of “four”. Strange things happen when cognitive load is high and the pressure is on. Then there is the commonest of all problems for interpreters, sound…ing…… …liiiiiike….we’re….strugglingandtryingto geeeet…..the words out……………. right.

So you see, not everything that reaches the listener’s ears coming from the interpreting booth sounds dulcet. See also my earlier post on frogs.

I don’t need to tell you how good speech synthesis has become. Many newspapers have speech synthesis buttons on their articles. Cleverly the newspaper articles that land in my mailbox seem to know that I have a preference for a softly spoken posh lady from southern England. In the future I could imagine that listeners will be able to select the voice and the accent they would like the interpretation in. Just like the voice in the SatNav in your car. No snotty colds. No irritating accents. No …. runningbitsalltoegther… then leaving a loooooooog…….. gaps.

But perhaps, like vinyl records in the time of streaming, in the future, after humans stop interpreting, niche listeners will miss the idiosyncrasies of human delivery and pay more for a human. Stranger things have happened.

I have gone on long enough. It is time to sum up this romp through AI and interpreting.

In my first post I mentioned an experiment with a computer and a booth of two interpreters in a contest to see which would win. And I conceded that as things stand it’s a hands down win to humans. But I did ask the question of whether computers would catch up, and if so when. For me, “when” is possibly not that troubling as I am fairly long in the tooth and only have a relatively short time in the booth before I retire. I may get asked to retire before then, but that’s another story.

But what about youngsters entering the profession now in their twenties and thirties? I would have thought that the when question was really rather important. It’s a long slow struggle to get into interpreting, and a longer struggle to get comfortable and make a living.

In a time when everything is so uncertain, when travel bans and quarantines and closed borders are making it impossible for some freelance interpreters to work in the booth it may seem like the supply/demand curves are in a good place for interpreters. Yes, that seems to work to the advantage of some, at least. New ways of solving old problems are being rushed to market by institutional employers and more generally in the marketplace. Remote Simultaneous Interpreting is seen by some as a salvation in dark times. By others it is a vRSIus that must be contained and preferably irradicated. Interpreters on contracts may feel exposed to the risks of Coronavirus but at the same time feel secure from the risk of worklessness and penury experienced by others. This is a good time to ask the question “when” because prospects and opportunities and policy will depend on what the answer is.

In the excellent Artificial Intelligence and the Interpreter webinar series hosted recently by AIIC UK & Ireland, the conclusion reached by panellists was that AI is not currently a major threat to interpreters. The Head of the Strategy and Innovation Unit at the European Parliament’s Directorate-General for Logistics and Interpretation for Conferences (DG LINC) said that AI was not in his top five threats. Similar views were espoused by Head of the Meeting Services and Interpretation Section, Conference Division at the International Maritime Organisation. And interestingly, academics, recruiters, entrepreneurs and experts working in the sector tended to agree with the notion that interpreters are not at risk.

As you will have seen from what I have written I am no expert in AI or in speech and language processing. But I am a little bit of an expert in simultaneous interpreting, having spent over a quarter of a century doing it. And sometimes quite badly. My observation is that with so many parties putting interpreters on a pedestal, but with little real understanding on the AI side of what interpreter quality is like in the real world, and with little appreciation on the interpreter side of quite how close science is to cracking this one, there is a cleft in which the reality and imminence of the threat is getting lost.

Change comes. Change is not sentimental.
Let’s not pull the wool over our own eyes. We’re toast.

Wait! We’re not toast. We are so not toast. WE ARE SAVED!

Google says “We’re toast” is:
Nous sommes truffés
Wir stoßen an
Siamo brindisi
Jesteśmy tosty

I’ll let you know when it gets anything right.

More by Count Smorltalk
Count Smorltalk’s posts on AI

Images: Pete Linforth, Pixabay; Elena Mozhvilo, unsplash

About author View all posts

Count Smorltalk

Count Smorltalk

is an English booth interpreter. He wishes to remain anonymous.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.