Count Smorltalk fails the Turing test
On some dark days in the depths of the Covid lockdown if you had asked my friends whether text messages emanating from my account were generated by human or by computer, they would have answered that the thing the other end was definitely not human. Sadly, at some points during these bleak times I would have failed the Turing test.
Ok, so I got the Turing test all upside down, I know. But then that’s kind of my point. Humans are weird and say weird things.
Humans are weird and that is what makes us so lovable….and detestable. Weird, irrational, sometimes brilliant, often mind-blowingly stupid. So it is with speakers. And so too it is with interpreters. Perhaps that’s why we tend to click.
Thing is, humans begat computers. And when our AI progeny started to analyse all the nonsense humanity produced, it too started to see the world awry. After all, computers failed to recognise black faces in scans not because the computer didn’t see a black face but because the white programmer trained it only on white faces. Stupidity begets stupidity. We are the authors of much of our own misfortune.
It is good then, that at the heart of the policies that govern our profession, and in conferences that bring together the sharpest minds on AI and interpreting, the guiding principle is that whereas human interpreters are accurate, sensitive and well-read, computers are error prone, unwittingly offensive, and ignorant.
Here is an experiment I would like to perform just to check quite how superior humans are to computers when it comes to interpreting. Pit a state-of-the-art computer interpreting system against a booth of 2 experienced professional conference interpreters working C or B to A in a conference situation for an 8 hour day with a 1 hour lunch break and compare all the output.
Clearly the human interpreters would win. It would be a decisive victory I hear you say.
Ok, fair enough.
Let’s do it for 4 days in a row. And then for 4 weeks in a row and compare the results. Let’s throw in at least one early start and one night shift. Perhaps one of the interpreters might have a cold. Perhaps one of the interpreters might have a baby at home. Perhaps one of the interpreters might have been out on the town the night before… Now let’s compare.
Still no contest I hear you say.
Ok, fair enough.
Let’s do the same next year, and the year after and keep comparing. The question is will there be parity? And if so, when?
Nope, I hear you say, not happening. Not happening because computers don’t understand the world. Computers don’t understand the world and computers don’t like accents.
Ok, fair enough.
Let’s just examine how good humans are at all this.
Human interpreters like to get the written texts of complex speeches ahead of time? Why is that? It’s because written language often lacks the redundancy that equates to thinking time for us humans, and without thinking time we get swamped because we cannot complete one sentence before the next one is on top of us. We need to read the speech through beforehand if we are to get it right. Even then we often struggle if it is delivered too fast.
Speech-to-text systems on computers tend not to make any distinction between sentences with dense structures and ones with looser structures. So long as the speech follows a logical pattern of syntactically correct discourse, the computer will just chomp through it. And if you have ever tried slotting a huge chunk of complex text into an automatic translation module you’ll have had the results back in a fraction of a second. So, speed is not the computer’s enemy; it is the human’s enemy.
Computers don’t like accents. Well, we don’t like accents either. Accents throw us off kilter; we don’t always hear what’s being said. This is particularly true in our C languages. The point is that a typical human interpreter will make mistakes with accents. Train us with enough speeches in Glaswegian English or Viennese German and we’ll get better. Only problem is that exactly the same applies to computers.
In fact, the reason why speech to text goes wrong often is that the computer’s statistical model didn’t have enough data. The reason why a computer might not get the words “Count Smorltalk is not human” right is because most computers have not been trained on a corpus including Dickens’s Pickwick Papers where a near namesake of mine Count Smorltork is to be found. But let’s be honest, really honest here, how many human interpreters have read Pickwick Papers?
So, both the computer and the interpreter will go for the next most statistically proximate solution, “Count small talk is not human”. That then is where things might go off the rails because the computer might then translate that as “Le Comte parler de tout et de rien n’est pas un être humain” in French. But if that were the output then the computer would not yet have seen enough training data. This can be remedied over time.
Talking of data, when I speak at a conference the name Count Smorltalk is on the speakers list. The interpreters will hopefully have the speakers list and they will cast their eye down it and they will hopefully get my name right when I am introduced. But so too will the computer. It will be trained on a corpus for the meeting including the speakers list complete with biographical data, all relevant technical material for the agenda items, and all submitted speeches.
Like human interpreters, provided computers get the audio input in a recognisable accent, have had access to a big enough and relevant enough corpus they perform tolerably well. They don’t mind speed. They don’t get drunk the night before. And they don’t get upset when it goes wrong.
So, back to me and the Turing test. Pitch these musings at a group of conference interpreters and ask them whether a human wrote it and I’m guessing that the majority would answer, “Count Smorltalk is not human!”.
More by Count Smorltalk
Count Smorltalk’s posts on AI
Images: geralt / Gerd Altmann, Pixabay