9 reasons not to automate glossary building

Getting AI, an app, or even another human, to complete your glossaries for you is, at best, a zero-sum game. At worst, it’s going to make you make mistakes in the booth.

The recent explosion of AI into our lives has brought the perennially unfashionable issue of glossary-building back to the fore. Can AI or a computer tool create or complete glossaries for us?1 The idea of automating glossary-building has been around much longer than ChatGPT. For years technophile interpreters (including yours truly) have been experimenting with, and software companies have been promoting, a variety of different ways to quickly create glossaries or fill in those pesky blanks in existing glossaries. The methods include:

  • asking AI to create a glossary (ChatGPT / Notion);
  • terminology extraction (SketchEngine, Interpreters’ Help);
  • autofilling via a platform & associated databases (InterpretBank);
  • collaborating on glossaries  (InterpretersHelp);
  • autofilling Excel tables with a macro linked to Google Translate.

Let me tell you why I don’t think any of them are worth the time… yet, at least.

First of all it’s worth thinking about the reasons for creating the glossary in the first place. I see three reasons amongst interpreters I know:
1) to search for terms while they are actually interpreting;
2) to research & activate terminology so that they can recall a term, its equivalent in the other language and how to use it, instantaneously while interpreting;
3) to create preparation and activation material for the next similar meeting.

An ideal world

In an absolute best-case scenario, the method we choose for this automation produces correct equivalents for all the terms sought. So the small group of interpreters that does 1) is all right Jack. But what about the rest of us? 

And even in this ideal world there are at least two problems I see with automation: relevance and activation.

1. Relevance

An AI or computer hasn’t been in the meeting you’re going to work in, so even if it gives correct terms and translations, they may not actually be the ones you need. Here, for example, I asked Notion for “the 15 most useful terms used in Opposition proceedings at the European Patent Office”. The result is a mostly accurate list of terms and their translations. However, they are FAR from being the most useful 15 terms. 

Human interpreters at the EPO would have (and have for years) provided new colleagues with the following: Proprietor, Opponent, novelty, inventive step, sufficiency of disclosure, inadmissible extension of subject matter, clarity, independent claim, allowable, admissible, feature, revocation, maintenance. None of them provided by Notion. An interpreter learning and activating Notion’s list is then not just not saving time, but actively wasting it! 

(The results can be improved by better prompts – frequent for useful for example – but this can only be done in hindsight. An interpreter doesn’t know the results aren’t useful until AFTER they’ve been less than useful!)

2. Activation

Those of us who want to activate terminology are going to have to engage with the terms, several times. (See Gile’s gravitational model of language acquisition.) To manufacture this repetition, many interpreters will one or any of the following:

  • highlight and annotate terms in documents;
  • manually extract terminology (by writing up a word-list2);
  • re-read or test themselves on their word-lists;
  • go over the highlights and annotations in the documents;
  • and sight-translate texts that they have prepped and in which the terms appear.

Quite naturally the most frequently recurring (and therefore most important) terms will be activated most by those same frequent appearances. This is time-efficient.

Any automation will remove at least one of these stages. If activation requires repetition however, then an activation stage will have to be put back in later, for example, testing yourself on the word-list three times instead of two, or adding a stage with flashcards. This is not bad in itself, but we should be aware that the automation has not saved us any time, it has merely pushed an activation activity further down the road. In other words automation is a zero-sum game. 

To lay the foundations of preparation for a future meeting on the same topic, we must first compare the terms we found, or which automation proposed, with what was actually used during the meeting and then update the glossary accordingly. Ideally a glossary for a given meeting would only contain authentic expressions that were actually used at the meeting. So if you didn’t check the computer’s output BEFORE the meeting, you are going to have to check it AFTERWARDS. That takes time. So automation becomes a zero sum game.

The real world

However… then we come to a not-so-best-case scenario in which the automation proposes equivalents in the target language that are wrong – a situation that is exacerbated by the natural forces pushing us to accept these terminological offerings at face value:

  • humans have a tendency to believe what they are told rather than question it;
  • it’s hard to reject work that is being done for you;
  • the less experience we have in a field, the harder it is to spot mistakes in the terms.3 (And almost by definition we are looking up terms in fields we don’t know so much about).

Below is an example of a popular interpreters’ glossary tool getting 5 out of 10 terms wrong when asked to automatically translate a word list.

Logically, but also ironically, interpreters seem to be using these tools for topic areas they DON’T know about. It is therefore much harder to judge whether the results are any good. But if we accept the need to check EVERYTHING a machine produces, then where is the time-saving automation is supposed to offer? If we don’t check, we risk making some pretty awful mistakes.

So let’s have a look at how automated systems get things wrong.

How and why do machines get terminology wrong?
3. Sources

Automated systems are not so fussy about selecting their sources4 (unless you specify the sources… but that is hard to choose reliable sources if we are not familiar with a topic). And they don’t tell us what those sources are anyway. So how can we know if it’s a reliable source?

Different sources will give different results. If you ask for the German for “secondment” you might see “Abordnung”. That’s great if you work at the European Space Agency. But if you’re talking about EU legislation, then you’ll need “Entsendung”5. Choosing a Canadian database instead of a French one or a Catholic Bible instead of a Protestant one could make for embarrassment or even scandal, depending on how unlucky you get.

A glossary entry will not be much use in the future either, if you don’t indicate where the term is used and what the source is for its translation.

4. Context

Machines are not brilliant at context. And there at least 3 types of context:

  • the sentence in which an expression finds itself;
  • the semantic context (the meaning of the text surrounding the term);
  • the place (the people or institution who are using the expression).

Machine translation engines have got quite good at at the first type of context in this list. As you would expect of things now called “Large Language Models”. But they are only good at it when they are provided with full sentences and paragraphs as context. However, tools that promise to translate lists of terms for you will by their very nature first remove the term from its context. As such the automation may not make the rather significant difference between the following:

scaling  … Verzunderung (flaking of metal at high-temperatures, e.g. gas turbines),

scaling  … Belagsbildung (calcium deposits in dishwashers)

scaling …  vergrössern/ verstärken (doing more of something)

Some context is semantic (meaning-based). At the European Parliament, the French short-cut Conférence des Présidents could mean either Conference of Committee Chairs or Conference of Presidents of Political Groups. Only an understanding of the surrounding discussion will make it clear which. Similarly, in the past rapporteur meant either rapporteur or draftsman in English depending on the type of document being drafted, something you could only know by understanding the context.

For the importance of location-as-context for terminological accuracy, see the example of secondment above.

5. Meaning

We linguists know that the scope of the meaning of a term is not always a clear-cut thing. AND that the scope of meaning of two equivalent expressions in two different languages may be different. Take the example of turboréacteur in French. This term is very broad in French and covers all sorts of jet engines, including those used on passenger jets. Linguee (and even Wikipedia) will suggest turbojet as the English equivalent. A turbojet however is a very specific type of jet engine (in which all the incoming air passes through the combustion chamber) and very much excludes the engines used on most passenger aircraft (usually turbo-fans).

If you do the research work yourself (read and understand the explanations that go with the terminology), then you will come across, engage with, and hopefully understand these subtleties.

6. Alternatives

Terminology is also not always a clear-cut thing in that there can be several identical terms with different meanings (homonyms). In the following extract from my own terminology database, the initialisation CIP can be seen to mean two different things in the EU context, 3 different things at the European Space Agency (AFC and PB STS) and even two different things in the same ESA meeting (AFC). It has a further meaning in the context of IPR. 

Even if AI correctly works out which institution or field to take a term from, AI cannot decide which of several options is correct. 

The reverse is also a problem. Automated systems tend not to offer multiple synonymous options to the interpreter for them to decide for themselves. So if you are prepping a meeting about turbines, you will want to know that BOTH Schaufel and Blatt are acceptable German versions of the English blade. But a machine will only suggest one to you. That leaves you unprepared for the moment a delegate uses the other term.

Beware any automation that doesn’t propose several options.

There is a simple way to test any automation system though – input terms in one language from a glossary that you already know to be correct and ask the tool to translate. This is what I have done, many times, to arrive at this article.  

7. Noise

Even if you eliminate the risk of “error” by specifying that terms should be taken from two different language versions of the same document, you still have the issue of noise – that is unnecessary results. Automation tools tend to define “term” as being something unusual compared to its control corpus of words. An interpreter thinks of terms as 1) something unusual compared to their own very personal mental control corpus and 2) what will be used in the meeting. It takes time to sift through a long list of automatically extracted (but not very useful) terms to arrive at a list of useful terms.

The question is, when you get the document last minute, whether those 5 minutes are better spent skim reading the document or sifting through the haystack of automatically generated terms. Quite apart from the terminological advantages, reading also gives you an overview of the content of the document.

Unrelated to the technological tools above but related to the topic of this article we also have:

8. Collaborative terminology

Some tools (Interpreters’ Help, Quizlet) will let you share glossaries and/or flashcards. This may not strictly speaking count as automation but the principle is the same, in that someone (rather than something) else is making the glossary for you. Many of the same problems apply. Handle with extreme caution! Unless you know the person in question is an expert, and was working in exactly the same type of meeting/institution as you will be, then avoid this. Also, we all need/want slightly different terms, so relevance is an issue here too.

9. Equivalence

There is an issue that is fundamental to translation theory and glossary building which I can hardly omit from an article aimed at linguists. That is the question of whether word-for-word equivalence – on which glossaries are inevitably built – is the best way to translate from one language to another. To give just one example of the problem, nouns in noun-heavy languages are very often best translated by a verb phrase in English. Expressions like la féminisation de la pauvreté are consistently translated by machines into English as feminisation where any human translator or interpreter worth their salt would most likely opt for something like poverty is increasingly affecting women. This problem does also affect manually created glossaries but is exacerbated by automatic translation tools’ preference for translating with the same part of speech.

Finally, I don’t even have the time and space here to discuss other potential reasons not to automate glossary building, like a future fast, reliable and accurate tool, (at a price we’re willing to pay) still falling foul of a given client’s confidentiality rules and being unusable as a result. Or the advantages (over and above activation) of doing the legwork yourself, like improved motivation and understanding coming from that engagement with the topic.


It’s a good thing that new tools are being developed, and that interpreters are testing them, reviewing them and generously sharing their experience. But we must remain realistic about what they can actually do for us and whether that is good enough to be useful. 

It’s all too easy to be impressed by an apparently instantaneous set of results when we are asking about topics that are new to us (and we can’t really judge the results). 

We should also be clear about what automation adds and what it takes away. And whether or not we are really saving time or just moving it around our preparation process. Or worse still, taking longer. Investing time in understanding what each tool can do for you; learning how to use the tool; and organising the documents into a format the tool can use; and checking the machine’s output; are all time-consuming processes that precede any “instantaneous” result.

Some things, like pierogi, are still best made by hand!


1. See for example:  5 Tedious Non-Translation Tasks ChatGPT Can Do Amazingly Well; Automating translation and crowdsourcing

2. NB hand-written notes are recalled better than typed notes, so old-school hand-written glossaries may be a better activation tool.

3. The negative effects are aggravated by the fact that research suggests that inexperienced interpreters rely more on glossaries and glossary-building that experienced interpreters, who tend to focus on conceptual understanding of the field in question.

4. InterpretBank does allow you to choose which glossary to use for the translation.

5. Actual test result with Notion.

About author View all posts Author website

Andrew Gillies

Andy Gillies is a conference interpreter and interpreter trainer based in Paris. He works from French, German and Polish into English at the European Parliament, the European Patent Office and the European Space Agency, amongst other places. He’s written extensively on interpreter training and has published other occasional musings about the profession for industry outlets.

A part-time nerd, Andy has always been interested in how interpreters can leverage tech to their advantage - be it glossary software; tablets for consecutive; Learning Management Systems for training; or more recently… RSI.

And failing that he's also written about how not to advertise being a luddite.

1 CommentLeave a comment

  • Hi Andrew & the readers

    Considering that this AI is not intelligent at all, but just a bit more sophisticated than MT, because it stea., hem it grabs data from the Web, I think it cannot beat a human translation made by an over-average translator, while it can certainly beat a sub-average one like MT does since years

    That said, in the name of the Mammon, high-quality translations (and glossaries are at its core) are less and less valued nowadays, apart certain niche fields (and Medicine isn’t YUK), so I am very stressed by this novelty, and if I were thinking today to start a career in the linguistic field (or in creative fields that are also deeply affected), I’d change my path for sure

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.