Violeta Demonte is Emeritus Professor of Spanish Language at the Autonomous University of Madrid, and a corresponding member of the Spanish Royal Academy. Her main fields of specialization are linguistic theory and descriptive grammar. She created the Linguistics and Cognitive Science Group at the Spanish High Center for Scientific Research (CSIC). She has received, among other awards, the “Ramón Menéndez Pidal” Spanish National Research Award in the Humanities.
***
The UN Digital Agency is working on a proposal to create a “watermark” to identify AI-generated text and multimedia materials. Do you think this is a viable proposal?
VD– Several large technology companies, especially those working with Large Language Models (LLMs), such as Chat GPT, as well as many states and large organisations, have been considering the need to develop technologies to make environments where artificial intelligence (AI) plays a central role more secure and reliable. One of the ideas that has been in development for some time is designing techniques to “watermark” AI-generated documents. These watermarks would make it possible to identify not only the nature but also the origin of the content generated. It is not easy to sum up the technology for inserting watermarks into AI-generated text, images and audiovisual material. To put it very simply: watermarks are created during the data training process and come out of algorithms that teach the models to “insert” a certain watermark.
Ideally, this is a very interesting and powerful technique as it could be used, among many other things, to identify fake or distorted news, to detect cases of copyright infringement, and to authenticate content. Furthermore, such mechanisms, properly used, would allow states to better regulate the implementation and use of AI-generated products. Legislation in the United States, the European Union, and China already requires facilitating the detection and origin of AI-generated contents (see the European Parliament Research Service Briefing of 23 December 2023: “Generative AI and watermarking“).
Are these techniques and these laws viable? At the moment they face some major problems. Technologies devised to remove or modify watermarks, the use of which can be justified or unjustified, already exist. Watermarks also vary between suppliers and they are not all distinguished by the same procedures. Their insertion appears to introduce bias into the content being generated. They also sometimes yield false positives. All of these can be considered technical problems, which will most likely be solved given the efficiency of the developers of artificial intelligence techniques. But their viability and effectiveness will not be guaranteed, and therefore they will not be truly viable, as long as there is no strong civic consensus among technologists, distributors, and users that pushes them to use AI for the common good, with guarantees of reliability and veracity. Unfortunately, when a tool can be used both righteously and perversely, the first thing to do is persuade humans that its misuse is not acceptable. At the same time, measures must be taken to restrict misuse as much as possible. A widespread use of watermarks in the field of machine translation would be very important to detect plagiarism, errors of specialisation, or the inadvisable replacement of experts by means that offer fewer guarantees.
BT – Do you think that linguistics can contribute in any way to defining ethical and legal standards for machine translation?
VD– It must certainly contribute. But let us first recall the general context. Legal scholars are considering the legal implications of AI and its products (texts, images, videos) and are asking questions such as: who is the intellectual owner of a work generated by AI or machine translation? How is copyright infringed when texts are generated by AI? Should the rights of authors whose texts are used to train large language model algorithms of large language models, machine translation [MT] algorithms, be recognised?
As regards copyright, even though I am by no means a specialist in such a complex subject, the answer to the first question is twofold: some argue that authorship of machine-translated texts should be recognised, and many believe that authorship is only attributable to human beings, since the creation of texts requires subjectivity and agency – properties which machines do not possess. Thus, models that generate language cannot benefit from copyright. This view implies that if, for example, a translation model happens (just by chance!) to generate a text over which someone has intellectual property rights, the penalty for that infringement should fall on the individual who generated the content, not on the company that designed a model capable of generating copyright-protected texts that are protected.
It seems that machines (and their owners) are always spared. Surely all this should be the subject of intense legal scrutiny.
To focus on your question: of course, I think that linguistics and translation studies have much to contribute to the task of establishing these legal standards. Specialised translators have the necessary linguistic training to spot various types of error in the translation of legal (or medical, or administrative) texts. But this alone is not enough: courts and administrations should have protocols to accept translations of texts with legal implications, establishing extensive “adequacy and style requirements” that would illustrate the potential errors that may be found and assure users that documents accepted by the court or office in question are fully legible and fully correspond to the original document.
As far as ethical standards are concerned, it is for states to ensure that the use of these tools is compatible with the standards of correctness, truthfulness, “what ought to be”, and proper conduct – the ethical norms – that should govern human conduct and action. In both cases, translators and the language sciences can help to set standards if they adequately define and illustrate linguistic biases, deliberate falsehood and lies, among many other issues. In my opinion, we are still far from having established them.
BT– Regarding the medium- and long-term future of translation, do you think that translation, as a profession, will eventually become a form of editing – that translators will become post-editors of machine translations (to a much greater extent than they already are)?
VD– As I tried to show in my lecture at Fundación Telefónica in September this year, machine translation, like everything that comes out of any of the forms of generative artificial intelligence, can produce various kinds of results: some are very acceptable, some are relatively flawed (hallucinations, errors in the use of certain meanings of a word, and problems with discursive connectors are persistent even in the best MTs), some are highly very flawed, and some are simply awful. Some translation experts argue that, despite this, the movement is unstoppable, and the degree of fluency and consistency of MT can only become better and better. Thus the replacement of translators by MT will come, sooner or later.
I am not so sure. I think it depends on the type of translation in question and, of course, on the ability of AI engineers and technicians to handle the many aspects of this technology that have to do with creativity, precision, and a sophisticated mastery of linguistic resources such as irony, metaphor, idioms, rhythm, tone, the expression of emotion, the ability to resonate with the user… All areas in which machine translation is not very successful so far. The “evaluation methods” for the various types of machine translation based on neural networks (such as BLEU) use very different figures to quantify the quality of machine translations of what we would call a standard or normal text (a job application letter, the summary of a conference, a conventional administrative that uses widely known terminology) – which generally have very few errors – from those used on translations of texts in highly specialised fields (such as the legal texts I mentioned before), let alone translations of literary texts, which can yield an 80% error rate.
In any case, let us not fool ourselves – it is a fact that many translators are already post-editors. Machine translation is very fast and easy, which is why corporations and international organisations are interested in it. As I recalled at the Fundación Telefónica talk, the European Commission’s Directorate General for Translation currently (Google, August 2024) has a team of 1,145 people: 660 translators, 235 translation assistants and 235 support staff members. In 2020 there were 2000. In other words, there has been a 40% reduction in the team. It should also be borne in mind that good publishers continue to hire human translators whose names appear on published texts. In the field of literary translation, great contemporary authors continue to translate the classics. The Colombian writer Juan Gabriel Vásquez has just translated Joseph Conrad’s Heart of Darkness, just as, years ago, Borges and Cortázar translated Virginia Woolf and Edgar A. Poe respectively. We shall see.
BT– How do you think this will affect the economic aspects of translation? Do you think that post-editing could become a technological as well as a humanistic profile, or are we heading for job degradation?
VD– I don’t know what the situation is like right now. Decades ago I translated a number of philosophy and linguistics books for good Spanish publishers, and the truth is that there was no correlation between the time and effort involved and the remuneration I was paid. In all fairness, a distinct role for translators (who are still translators when editing or proofreading) should not lead to a difference in salary. In principle, the specialised work of grammatical, semantic, conceptual correction, and stylistic adaptation is highly technical and complex, and perhaps requires a greater effort than translation without any intermediary other than the author of the text.
Editing/proofreading should be a profession that is highly valued both socially and professionally. And of course it should have a professional profile equal to that of a translator. Both translation and specialised editing/proofreading require adequate scientific training in the content of the text, deep knowledge of both the source language and the target language, good materials to consult as needed, extreme concentration, a desire to achieve a good style, extensive cultural knowledge… and I am probably forgetting many other factors. The difference is that the workload for proofreaders/editors is greater since, in addition to all this, they face the often unpleasant task of detecting potential flaws in the text assigned. Depending on the subject, this task may be more or less easy.