Protecting the Information Ecosystem

Emily M. Bender is a professor of Linguistics at the University of Washington in the US who specializes in Computational Linguistics, Natural Language Processing, and the societal impact of language technologies, particularly artificial intelligence. She is the co-author of the papers where the Stochastic Parrot and the Statistical Octopus, which we discussed in our Glosses from the AI Age series, first made an appearance.

In her remarks in an expert round table held this year at the US Congress, she used the metaphor of information as an ecosystem that must be protected from “spills” of synthetic media produced by AI. “Deepfakes” – synthetic images, audio, and video – have been the focus of much attention, but, she points out, not enough has been done about synthetic text.

Bender mentions the case of AI-generated books on mushroom collecting, published on Amazon, that could induce readers to making lethal mistakes. Some online sports news sites have also published AI-written articles on non-existing athletes, complete with AI-generated headshots – not quite deadly but still misleading.

In a paper written with computer scientist Chirag Shah, Bender has expounded on the metaphor of information as an ecosystem: “a system of interlocking actors which depend on each other and are irretrievably connected to each other.” This ecosystem, they argue, is now being polluted and endangered by the use of large language models (LLMs) and image generation systems.

We consider the Web as built up out of relationships of trust and ask what happens when those relationships are damaged, look at it as an interconnected system where a toxic spill in one place can spread to others, and underscore the importance of considering the system as a whole when designing tools. (Shah, Bender, 2024, p. 13)

As Bender points out, AI has now made it extremely easy to generate articles that closely mimic real reporting to cast doubt on such crucial topics as vaccine efficacy, which has polluted the information ecosystem to the point that it is now more difficult than ever to distinguish fact from fiction. This has enormous consequences. As she puts it:

When it’s hard to find trustworthy sources of information, it’s also hard to trust them once we’ve found them. And a public that cannot find and trust trustworthy information cannot participate effectively in collective efforts like democracy or public health. (Bender, 2023)

Hence, she argues, protections are needed.

Bender’s argument has been met with resistance based on the argument that placing restrictions on the use of AI would be tantamount to restricting free speech. It has been argued that restricting AI technology implies placing restrictions on the code that supports it, which, being considered a form of free speech, would be protected by the 1st Amendment to the US Constitution.

However, this legal argument seems to stand on shaky ground. Salib, Bloomfield, and Rozenshtein have pointed out that the design of an AI system – the statistical weights placed on its components – is almost never the expression of a human being. This is because, as opposed to computer codes written by human programmers, the codes used to run AI are generated by automated means:

No human writes them: They are created through vast machine learning operations. No human can even read them: It took a group of the world’s leading experts of AI interpretability months, and large computational resources, to even begin unpacking the inner workings of a current medium-sized model. Thus, humans have virtually no meaningful influence over or understanding of what model weights might communicate in their raw form. In general, code like that—including the “machine code” of traditional software—is not well understood as human expression.

As to the AI system’s outputs, these are by design not the expressions of any human being—neither of an AI system’s creators nor its users. (…) This is the whole point of large language models—to be conversational engines whose range extends far beyond the limits of their creators and users’ own knowledge, beliefs, and intentions. (Salib, Bloomfield, & Rozenshtein, 2024)

Bender herself notes, from a linguistic point of view, that “not everything that looks like speech is in fact speech in the sense of conveying a person or group of people’s communicative intent”. As we have argued elsewhere, there can be no real speech (free or otherwise) if there is no cognition.

One approach to identify AI-generated contents would be to create a system of machine-readable watermarks, which would identify synthetic media and make it possible for users to set filters. Indeed, ITU, the United Nations agency for digital technologies, is already calling for international standards for AI watermarking.

Another possible approach is that taken by the Coalition for Content Provenance and Authenticity (C2PA) – a collaborative initiative by leading firms including Adobe, Intel, Microsoft, and Sony – to promote provenance data (invisible metadata embedded in digital content to identify the creator and verify the content’s authenticity) by means of blockchain technology. This technology would also make it possible to add consent data, such as the consent for AI data mining, to metadata.

The ITU recently held a global workshop on “Detecting deepfakes and Generative AI” as part of its “AI for Good” series. Furthermore, research is now underway in both the public and the private sectors to find ways to identify AI-generated content and ensure data authenticity.

After all, as Bender puts it, we should all be free to choose whose speech we attend to.

In this article:

Bender, E. M. (2023, June 14). Advocating for protections for the information ecosystem. Medium. https://medium.com/@emilymenonbender/advocating-for-protections-for-the-information-ecosystem-89fbe95e9de2

Coalition for Content Provenance and Authenticity. (n.d.). C2PA: The Coalition for Content Provenance and Authenticity. C2PA. https://c2pa.org/

Hern, A. (2023, September 1). Mushroom pickers urged to avoid foraging books on Amazon that appear to be written by AI. The Guardian. https://www.theguardian.com/technology/2023/sep/01/mushroom-pickers-urged-to-avoid-foraging-books-on-amazon-that-appear-to-be-written-by-ai

International Telecommunication Union. (2024, May). AI watermarking: A watershed for multimedia authenticity. ITU Hub. https://www.itu.int/hub/2024/05/ai-watermarking-a-watershed-for-multimedia-authenticity/

ITU AI for Good. (2024, April 10). Detecting deepfakes and generative AI: Standards for AI watermarking and multimedia authenticity. AI for Good. https://aiforgood.itu.int/event/detecting-deepfakes-and-generative-ai-standards-for-ai-watermarking-and-multimedia-authenticity/

Leclerc, A. (2023, August 3). Sports Illustrated AI-generated writers: What it means for the future of journalism. Futurism. https://futurism.com/sports-illustrated-ai-generated-writers

Nixon, J. The Unconstitutionality of California Senate Bill 1047: A First Amendment Analysis. Preprint. https://drive.google.com/file/d/1a7JLpcKvgNeqL-Eum3yE-85tNL7-e2ni/view

Salib, P. N., Bloomfield & D., Rozenshtein, A. Z. (2024, June 7).AI Safety Laws Are Not (Necessarily) a First Amendment Problem. Lawfare. https://www.lawfaremedia.org/article/ai-safety-laws-are-not-(necessarily)-a-first-amendment-problem

Shah, C. & Bender, E. M. (2023). Envisioninng Information Access Systems: What Makes for Good Healthy Web? ACM Trans. Web, 18(3). https://doi.org/10.1145/3649468