Abstract: The term ‘hallucination’ is used in relation to both human perception and machine learning. ZYX is a sound work that considers how Automatic Speech Recognition (ASR) might be made to hallucinate and how that differs from human hallucination, specifically hallucinations triggered by LSD and grief. The work consists of a voice-over accompanied by filtered speech sounds. Both elements were made through the development and application of an audio filter that overemphasises disfluencies in speech in order to force errors in ASR. The script for the voice-over was written using erroneous output generated this way. Given the problematic ethics of the capitalist development of ASR systems that misrecognise large parts of human speech, the work proposes the forcing of errors as a potential form of resistance—as a disruption to ‘smoothness’ and also as a generative writing method. The sound piece should be listened to on its own, preferably with your eyes closed. A link to endnotes will appear afterwards.
Keywords: hallucination, Automatic Speech Recognition, machine learning, error, disfluency, grief
02:19 … excess …
‘Poetry is the excess which breaks the limit and escapes measure. The ambiguousness of poetical words, indeed, may be defined as semantic overinclusiveness. Like the schizo, the poet does not respect the conventional limits of the relation between the signifier and signified, and reveals the infinitude of the process of meaning-making (signification). Exactness and compliance are the conditions of merit and exchange. Excessiveness is the condition of revelation, of emancipation from established meaning and of the disclosure of an unseen horizon of signification: the possible.’
Franco ‘Bifo’ Beradi, Breathing: Chaos and Poetry, South Pasadena: Semiotext(e) 2018, p. 20.
03:01 … mantra …
Repeating a word or phrase over and over removes those sounds from the normal exchanges of functional daily communication and instead acts on the rhythm of the body. This is the fundamental idea of a mantra or incantation where numinous, sacred or magical qualities are ascribed to the repetition of speech sounds—phonemes, syllables, words or phrases that may or may not have syntactic structure and literal meaning. A mantra might be how you come to make something manifest, calling it to presence.
03:55 … memory …
‘Back in the mainframe era, IBM never used the word “memory” for computer data banks. Instead, they always used “storage.” The rationale for this is that it’s misleading to refer to what is nothing more than a bunch of boxes which store and retrieve bytes (computer storage) with the associative, pattern-matching, highly parallel function of human memory. Calling the computer’s data bank “memory” attributes to it anthropomorphic powers it doesn’t have.’
John Walker, ‘Computation, Memory, Nature, and Life: Is Digital Storage the Secret of Life?’ Fourmilab, September 29, 2004,
04:26 … reality ….
The following is taken from artist Moyra Davey’s Penn Design Lecture, delivered on April 5, 2012, at ICA Philadelphia. See https://vimeo.com/48154937 around 01:19.
Audience member: I was so taken with the uh [clears throat] moment in the goddesses where um you’re letting us hear the recording that you’re speaking over and there’s this kind of sense of, your re repetition is a kind of imprinting of the word that you’re hearing, the word that we’re hearing, then it’s kind of redacting because you’re talking over your own voice [MD yeah] and I got really kind of caught up in this layering and then suddenly you’re you’re going to remember a quote and you can’t remember it and it’s about the void and then there’s the void being the image of the film, what kind of happens in that particular moment in the film you set yourself up to jump off a cliff or something [MD relaxes and laughs] and getting yourself tangled in the layers of [MD sighs yeah] of writing [MD wants to talk now] and repetition and voice and the stamping out and
Moyra Davey (MD): Yeah I mean its um, it’s really ah I think, you know it’s it’s really difficult to just to um to have something be… in a video for instance to just like have something be read to you, you know by a narrator who’s reading a script it’s um, I… I… I at any rate I find that qu often difficult to follow unless you know it’s it’s a trained voice and it’s it’s done really well but often it’s it’s, I mean it’s a lot easier as I’m sure you know, to follow uh the voice of someone who’s like thinking as they speak because its slowed down and it’s um it has a completely different quality from you know the scripted ah narration which this is, but I um, and in 50 minutes I actually I had a script but I ah mem… I attempted to memorise the whole thing and and speak it and I realised um that the most intr interesting parts, you know when I was looking at all the takes, were the moments where I I forgot something, where I you you know repeated something, where I got flustered, I made a mistake, those were like, because those were the moments of spontaneity in this you know very um rehearsed delivery and so for this video ‘Les Goddesses’ I um I couldn’t memorise, it was like there was no way I could memorise this all of that and I got the idea, actually from another artist, Suzanne Bocanegra made made a wonderful piece called when a priest marries a witch and she she used th this device and um but it but I just I kept in ah all of the kind of slips and mistakes because you know as I was saying that’s you know that becomes you know this um uh [vocal creak] it it you know one of this kind of, the for the aspects of layering you know that you mentioned there’s the layering of you know the echo, well it’s not really an echo ’cause you’re hearing it before the voice um but then just this whole sort of layer of the botched performance becomes becomes something interesting and I think it is also is, is more more um, it’s just more ah apprehen.. apprehendable for a viewer or listener its sort of slowed down, and ah you know you can sort of hear you know the gears [smiling] kind of trying to cogitate around this thing um this mistake that’s being made and trying to um uh you know just kind of uh just get caught up [nodding]
(Transcription by Anna Barham)
06.22 … see…
‘When we think about hallucination, we typically think of some kind of internally generated perception, a seeing or a hearing of something that isn’t actually there—as can happen in schizophrenia, or [with] psychedelic[s]. These associations place hallucination in contrast to “normal” perception, which is assumed to reflect things that actually exist out in the world. On the top-down view of perception, this sharp distinction becomes a matter of degree. Both “normal” perception and “abnormal” hallucination involve internally generated predictions about the causes of sensory inputs, and both share a core set of mechanisms in the brain. The difference is that in “normal” perception, what we perceive is tied to—controlled by—causes in the world, whereas in the case of hallucination our perceptions have, to some extent, lost their grip on these causes. When we hallucinate, our perceptual predictions are not properly updated in light of prediction errors. If perception is controlled hallucination, then—equally—hallucination can be thought of as uncontrolled perception. They are different, but to ask where to draw the line is like asking where the boundary is between day and night.’
Anil Seth, Being You: A New Science of Consciousness, London: Faber & Faber, 2021, Kindle, p.101.
08:21 … texture…
‘Acuity enhancement is defined as a heightening of the clearness and clarity of vision. This results in the visual details of the external environment becoming sharpened to the point where the edges of objects become perceived as extremely focused, clear, and defined. The experience of acuity enhancement can be likened to bringing a camera or projector lens that was slightly blurry into focus. At its highest level, a person may experience the ability to observe and comprehend their entire visual field simultaneously, including their peripheral vision. This is in contrast to the default sober state where a person is only able to perceive the small area of central vision in detail.
It is thought that a fundamental feature of information-processing dysfunction in both hallucinogen-induced states and schizophrenia-spectrum disorders is the inability to screen out, inhibit, filter, or gate irrelevant stimuli and to attend selectively to more important features of the environment.
The CSTC model of the brain posits that the thalamus plays a key role in controlling or gating external sensory information to the conscious faculties and is thereby fundamentally involved in the regulation of a person’s awareness and attention. The interruption of psychedelics to these neural pathways that inhibit the sensory gating systems may, therefore, result in an enhanced availability of sensory information which is usually filtered out by these systems. This process is likely also involved in the various visual, tactile, and auditory enhancements which commonly occur when under the influence of a psychedelic experience.’
‘Acuity enhancement,’ Psychonaut Wiki, last modified May 20, 2022, https://psychonautwiki.org/wiki/Acuity_enhancement.
08:23 … limescale …
ZYX developed by thinking about what aspects of speech are missed or disregarded in processes of ASR—i.e., what does smoothness elide? I think of these disregarded paralinguistic aspects as a kind of excess or residue—the characteristics of the actual physical production of speech in an individual’s mouth.
I began by thinking about the difficulty of telling where one word begins and another ends in connected speech. I developed a process of cutting speech recordings on a phonemic level, to remove the precise sounds that are comprehensible as words and to see what remains (quite a lot). This residue included disfluencies such as ‘uh,’ ‘um,’ stutters, repetitions and half spoken words, as well as audible breathing, over enunciation, assimilations, laughter, background noises and the acoustics of the space.
Subsequently, I developed an audio filter—an inverted gate—which could isolate these residues automatically. By applying this filter on top of my own speech and readings of the texts below, I was able to emphasise the ‘excess’ and disfluencies and force the ASR (enhanced dictation macOS 12.3.1) to make errors. In turn, I read the erroneous outputs with the filter, causing the errors to feedback and proliferate within the system—causing the ASR to ‘hallucinate.’ I wrote the script for the voice-over from the multiple versions generated in the process.
Texts used in creating the script:
Personal account of taking LSD.
‘Computer Hallucination,’ Tech Target. n.d. https://whatis.techtarget.com/definition/computer-hallucination.
Seth, Anil, ‘Your Brain Hallucinates Your Conscious Reality.’ Filmed 2017. Ted video. https://www.ted.com/talks/anil_seth_your_brain_hallucinates_your_conscious_reality?language=en.
McKenna, Terence, Psychedelic Advice. Psychedelic (podcast). May 20, 2020. https://podcasts.apple.com/gb/podcast/psychedelic-advice-terence-mckenna/id1500437808?i=1000475208144.
Flaubert, Gustave, The Temptation of St Anthony. Translated by L. Hearn. Introduction by M. Foucault. New York: Modern Library, 2001.
McCarthy-Jones, Simon, ‘Sensing the Dead Is Perfectly Normal and Often Helpful.’ The Conversation. July 19, 2017. https://theconversation.com/sensing-the-dead-is-perfectly-normal-and-often-helpful-81048.
During the development of this work for APRIA, the editors and contributors met regularly on Zoom to share our research and work in progress. In one of these meetings, the editors questioned whether the errors I was trying to force in ASR could really be called hallucinations. They thought apophenia might be a more appropriate term.
Apophenia is the tendency to perceive a connection or meaningful pattern between unrelated or random things. A common example of apophenia is pareidolia—seeing a face in an arrangement of objects like a door handle and screws, or a figure in the clouds. Originating from the study of schizophrenia in the 1950s, the term apophenia is now used in statistics to describe a ‘type I error’ or false positive, and ‘algorithmic apophenia’ describes ‘the illusory consolidation of correlations or causal relations [from a data set] that do not exist in the material world but only in the mind of AI’ (Matteo Pasquinelli, ‘How A Machine Learns and Fails—A Grammar of Error for Artificial Intelligence,’ Spectres of AI, spheres journal #5, 2019. See also Hito Steyerl, ‘A Sea of Data: Apophenia and Pattern (Mis)Recognition’, e-flux journal 72, April 2016).
I stuck with the word hallucination. In part, because I had found it used within the technical community to describe proliferating misinterpretations in ASR specifically, but mainly because hallucination implies a more total illusion. In the above examples of pareidolia, you don’t not see the door handle or the clouds—you are not mistaking them for a face but recognising a similarity between patterns and seeing both at once. Instead, the phenomenon I was interested in was the feeling of believing what you see, of not being able to separate the image and the reality and of that belief altering subsequent behaviour or interpretations. Apophenia might be part of that mechanism, but it doesn’t seem to account for the quality or strength of the feeling.
In an important way, though, I found this question of terms misses the point. The issue with calling proliferating errors in ASR ‘hallucination’ is the anthropomorphism of the metaphor, and in this sense, substituting ‘apophenia’ for ‘hallucination’ is just splitting hairs. Both terms imply some sort of intelligence or consciousness where in reality, ‘artificial intelligence’ is a manifestation of computationally intensive training with large datasets and predefined rules and rewards that, in turn, depend on a much wider set of political structures. (Kate Crawford, Atlas of AI, New Haven, Connecticut: Yale University Press, 2021.) Anthropomorphism is a sleight of hand that distracts from this edifice, and vocal interfaces are a particularly potent form of the misdirection.
- ‘Acuity enhancement,’ Psychonaut Wiki, last modified May 20, 2022, https://psychonautwiki.org/wiki/Acuity_enhancement.
- Beradi, Franco ‘Bifo,’ Breathing: Chaos and Poetry. South Pasadena: Semiotext(e), 2018.
- ‘Computer Hallucination,’ Tech Target. n.d. https://whatis.techtarget.com/definition/computer-hallucination.
- Crawford, Kate, Atlas of AI. New Haven, Connecticut: Yale University Press, 2021.
- Davey, Moyra, ‘Penn Design Lecture.’ ICA Philadelphia. Filmed April 5, 2012. https://vimeo.com/48154937.
- Flaubert, Gustave, The Temptation of St Anthony. Translated by L. Hearn. Introduction by M. Foucault. New York: Modern Library, 2001.
- McCarthy-Jones, Simon, ‘Sensing the Dead Is Perfectly Normal and Often Helpful.’ The Conversation. July 19, 2017. https://theconversation.com/sensing-the-dead-is-perfectly-normal-and-often-helpful-81048.
- McKenna, Terence, Psychedelic Advice. Psychedelic (podcast). May 20, 2020. https://podcasts.apple.com/gb/podcast/psychedelic-advice-terence-mckenna/id1500437808?i=1000475208144.
- Pasquinelli, Matteo, ‘How A Machine Learns and Fails—A Grammar of Error for Artificial Intelligence.’ Spectres of AI, spheres journal 5, 2019.
- Seth, Anil, Being You: A New Science of Consciousness. London: Faber & Faber, 2021.
- ———, ‘Your Brain Hallucinates Your Conscious Reality.’ Filmed 2017. Ted video. https://www.ted.com/talks/anil_seth_your_brain_hallucinates_your_conscious_reality?language=en.
- Steyerl, Hito, ‘A Sea of Data: Apophenia and Pattern (Mis)Recognition.’ e-flux journal 72, April 2016.
- Walker, John, ‘Computation, Memory, Nature, and Life: Is Digital Storage the Secret of Life?’ Fourmilab. September 29, 2004. https://www.fourmilab.ch/documents/comp_mem_nat_life/.