The Mental States of Language Models

Behaviorist psychologists refused to talk about mental states or internal computations in human brains on the grounds that everything should be understood in terms of stimulus-response relationships. On the other hand linguists and cognitive psychologists assume the existence of internal computations and attempt to understand and model them. Today we are faced with large language models with surprising and perhaps disturbing linguistic behavior. In contrast to behaviorist thinking, it seems important to recognize that during a dialogue a language model chatbot has an internal state — the activations of the artificial neurons in its artificial neural network. By analogy with people I call will this the mental state of the language model. In most current models the mental state of a chatbot is determined by the weights of the model and the dialogue context. One might take a behaviorist position and object that we can then just take the state to be the context (the stimulus). But if people were shown to be deterministic cognitive scientists would still be interested in their internal computations. When we ask if a model believes a certain statement we are not asking a question about the stimulus. We are asking what conclusion the model has drawn and how it will respond to a question. Also, in a model with an internal train of thought, as described below, the state depends on stochastically generated thoughts.

This mental state, this system of neural activations, is often referred to as a “black-box”. It is very difficult to interpret. However, in the current discussions of large language models such as ChatGPT and Bing I believe that it is fruitful to consider what might be contained in the mental states of language models. In particular the coherence of the text generated by the new wave of chatbots would seem to imply that their mental states incorporate some degree of understanding of the dialogue. Considering the mental states of language models provides a cognitive, anti-behaviorist, perspective.

I want to also address the question of whether language models could in principle become sentient. Of course this begs the question of what is “sentience”. When should a system of mental states causally connected to behavior be called sentient? There are various sentience-related properties that also beg for more explicit definitions. Consider “understanding”, “awareness”, “goals”, “motivation”, “feelings”, and “thoughts”. Can a chatbot have internal thoughts and be aware of them? Can a chatbot be aware of the passage of time and the current moment? Rather than try to define sentience I will discuss these related properties and leave it to the reader to decide whether it is plausible that a language model chatbot might possess these properties and whether possession of a constellation of sentience-related properties constitutes sentience.

Memory: Before discussing particular sentience-related properties I want to discuss the issue of memory. Recently there has been much interest in retrieval models. A retrieval model is given access to a corpus of knowledge such as news articles or perhaps pre-digested facts. In a retrieval dialogue model retrieval of information from a corpus is performed during the dialogue. I will call the corpus of information the model can retrieve from the memory of the model. It seems a trivial step to allow the model’s memory to include past dialogues.

Trains of Thought: Chain of thought prompting is a recent development in the interaction with language models. One asks the model to generate a sequence of reasoning steps. Here I will use the more colloquial phrase “train of thought” for the simple idea that the model can generate statements that become part of the context but are not shown to the user. A train of thought model could be made to tag the internal thought as internal so that the bot can determine that the user has not seen the model’s thoughts. All events in the dialogue can also be time stamped so that the model can observe the passage of time. During a dialogue the model has a current mental state which defines “now” where each event in the dialogue (turn or thought) could be tagged with the moment of time (the “now”) at which it occurred.

Understanding: Consider the sentence “Sally finally understood that John would never change his mind”. What does this sentence mean? It means that Sally can now make a certain set of true predictions that she would not have made before she had this understanding. To understand something is to know its consequences. Language models are not perfect at this but neither are people. Perfection is not required in the colloquial notion of what it means “to understand”. This form of understanding is important for generating coherent text and hence it seems reasonable to assume that the mental states of language models embody understanding. If its mental states embody understanding it seems reasonable to say that the language model understands.

Goals and Motivations: It is often said that the goal of a language model is to predict the next word in a corpus of text. To do this well, however, the machine must model the world being described in the language. To predict what a character in a novel will say one must understand that character. For example their goals and motivations. To generate coherent text the language model must build a representation of the context defining the goals and temperament of the characters. The text in a novel often describes an action. A full understanding of that action typically involves an understanding of the goal that it is achieves or the motivation behind it. Modeling goals and motivations seems an important part of modeling language.

A Self-Model: At the beginning of each dialogue the Bing chatbot (Sydney) is fed a rather lengthy start-up prompt describing itself. This start-up prompt seems clearly designed to provide Sydney with an understanding of itself — a self-model. Sydney’s generation of first person speech, as in “I want to x” presumably reflects an aspect of a self-model represented in its mental state. Its self-model presumably provides coherence in its self-descriptions as well as other linguistic actions such as recommendations or requests for information. Since it is trained to produce coherent text it is essentially trained to behave in accordance with a self-model. If Sydney’s self-model includes being friendly then one should expect Sydney to be friendly. It should be noted, however, that Sydney’s mental state, and perhaps its self-model, can evolve throughout the course of a dialogue.

Feelings: Feelings are closely related to goals. To generate text accurately the language model must infer feelings, such as anger or fear, as well as goals and intentions. In generating realistic first person speech, such as “I feel angry” or “I feel hungry” the language model’s mental state needs to develop a model of its own feelings. Clearly our own feelings must somehow be embodied in the neural firings that are our own mental states.

Language models model people: There is a natural tendency to assume that an artificial intelligence would be like a person. After all, people are the only instance of intelligent entities that we know. I have always thought that this anthropomorphic view of AI agents is a mistake — intelligent machines could be dramatically different from people in very fundamental ways. However, language models are trained to model people and the start-up prompt describes Sidney as a person. Hence Sidney’s self-model will be a model of a human — it will incorporate aspects of human nature compatible with the self-description given in its start-up prompt. Although its self-model may evolve over time, it will always be a model of a person. So this form of chatbot will tend to be very human-like. People can be very dangerous.

Intelligence and Hallucination: Of course a sentient machine need not be superintelligent. No one would call Bing a superintelligence. In fact Bing’s understanding seems quite limited. Understanding and intelligence seem intimately related to truth. To be intelligent is to be able to determine truth. Bing is notorious for its “hallucinations” — a tendency to confidently assert stuff it makes up. Hallucination is a failure to be faithful to the truth. Over time Models will undoubtedly improve their ability to determine truth. There are two comments I want to make here. First, populist politicians can be dangerous even when, and perhaps especially when, they have a very poor ability to determine truth. Second, if a machine ever does reach the level of superintelligence, and can really know what is true, we may all have to face the truth.

Conclusions: I personally expect that within this calendar year (2023) chatbots will emerge that fully possess the sentience-related properties described above. I will leave it the reader to decide if this actually constitutes sentience. Sentience is different from superintelligence. The nature of highly advanced intelligence is difficult to foresee. I expect that machines with general intelligence more advanced than people will emerge within this decade. Even in the absence of superintelligence the issues of AI safety are pressing. An accurate understanding of the machines we are creating is now of critical importance.

This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to The Mental States of Language Models

  1. realist says:

    Isn’t sentience another name for consciousness?
    And there is no consensus on what consciousness IS, it’s a huge jungle.

  2. MarkJ says:

    I agree with you about sentience (I think). Eye-rolling is a common response from experts to LLM sentience claims, but personally I don’t think these sentience claims are crazy (I think they are wrong, though, for reasons I give below).

    The reason I think LLM sentience claims are not crazy is because it’s reasonable to accept that there are many different kinds of sentience and consciousness, and someone can reasonably claim that LLMs have a kind of strange, alien sentience. For example, some psychologists and philosophers not unreasonably claim that certain kinds of animals have kinds of self-awareness and goal-directedness that is related to a kind of sentience and possibly a kind of consciousness.

    It’s true that it’s hard to see how consciousness can arise in the activations and connections in a neural net, but it’s equally puzzling to understand how sentience could arise in our neural wetware where it clearly does arise. Our inability to conceive of the physical bases of sentience and consciousness is clearly no barrier to its actual occurrence. (And in fairness to neural nets, they are much more similar to neural wetware than the symbolic and statistical models we both spent decades working on).

    My guess is that current LLMs cannot become sentient because they are missing a crucial component of self-awareness and consciousness: episodic or long-term memory. I think our sense of self arises from our memories of our own previous thoughts. My mental identity is a collection of (more or less) coherent past thoughts, and my current thoughts extend and project my previous thoughts onto my current sensory context to produce new thoughts, which in turn will be added to the memories that will be the basis of future thoughts. (Of course it’s possible there are other essential components for consciousness missing from current LLMs).

    In terms of current model architectures, I think the popular Retriever+Reader architecture should be extended to a 3-component Retriever+Reader+Writer architecture, where the Writer writes “memories” into a permanent store that the Retriever can retrieve in the future. I’m not sure what “memories” are, but I don’t see any reason why they need to be symbolic (e.g., text strings). Perhaps “memories” could be activation vectors – maybe the same vectors that the Retriever uses for its Approximate Nearest Neighbour retrieval.

    I like your suggestion of using a Retriever+Reader architecture to retrieve previous dialogs; it seems to be a good starting point for the 3-component model I sketched above. For example, it by-passes the need to figure out what “memories” ought to be; they would be the dialog itself.

    I’m not sure how we would train such a model. Maybe we could adapt existing multi-step QA or inference tasks, encouraging the model to rely on its own previous output conclusions to answer the current question.

    • McAllester says:

      I agree with al of this. Memory, including memory of thoughts, is discussed in the blog post. I agree that it is essential to a human0like cognitive process.

      • Mark Johnson says:

        How would you train the model to use the memory? Multi-hop QA seems like one way (where you’d gradually introduce longer and longer inferential sequences).

        But that doesn’t really get at what I think episodic memory is really about.

        I think episodic memory plays a role like:

        * given an input sensory stimulus, we retrieve similar situations from episodic memory
        * the episodic memories tell the model details of those situations, what I did in those situations, and the overall outcome of my actions (positive or negative)
        * we use these memories to plan our response to the current input. If the current input is sufficiently similar to one of the episodic memories and if the outcome was positive, then we might act in a similar way to the way we did in our memory.

        Do you have any ideas about training tasks that might encourage a model to use memory in this kind of way?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s