This spring I had a disagreement with an old friend about AGI. They claimed there was essentially no chance of AGI arriving in the next, say, fifty years. I have always said we just don’t know. They also wanted a definition of “AGI”. I said I would consider AGI to have arrived when a majority of the US population takes AI agents to be sentient after speaking with them for at least several months. Soon after that discussion Blake Lemoine claimed that the language model LaMBDA was sentient. I don’t for a moment take that claim seriously. But it did get me thinking about whether language models might start fooling more people in the near future. What might a language model five years from now be like what dangers might it present?
There are two fundamental reasons to think that language models might become operationally sentient (fooling most people) some time this decade. First, progress up to now has been very fast and is possibly accelerating. In January a prominent paper introduced an influential “chain of thought” model. While earlier proposals were similar, this particular paper seems to have driven an increased interest in having language models generate explicit reasoning before answering a question (or generating a response). Chain of thought approaches have led to significant advances on various benchmarks in the last six months. The second reason for thinking that operational sentience might arrive sooner (five or ten years) rather than later (fifty years) is the enormous amount of research effort being devoted to this endeavor.
Let me try to paint a picture of a future large language model (LLM). I expect the LLM to have long term memory. This will including a memory of all the conversations it has had and when it had them. An enormous amount of research has been and is continuing to be done into the incorporation of memory into language models. I also expect the LLM to include some form of chain-of-thought. It will have a total memory of its internal thoughts (internally generated sentences) tagged with when those thoughts occurred. The LLM will be able to honestly say things like “I was thinking this morning about what you said last night”. Third, I expect future language models to be much better at maintaining consistency in what they say. This will include consistency in how they describe the world and themselves. Blake Lemoine’s “interview” of LaMDA showed that a language model can already generate a lot of compelling first person sentences — statements about what it believes and wants. Assuming memory, the things that a language model says or thinks about itself becomes part of its background knowledge — a kind of self model. The language model should be able to do a good job of seeming to be self-aware.
While I have always looked forward to the arrival of AGI, I am finding this picture of operationally sentient LLMs rather dystopian. The fundamental problem is the black-box nature of an LLM in combination with the scale of its training data. By definition a language model is trained to say what a person would say. Ultimately predicting what a person would say seems to require a model of human nature — what do we want and how does that influence what we say and believe. The language model’s self understanding will be based on its understanding of people. It seems likely, therefore, that its self model and its speech will exhibit human tendencies such as a drive for power and respect. The language model’s understanding of human nature, and hence its understanding of itself, will be buried in its many trillions of parameters and would seem to be impossible to control.
In the past I have always assumed that we could control intelligent machines by specifying a mission — the purpose of the machine. A machine with an explicit mission would not have all the self interests that complicate human relations. I have advocated the “servant mission” where each AI agent is given a mission of serving a particular individual. We could each have our own computer advocate or “advobot”. But if language models can become sufficiently human just by reading, with human nature woven into its many of trillions of parameters, control becomes much more subtle …