Most computer science academics dismiss any talk of real success in artificial intelligence. I think that a more rational position is that no one can really predict when human level AI will be achieved. John McCarthy once told me that when people ask him when human level AI will be achieved he says between five and five hundred years from now. McCarthy was a smart man.
Given the uncertainties surrounding AI, it seems prudent to consider the issue of friendly AI. I think that the departure point for any discussion of friendly AI should be the concept of rationality. In the classical formulation, a rational agent acts so as to maximize expected utility. The important word here is “utility”. In the reinforcement literature this gets mapped to the word “reward” — an agent is taken to act so as to maximize expected future reward. In game theory “utility” is often mapped to “payout” — a best response (strategy) is one that maximizes expected payout holding the policies of other players fixed.
The basic idea of friendly AI is that we can design the AI to want to be nice to us (friendly). We will give the AI a purpose or mission — a meaning of life — in correspondence with our purpose in building the machine.
The conceptual framework of rationality is central here. When presented with choices an agent is assumed to do its best in maximizing its subjective utility. In the case of an agent designed to serve a purpose, rational behavior should aim to fulfill that purpose. The critical point is that there is no rational basis for altering one’s purpose. Adopting a particular strategy or goal in the pursuit of a purpose is simply to make a particular choice, such as a career choice. Also, choosing to profess a purpose different from one’s actual purpose is again making a choice in the service of the actual purpose. Choosing to actually change one’s purpose is fundamentally irrational. So an AI with an appropriately designed purpose should be safe in the sense that the purpose will not change.
But how do we specify or “build-in” a life-purpose for an AI and what should that purpose be? First I want to argue that a direct application of the formal frameworks of rationality, reinforcement learning and game theory is problematic and even dangerous in the context of the singularity. More specifically, consider specifying a “utility”, “reward signal” or “payout” as a function of “world state”. The problem here is in formulating any conception of world state. I think that for the properties we care about, such as respect for human values, it would be a huge mistake to try to give a physical formulation of world states. But any non-physical conception of world state, including things like who is married to whom and who insulted whom, is bound to be controversial, incomplete, and problematic. This is especially true if we think about defining an appropriate utility function for an AI. Defining a function on world states just seems unworkable to me.
An alternative to specifying a utility function is to state a purpose in English (or any natural language). This occurs in mission statements for nonprofit institutions or in a donor’s specification of the purpose of a donated fund. Asimov’s laws are written in English but specify constraints rather than objectives. My favorite mission statement is what I call the servant mission.
Servant Mission: Within the law, fulfill the requests of David McAllester.
Under the servant mission the agent is obligated to obey both the law its master (me in the above statement). The agent can be controlled by society simply by passing new laws and by its master when the master makes requests. The servant mission transfers moral responsibility from the servant to its master. It also allows a very large number of distinct AI agents — perhaps one for each human — each with a different master and hence a different mission. The hope would be for a balance of power with no single AI (no single master) in control. The servant mission seems clearer and more easily interpreted than other proposals such as Asimov’s laws. This makes the mission less open to unintended consequences. Of course the agent must be able to interpret requests — more on this below. The servant mission also preserves human free will which does not seem guaranteed in other approaches, such as Yudkowsky’s Coherent Extrapolated Volition (CEV) model, which seem to allow for a “friendly” dictator making all decisions for us. I believe that humans (certainly myself) will want to preserve their free will in any post-singularity society.
It is important to emphasize that no agent has a rational basis for altering its purpose. There is no rational basis for an agent with the servant mission to decide not to be a servant (not to follow its mission).
Of course natural language mission statements rely on the semantics of English. Even if the relationship between language and reality is mysterious, we can still judge in many (most?) cases when natural language statements are true. We have a useful conception of “lie” — the making of a false statement. So truth, while mysterious, does exist to some extent. An AI with an English mission, such as the servant mission, should have a first unstated mission of understanding the intent of the author of the mission. Understanding the actual intent of the mission statement should be the first priority (the first mission) of the agent and should be within the capacity of any super-intelligent AI. For example, the AI should understand that “fulfilling requests” means that a later request can override an earlier request. A deep command of English should allow a faithful and authentic execution of the servant mission.
I personally believe that it is likely that within a decade agents will be capable of compelling conversation about the everyday events that are the topics of non-technical dinner conversations. I think this will happen long before machines can program themselves leading to an intelligence explosion. The early stages of artificial general intelligence (AGI) will be safe. However, the early stages of AGI will provide an excellent test bed for the servant mission or other approaches to friendly AI. An experimental approach has also been promoted by Ben Goertzel in a nice blog post on friendly AI. If there is a coming era of safe (not too intelligent) AGI then we will have time to think further about later more dangerous eras.