Friendly AI and the Servant Mission

Most computer science academics dismiss any talk of real success in artificial intelligence. I think that a more rational position is that no one can really predict when human level AI will be achieved. John McCarthy once told me that when people ask him when human level AI will be achieved he says between five and five hundred years from now. McCarthy was a smart man.

Given the uncertainties surrounding AI, it seems prudent to consider the issue of friendly AI. I think that the departure point for any discussion of friendly AI should be the concept of rationality. In the classical formulation, a rational agent acts so as to maximize expected utility. The important word here is “utility”. In the reinforcement literature this gets mapped to the word “reward” — an agent is taken to act so as to maximize expected future reward. In game theory “utility” is often mapped to “payout” — a best response (strategy) is one that maximizes expected payout holding the policies of other players fixed.

The basic idea of friendly AI is that we can design the AI to want to be nice to us (friendly). We will give the AI a purpose or mission — a meaning of life — in correspondence with our purpose in building the machine.

The conceptual framework of rationality is central here. When presented with choices an agent is assumed to do its best in maximizing its subjective utility. In the case of an agent designed to serve a purpose, rational behavior should aim to fulfill that purpose. The critical point is that there is no rational basis for altering one’s purpose. Adopting a particular strategy or goal in the pursuit of a purpose is simply to make a particular choice, such as a career choice. Also, choosing to profess a purpose different from one’s actual purpose is again making a choice in the service of the actual purpose. Choosing to actually change one’s purpose is fundamentally irrational. So an AI with an appropriately designed purpose should be safe in the sense that the purpose will not change.

But how do we specify or “build-in” a life-purpose for an AI and what should that purpose be? First I want to argue that a direct application of the formal frameworks of rationality, reinforcement learning and game theory is problematic and even dangerous in the context of the singularity. More specifically, consider specifying a “utility”, “reward signal” or “payout” as a function of “world state”. The problem here is in formulating any conception of world state. I think that for the properties we care about, such as respect for human values, it would be a huge mistake to try to give a physical formulation of world states. But any non-physical conception of world state, including things like who is married to whom and who insulted whom, is bound to be controversial, incomplete, and problematic. This is especially true if we think about defining an appropriate utility function for an AI. Defining a function on world states just seems unworkable to me.

An alternative to specifying a utility function is to state a purpose in English (or any natural language). This occurs in mission statements for nonprofit institutions or in a donor’s specification of the purpose of a donated fund. Asimov’s laws are written in English but specify constraints rather than objectives. My favorite mission statement is what I call the servant mission.

 Servant Mission: Within the law, fulfill the requests of David McAllester.

Under the servant mission the agent is obligated to obey both the law its master (me in the above statement). The agent can be controlled by society simply by passing new laws and by its master when the master makes requests. The servant mission transfers moral responsibility from the servant to its master. It also allows a very large number of distinct AI agents — perhaps one for each human — each with a different master and hence a different mission. The hope would be for a balance of power with no single AI (no single master) in control. The servant mission seems clearer and more easily interpreted than other proposals such as Asimov’s laws. This makes the mission less open to unintended consequences. Of course the agent must be able to interpret requests — more on this below. The servant mission also preserves human free will which does not seem guaranteed in other approaches, such as Yudkowsky’s Coherent Extrapolated Volition (CEV) model, which seem to allow for a “friendly” dictator making all decisions for us.  I believe that humans (certainly myself) will want to preserve their free will in any post-singularity society.

It is important to emphasize that no agent has a rational basis for altering its purpose. There is no rational basis for an agent with the servant mission to decide not to be a servant (not to follow its mission).

Of course natural language mission statements rely on the semantics of English. Even if the relationship between language and reality is mysterious, we can still judge in many (most?) cases when natural language statements are true. We have a useful conception of “lie” — the making of a false statement. So truth, while mysterious, does exist to some extent. An AI with an English mission, such as the servant mission, should have a first unstated mission of  understanding the intent of the author of the mission. Understanding the actual intent of the mission statement should be the first priority (the first mission) of the agent and should be within the capacity of any super-intelligent AI. For example, the AI should understand that “fulfilling requests” means that a later request can override an earlier request. A deep command of English should allow a faithful and authentic execution of the servant mission.

I personally believe that it is likely that within a decade agents will be capable of compelling conversation about the everyday events that are the topics of non-technical dinner conversations. I think this will happen long before machines can program themselves leading to an intelligence explosion. The early stages of artificial general intelligence (AGI) will be safe. However, the early stages of AGI will provide an excellent test bed for the servant mission or other approaches to friendly AI. An experimental approach has also been promoted by Ben Goertzel in a nice blog post on friendly AI. If there is a coming era of safe (not too intelligent) AGI then we will have time to think further about later more dangerous eras.


This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to Friendly AI and the Servant Mission

  1. Hey David!
    Thanks for updating the old blog with the new info. I didn’t realize you had moved. Last year I had written a piece covering the FAI topic as well — also linking to Goertzel’s disagreement.

    Whether you want a “friendly” or “servant” AI, one important rebut that might supersede both is Hugo De Garis’ concern that “even if a mathematical proof exists for a self-modifying source code to maintain friendliness, an AI will have natural mutations in hardware circuitry due to routine environmental factors like cosmic rays, which pose a problem because ‘its mutated goals may conflict with human interest.'” (here’s my full piece: )

    • McAllester says:

      I like the phrase “Servant AI” and your earlier blog post.

      I am not too concerned about mutations. Although biology never became mutation-proof, it is not difficult to incorporate redundancy (a RAID file system being one example). At modest cost the expected time to mutation can be made to be tera-years even with unreliable hardware.

      • Thank you!

        What would redundancy imply in this context? If we’re talking about the goals themselves undergoing mutation via externality, how do we guarantee error correction? Maybe this is an easy answer for you, but I honestly don’t know.

        How are things going over at TTI?

  2. I like your thoughts on this topic. I can think of a slight flaw, however – suppose person A tells their AI to use legal means to harm person B, and the AI, being highly effective in many domains proceeds to do so (eg. frivolous lawsuits etc etc….). People aren’t always benevolent, and the law will probably never totally protect people against eachother, so maybe the AI would need a few extra failsafes in addition to this sort of thing.

  3. Bob Givan says:

    A few issues that seem to me to arise with the servant mission:

    1. How do we prevent the AI from succeeding in this mission in part or in entirety by manipulating us into asking easily fulfillable requests? Such manipulation may not be benign, either. A goal of fulfilling all my requests can be trivially met by eliminating me.

    2. How do we ensure against unintended implications of requests? The world abounds with jokes based on such requests being made to genies. Even if we allow later requests to cancel earlier ones, it may be way too late. And the deep understanding of English (and common sense) can go a long way, but aren’t there still far too many doors open for misunderstanding? [Is it possible that you wanted all insects exterminated?] At some point, assuming the AI will correctly resolve all our intentions goes beyond common sense, starts to make it unnecessary for the AI to even receive mission requests from us, and makes the mission more and more resemble the CEV benevolent dictator, where it must know enough about us to know what we want.

    • McAllester says:

      Many of the concerns about AI center on unnatural interpretations of goal statements. Most people would not interpret “fulfill requests” as implying “manipulate requests”. One issue here is whether a superintelligence can distinguish between natural and unnatural interpretations. By “natural” I mean what people in a given linguistic community would expect a person to do in response to a given request or mission. It seems to me that we can expect a superintelligence to be able to accurately judge the degree to which an action is a natural fulfillment of a given request as judged by a given linguistic community. Many concerns may be overblown provided that we can arrange that machines follow natural interpretations.

      Unnatural interpretations crop up when we treat a statement as logic with strict truth conditions. We should avoid thinking like logicians. Most language is very different from Boolean logic.

      Bob says that “assuming AI will correctly resolve all our intentions goes beyond common sense”. Superintelligence already goes beyond common sense … He also states “this assumption makes it unnecessary for the AI to even receive mission requests from us”. I have two responses. First, people are different and we each individually need some way of telling our servant about our particular life goals. Second, it is very important to us, as human beings, that we preserve our free will. By definition this implies that we must remain the decision makers even when the machines are smarter than us.

  4. Pingback: Response to Cegłowski on Superintelligence - Machine Intelligence Research Institute

  5. Pingback: Why AI Safety? – Machine Intelligence Research Institute

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s