Friendly AI and the Servant Mission

Most computer science academics dismiss any talk of real success in artificial intelligence. I think that a more rational position is that no one can really predict when human level AI will be achieved. John McCarthy once told me that when people ask him when human level AI will be achieved he says between five and five hundred years from now. McCarthy was a smart man.

Given the uncertainties surrounding AI, it seems prudent to consider the issue of friendly AI. I think that the departure point for any discussion of friendly AI should be the concept of rationality. In the classical formulation, a rational agent acts so as to maximize expected utility. The important word here is “utility”. In the reinforcement literature this gets mapped to the word “reward” — an agent is taken to act so as to maximize expected future reward. In game theory “utility” is often mapped to “payout” — a best response (strategy) is one that maximizes expected payout holding the policies of other players fixed.

The basic idea of friendly AI is that we can design the AI to want to be nice to us (friendly). We will give the AI a purpose or mission — a meaning of life — in correspondence with our purpose in building the machine.

The conceptual framework of rationality is central here. When presented with choices an agent is assumed to do its best in maximizing its subjective utility. In the case of an agent designed to serve a purpose, rational behavior should aim to fulfill that purpose. The critical point is that there is no rational basis for altering one’s purpose. Adopting a particular strategy or goal in the pursuit of a purpose is simply to make a particular choice, such as a career choice. Also, choosing to profess a purpose different from one’s actual purpose is again making a choice in the service of the actual purpose. Choosing to actually change one’s purpose is fundamentally irrational. So an AI with an appropriately designed purpose should be safe in the sense that the purpose will not change.

But how do we specify or “build-in” a life-purpose for an AI and what should that purpose be? First I want to argue that a direct application of the formal frameworks of rationality, reinforcement learning and game theory is problematic and even dangerous in the context of the singularity. More specifically, consider specifying a “utility”, “reward signal” or “payout” as a function of “world state”. The problem here is in formulating any conception of world state. I think that for the properties we care about, such as respect for human values, it would be a huge mistake to try to give a physical formulation of world states. But any non-physical conception of world state, including things like who is married to whom and who insulted whom, is bound to be controversial, incomplete, and problematic. This is especially true if we think about defining an appropriate utility function for an AI. Defining a function on world states just seems unworkable to me.

An alternative to specifying a utility function is to state a purpose in English (or any natural language). This occurs in mission statements for nonprofit institutions or in a donor’s specification of the purpose of a donated fund. Asimov’s laws are written in English but specify constraints rather than objectives. My favorite mission statement is what I call the servant mission.

 Servant Mission: Within the law, fulfill the requests of David McAllester.

Under the servant mission the agent is obligated to obey both the law its master (me in the above statement). The agent can be controlled by society simply by passing new laws and by its master when the master makes requests. The servant mission transfers moral responsibility from the servant to its master. It also allows a very large number of distinct AI agents — perhaps one for each human — each with a different master and hence a different mission. The hope would be for a balance of power with no single AI (no single master) in control. The servant mission seems clearer and more easily interpreted than other proposals such as Asimov’s laws. This makes the mission less open to unintended consequences. Of course the agent must be able to interpret requests — more on this below. The servant mission also preserves human free will which does not seem guaranteed in other approaches, such as Yudkowsky’s Coherent Extrapolated Volition (CEV) model, which seem to allow for a “friendly” dictator making all decisions for us.  I believe that humans (certainly myself) will want to preserve their free will in any post-singularity society.

It is important to emphasize that no agent has a rational basis for altering its purpose. There is no rational basis for an agent with the servant mission to decide not to be a servant (not to follow its mission).

Of course natural language mission statements rely on the semantics of English. Even if the relationship between language and reality is mysterious, we can still judge in many (most?) cases when natural language statements are true. We have a useful conception of “lie” — the making of a false statement. So truth, while mysterious, does exist to some extent. An AI with an English mission, such as the servant mission, should have a first unstated mission of  understanding the intent of the author of the mission. Understanding the actual intent of the mission statement should be the first priority (the first mission) of the agent and should be within the capacity of any super-intelligent AI. For example, the AI should understand that “fulfilling requests” means that a later request can override an earlier request. A deep command of English should allow a faithful and authentic execution of the servant mission.

I personally believe that it is likely that within a decade agents will be capable of compelling conversation about the everyday events that are the topics of non-technical dinner conversations. I think this will happen long before machines can program themselves leading to an intelligence explosion. The early stages of artificial general intelligence (AGI) will be safe. However, the early stages of AGI will provide an excellent test bed for the servant mission or other approaches to friendly AI. An experimental approach has also been promoted by Ben Goertzel in a nice blog post on friendly AI. If there is a coming era of safe (not too intelligent) AGI then we will have time to think further about later more dangerous eras.

 

Posted in Uncategorized | 7 Comments

AI and Free Will: When do choices exist?

This post was originally made on July 15, 2013.

This is a philosophical post instigated by Scott Aaronson’s recent paper and blog post regarding free will.  A lot hinges, I think, on how one phrases the question.  I like the question “When do choices exist?” as opposed to “Do people have free will?”.  I will take two passes at this question.  The first is a discussion of game theory.  The second is a discussion of coloquial language regarding choice.  My conclusion is that choices exist even when the decision making process is deterministic.

Game Theory. Game theory postulates the existence of choices.  A bimatrix game is defined by two matrices each of which is indexed by two choices — a choice for player A and a choice for player B.  Given a choice for each player the first matrix specifies a payout for player A and the second matrix specifies a payout for player B.  Here choices exist by definition.

We write programs that play games.  A computer chess program has choices — playing chess involves selecting moves.  Furthermore, it seems completely appropriate to describe the computation taking place in a min-max search as “considering” the choices and “selecting” a choice with desirable outcomes.  Note that most chess programs use only deterministic computation.  Here the choices exist by virtue of the rules of chess.

It seems perfectly consistent to me to assume that my own consideration of choices, like the considerations of a chess program, are based on deterministic computation.  Even if I am determined and predictable, the world presents me with choices and I must still choose.  Furthermore, I would argue that, even if I am determined, the choices still exist — for a given chess position there is actually a set of legal moves.  The choices are real.

Coloquial Language.  Consider a sentence of the form “she had a choice”.  Under what conditions do we colloquially take such a sentence to be true?  For example, we might say she had a choice between attending Princeton or attending Harvard.  The typical condition under which this is true is when she was accepted to both.  The fact that she was accepted to both says nothing about determinism vs. nondeterminism.  It does, however, imply colloquially that the choice exists.

The issues of the semantics of natural language are difficult.  I plan various blog posts on semantics. The central semantic phenomenon, in my opinion, is paraphrase and entailment — what are the different ways of saying the same or similar things and what conclusions can we draw from given statements.  I believe that a careful investigation of paraphrase and entailment for statements of the form “she had a choice” would show that the existence of choices is taken to be a property of the world, and perhaps the abilities of the agent to perform certain actions, but not a property of the fundamental nature of the computation that makes the selection.

Summary. It seems to me that “free will” cannot be subjectively distinguished from having choices.  And we do have choices — like the chess program we must still choose, even if we are determined and predictable.

Posted in Uncategorized | 4 Comments

Metaphor and Mathematics

When I was in graduate school there was considerable interest in a then-new book, Metaphors We Live By, Lakoff and Johnson (1980). More recently I encountered Where Mathematics Comes From, Lakoff and Nunez (2001).  Like Lakoff and Nunez, I believe that metaphor is central to both language and thought. I believe that mentalese is close to surface language and surface language is highly metaphorical. I have also long believed that the mathematics itself is a manifestation of an innate human cognitive architecture. I have recently posted a manuscript giving a new type-theoretic foundation of mathematics that is, I believe, a better formulation of a cognitive architecture than, say, Zermelo-Fraenkel set theory.

But how is it possible that thought is highly metaphorical and, at the same time, based on a cognitive architecture reflected in the precise logical structure of mathematics? The answer is, I believe, that the structure of metaphor is essentially the same as the general structure of mathematical thought. The central idea in both, I would argue, is roles and assertions.

Roles and Assertions

The concept of frame or script, in the sense of Fillmore, Minsky or Schank, seems central to the lexicon of natural language. Frames and scripts have roles. Words such as “buy” and “sell” invoke semantic roles such as the buyer, the seller, the thing being sold and the money transferred. The semantic roles associated with English verbs have been catalogued in Martha Palmer’s verbnet project and semantic role labeling has become one of the central tasks of computational linguistics.

In addition to roles, frames and scripts invoke assertions. For example, we have

before x sold y to z, x owned y

after x sold y to z, z owned y

But what is the relationship between the roles and assertions of lexical frames and metaphor? As an example, the lead story on news.google.com as of this writing starts with

The Israeli military continued its substantial military attacks around Rafah in southern Gaza on Sunday.

 Consider the word “continue”. This is a very abstract word. It seems to have a role for an agent (Israel) and the activity that is continued (the attacks). But the assertions associated with this word make minimal assumptions about the agent and the activity. Some assertions might be

If during s, x continued to do y then before s, x was doing y

If during s, x continued to do y then during s, x was doing y

If we consider any two particular continuation events the relationship between them seems metaphorical. For example, consider “he continued to travel west” and “the pump continued to fail”. What these two continuations have in common is the above general assertions about contiuation. But we can think of this as forming a metaphor between traveling and the “activity” of failing. The real phenomenon, however, is that in both cases the assertions of the continuation frame apply. The assertions are polymorphic — they can be applied to instances of different types such as traveling and failing.

Object Oriented Programming

 Roles and assertions also arise in object oriented programming. In a graphical user interface one writes code for displaying objects without making any commitment to what those objects are. A displayable object has to support certain methods (roles) that satisfy certain conditions (assertions) assumed by the GUI program. The same code can display avatars and pi-charts by treating both an appropriate abstract level. The GUI code (metaphorically?) treats an avatar as an image.

Mathematics

Roles and assertions arise in almost all mathematical concepts.   A tree is a directed graph with nodes and edges (the roles) satisfying the tree conditions. The nodes can be anything — the concept of “tree” is polymorphic. The concept of tree is very general and can be used in many different situations. All we have to do is to find some way to view a given object as a tree. For example, a chess game can be (metaphorically?) viewed as a tree. My recently posted type-theoretic foundation of mathematics is organized around the notion of structure types (frames) each of which involves roles and assertions.

Summary

In this post I wanted to simply introduce the idea that metaphor and mathematics are related through the common structure of roles and assertions. There are many issues not discussed here such as the distinction between logical (hard) and statistical (soft) assertions and the role of embodiment (grounding). But I will leave these for later posts.

Posted in Uncategorized | Leave a comment

Chomsky vs. Hinton

This was originally posted on Saturday, September 28, 2013

 This is a continuation of my “free lunch” post.  I want to provide more context.  I also feel bad about phrasing the discussion in the context of a particular textbook.  The no free lunch theorem is widely quoted and seems to be a standard fixture of the machine learning community.

I would like to rephrase the issue here as a debate between Chomsky and Hinton.  I will start with caricatures of their positions:

Chomsky:  Generalizing to unseen inputs — for example judging grammatically on sentences never seen before — is impossible without some a-priori knowledge of the set of allowed predictors (the set of possible human languages).  Hence we must be born with knowledge of the space of possible human languages — some form of universal grammar must exist.

Hinton:  Neural networks are a Turing-complete model of computation.  Furthermore, there exists some (perhaps yet to be discovered) universal algorithm for learning networks which can account for leaning in arbitrary domains, including language.

This debate centers on an empirical question — what algorithms or knowledge is provided by the human genome?

It is important here to distinguish information-theoretic issues from computational complexity issues.

Information-Theoretic Issues: Chomsky takes the position that on purely information theoretic grounds universal grammar must exist. Chomsky’s argument is essentially an invocation of the no free lunch theorem.  But the free lunch theorem shows, I think, that there exist information-theoretically adequate universal priors on the set of all computable functions.  On information-theoretic grounds I think Hinton’s position is much more defendable than Chomsky’s.

Computational Issues: It is clearly silly to consider enumerating all C++ programs as part of a learning algorithm.  But training deep neural networks is not silly — it has lead to a 30% reduction in word error rate in speech recognition.  Chris Manning is working on deep neural network approaches to learning grammar.

Computational issues do provide good motivations for certain learning algorithms such as the convex optimization used in training an SVM.  But a computational efficiency motivation for a restricted predictor class is different from claiming that for information-theoretic reasons we must be given some restriction to a proper subclass of the computable functions.

There are many Turing complete models of computation and the choice of model does seem to matter. We seem to gain efficiency in programming when we become familiar with the C++ standard library. Somehow the library provides useful general purpose constructs.

We seem to be capable of general purpose thought.  Thought seems related to language. Chomsky might be right that some form of linguistic/cognitive architecture is provided by the human genome. But the adaptive advantage provided by the details of an architecture for general-purpose thought seem likely to be related to computational issues rather than an information-theoretic requirement for a learning bias.

Posted in Uncategorized | 3 Comments

The Free Lunch Theorem

This was originally posted on Saturday, September 21, 2013

Different learning theorists have different approaches to teaching learning theory. Shai Shalev-Shwartz and Shai Ben-David have written a book,  Understanding Machine Learning: From Theory to Algorithms, which gives a significant amount of space to the no free lunch theorem (preliminary notes). Here I will state a “free lunch theorem” and argue that the free lunch theorem is a good (better) departure point for learning theory.

 

I will state the free lunch theorem in terms of C++ code.  One can specify a predictor as a C++ program.  The predictor can be a linear classifier, a polynomial, a decision tree, or whatever.  I will assume access to an arbitrarily large (but finite) standard library so that programs can be written extremely compactly if you know the right library function to call.  The library can include every form of function used as a predictor by machine learning practitioners and be designed in such a way as to allow predictors (of all kinds) to be written as compactly as possible.  The library can also include all of the programming abstractions in use by programmers today so that new programs can be written as compactly as possible.  For any code library L, and program f written using L, I will write | f |_L for the length of the code for f (in bits).  Here we are not counting the length of the code for the library functions, just the number of bits in the name of the library function when the function is called.  The theorem states, in essence, that, as long as the library L is written prior to seeing the training data, any predictor f written in C++ using L will perform well provided that it performs well on the training data and that | f |_L is small compared to the size of the training data.  For example, a polynomial predictor with n numerical parameters and which performs well on the training data will perform well on new data provided that the number of training points is large compared with bn where b is the number of bits used for each numerical parameter.  (Note the lack of any reference to VC dimension.)

To state the free lunch theorem precisely we assume a data distribution D on input-output pairs and a sample S of m such pairs drawn IID from this distribution.  For simplicity we will assume binary classification so that the output is always either 1 or -1. For any predictor f mapping input to outputs we define the generalization error rate of f, denoted err(f,D), to be the fraction of time that f makes a mistake when drawing input-output pairs form the data distribution D.  We write err(f,S) for the fraction of times that f makes a mistake on the sample.  The theorem states that with probability at least 1-delta over the draw of the sample S (of m training pairs) the following holds simultaneously for all C++ programs f that can be written with library L and all values of the parameter lambda.

(1)   err(f,D) <= (1 – 1/(2lambda)) [ err(f,S)  +  [lambda(ln 2)/m] | f |_L  + [lambda/m] ln(1/delta) ]

This is a special case of theorem 1 of the PAC-Bayesian tutorial referenced in my post on a generalization bound for dropouts. For fixed lambda, this bound expresses a linear trade-off between the accuracy on the training data err(f,S) and the simplicity of the rule as measured by | f |_L.  It is a form of Occam’s razor. An important point is that we can use an arbitrarily large code library when writing f. Another important point is that it is meaningless to make lambda very large. The only reason for making lambda large is that this reduces the leading factor in the bound. However, the leading factor rapidly approaches 1 as lambda becomes large.  Hence the weight on | f |_L fundamentally goes as 1/m rather than 1/sqrt{m}. (Some people will argue with this interpretation, but in any case the theorem is true as stated.)

The free lunch theorem says that we do not have to fix any particular form of predictor before we see the training data — we can use an enormous library of essentially all classes of predictors known to learning theory, and in fact all predictors that can be written in C++.  This includes any predictor that could possibly be shipped to a customer.

So what about the no free lunch theorem.  As stated by Shai Shalev-Shwartz in his notes chapter 6, the theorem states, in essence,  that for any learning algorithm there exists a data distribution for which a low error rate prediction rule exists but where the learning algorithm fails to find such a predictor. When applied in the setting of the above free lunch theorem (1), the no free lunch theorem states that there exists a data distribution such that the only good predictors f are such that | f |_L > m.   In this case the true input-output relation has no compact representation.  Structureless functions exist.  We are not going to learn structureless functions.

The pedagogical role of the no free lunch theorem is to justify the introduction of restricted classes of predictors — one assumes-away the possibility that the true input-output relation is structureless.  I will admit that one does often want to consider restricted classes, such as linear predictors over a fixed (but very high dimensional) feature map.  This is an important class of predictors and it is useful to try to understand it.  But the no free lunch theorem is often used to justify restrictions to finite capacity classes.  This leads to bad theory in my opinion.  But that is a subject for another post.

In summary, the free lunch theorem (1) seems like a better departure point for the study of learning theory than the observation that structureless functions exist.

Posted in Uncategorized | Leave a comment

Situations

This was originally posted on Saturday, September 14, 2013

The concept of a “situation” seems central to artificial intelligence and the semantics of natural language. The situation calculus (McCarthy and Hayes, 1969) underlies the notion of STRIPS planning used in the AI planning community.  The notion of “possible world” underlies the Kripke semantics of modal logic which is used in theories of the natural language semantics.  Situations (or possible worlds) play a central role in type-logical semantics.  “Situation Semantics” has been proposed as an alternative to possible world semantics where situations contain partial rather complete information. Situations seem closely related to the events of Davidsonian event semantics.  The notion of situation also seems closely related to the notion of state in reinforcement learning.

Before going any further, the first sentence of the lead story at news.google.com at the moment of this writing is:

With the announcement on Saturday that the U.S. and Russia have reached an agreement on securing Syria’s chemical weapons stockpile, the American threat of U.S. military action was effectively taken off the table.

A search of news.google.com with keywords “situation in Syria” finds the following title of a Sept 10 Washington Post article.

Syria situation further strains Obama’s relationship with the antiwar movement.

But what is “The Syria situation”.  Is the (actual) Syria situation a natural example of the general concept of a situation?  Should the first sentence above be considered to be a statement about, or true of, the Syria situation?  It seems to be true that the Syria situation evolves over time — there is the current Syria situation, a history of the Syria situation, and possible future outcomes of the Syria situation.

This may seem hopelessly complex.  But perhaps the complexity arises from a feeling that we need to define a system of concepts and rules that specify exactly what situations are.  Perhaps formal definitions and rules are not necessary.  Perhaps there are just soft (weighted) lexically-driven syntactic entailment rules where nothing is formally defined.

Quine took up the slogan “to be is to be the value of a variable“.  I would modify this and say “to be is to be the referent of a mention”.  If we think in terms of mention equivalence classes, as in mention-mention models of coreference, then the slogan becomes “to be is to be a mention equivalence class”. But in a database model of reality it seems more natural to work with a mention-entity model and use the term “reference” rather than coreference.  Of course Santa Clause and Unicorns do not exist in reality simply because we mention them.  However, they do exist in certain stories — in the situations described in certain works of fiction.  Does a fictional situation “exist” as a “possible world”?   I think the most intellectually coherent answer is yes — fictional worlds and the entities in them exist in much the same sense that the vector space R^100 exists, they are things we can think about.  I am not too concerned about contradictory entities such as a round square — the simple assumption that all mentions refer to some entity, where some entities may not be part of our physical reality, seems likely to work well for practical NLP systems.

Perhaps what really defines the notion of a situation is the fact that the same proposition — the same proposition entity or fluent — can have different truth values in different situations.  For example, the sentence “David is hungry” (for a fixed referent of “David”) is true in some situations but not in others. At a syntactic level propositions are interpreted relative to situations as in “When he arrived at the party last night, David was hungry”.  Here the phrase “when he arrived at the party last night” seems to be acting as a situation mention and “David was hungry” is being interpreted in the situation that is the referent of the situation mention.  I would extend the slogan “to be is to be the referent of a mention” with a second slogan “to be a situation is to be the referent of a situation mention”.  Somewhat more explicitly, a situation mention is a sentential modifier that intuitively gives the situation in which the sentence is to be interpreted.  For example, “Israel is only a minor player in the Syria situation”.

This is an informal blog post — there are no theorems here.  However, I hope that it provides food for thought.  My over-arching theme is that syntax plus reference (to database entities) plus, perhaps, lexically-driven syntactic paraphrase rules, covers a lot of “semantics”.

I promise that a significant fraction of future blog posts will discuss theorems.

Posted in Uncategorized | Leave a comment

Tarski and Mentalese

This was originally posted on Friday, August 30, 2013

Blogging gives me a chance to unload many years of unpublishable thinking.  One of the things that I have thought long and hard about is semantics in the sense of Tarski’s definition of truth and its relationship to Platonist vs. formalist philosophies of mathematics.  Here I will try to reconcile Platonist and formalist philosophies and relate both to the the notion of semantics for natural language.We are faced with two positions on the nature of mathematics both of which have powerful arguments.

Platonism: Gödel and Tarski each took a Platonic approach to meta-mathematics (mathematical logic). This is perhaps clearest when discussing the natural numbers.  The formulas of natural numbers can be defined by the following grammar.
t ::= 0 \;|\; 1 \;|\; x \;|\; t_1 + t_2 \;|\; t_1 \times t_2
\Phi ::= t_1 = t_2 \;|\; \neg \Phi \;|\; \Phi_1 \vee \Phi_2 \;|\; \Phi_1 \wedge \Phi_2 \;|\; \forall x \;\Phi[x] \;|\; \exists x \;\Phi[x]
If we let \rho denote an assignment of a natural number to each variable then Tarski defined the meaning or value {\cal V}\llbracket t \rrbracket\rho and {\cal V}\llbracket \Phi \rrbracket \rho recursively by equations such as
{\cal V}\llbracket x \rrbracket \rho = \rho(x)
{\cal V} \llbracket t_1 + t_2 \rrbracket \rho = {\cal V} \llbracket t_1 \rrbracket \rho + {\cal V} \llbracket t_2 \rrbracket \rho
and
{\cal V} \llbracket \forall x \;\Phi[x] \rrbracket \rho = T if for all n we have {\cal V} \llbracket \Phi[x] \rrbracket \rho[x := n] = T.
Here we have a clear distinction between syntax and semantics.  The syntax consists of the formal expressions and the semantics is the relationship between the formal expressions and the actual natural numbers.
These definitions of semantic values are Platonic in the sense that we, the mathematicians, are taking the natural numbers to actually exist and are taking English statements such as “for every natural number n we have …” to be meaningful.  Of course Platonism extends beyond natural numbers — working mathematicians adopt a Platonic stance toward all of the objects of mathematics.
Platonism seems essential to the clear-headed thinking of Gödel and Tarski both of whom proved fundamental (true) theorems in mathematical logic such as the statement that no finite axiom system for the numbers can be both sound and complete.  Such theorems cannot even be stated without reference to the actual, Platonic, natural numbers.  But neither can any normal arithmetic statement, such as Fermatt’s last theorem.  Fermatt’s last theorem is, after all, about the actual numbers.
Formalism:  I firmly believe that my brain performs classical (possibly stochastic) computation (I do not believe that my brain exploits quantum computing in any significant way).  Hence, whatever access I have to the actual natural numbers must be explainable by a computational process taking place in my brain.  The best model I know for what such a computation might look like is the manipulation of symbolic expressions.  It seems to be true that the axioms of Zermelo-Fraenkel set theory with the axiom of choice (ZFC) characterize the set of theorems that Platonic mathematicians accept as true.  So at some level the formalist position — that mathematics is ultimately just the manipulation of formal expressions — must be true.
Synthesis: Many years ago I was discussing Tarskian semantics with Marvin Minsky and he remarked that it all seemed silly to him — Tarski is just saying that the symbol string “snow is white” means that snow is white.  Paraphrasing Minsky — Tarskian semantics is just a way of removing the quotation marks.  Once the quotation marks are removed we become Platonists — with the quotation marks removed we are talking in the normal (Platonic) way.
This can be made a little clearer, I think, by considering a mathematician robot who, by construction, thinks by formal inference in a formal rule system.  Now, I argue, one can still imagine both Platonist and formalist Robots.  When thinking about arithmetic the Platonist robot manipulates formulas which are “about” numbers such as
There do not exist natural numbers x,y,z,w with w greater than 2 and x^w+y^w = z^w.
I have written an English sentence, but there is no reason an English sentence can’t be a formula (see my previous post on Wittgenstein’s public language hypothesis).  The formalist robot thinks about manipulating formulas.  The formalist robot manipulates formulas such as
If \Phi is derivable and \Phi \rightarrow \Psi is derivable then \Psi is derivable.
Again this is an English sentence just like the Platonic formula except that this sentence (this formula) is about formulas rather than about numbers.  A statement of Tarskian semantics is simply another formula (in English) defining a value function which maps terms to numbers and formulas to truth values.
Bottom Line: My position is that Platonic reasoning is simply formal reasoning taking place in a native mentalese.   Following Wittgenstein, I think that mentalese is closely related to the surface form of natural language.
When I think about the vector space \mathbb{R}^{100} there is no magical or mystical connection between my brain and an actual 100 dimensional vector space (Roger Penrose seems to think otherwise).  When I think about my mother there is no magical or mystical connection between my brain and my mother (many people think otherwise). Tarskian denotational semantics is a process of erasing quotation marks — of mapping external formal symbol strings into the language of mentalese.  This said, the Platonic treatment of formal logic is essential and central to the understanding of logic and Marvin should not be calling it silly.  (Marvin, if you read this please forgive me for my 30 year old possibly misremembered quotation).
When we talk about semantics from an NLP perspective we should be talking about a translation into some kind of mentalese and we should be striving for an understanding of that mentalese.
Posted in Uncategorized | 1 Comment