## This was originally posted on Wednesday August 7, 2013

The basic idea in vector semantics is that the “meaning” of a word (or phrase or sentence) can be expressed as a vector. For a long time I thought that this was just silly. However, I have recently become more sympathetic to the work done in this area. It seems to me, however, that what is really being done is “vector categories” rather than “vector semantics”.

Before going any further I will look up a sentence from news.google.com. The first sentence of the lead story at the moment of this writing is:

*With his decision to cancel next month’s planned summit with Russia’s President Vladimir Putin, President Obama took a principled stance that has lawmakers on both sides of the aisle cheering him for finally standing up to an international bully.*

I believe that reference is fundamental to semantics. I think reference should be viewed as a mapping from a mention (a phrase in a sentence) to an entity in a database model of reality. In the above sentence the phrases “Obama”, “Putin”, “next month’s summit”, “lawmakers”and “his decision” all refer to entities that the author seems to expect readers are already familiar with. The phrases “took a stance” and “stood up to” seem to be co-referential with “his decision” and, at the same time, assert properties of, or descriptions of, the entity (the action). The reference interpretation of semantics is emphasized in a recent post on Google’s research blog.

Vector semantics seems to be in direct conflict with modeling reality by a database and taking reference — the mapping from mentions to database entities — as core to semantics. However, if we interpret the vectors as vector categories rather than as a denotational meaning (a referent) then these vectors seem quite useful.

My first exposure to vector semantics was in the context of latent semantic indexing (LSI) which assigns a vector to each word by doing singular value decomposition on a word-document matrix. The vector assigned to a word in this way can perhaps be viewed as a representation of a distribution over topics. Indeed we might expect “was not cancerous” and “was cancerous” to have the same topic distribution — they both will be common in certain medical diagnosis documents — even though they have opposite meanings.

A more interesting interpretation of vector semantics (in my opinion) was pointed out to me by

Michael Collins. He has been doing work on using spectral methods for learning of latent syntactic subcategories. Here we assume that each standard syntactic category, such as NP, has some latent set of subcategories — perhaps “person” “place” or “thing”. We can then assign a vector to a phrase where the vector can be interpreted as assigning a score for each possible latent subcategory.

Vector representations for the purpose of scoring “grammaticality” naturally yield factored representations analogous to formal models of English Grammar based on feature structures.

The idea is that, for the purpose of determining grammaticality, the syntactic category of a phrase has features such as gender, person, and number. Feature structures can be viewed as a simple kind of vector with discrete values along each coordinate. This interpretation of vector semantics is especially suggested by the observation that certain directions in the vector space can be interpreted as gender as in the equation Phi(queen) = Phi(king) – Phi(man) + Phi(woman) where Phi(w) is the vector associated with word w [paper].

Perhaps even more interestingly, we can abandon any direct interpretation of the vectors associated with phrases other than that they be useful in parsing and then discriminatively train parsers which compute latent vectors at each node. The vectors can be even be computed by deep neural networks trained for this purpose [paper]. While the precise meaning of the vectors remains obscure, the fact that they are trained to produce accurate parsing would seem to imply that they encode information relevant to the selectivity of words for arguments — what things do people “give” or “eat”. Many parsers use bilexical statistics — actual verb-object pair statistics — but abstracting the words to vectors should allow greater generalization to unseen pairs.

It has also been proposed that vector semantics can usefully encode relations analogous to the relations of a database [paper]. Although vectors, like databases and bit strings, are information-theoretically universal (a single real number can hold an infinite number of bits), I do not believe that vectors should replace databases as a model of the facts of reality. However, it does seem possible that vectors have an important role to play in uncertain inference such as that modeled by Markov logic Networks. A Markov logic network is essentially a weighted SAT formula where the sum of the weight of violated clauses is interpreted as an energy (cost) for any given truth assignment to the Boolean variables. Each Boolean variable typically has some internal structure, such as R(b,c) where R is a relation and b and c are entities. We can assign entities vector categories and assign R a matrix for combining the categories of its arguments. We can then compute an energy R(b,c) where this energy intuitively represents a negative log prior probability of R(b,c). In this way vector categories could play an important role in some variant of Markov logic and could even be trained so as to produce correct inferences.

The bottom line is that the term “vector categories” seems more appropriate than”vector semantics”. Semantics remains, for me, the relationship between language and a database model of reality.