For a couple years now Michael Douglas and I have been contemplating a meaning representation — simple sentence graphs — based on breaking sentences down into simple sentence components. Over the last week or so we have been playing with chain-of-thought prompting to get GPT-3 to do semantic parsing to generate simple sentence graphs. An example with just a single verb is

We are using a chain-of-thought prompt consisting of seven examples of simple sentence parses. The training parses do not contain the word “jealous”. We follow the training parses with the input sentence and a request for a simple sentence parse for the phrase headed by “being jealous”. GPT-3 then generates a semantic parse consisting of a generic “lexical entry” together with an “episodic instantiation”. The lexical entry consists of a simple sentence specifying roles as parenthesized generic definite noun phrases. The lexical entry also contains other sentences that are likely to be true of instances of this lexical entry. The episodic instantiation specifies the particular entities and events filling the roles of the semantic lexical entry.
One can also ask GPT-3 to decide for itself what lexical entries to retrieve. For example,
when given
INPUT: Russian lawmakers Monday ratified the illegal annexation of four partially occupied Ukrainian regions.
GPT-3 generates
Retrieving lexical entries for (ratifying), (annexing), and (occupying).
(ratifying) is when (the ratifier) ratifies (the treaty);
(the treaty) is a legal document that has been signed by (the ratifier) and (the other party);
(the treaty) creates an agreement between (the ratifier) and (the other party).
(annexing) is when (the annexer) annexes (the territory);
(the territory) is under the control of (the annexer).
(occupying) is when (the occupier) occupies (the territory);
(the territory) is under the control of (the occupier).
Episodic Instantiation:
(the ratifying):
(the ratifier) is Russian lawmakers
(the treaty) is the annexation of four partially occupied Ukrainian regions
(the other party) is unknown
(the annexing):
(the annexer) is Russian lawmakers
(the territory) is four partially occupied Ukrainian regions
(the occupying):
(the occupier) is Russian lawmakers
(the territory) is four partially occupied Ukrainian regions
when given
INPUT: Rex Tillerson testified that he never asked Tom Barrack to conduct diplomacy.
GPT-3 generates
Retrieving lexical entries for (testifying), (asking to do), (conducting diplomacy).
(testifying) is when (the witness) testifies (the information) to (the court); before (the testimony), (the witness) should know (the information); before (the testimony), (the witness) is under oath; (the testimony) causes (the court) to know (the information).
(asking to do) is when (the asker) asks (the hearer) to do (the action); before (the asking), (the asker) wants (the hearer) to do (the action); (the asking) causes (the hearer) to try to do (the action).
(conducting diplomacy) is when (the diplomat) conducts diplomacy with (the other party); before (the diplomacy), (the diplomat) wants (the diplomacy) to happen; during (the diplomacy), (the diplomat) tries to resolve (the issue) with (the other party).
Episodic Instantiation:
(the testimony): (the witness) is Rex Tillerson; (the information) is that he never asked Tom Barrack to conduct diplomacy; (the court) is unknown
(the asking): (the asker) is Rex Tillerson; (the hearer) is Tom Barrack; (the action) is to conduct diplomacy.
(the diplomacy): (the diplomat) is Tom Barrack (the other party) is unknown; (the issue) is unknown
A simple sentence parse can be viewed as a graph structure (a simple sentence graph) where episodic instantiations connect (instantiate) semantic lexical entries with particular entities and events. A given entity can occur in various different instantiations.
An advantage of simple sentence parsing (SSP) is that it is largely devoid of any formal linguistic theory. At the same time SSP seems to provide compelling semantic parses. At the end of this post we give the seven training parses appearing in the chain of thought text prepended to the above inputs.
I will consider three questions raised by SSP. First, do these semantic parses contribute the current debate between Yann LeCun and Gary Markus on Nativisim vs. Empiricism for symbolic processing? Second, does the fact that GPT-3 can do this suggest that GPT-3 is already doing some form of semantic parsing? Does SSP with GPT-3 in some way open GPT-3’s black box? Finally, does SSP suggest new memory-based transformer architectures?
Nativism vs. Empiricism. The issue here is whether we have to “bake in” some mechanism explicitly designed to handle symbolic reasoning. Does progress in AI require, as Marcus believes, something like the innate language faculty assumed by many linguists? Or, as LeCun believes, can symbolic reasoning emerge from deep learning alone without any prior bias? GPT-3’s ability to do semantic parsing would seem to support LeCun’s position. However GPT-3’s ability to do SSP, while good, is not perfect. Furthermore, GPT-3 training seems extremely data inefficient. Baking in symbolic reasoning might greatly reduce the data requirements. So I think the question remains open. My intuition is to agree with Marcus. However, I expect that the “language faculty” will turn out to be a deep learning architecture inspired by symbolic reasoning — a neuro-symbolic compromise between nativism and empiricism.
Does SSP open the back box of GPT-3? It is very tempting to view the above examples of semantic parses as a kind of demonstration of GPT-3’s understanding of words and how that understanding is applied to sentences. But this could easily be an illusion — an overly anthropomorphic view of GPT-3. On the other hand it seems intuitively plausible that factoring knowledge into semantic knowledge and episodic instantiations would reduce the number of parameters needed to store the knowledge. The semantic knowledge can be shared across many episodic instances. It would be somewhat surprising if GPT-3 was not doing some form of semantic-episodic factoring. Lexical semantics could be nothing more than word (or token) embeddings. GPT-3’s token embeddings have over 12 thousand dimensions — enough to store fairly elaborate lexical information.
Might SSP suggest new deep learning architectures? This is of course a difficult question. One idea is to try to design memory architectures around the separation of “semantic” and “episodic” memory. A semantic memory would hold lexical entries and facilitate few-shot learning of new words. The episodic memory would hold episodic instantiations of lexical entries. In both cases the thing remembered is the output of language processing rather than raw input text as in existing memory architectures. My intuition is that the set of memories should be a discrete set (symbols?) but that the contents of the memories should be combinations of vectors and pointers to other memories (defining a graph structure).
I will close with two simple examples showing word sense disambiguation.
When Given
INPUT: Mary painted the door red.
Retrieving lexical entry for (painting).
GPT-3 generates
(painting) is when (the painter) paints (the painting surface) with (the paint); before (the painting), (the painting surface) is a different color than (the paint); after (the painting), (the painting surface) is the same color as (the paint); during (the painting), (the painter) is applying (the paint) to (the painting surface).
But when given
Mary painted the flowers
Retrieving lexical entry for (painting).
GPT-3 generates:
(painting) is when (the painter) paints (the painting); before (the painting), (the painter) wants to paint (the painting); (the painting) is a picture of (the thing painted); after (the painting), (the painting) is finished.
The training parses prepended to these examples are given below. Davinci-002 is used with temperature 0, top P 0, frequency penalty 0 and presence penalty 0, no stop token, no start text and no restart text. The training examples can undoubtedly be improved.
ENJOY!
======== Training Parses ========
INPUT: Fred ate scrambles eggs.
Retrieving lexical entry for (eating) and (an egg)
(eating) is when (the eater) eats (the food);
before (the eating), (the eater) is hungry;
after (the eating), (the eater) is not hungry;
after (the eating), (the food) is inside (the eater);
after (the eating), (the food) is gone.
(an egg) is a means of reproduction for (the animal);
(the egg) is laid by (the animal);
(the egg) hatches to produce a new (the animal).
Episodic Instantiation:
(the eating):
(the eater) is Fred
(the food) is scrambled eggs
(the egg):
(the animal) is a chicken
INPUT: George Floyd was murdered by the police.
Rretrieving lexical entry for (murdering) and (police).
(murdering) is when (the victim) is murdered by (the murderer);
before (the murder), (the victim) is alive;
(the murder) causes (the victim) to be dead;
before (the murder), (the victim) does not want (the murder) to happen;
(the murder) is a crime.
(police) are people with the authority to enforce the law.
Episodic Instantiation:
(the murder):
(the victim) is George Floyd
(the murderer) is the police
(the police) are unknown
INPUT: Putin told the general to attack.
Retrieving lexical entries for (telling to do) and (a general).
(telling to do) is when (the teller) tells (the hearer) to do (the action);
(the teller) has authority over (the hearer);
after (the telling to do), (the hearer) should do (the action).
(a general) is a person who controls an army.
Episodic Instantiation:
(the telling to do):
(the teller) is Putin
(the hearer) is the general
(the action) is to attack
(the general) is unkown.
INPUT: Putin told the general to attack.
Retrieving lexical entry for (attacking).
(attacking) is when (the attacker) attacks (the attacked);
before (the attack), (the attacked) does not want (the attack) to happen;
during (the attack), (the attacker) tries to damage (the attacked);
after (the attack), (the attacked) is angy at (the attacker).
Episodic Instantiation:
(the attack):
(the attacker) is the general
(the one attacked) is unknown
INPUT: Mary told John about the party.
Retrieving lexical entries for (telling about) and (a party).
(a telling about) is when (the teller) tells (the hearer) about (the thing);
before (the telling about), (the teller) wants (the hearer) to know about (the thing);
(the telling about) causes (the hearer) to know about (the thing).
(a party) is when (the partiers) get together at (the place) and (the time) to have fun;
during (the party), (the partiers) have fun.
Episodic Instantiation:
(the telling about):
(the teller) is Mary
(the hearer) is John
(the thing) is the party
(the party):
(the parters) are unknown
(the place) is unknown
(the time) is unknown
INPUT: The principal said that the teacher had resigned when they were actually on vacation.
Retrieving lexical entries for (saying), (resigning), and (vacationing).
(saying) is when (the sayer) says (the statement) to (the hearer);
before (the saying), (the sayer) wants (the hearer) to hear (the statement);
(the saying) causes (the hearer) to hear (the statement);
after (the saying) (the hearer) has heard (the statement).
(resigning) is when (the resigner) resigns from (the job);
before (the resignation), (the resigner) has (the job);
before (the resignation), (the resigner) does not want (the job);
(the resignation) causes (the resigner) to not have (the job).
(vacationing) is when (the vacationer) is on vacation from (the job) and in (the location);
before (the vacation), (the vacationer) wants (the vacation);
before (the vacation), (the vacationer) is doing (the job);
during (the vacation), (the vacationer) is not doing (the job);
during (the vacation), (the vacationer) is at (the location);
after (the vacation), (the vacationer) is again doing (the job).
Episodic Instantiation:
(the saying):
(the sayer) is the principal
(the statement) is (the teacher had resigned)
(the statement) is false
(the hearer) is unknown
(the resigning):
(the resigner) is the teacher
(the job) is teaching
(the vacationing):
(the vacationer) is the teacher
(the job) is same as (the job of (the resignation))
(the location) is unknown
INPUT: Herschel Walker pledges to sue over the report that he paid for an abortion.
Retrieving lexical entries for (pledging), (suing), (reporting), (paying) and (an abortion).
(pledging) is when (the pledger) pledges to do (the action);
(the pledge) is a public promise by (the pledger) to do (the action).
(suing) is when (the plaintiff) sues (the defendant) over (the issue) and is decided by (the judge);
(the plaintiff) wins (the lawsuit) if (the judge) rules in favor of (the plaintiff);
(the defendant) wins (the lawsuit) if (the judge) rules in favor of (the defendant).
(reporting) is when (the reporter) reports (the information) to (the recipient);
before (the report), (the recipient) expects (the report);
before (the report), (the reporter) should make (the report);
(the report) causes the (the recipient) to know (the information).
(paying) is when (the payer) pays (the amount) to (the payee) for (the thing);
before (the payment), the payer has (the amount);
before (the payment), (the recipient) wants (the payment);
(the payment) causes (the recipient) to have (the amount).
(an abortion) is when (the woman) has (the abortion);
before (the abortion), (the woman) is pregnant;
before (the abortion), (the woman) wants to not be pregnant;
(the abortion) causes (the woman) to not be pregnant.
Episodic Instantiation:
(the pledge):
(the pledger) is Herschel Walker
(the action) is to sue
(the suit):
(the plaintiff) is Herschel Walker
(the defendant) is unknown
(the issue) is (the report)
(the judge) is unknown
(the report):
(the reporter) the same as (the defendant of (the report))
(the information) is that Herschel Walker paid for an abortion
(the payment):
(the payer) is Herschel Walker
(the payee) is unknown
(the amount) is unknown
(the thing) is (the abortion)
(the abortion):
(the pregnant woman) is unknown
(the procedure) is unknown
(the pregnancy) is unknown
Nice! Does this bear on the question in your previous post, i.e., is multi-modal data necessary and/or useful for training a model that “understands” language? You could try spatial understanding sentences of the “the pen is in the box” vs “the box is in the pen” ilk. (There are better examples but I’ve forgotten them; let me know if you’d like me to look them up).
When I first read this post I mistakenly thought that the “eat scrambled eggs” example was GPT3 output, not part of the prompt you wrote. Anyway, this is a nice “naive physics” example. If your GPT3 SSG parser can reliably demonstrate understanding of both spatial organisation and naive physics, then I’d be more inclined to believe your claim that multi-modal models are not necessary for understanding.
Marcus would probably like you to try some of the famous reversible passive examples “my dog bit the postman” vs “the postman bit my dog”.
But what exactly is the claim that you’re making here? Pre-deep learning I thought that syntactic parsing was an essential intermediate step in language understanding. A syntactic parse makes all the important linguistic relationships local, which is necessary if our models can only work with extremely local relationships involving at most pairs or triples of elements. But deep learning models demonstrate that this explicit locality is not necessary for even very sophisticated language processing (e.g., machine translation), and so it’s probably not essential for language understanding either.
Of course I’m very impressed with chain-of-thought prompting. This does suggest that a certain kind of locality is important for reasoning, but it’s not the kind of locality that a syntactic parse encodes.
My guess is that episodic memory is a necessary element in intelligence and (dare I say) consciousness that all our current models are missing. In fact, I suspect that consciousness requires episodic memory; our sense of self is generated by how we remember our own past thoughts.
Mark, great comments. I will experiment with physical language.
Your SSPs seem to be an interesting way of unlocking the linguistic information present in GPT-3 type models. This might be an alternative to probing.
Emily Bender’s books survey a range of linguistic phenomena, if you want ideas for what to test your models on.
https://www.morganclaypool.com/doi/abs/10.2200/S00493ED1V01Y201303HLT020
https://link.springer.com/book/10.1007/978-3-031-02172-5
Another nice thing about your SSP is that it’s a prompt-based technique for extracting linguistic knowledge from GPT3. It would be interesting to compare this with linear probing, which I’ve always found a bit problematic (after all, a wide single layer network can compute very complex functions).
Emily Bender has a couple of books surveying linguistic phenomena; it might make an interesting student project to compare the linguistic knowledge you can extract with prompt-based techniques like SSP versus using linear probing on the same networks.
https://www.morganclaypool.com/doi/abs/10.2200/S00493ED1V01Y201303HLT020
Click to access 9781681730745_sample.pdf