Logic
2
11module logic
In this chapter we will develop the formal language of first order logic upon which we will build mathematics. Although the objects of our study will be sets it is first order logic which allows us to make statements about those sets.
First order logic is itself an extension of a simpler system called propositional logic. Propositional logic encompasses all the logical statements that do not involve the quantifiers and . That includes and, or, not, and implications.
Propositional Logic 2.1
Formal Language
When defining propositional logic we will eventually want to be able to express all of and, or, not, and implies. One possibility is to make all four operators part of our formal language. However, it turns out that and, or, and not can all be defined in terms of only implication and a constant falsum, which we will denote as . The advantage of doing things this way is that later we will need fewer axioms because we will have fewer foundational terms we need to define the meaning of. The disadvantage is that to define and and or in terms of implication we will have to assume the law of the excluded middle or equivalently the law of double negation. This means that we will can only do classical logic. However, supporting intuitionist logic is not a goal of this development. So we will take the simpler formal language.
The first piece of formal language syntax we want to define is the implication
operator. Below is the Watson command to define the operator. Watson calls
mathematical statements sentences and so defines the non-terminal that
represents mathematical statements sentence. The syntax
non-terminal ::= replacement for defining productions in a context free
grammar is actually standard and is called Backus–Naur form. The (500)
is a Watson specific extension and allows defining the precedence of different
pieces of syntax. This is not mathematically meaningful but simply helps with
writing notation. For example, we want ¬p → q to mean (¬p) → q and not
¬(p → q).
37syntax implication
38 sentence (250) ::= sentence " → " sentence
39end
The next piece of syntax we will define is the constant symbol falsum. This represents a statement that is always false. The utility of this will become apparent shortly.
45syntax falsum
46 sentence ::= "⊥"
47end
This is the entire formal language of propositional logic. It is remarkably small yet it is capable of expressing all quantifier free logical statements.
Other Logical Operators
Although only implication and falsum are part of our formal language, we still need to be able to discuss and, or, and not. To do this we will use another Watson feature: definitions.
Definitions allow us to define new notation along with a replacement rule to explain what the definition means. Internally, Watson expands definitions into the underlying formal language before checking proofs. So definitions have no effect on proof checking. Instead they are simply a convenience for writing proofs. In theory, we do not need to employ definitions at all.
Let us make our first definition in order to define the negation operator. How can we express negation using only implication and falsum? If a proposition leads to a contradiction, then we can conclude that is false. We can write leading to a contradiction as since represents a false statement, a.k.a. a contradiction. So since we know that means is false, we can make this our definition of .
In order to explain this definition to Watson there are two steps. First we must define the actual syntax of the notation we want to use.
74notation not
75 sentence (400) ::= "¬" sentence
76end
This allows Watson to correctly parse the notation when we later write it. But
we also need to define what the notation means. We can do this with a
replacement rule. The reason these are separated into two different steps is
that often we use the same notation to refer to multiple different things. For
example, later we will use |x| to mean both numeric absolute value and set
cardinality.
85definition
86 ¬p := p → ⊥
87end
The definition syntax is intuitive. You simply place the expression you want
to define on the left, and the definition on the right. Where we wrote sentence
in the notation command we give a variable name p in the definition command.
We use := instead of = because = is a part of our formal language that we
will later want to define. Instead we use := as a metalogical symbol for
definitions. Note that this means it is not a good idea to make := a symbol
in your formal language.
Next we can define or. Again we must think in terms of the operations we have already defined. One property of or that we know is that if is true and is not true, then is true. In fact, we can use this as our definition, defining as .
102notation or
103 sentence (300>) ::= sentence " ∨ " sentence
104end
106definition
107 p ∨ q := ¬p → q
108end
Notice that the definition is not symmetric. When we prove the properties of or we will initially have to have separate proofs for the left and right sides.
We can define and in terms of or using the standard DeMorgan's Law relationship. That is we define to mean . However, before we can make this definition we need one other definition: parentheses! Don't laugh. Parentheses are a part of mathematical notation too and they need a definition just like everything else.
119notation sentence.parentheses
120 sentence ::= "(" sentence ")"
121end
Parenthesis are somewhat unique though in that inserting parentheses shouldn't effect the meaning of the sentence beyond disambiguation. To track this fact, we are going to use another Watson feature: attributes. We can declare a new attribute like so:
128attribute reducible
129 attribute ::= @kw"reducible"
130end
Then when we make our definition we attach the annotation.
134@[reducible]
135definition
136 (p) := p
137end
How, exactly, this reducible annotation will be used will be explained later.
With this we are able to define and.
143notation and
144 sentence (350>) ::= sentence " ∧ " sentence
145end
147definition
148 p ∧ q := ¬(¬p ∨ ¬q)
149end
There is one final definition we would like to make: . The definition is completely standard however there is one thing to comment on. Notice that the definition of includes two copies each of and . If and themselves contain if and only ifs, then the size of the formal language sentence our notation describes would grow exponentially! Luckily Watson is equipped to handle this and stores the formal language sentences in a form where the size is linear with respect to the notation used.
160notation iff
161 sentence (200) ::= sentence " ↔ " sentence
162end
164definition
165 p ↔ q := (p → q) ∧ (q → p)
166end
Axioms
Having defined our notation we are now ready to give the axioms of propositional logic. Just as we amazingly need only two operators to define the complete syntax of propositional logic, we also need just four axioms to completely describe the properties of those operators. When we introduce quantifiers we will add two more axioms which means we need just six axioms to define all of logic!
The first axiom is the most venerable and most intuitive of the four. It is
the axiom of modus ponens. It says that if and are both true then
is also true. This might seem so obvious as to barely require stating but
remember from the introduction that it is only the axioms that give the
sentences in our formal language meaning. So far all we have told Watson is that
we are allowed to write the literal symbol →. We have not yet told Watson what
this symbol means. The axiom of modus ponens is what gives the implication
symbol its meaning.
185axiom modus_ponens [p q : sentence] : (p → q) (p)
186 |- q
187end
Let's take a moment to understand the axiom command. Immediately following the
keyword axiom is the name of the axiom. When we later write proofs we will
invoke the axiom by referring to this name. Next comes the templates of the axiom.
When we say that and together imply we mean that this
deduction holds for any possible and . It is this idea of the statement
being true for any possible and that the syntax [p q : sentence]
tells Watson. After the templates comes a : to indicate the end of the
templates and the beginning of the hypotheses. Finally, we mark the conclusion
of the axiom with the |- symbol. This is a standard symbol from logic called
a tack or turnstile and can be read as "entails." So we get that the
hypotheses entail the conclusion.
In the axiom the implication has been eliminated from the hypothesis to reach the conclusion. As such the axiom of modus ponens is also often called the axiom of implication elimination. Modus ponens allows us to derive a new fact if we already know an implication. But how will we learn an implication in the first place? This will be accomplished with the second axiom of propositional logic: the axiom of implication introduction or, as we will call it, the deduction axiom.
208axiom deduction [p q : sentence] : (assume p |- q)
209 |- p → q
210end
This axiom has a hypothesis which is quite different from the ones we have seen
before. You will notice that assume p |- q is neither part of our formal
language nor a piece of notation we have defined. In fact it is not a
mathematical statement at all, it is a metamathematical statement. When we
write assume p |- q as a hypothesis, what we are telling Watson is that the
conclusion can be reached if knowing that was true would allow us to prove
that was true. Thus assume p |- q is not a statement that has a truth
value within the mathematical system we are defining. It is not even a statement
in that system! This is a question about the behavior of the proof system itself.
Therefore it is metamathematical.
In Watson we call statements of the form assume p |- q facts to distinguish
them from statements in our formal mathematical system which you will recall we
are calling sentences. Watson will record which facts have been proven and
facts can be hypotheses but a fact cannot be a conclusion of an axiom. To do so
would be a category error. An axiom tells Watson about the definition of truth
in the formal language we are defining. However it is Watson itself that decides
which things are provable. To make a fact the conclusion of an axiom would be to
declare that a certain proof exists, which might not be true. Whether such a
proof exists is a concrete fact about the world, not an abstract truth function
we can define specific cases of.
The final two axioms concern the falsum constant. The third axiom is the axiom of double negation. It says that if is true then is also true. It is the addition of this deduction that makes our logic classical. Because we defined and and or in terms of implication and falsum this axiom is required to prove facts about and and or. So almost no intuitionist reasoning will be possible in our system.
241axiom not_not.elim [p : sentence] : (¬¬p)
242 |- p
243end
The fourth and final axiom is the principle of explosion. It states that if we know to be true, anything at all can be proven.
248axiom explosion [p : sentence] : (⊥)
249 |- p
250end
This axiom may seem surprising but it would actually be implied by any system of reasoning about falsity. If we know that and are both true then we can make the following (right now informal) argument that an arbitrary is true. Since is true is true. But is true so since is true it must be that is true. Hence . As we can see the principle of explosion will be true once we have defined all the normal logical deductions anyway. So, since it turns out that the principle of explosion will help us prove those deductions, it is reasonable to add it as an axiom.
Tactics
With our definitions and axioms in hand, we are now ready to begin proving
theorems. Theorem statements in Watson look identical to axioms except, of
course, that they come with a proof. What is the syntax for these proofs?
The syntax is entirely defined by us! Watson has no built-in proof syntax.
Instead we use the tactic command to define a new type of proof syntax and
then write code in the scripting language Lua that transforms this proof syntax
into a formal deduction that Watson can check. Let's define a simple proof syntax
we can use to prove our first theorems.
The simplest proof step is claiming that a fact is true and providing a
justification for that step. There are multiple ways of justifying a fact so we
will create a non-terminal symbol that represents justifications. We can do this
with the grammar_category command.
277grammar_category justification
Then we will claim that a fact is true by using the follows tactic. Notice
that we provide both the justification for the given fact and the tactic that
will complete the rest of the proof given this new fact.
283tactic tactic.follows
284 -- The name before the colon is the name that can be used in Lua to access
285 -- the sub-tactic. The name after the colon is the non-terminal.
286 tactic ::= @kw"follows" goal:@fact just:justification ";" next:tactic
287end
One form of justification for a fact is applying a theorem. We will call this
justification by. To use a theorem we need to fill in any template parameters
the theorem has. We will create a new non-terminal for the templates.
293grammar_category templates
295tactic justification.by
296 justification ::= @kw"by" thm:@name tmps:templates
297end
And then we will fill in the syntax for a list of template instantiations.
301grammar_category template
303tactic templates.some
304 templates ::= tmp:template rest:templates
305end
307tactic templates.none
308 templates ::=
309end
311tactic template
312 template ::= "[" frag:@any_fragment "]"
313end
The pattern of a some and none variant is a common way of expressing a list
within a context free grammar.
Another possible justification is using a whole subproof to prove a given fact.
We will call this justification as which reminds us of a proof remark like
"the fact follows as this subproof demonstrates."
322tactic justification.tactic
323 justification ::= @kw"as" tactic:tactic
324end
Another useful tactic will be todo. This can be used when we are in the middle
of writing our proofs. It will prove anything we want but Watson will mark our
proof as unfinished so we know to return and finish it later. This is similar to
sorry in Lean.
331tactic tactic.todo
332 tactic ::= @kw"todo"
333end
The final tactic we will need is the simplest tactic of all: do nothing. This is useful if the goal of our proof has already been proven and we don't need to do anything more.
339tactic tactic.empty
340 tactic ::=
341end
And and Or
We will start by proving the basic facts about or. For or there are two introduction rules. Left and right introduction. That is, if we know is true then we know is true. And also if we know is true then is true. As we mentioned before, the definition of or is asymmetric so the proofs of these two facts are different.
351theorem or.intro_right [p q : sentence] : (q)
352 |- p ∨ q
353proof
354 -- p ∨ q means ¬p → q by definition. so we want to prove an implication.
355 -- we will start by assuming that ¬p is true.
356 follows assume ¬p |- q as
357 -- however q is a hypothesis so there is nothing to be proved
358 ;
359
360 -- we have assume ¬p |- q but we want ¬p → q so we will use the deduction
361 -- theorem to convert between these
362 follows ¬p → q by deduction [¬p] [q];
363qed
This proof features a very common style of deduction: we assume and deduce . Then we apply the deduction theorem to conclude . Let's add a new tactic that automatically performs this reasoning.
369tactic tactic.assume_then
370 tactic ::= @kw"assume" assumption:@fragment(sentence) @kw"then"
371 conclusion:@fragment(sentence) just:justification ";" next:tactic
372end
TODO: explanation
375tactic tactic.assume
376 tactic ::= kw:@kw"assume" assumption:@fragment(sentence) ";" next:tactic
377end
We can use this new tactic to prove the other or introduction rule.
381theorem or.intro_left [p q : sentence] : (p)
382 |- p ∨ q
383proof
384 -- since p ∨ q is really ¬p → q it suffices to show ¬p → q.
385 assume ¬p then q as
386 -- we have both p and ¬p which is a contradiction. recall that ¬p means
387 -- p → ⊥ so we can derive ⊥ and then use explosion to get q.
388 follows ⊥ by modus_ponens [p] [⊥];
389 follows q by explosion [q];
390 ;
391qed
In this proof we see two more common proof patterns that we will make into
tactics. If we can derive then our proof is always done because we can
use the explosion theorem. This represents the normal proof idea of having
reached a contradiction. The contradiction tactic will automatically search
the facts we have proven to see if we have both a fact and its negation. If it
finds such a contradiction it will complete the proof. Otherwise it will produce
an error.
401tactic tactic.contradiction
402 tactic ::= @kw"contradiction" just:justification
403end
The other proof pattern we observed came at the start. We saw that
was defined as so it suffices to show this second fact. We wrote
a comment to explain this but it is better to have Watson actually check the
equivalence for us. We will introduce a new tactic suffices that changes the
goal to a new goal using the given justification to show that the new goal is
sufficient to prove the old goal.
412tactic tactic.suffices
413 tactic ::= @kw"suffices" goal:@fragment(sentence) just:justification ";"
414 next:tactic
415end
If the two goals are definitionally equal, as they were in or.intro_left, no
justification is needed. Let's add a new, empty, justification for when there
isn't actually anything to prove.
421tactic justification.empty
422 justification ::=
423end
Armed with these new tactics we can prove the elimination rule for or. It says that if we know is true and that both and imply then is also true. This is a natural inference rule and we could have chosen this as an axiom instead of double negation elimination.
430theorem or.elim [p q r : sentence] : (p ∨ q) (p → r) (q → r)
431 |- r
432proof
433 -- assuming ¬r will lead to a contradiction. thus we will have ¬¬r which
434 -- we can convert to r using the not_not.elim axiom.
435 assume ¬r then ⊥ as
436 -- p gives us r which is a contradiction. so we have ¬p
437 assume p then ⊥ as
438 follows r by modus_ponens [p] [r];
439 contradiction
440 ;
441 -- but since p ∨ q we must then have q
442 follows q by modus_ponens [¬p] [q];
443 -- which also gives us r
444 follows r by modus_ponens [q] [r];
445 -- so we have a contradiction still
446 contradiction
447 ;
448 follows r by not_not.elim [r];
449qed
Once again there is a common proof pattern here we can make into a tactic. We
assume a fact and then derive a contradiction from it. This allows us to
conclude the negation of that fact. We will call the tactic assume for_contra.
455tactic tactic.assume_contra
456 tactic ::= @kw"assume" assumption:@fragment(sentence) @kw"for_contra" ";"
457 next:tactic
458end
Also, and this is an entirely aesthetic concern, writing follows everywhere is
boring. Let's introduce thus and 'hence' which do the same thing as follows but makes the
proof easier to read when one fact follows directly from what was previously shown.
464tactic tactic.thus
465 tactic ::= @kw"thus" goal:@fact just:justification ";" next:tactic
466end
468tactic tactic.hence
469 tactic ::= @kw"hence" goal:@fact just:justification ";" next:tactic
470end
We are now ready to prove that or is commutative.
474theorem or.comm [p q : sentence] : (p ∨ q)
475 |- q ∨ p
476proof
477 assume p then q ∨ p by or.intro_right [q] [p];
478 assume q then q ∨ p by or.intro_left [q] [p];
479 thus q ∨ p by or.elim [p] [q] [q ∨ p];
480qed
If we know and that is false then we must have . Once again, since the definition of or is asymmetric the two proofs are not the same.
485theorem or.not_left [p q : sentence] : (p ∨ q) (¬p)
486 |- q
487proof
488 -- this theorem is how we defined or! p ∨ q is by definition ¬p → q
489 follows ¬p → q;
490 thus q by modus_ponens [¬p] [q];
491qed
Just like with suffices it is helpful to have Watson check if our goal is
definitionally equal, let's add the have tactic when we are claiming a fact
we already have is definitionally equal to a new form.
497tactic tactic.have
498 tactic ::= @kw"have" f1:@fragment(sentence) @kw"from" f2:@fragment(sentence)
499 ";" next:tactic
500end
We can apply or.not_left to prove or.not_right.
504theorem or.not_right [p q : sentence] : (p ∨ q) (¬q)
505 |- p
506proof
507 assume ¬p for_contra;
508 thus q by or.not_left [p] [q];
509 contradiction
510qed
Next let's move on to proving properties about and. Whereas or has two introduction rules and one elimination rule, and has one introduction rule and two elimination rules.
516theorem and.intro [p q : sentence] : (p) (q)
517 |- p ∧ q
518proof
519 -- we can expand the definition of p ∧ q
520 suffices ¬(¬p ∨ ¬q);
521 assume ¬p ∨ ¬q for_contra;
522 -- but either of these possibilities leads to a contradiction
523 assume ¬p then ⊥ as contradiction;
524 assume ¬q then ⊥ as contradiction;
525 thus ⊥ by or.elim [¬p] [¬q] [⊥];
526qed
This time the definition of and is symmetrical so the proofs of the two elimination rules are the same.
531theorem and.elim_left [p q : sentence] : (p ∧ q)
532 |- p
533proof
534 assume ¬p for_contra;
535 thus ¬p ∨ ¬q by or.intro_left [¬p] [¬q];
536 have ¬(¬p ∨ ¬q) from p ∧ q;
537 contradiction
538qed
540theorem and.elim_right [p q : sentence] : (p ∧ q)
541 |- q
542proof
543 assume ¬q for_contra;
544 thus ¬p ∨ ¬q by or.intro_right [¬p] [¬q];
545 have ¬(¬p ∨ ¬q) from p ∧ q;
546 contradiction
547qed
Then the proof of and commutativity is very simple.
551theorem and.comm [p q : sentence] : (p ∧ q)
552 |- q ∧ p
553proof
554 follows p by and.elim_left [p] [q];
555 follows q by and.elim_right [p] [q];
556 thus q ∧ p by and.intro [q] [p];
557qed
If And Only If
Since if and only if is just a special case of and it has the same introduction and elimination rules. We just refer to the theorems we proved for and.
564theorem iff.intro [p q : sentence] : (p → q) (q → p)
565 |- p ↔ q
566proof
567 suffices (p → q) ∧ (q → p);
568 follows (p → q) ∧ (q → p) by and.intro [p → q] [q → p];
569qed
571theorem iff.elim_fwd [p q : sentence] : (p ↔ q)
572 |- p → q
573proof
574 follows p → q by and.elim_left [p → q] [q → p];
575qed
577theorem iff.elim_rev [p q : sentence] : (p ↔ q)
578 |- q → p
579proof
580 follows q → p by and.elim_right [p → q] [q → p];
581qed
And we can also prove a few more standard properties.
585theorem iff.mp [p q : sentence] : (p ↔ q) (p)
586 |- q
587proof
588 follows p → q by iff.elim_fwd [p] [q];
589 thus q by modus_ponens [p] [q];
590qed
592theorem iff.mpr [p q : sentence] : (p ↔ q) (q)
593 |- p
594proof
595 follows q → p by iff.elim_rev [p] [q];
596 thus p by modus_ponens [q] [p];
597qed
If and only if is an equivalence relation.
601theorem iff.refl [p : sentence] :
602 |- p ↔ p
603proof
604 assume p then p;
605 thus p ↔ p by iff.intro [p] [p];
606qed
608theorem iff.symm [p q : sentence] : (p ↔ q)
609 |- q ↔ p
610proof
611 follows p → q by iff.elim_fwd [p] [q];
612 follows q → p by iff.elim_rev [p] [q];
613 thus q ↔ p by iff.intro [q] [p];
614qed
616theorem iff.trans [p q r : sentence] : (p ↔ q) (q ↔ r)
617 |- p ↔ r
618proof
619 assume p then r as
620 thus q by iff.mp [p] [q];
621 thus r by iff.mp [q] [r];
622 ;
623 assume r then p as
624 thus q by iff.mpr [q] [r];
625 thus p by iff.mpr [p] [q];
626 ;
627 thus p ↔ r by iff.intro [p] [r];
628qed
Sometimes if we say that and are equivalent. We can make this notion rigorous in the following sense: for any property that holds of , that same property also holds of (and vice versa). We can show that this equivalence holds by induction over all possible formal sentences containing .
637theorem iff.equiv.imp1 [p q r : sentence] : (p ↔ q) (p → r)
638 |- q → r
639proof
640 assume q then r as
641 follows p by iff.mpr [p] [q];
642 thus r by modus_ponens [p] [r];
643 ;
644qed
646theorem iff.equiv.imp2 [p q r : sentence] : (p ↔ q) (r → p)
647 |- r → q
648proof
649 assume r then q as
650 follows p by modus_ponens [r] [p];
651 thus q by iff.mp [p] [q];
652 ;
653qed
655theorem iff.equiv.falsum [p q : sentence] : (p ↔ q) (⊥)
656 |- ⊥
657proof
658 -- not much to prove...
659qed
With these three proofs we can see that for any property , if then we will also have . To complete the proof we would have to perform induction over all possible properties . However such an induction would have to take place outside of the formal system. In any specific case we could prove it as a theorem, but the general case is a metatheorem. Watson is not capable of proving such metatheorems, so we will have to add it as an axiom. But we should remember that our reasons for adding this axiom are different than our reasons for the other axioms.
To write the axiom we will need some way of denoting a property of .
672notation sentence.with_sentence
673 sentence (1000) ::= @name "(" sentence ")"
674end
Then we can state the axiom, which we will call iff.rw for if and only if
rewrite.
679axiom iff.rw [p q : sentence] [m(_) : sentence(sentence)] : (p ↔ q) (m(p))
680 |- m(q)
681end
First Order Logic 2.2
The logical connectives we have developed thus far allows us to discuss the
relationships between various propositions. However, if you examine the formal
language we have thus far defined, you will realize that the only sentence
that does not contain another sentence within it is . This means that
every sentence we can discuss is built up from logical combinations of .
This means that every formal sentence is either trivially false or trivially
true. Obviously, this doesn't make for a very interesting mathematical system!
Terms and Equality
When we do mathematics, what form do our most basic assertions take? That is, if you have some sentence with no logical connectives in it, what will it look like? We claim, rather boldly, that every single assertion in traditional mathematics that is not trivially true or false is a claim about the relationship between two or more objects. (If you do not believe this, it may help to consider the definition of the assertion you are thinking of. For example, the statement " is even" may at first appear to be about just one object, until you remember that the definition of "is even" is "there is some such that ".)
Thus, in order to do mathematics, we will need both objects and at least one
relationship between them. We will call an object of study in our mathematical
system a "term." (This name comes from the fact that all mathematical
statements, if examined deeper and deeper, will eventually terminate at an
object). We will inform Watson of our desire for a new category of syntactical
object using the syntax_category command.
713syntax_category term
We will want parentheses for terms as well.
717notation term.parentheses
718 term ::= "(" term ")"
719end
721@[reducible]
722definition
723 (a:term) := a
724end
Next, we will need some relationship between terms. All but the most exotic of mathematical systems contain a notion of equality.
729syntax equality
730 sentence (500) ::= term " = " term
731end
We can also define notation for not equals.
735notation inequality
736 sentence (500) ::= term " ≠ " term
737end
739@[reducible]
740definition
741 a ≠ b := ¬a = b
742end
We will need just two axioms to describe the behavior of equality. Once again, these can be thought of as introduction and elimination rules. The first axiom, the introduction rule, asserts that any term is equal to itself. In other words, equality is reflexive.
749axiom eq.refl [a : term] :
750 |- a = a
751end
The second axiom, the elimination rule, says that, if two objects are equal, then they should have all the same properties. Precisely, we mean that if then for any sentence which is true, if we replace all occurrences of with , the sentence remains true. In order to describe this rule to Watson, we will need notation for a sentence which contains a term.
759notation sentence.with_term
760 sentence (1000) ::= @name "(" term ")"
761end
And now we can state the axiom.
765axiom eq.subst [p(_) : sentence(term)] [a b : term] : (a = b) (p(a))
766 |- p(b)
767end
From these two axioms, and with judicious choice of the substitution property, we can derive that equality is an equivalence relation.
772theorem eq.symm [a b : term] : (a = b)
773 |- b = a
774proof
775 follows a = a by eq.refl [a];
776 thus b = a by eq.subst [_ = a] [a] [b];
777qed
779theorem eq.trans [a b c : term] : (a = b) (b = c)
780 |- a = c
781proof
782 follows a = c by eq.subst [a = _] [b] [c];
783qed
Quantifiers
You may have noticed that, although we now have equality as a nontrivial sentence, we in fact cannot use it because we have no terms! Right now, our formal language contains no terms whatsoever, so we could never actually write an equality. To fix this, we will now introduce quantifiers, statements of the form or . The bound variables in the quantifiers will be our first terms.
Similarly to how we only needed to define one of and and then could define the remaining one in terms of the other, we only need to add one of and to our formal language. (In fact, this connection is no coincidence. Consider applying DeMorgan's laws to a statement with an infinite number of s or s).
For reasons that will become apparent in a moment, we will choose to
be in our formal language. The @binding(term) syntax tells Watson to add a new
variable name which holds a term when evaluating the sentence. As discussed
in the introduction, this notion is not context free which is why it necessitates
a special case within Watson.
806syntax exists
807 sentence (000) ::= "∃" @binding(term) ", " sentence
808end
And we can now define in terms of . The explanation of which
variables are bound goes on the notation command not the definition command
because Watson needs to know which variables exist in order to parse statements
before it resolves definitions. Unlike with the syntax commands, bindings in
a notation command can be present only for certain children. This will be
important later for notation like where should be
available on the right but doesn't make sense when evaluating .
818notation forall
819 sentence (000) ::= "∀" x:@binding(term) ", " sentence(x)
820end
822definition
823 ∀x, p(x) := ¬∃x, ¬p(x)
824end
We will also introduce notation for the “at most one” and “unique” existential quantifiers.
829notation exists.at_most_one
830 sentence (000) ::= "∃*" x:@binding(term) ", " sentence(x)
831end
833@[reducible]
834definition
835 ∃*x, p(x) := ∀y, ∀z, p(y) ∧ p(z) → y = z
836end
838notation exists.unique
839 sentence (000) ::= "∃!" x:@binding(term) ", " sentence(x)
840end
842@[reducible]
843definition
844 ∃!x, p(x) := (∃x, p(x)) ∧ (∃*x, p(x))
845end
To make sense of quantifiers we will need two axioms. Did you guess that they are once again an introduction rule and an elimination rule? The introduction rule for is simple: if we have a particular object with a property, then we certainly know that there exists an object with that property.
852axiom exists.intro [p(_) : sentence(term)] [a : term] : (p(a))
853 |- ∃x, p(x)
854end
The elimination rule is significantly more complex. In traditional mathematics, elimination works like this: given we can obtain a new variable such that . The variable must be "fresh" in the sense that we haven't used it before. We call the witness to the existential statement. Watson has no way of expressing that a variable is fresh and so we cannot use this system. The reason Watson doesn't support freshness is twofold: On the one hand, as we will soon see, it is unnecessary. But more generally, it makes the system significantly more complicated as sentences can no longer be understood entirely on their own but must now carry with them the context of which variables were "fresh" when they were created.
How can we do without freshness? To understand the problem let's try a simple solution first. We can add a new term to our constant language to serve as the witness. If we imagine the witness is called then we might use the following syntax declaration and axiom.
syntax witness
term ::= "ε"
end
axiom exists.elim [p(_) : sentence(term)] : (∃x, p(x))
|- p(ε)
end
What would be the problem with this? The issue is that we are using the same witness for every existential. For example, it is the case both that and . Then using our axiom we would get that and but obviously no such could exist. This is exactly why the witness is required to be fresh.
Since the issue arises from using the same witness for different existentials, we might imagine using the following alternate idea.
syntax witness
term ::= "ε" @binding(term) ", " sentence
end
axiom exists.elim [p(_) : sentence(term)] : (∃x, p(x))
|- p(εx, p(x))
end
Returning to our even-odd example, this time we would get that and which is not contradictory as and are two different terms. This system does work. In fact, it is called Hilbert's Epsilon Operator and has certain uses in formal logic and set theory.
Why then are we not using the epsilon operator? The issue is that the epsilon operator actually implies the Axiom of Choice! To see why, consider a collection of non-empty sets . Then construct the map defined as . We can prove that such a exists (every is non-empty) and so this is a valid choice function. We do not want our logical system to imply the Axiom of Choice so we will have to make some changes.
The trouble with the epsilon operator is that it allowed us to make an infinite number of parallel choices. Extracting a witness from an existential should behave less like a function and more like a reasoning step. To try and enforce this we might imagine using the same witness term as above but with the following new axiom:
axiom exists.elim [p(_) : sentence(term)] [q : sentence]
: (∃x, p(x)) (p(εx, p(x)) → q)
|- q
end
The idea behind this axiom is that we don't actually get to extract a witness for any proposition, we just get to imagine we could do so for one particular proposition. However, this axiom is no different from the previous axiom. By simply setting to be we extract the previous, bad, axiom.
The issue is that the witness itself appears in the conclusion we are drawing. However we can use a trick to prevent this: we include the conclusion we want in the witness term. This brings us to the final existential elimination axiom we will actually be using.
942syntax witness
943 term (000) ::= "∃ᵉ" @binding(term) ", " sentence " ⊢ " sentence
944end
946axiom exists.elim [p(_) : sentence(term)] [q : sentence]
947 : (∃x, p(x)) (p(∃ᵉx, p(x) ⊢ q) → q)
948 |- q
949end
We may read the syntax as "the witness for the existential which will be used to prove ." The fact that is embedded into the witness prevents us from proving the axiom of choice. The reason is that if we attempt to define a proof function in the same way as before, we will no longer be able to apply the existential elimination axiom to all the witnesses simultaneously. The goal in the existential elimination cannot name the witness. So we must finish the proof as the next step after applying the axiom and thus won't be able to quantify over all witnesses.
Using these two axioms we can prove the introduction and elimination rules for as theorems. The elimination rule is the dual of the introduction rule:
965theorem forall.elim [p(_) : sentence(term)] [a : term] : (∀x, p(x))
966 |- p(a)
967proof
968 -- we rewrite the forall as an existential
969 have ¬∃x, ¬p(x) from ∀x, p(x);
970
971 -- and then if ¬p(a) we have a counterexample to the above
972 assume ¬p(a) for_contra;
973 thus ∃x, ¬p(x) by exists.intro [¬p(_)] [a];
974
975 -- which leads to a contradiction
976 contradiction
977qed
The introduction rule is dual to the elimination rule. Traditionally, we would pick a fresh variable to prove the property of and then deduce that the property must hold for all variables. Here, our existential witnesses serve as the arbitrary variable.
984theorem forall.intro [p(_) : sentence(term)] : (p(∃ᵉx, ¬p(x) ⊢ ⊥))
985 |- ∀x, p(x)
986proof
987 assume ∃x, ¬p(x) for_contra;
988
989 -- from the existential we extract the witness. the specific x such that
990 -- ¬p(x). we had p(x) as our hypothesis, chosen to create a contradiction
991 -- here. of course, the point is it shouldn't matter what value we had as
992 -- the witness in our hypothesis; all should be equally easy to prove.
993 assume ¬p(∃ᵉx, ¬p(x) ⊢ ⊥) then ⊥ as contradiction;
994
995 -- then we extract the conclusion from the existential
996 thus ⊥ by exists.elim [¬p(_)] [⊥];
997qed
As you can see, using the witnesses can be quite cumbersome. We can abstract these four rules into the following tactics:
The existential introduction rule becomes:
1003tactic justification.using
1004 justification ::= kw:@kw"using" use:@fragment(term)
1005end
The existential elimination rule becomes:
1008grammar_category obtain_vars
1010tactic tactic.obtain
1011 tactic ::= @kw"obtain" list:obtain_vars just:justification ";" next:tactic
1012end
1014tactic obtain_vars.last
1015 obtain_vars ::= var:@name @kw"st" st:@fragment(sentence)
1016end
1018tactic obtain_vars.more
1019 obtain_vars ::= var:@name @kw"and" rest:obtain_vars
1020end
1022tactic obtain_vars.more_st
1023 obtain_vars ::= var:@name @kw"st" st:@fragment(sentence) @kw"and" rest:obtain_vars
1024end
This is the forall introduction rule:
1027tactic tactic.arbitrary
1028 tactic ::= kw:@kw"arbitrary" var:@name ";" next:tactic
1029end
1031tactic tactic.arbitrary_st
1032 tactic ::= kw:@kw"arbitrary" var:@name @kw"st" st:@fragment(sentence) ";" next:tactic
1033end
And here is the forall elimination rule:
1036tactic justification.specializing
1037 justification ::= @kw"specializing" forall:@fragment(sentence)
1038end
We can see these all play nicely together in the following classic theorem (which will be the first theorem using a predicate on two terms).
1043notation sentence.with_term_term
1044 sentence (1000) ::= @name "(" term ", " term ")"
1045end
1047theorem exists_forall.swap [p(_0, _1) : sentence(term, term)] : (∃x, ∀y, p(x, y))
1048 |- ∀y, ∃x, p(x, y)
1049proof
1050 obtain x' st ∀y, p(x', y);
1051 arbitrary y';
1052 follows p(x', y') specializing ∀y, p(x', y);
1053 thus ∃x, p(x, y') using x';
1054qed
With our development of quantifiers we have completed the traditional presentation of first order logic. In fact, it is called first order logic because the quantifiers quantifier over just one sort of object, terms. Everything we have done so far makes sense in almost any mathematical system. The next step on our journey is to introduce sentences and terms specific to our context, in this case set theory. However, before we do this, there is one extension we can make to our system that will make our lives easier.
The Iota Operator 2.3
Using the tools of Watson it is easy to define complex sentences that take other terms or sentences as arguments. This is because, in some sense, a sentence is self contained. However, terms do not behave this way. Thus far, the only terms we have defined come on the inside of quantifiers. This makes it impossible to define self contained terms.
definition my_term
foo := what could go here? we aren't in a quantifier
end
This situation could be resolved if our specific mathematical language contained some primitive, self-contained terms for us to use. However, set theory is usually presented without any such self contained terms. Of course, mathematicians use self contained terms all the time. What do they mean by this? Take the empty set for example. Traditionally, it is understood that if you see a formula containing you are supposed to expand it using the following three rules:
This is a significantly more complex notion of definition than the find and replace system that Watson supports. These notions could be instead be added as axioms, which is what the Metamath proof system does.
However there is a different way of thinking about term definitions we can use instead. A term is well defined if we know that it exists and we know that it is unique. We can introduce a new term into our formula language that represents the unique term with a given property. Following the tradition of the Principia Mathematica, this is called the iota operator and is denoted with an inverted iota.
1103syntax iota
1104 term (000) ::= "℩" @binding(term) ", " sentence
1105end
This term only has meaning if there is actually a unique object with the given property. Otherwise, it can be thought of as an arbitrary term.
1110axiom iota.unique [p(_) : sentence(term)] : (∃!x, p(x))
1111 |- p(℩x, p(x))
1112end
In the future we will make definitions in terms of this iota operator. One advantage of the iota operator over other systems is that it makes sense for any first order logical system, regardless of the particular predicates or terms that are present. Additionally it is a conservative extension meaning that every sentence containing an iota operator could be converted to one without and all sentences which correspond in this way will have the same truth value. This means that the iota operator is just a notation convenience and doesn't fundamentally change our system.
With all the pieces in place, we are ready to begin set theory.